PROJECTS (not updated frequently. Please see publications or Personal Robotics group page.)


Personal Robots learning from online 3D data about human usage of objects

By observing online 3D data such as on Google 3D warehouse, robots learn how humans use the objects and the environments. Applied to robotic arrangement of objects.

Selected Papers: ICML'12, ISER'12.
Students: Yun Jiang, Marcus Lim, Alejandro Perez.
Research/Code/Data: project webpage

Personal Robots: Learning Object Placements

Learning algorithms to predict robotic placements, even for objects of types never seen before by the robot. Applied to tasks such as arranging a cluttered room, loading items onto a dish-rack, or putting items in a fridge, etc.

Selected Papers: ICRA'12, IJRR'12.
Students: Yun Jiang, Marcus Lim, Changxi Zheng.
Research/Code/Data: project webpage
Popular Press: Newswise, Zee News, News Tonight, ACM Technews, Communications of the ACM, UPI, NDTV, CBS WBNG Action News.

Personal Robots: Learning Human Activities

Learning algorithms to predict human activities. In order for personal robots to be useful to humans, they first need to understand human activities using image and depth data.

Selected Papers: ICRA'12.
Students: Jae Y. Sung, Colin Ponce, Hema Koppula.
Research/Code/Data: project webpage
Popular Press: R&D magazine, Gizmag.

Personal Robots: 3D Scene Understanding

Learning algorithms to understand the 3D structure of the scenes.

Selected Papers: NIPS'11a, TPAMI'12, NIPS'11b.
Students: Abhishek Anand, Hema Koppula, Congcong Li. Research/Code/Data: project webpage
Popular Press: New Scientist (July 21)

Make3D: Single Image Depth Perception

Learning algorithms to predict depth and infer 3-d models, given just a single still image. Applications included creating immersive 3-d experience from users' photos, improving performance of stereovision, creating large-scale models from a few images, robot navigation, etc. Tens of thousands of users have converted their single photographs into 3D models.

 

Personal Robots: Learning Robotic Grasps

Learning algorithms to predict robotic grasps, even for objects of types never seen before by the robot. Applied to tasks such as unloading items from a dishwasher, clearing up a cluttered table, opening new doors, etc.

Selected Papers: NIPS'06, IJRR'08, AAAI'08a, AAAI'08b, ICRA'11, ICRA'12.
Research/Code/Data: Dishwasher, Cluttered (Barrett), Opening New Doors.
Details: STAIR, Personal Robotics.
Popular Press: New York Times, Wired Magazine, NBC, ABC, BBC, CBS WBNG Action News.

 

Holistic Scene Understanding: Combining Models as Black-boxes using Cascades

Holistic scene understanding requires solving several tasks simultaneously, including object detection, scene categorization, labeling of meaningful regions, and 3-d reconstruction. We develop a learning method that couples these individual sub-tasks for improving performance in each of them.

Paper: NIPS'08, ECCV-workshop'10, NIPS'10, NIPS'11, TPAMI'12.
Related: Make3D.
Popular Press: New Scientist (May 24, 2011).

 

Visual Navigation: Miniature Aerial Vehicles

Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments. Also fly a indoor helicopters/quadrotors autonomously using a single onboard camera.

Selected Papers: ICML'05, IJCV'07, IROS 2009, UAI'10, ICRA 2011.
Research/Code/Data: Car, MAVs.
Video: Youtube.
Popular Press: KTVU news, New Scientist.

STAIR: Opening New Doors

For a robot to practically deployed in home and office environments, they should be able to manipulate their environment to gain access to new spaces. We present learning algorithms to do so, thus making our robot the first one able to navigate anywhere in a new building by opening doors and elevators, even ones it has never seen before.

Selected Papers: RSS Manipulation workshop'08, IROS 2010.
Research/Code/Data: Opening New Doors.
Details: STAIR, Personal Robotics.

 

Sound Location from Single Microphone

The ability to perform monaural (single-ear) localization is important to many animals; indeed, monaural cues are also the primary method by which humans decide if a sound comes from the front or back, as well as estimate its elevation. In this paper, we propose a machine learning approach to monaural localization, using only a single microphone and an "artificial pinna" (that distorts sound in a direction-dependent way).

Selected Papers: ICRA 2009a.
Details: Project page.

 

STAIR: Optical Proximity Sensors

We propose novel optical proximity sensors for improving grasping. These sensors, mounted on fingertips, allow pre-touch pose estimation, and therefore allow for online grasp adjustments to an initial grasp point without the need for premature object contact or regrasping strategies.

Selected Papers: ICRA'09b.
Details: Project page, Manipulation group.

 

Zunavision

We developed algorithms to automatically modify videos by adding textures in them. Our algorithms perform robust tracking, occlusion inference, and color correction to make the texture look part of the original scene.

Details: Project page

 

3D Object Orientation from an Image

Orientation learning is a difficult problem because the space of orientations is non-Euclidean, and in some cases (such as quaternions) the representation is ambiguous, in that multiple representations exist for the same physical orientation. Learning is further complicated by the fact that most man-made objects exhibit symmetry, so that there are multiple "correct" orientations. In this paper, we propose a new representation for orientations---and a class of learning and inference algorithms using this representation---that allows us to learn orientations for symmetric or asymmetric objects as a function of a single image.

Selected Papers: ICRA 2009.

 
 

Visual Navigation: High speed obstacle avoidance

Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments.

Selected Papers: ICML'05, IJCV'07.
Research/Code/Data: here.
Video: Youtube.

 

Make3D extension: Large Scale Models from Sparse View

Create 3-d models of large environments, given only a small number of (possibly) non-overlapping images. This technique integrates Structure from Motion (SFM) techniques with Make3D's single image depth perception algorithms.

Selected Papers: IJCAI'07, ICCV-VRML'07, AAAI-Nectar'08, IEEE-PAMI.
Research/Code/Results: here.

 

Improving Stereovision using monocular cues

Stereovision is fundamentally limited by the baseline distance between the two cameras. I.e., the depth estimates tend to be inaccurate when the distances considered are large. We believe that monocular visual cues give largely orthogonal, and therefore complementary, types of information about depth. We propose a method to incorporate monocular cues to stereo (triangulation) cues to obtain significantly more accurate depth estimates than is possible with either alone.

Selected Papers: IJCAI'07, IJCV'07.
Research/Code/Data: here.

 

6-D wireless sourceless mouse

This device uses accelerometers and gyrometers to estimate its 3-d location and 3-d orientation. This device can be used, for example, to conveniently navigate in a 3-d virtual world.

Selected Papers: LNCS-KES'05.
Research page: here.
Video: wmv.

 

Noise tolerant Locally Linear Isomaps

Isomaps (for non-linear dimensionality reduction) suffer from the problem of short-circuiting, which occurs when the neighborhood distance is larger than the distance between the folds in the manifolds. We proposed a new variant of Isomap algorithm based on local linear properties of manifolds to increase its robustness to short-circuiting.

Selected Papers: LNCS-ICONIP'05.

Data-driven Robotics

The issue of what data is there to learn from is at the heart of all learning algorithms---often even an inferior learning algorithm will outperform a superior one, if it is given more data to learn from. We proposed a novel and practical solution to the dataset collection problem; we first use a green screen to rapidly collect data and then use a probabilistic model to rapidly synthesize a much larger training set. We used this data to build reliable classifiers for our robots.

Selected Papers: AAAI'08.
Research/Code/Data: here.
Video: coming soon.

 
 

Expression/Gesture Recognition

Infer facial expressions (e.g., smile, surprise, disgust, etc.) given an image of a face. This algorithm builds a sparse geometric model of face, and uses the parameters of the geometric model as features in a learning algorithm. Reasonably robust to partial occlusions. In a similar project, we use a web camera to track the hand and to infer the hand gestures for controlling a simple computer GUI. (No other equipment such as gloves were needed.)

Selected Papers: ICONIP'04.

 

Converting insulator polystyrene to moderately conducting polymer

We described a simple, bioinspired approach for the conversion of an insulator, polystyrene, to a moderately conducting polymer by introducing adenine nucleobases.

Selected Papers: Chemistry Letters'04.

 

ELifebelt: Wristworn device to save a person from electric shock

We developed a electronic device that when worn as a wrist-watch protects the person from electric shocks. It monitors the skin potentials continuously and trips the power circuit wirelessly to save the person's life.

Selected Papers: NPSC'04, Extended version.

 

Other projects

See publications page for more. E.g., speech recognition, etc.