Personal Robots learning from online 3D data about human usage of objects
By observing online 3D data such as on Google 3D warehouse, robots learn how humans use the objects and
the environments. Applied to robotic arrangement of objects.
Learning algorithms to predict robotic placements, even for objects of types never seen
before by the robot. Applied to tasks such as arranging a cluttered room, loading items onto
a dish-rack, or putting items in a fridge, etc.
Selected Papers: ICRA'12, IJRR'12.
Students: Yun Jiang, Marcus Lim, Changxi Zheng.
Research/Code/Data: project webpage
Popular Press: Newswise, Zee News, News Tonight, ACM Technews, Communications of the ACM, UPI, NDTV, CBS WBNG Action News.
Learning algorithms to predict human activities. In order for personal robots to be useful
to humans, they first need to understand human activities using image and depth data.
Selected Papers: ICRA'12.
Students: Jae Y. Sung, Colin Ponce, Hema Koppula.
Research/Code/Data: project webpage
Popular Press: R&D magazine, Gizmag.
Learning algorithms to predict depth and infer 3-d models, given just a single still image.
Applications included creating immersive 3-d experience from users' photos,
improving performance of stereovision,
creating large-scale models from a few images, robot navigation, etc.
Tens of thousands of users have converted their single photographs into 3D models.
Learning algorithms to predict robotic grasps, even for objects of types never seen
before by the robot. Applied to tasks such as unloading items from a dishwasher, clearing
up a cluttered table, opening new doors, etc.
Holistic scene understanding requires solving several tasks
simultaneously, including object detection, scene categorization,
labeling of meaningful regions, and 3-d reconstruction.
We develop a learning method that couples
these individual sub-tasks for improving
performance in each of them.
Paper: NIPS'08, ECCV-workshop'10, NIPS'10, NIPS'11, TPAMI'12.
Related: Make3D.
Popular Press: New Scientist (May 24, 2011).
Use monocular depth perception and reinforcement learning
techniques to drive a small rc-car at high speeds in unstructured
environments. Also fly a indoor helicopters/quadrotors autonomously using
a single onboard camera.
For a robot to practically deployed in home and office environments,
they should be able to manipulate their environment to gain access
to new spaces.
We present learning algorithms to do so, thus making our robot the
first one able to navigate anywhere in a
new building by opening doors and elevators, even ones it has never seen before.
The ability to perform monaural (single-ear) localization
is important to many animals; indeed, monaural cues are also the primary method
by which humans decide if a sound comes from the front or back, as well as
estimate its elevation.
In this paper, we propose a machine
learning approach to monaural localization, using only a single microphone and
an "artificial pinna" (that distorts sound in a direction-dependent way).
We propose novel optical proximity sensors for improving grasping.
These sensors, mounted on fingertips, allow pre-touch pose
estimation, and therefore allow for online grasp adjustments
to an initial grasp point without the need for premature
object contact or regrasping strategies.
We developed algorithms to automatically modify videos by adding
textures in them. Our algorithms perform robust tracking,
occlusion inference, and color correction to make the texture
look part of the original scene.
Orientation learning is a difficult problem because the
space of orientations is non-Euclidean, and in some cases (such as quaternions)
the representation is ambiguous, in that multiple representations exist
for the same physical orientation. Learning is further complicated by
the fact that most man-made objects exhibit symmetry, so that there are
multiple "correct" orientations. In this paper, we propose a
new representation for orientations---and a class of learning and
inference algorithms using this representation---that allows us to
learn orientations for symmetric or asymmetric objects as a function
of a single image.
Make3D extension: Large Scale
Models from Sparse View
Create 3-d models of large environments, given only a small number
of (possibly) non-overlapping images. This technique integrates
Structure from Motion (SFM) techniques with Make3D's single image
depth perception algorithms.
Stereovision is fundamentally limited by the baseline distance between the
two cameras. I.e., the depth estimates tend to be inaccurate when
the distances considered are large. We believe that monocular visual
cues give largely orthogonal, and therefore complementary, types of
information about depth. We propose a method to incorporate monocular
cues to stereo (triangulation) cues to obtain significantly more
accurate depth estimates than is possible with either alone.
This device uses accelerometers and gyrometers to estimate its
3-d location and 3-d orientation. This device can be used, for
example, to conveniently navigate in a 3-d virtual world.
Isomaps (for non-linear dimensionality reduction) suffer from the problem of
short-circuiting, which occurs when the neighborhood distance is larger
than the distance between the folds in the manifolds. We proposed a
new variant of Isomap algorithm based on local linear properties of
manifolds to increase its robustness to short-circuiting.
The issue of what data is there to learn from is at the heart
of all learning algorithms---often even an inferior learning
algorithm will outperform a superior one, if it is given
more data to learn from. We proposed a novel and practical
solution to the dataset collection problem; we first use a green
screen to rapidly collect data and then use a probabilistic
model to rapidly synthesize a much larger training set. We
used this data to build reliable classifiers for our robots.
Infer facial expressions (e.g., smile, surprise, disgust, etc.)
given an image of a face. This algorithm builds a sparse geometric
model of face, and uses the parameters of the geometric model
as features in a learning algorithm. Reasonably robust to
partial occlusions. In a similar project, we use a web camera
to track the hand and to infer the hand gestures for controlling
a simple computer GUI. (No other equipment such as gloves were
needed.)
Converting insulator polystyrene to moderately conducting polymer
We described a simple, bioinspired approach for the conversion
of an insulator, polystyrene, to a moderately conducting
polymer by introducing adenine nucleobases.
ELifebelt: Wristworn device to save a person from electric shock
We developed a electronic device that when worn as a wrist-watch
protects the person from electric shocks. It monitors the skin
potentials continuously and trips the power circuit wirelessly
to save the person's life.