Motion Tracking and Visual Guidance

One of the tasks that we have applied this method to is "coarse" motion tracking, where relatively large objects moving nonrigidly are to be tracked across time. The key observation underlying the method is that as an object moves with respect to the camera, the change in the image can be modeled as two parts: (1) a motion in the image, and (2) a change in the image shape. The motion in the image we recover by finding the best match of a model at one time frame to the image at the next time frame. The shape change in the image is then handled by building a new two-dimensional model at the subequent time frame. Thus an object is modeled as a sequence of two-dimensional edge images that change as a function of time. At each time, a given object is modeled as a two-dimensional bitmap (e.g., the model of Kevin above). The transformation that best matches the model to the image at the next time frame is found, and then a new model is formed using the subset of the image edges that are close to the transformed model.

As the method requires an initial two-dimensional model of the object, we use the optical flow (local motion of the image points from one time frame to the next) to identify portions of the image where there is motion. For each such area, if the system does not currently have a model for the region, a new model is formed and that model is tracked. An object must be successfully tracked for 1/2 second (15 frames) before it is displayed. Each object that the system is tracking is displayed in a different color.

An mpeg image sequence illustrating the tracker output (300 frames, 1mbyte -- this takes about two minutes to play; and it's only 10 seconds of original video at 30 fps).

The use of 2D edge models for motion tracking is described further in our paper Tracking Non-Rigid Objects in Complex Scenes

Mobile Robot Navigation

We have used the idea of building successive two-dimensional geometric (edge) models of an object to guide a mobile robot to a target in the visual field. The basic idea is to center the landmark in the field of view, and then move towards it. The method does not require prior calibration of the camera system. The key observation is that when a camera moves directly towards an object, the range to that object is given by m/(s-1) where m is the distance that the camera moved and s is the change in the apparent size of the object in the image. Thus we determine the bearing (orientation) of the landmark, rotate so that the robot (and camera) is heading in that direction, move forward some distance, use the change in apparent size of the landmark to compute the range to the landmark, and continue moving towards the landmark until the distance estimate is small.

As the robot moves towards the landmark the range and bearing estimates are updated and the path is corrected based on these measurements. By correcting the path as the robot moves, the method can compensate for the camera being misaligned (not pointing in the same direction as the ``forward'' motion of the robot) and for the robot not moving in exactly the commanded direction. In order to cope with moving targets, the navigation system predicts the anticipated location of the target at the subsequent frame, and rotates so that that location will be in the center of the field of view.

The sequence of images taken by Tommy (the mobile robot) as it navigates around two obstacles towards Kevin, who is sitting on a couch, illustrates the landmark-based navigation method. The grey-level images (42 frames, 162kbytes) show the scene as Tommy moves (the frames were taken several seconds apart). The match images (42 frames, 568 kbytes) show the intensity edges for each frame (in green), together with the best match of the model overlaid in red (yellow indicates points of the model that were directly superimposed on image edges). The model at each time frame was in general extracted from the previous image frame.

A similar method can be used to follow a moving target. In this case, the location of the target at the next time frame is predicted based on the motion between the previous two frames, so that the target can be kept centered in the field of view (this is particularly important because images are currently taken several seconds apart). The sequence of images taken by Tommy as it chases Lily (another mobile robot) down the hallway illustrate the method. The grey-level images (42 frames, 162kbytes) show the scene as Tommy moves.

We are currently developing a "pathfinder robot", which uses the visually guided navigation to move the robot to landmarks specified using a graphical user interface. Sequences of these landmark targets then serve as a route that the robot can re-traverse.

The use of 2D edge models for visually guided navigation and homing of mobile robots is described further in the report Visually-Guided Navigation by Comparing Two-Dimensional Edge Images