The object recognition
problem is that of finding instances of object classes in
an image or video sequences: faces, giraffes, the digit 5,
chairs etc. We base our approach on deformable shape matching
using relational descriptors based on "shape contexts"
and "geometric blur". This enables one to compute
similarity measures between shapes which, together with similarity
measures for texture and color, can be used to drive object
recognition. I will show results on a variety of 2D and 3D
datasets such as handwritten digits and the Caltech-101 dataset
of visual categories.
The action recognition problem is that of finding instances
of actions in video sequences: run, jump, kick etc. We have
developed two approaches to recognition of actions. In low
resolution data, ("far field") the approach is based
on collecting low resolution optical flow measurements over
a spatiotemporal volume for each moving figure, constructing
a robust descriptor from this volume, and then matching these
to stored sequences. In high resolution data ("near field")
the approach is based on extracting stick figures in each
frame, and relying on joint level human body tracking to provide
a complete intermediate representation which is robust to
lighting, clothing as well as pose.
This talk is based on joint work; please visit: http://http.cs.berkeley.edu/projects/vision/vision_group.html
for pointers
to publications.