An Artificial Training Set for the Recognition and Tracking of Hand Gestures

Arvind Sastry, Carlo Tomasi

Hand gestures are examples of fast and complex motions. Computers fail to track these in fast video, but sleight of hand fools humans as well: what happens too quickly we just can not see. Our research is based on the idea of a three dimensional (3D) tracker for these motions that relies on the recognition of familiar configurations in two dimensional (2D) images (Classification), and fills the gaps in between (Interpolation). We illustrate this idea with experiments in hand motions similar to finger spelling. Classification of hand configurations, as a component of the overall hand tracking procedure and as such the focal point of this dissertation, involves the processing of the 2D frame from video to determine its associated 3D pose and configurations parameters.

With this goal in mind, we first build an artificial (and therefore scalable) training set of ‘familiar’ hand frames that is invariant to shape, scale, rotation, position and configuration of the hand. Next, we propose a feature design technique that generates image features best suited for this particular shape recognition exercise. To differentiate between two such features and thereby the configurations they represent, we put forward an appropriate distance metric based on the Earth Mover’s Distance. The final design of the classifier then applies this distance metric and feature extraction process to achieve efficient and effective recognition.