User dependent: the gestures in one batch are performed by a single user. There is a single labeled training example of each gestures of the vocabulary in a given batch. The goal of the challenge is, for each batch, to train a system on the training examples, and to make predictions of the labels for the test examples. The test labels in the validation batches are withheld. Additional batches finalXX will be provided for final testing.
Overall analysis
https://docs.google.com/file/d/0B08QS7nJpK7mX3pFalVzckxoU2M/edit
Winners' methods
1st place
https://docs.google.com/file/d/0B4jW8HPqnNiuU2RiQWl6TnpfQzQ/edit
Initial image preprocessing
- used depth information only
- Identify outliers (those pixels returned as 0 by kinect) and remove outliers (simple/fast wavelet reconstruction)
Representation of visual features
- Mimic behavioral and neural mechanisms underlying visual processing
- Selected features of interest (emphasize moving close to the camera gestures)
- Feature/background separation
- Encode features time-varying shape and trajectory
- Similarity measure (robust to variability in features selection or location)
- General Bayesian network model similar to speech recognition literature
- Can perform simultaneous recognition and segmentation
- Compute similarities between each input video frame with sample gesture video frames
No comments :
Post a Comment