Tuesday, 23 April 2013

mmread.m

mmread.m is a Matlab function for reading different media files on different platforms (Linux, Windows and Macs). It is used in the sample code of 2011/2012 One-shot-learning gesture recognition challenge.

When I first run example.m in the sample code, it asks me to install libavbin.so and I clicked yes. However I got segmentation fault after the installation. I also tried to install libavbin.so from the version downloaded from https://code.google.com/p/avbin/, but this causes an error (undefined symbol: 
avbin_open_filename_with_format error) in FFGrab.mexa64. The author noted that he fixed this bug. So I downloaded mmread from the Matlab file exchange and replaced that in the sample code. Now mmread works.


Tuesday, 16 April 2013

A Unified Framework for Concurrent Usage of Hand Gesture, Shape and Pose

Natural interaction with hands

  • hand trajectory
    • HMM based continuous recognition
    • not really natural
  • hand shape
    • Use depth image directly / label / recognition
  • articulated hand pose
    • Hand skeleton extraction
[Download][Summary][Slides]



    One-shot gesture recognition

    Data
    User dependent:  the gestures in one batch are performed by a single user. There is a single labeled training example of each gestures of the vocabulary in a given batch. The goal of the challenge is, for each batch, to train a system on the training examples, and to make predictions of the labels for the test examples. The test labels in the validation batches are withheld. Additional batches finalXX will be provided for final testing.

    Overall analysis
    https://docs.google.com/file/d/0B08QS7nJpK7mX3pFalVzckxoU2M/edit

    Winners' methods
    1st place
    https://docs.google.com/file/d/0B4jW8HPqnNiuU2RiQWl6TnpfQzQ/edit

    Initial image preprocessing

    • used depth information only
    • Identify outliers (those pixels returned as 0 by kinect) and remove outliers (simple/fast wavelet reconstruction)

    Representation of visual features

    • Mimic behavioral and neural mechanisms underlying visual processing
    • Selected features of interest (emphasize moving close to the camera gestures)
    • Feature/background separation
    • Encode features time-varying shape and trajectory
    • Similarity measure (robust to variability in features selection or location)
    Gesture recognition algorithm
    • General Bayesian network model similar to speech recognition literature
    • Can perform simultaneous recognition and segmentation
    • Compute similarities between each input video frame with sample gesture video frames