I am using an accelerometer-enabled device (mobile phone, to be specific) that enables sampling acceleration at a rate of about 20 samples per second. The samples contain three values, each corresponding to the X, Y and Z component of the measured acceleration as perceived by the device.
I've built a system where I have logged several gestures (as a time-dependent series of samples, $f(t)$) as examples that I'd like to match against input on the device to classify gestures and execute actions based on recognized gestures.
Ideally, I'd like the evaluation to take place on the device, but given the low computational capacity and the need for near-realtime evaluation, the algorithm would need to be pretty efficient.
How do I approach such a classification problem?
Addendum: An additional problem I've thought of is that the signal could be located anywhere in the stream, eg. the gesture could start and end at any time during capture. Would I use a sliding window and do a compare each time a new sample comes in, truncating off the start of the stream?
Addendum 2: It seems someone's already been tackling this problem using FFTs and SVM. Does anyone have a good explanation and/or pointers as to the implementation of this method and its feasibility for real-time recognition?