2
$\begingroup$

I am using an accelerometer-enabled device (mobile phone, to be specific) that enables sampling acceleration at a rate of about 20 samples per second. The samples contain three values, each corresponding to the X, Y and Z component of the measured acceleration as perceived by the device.

I've built a system where I have logged several gestures (as a time-dependent series of samples, $f(t)$) as examples that I'd like to match against input on the device to classify gestures and execute actions based on recognized gestures.

Ideally, I'd like the evaluation to take place on the device, but given the low computational capacity and the need for near-realtime evaluation, the algorithm would need to be pretty efficient.

How do I approach such a classification problem?

Addendum: An additional problem I've thought of is that the signal could be located anywhere in the stream, eg. the gesture could start and end at any time during capture. Would I use a sliding window and do a compare each time a new sample comes in, truncating off the start of the stream?

Addendum 2: It seems someone's already been tackling this problem using FFTs and SVM. Does anyone have a good explanation and/or pointers as to the implementation of this method and its feasibility for real-time recognition?

1 Answers 1

1

This sounds like an ideal problem for a neural network.

You should create a training set of examples $(x_i, y_i)$ where each $y_i$ corresponds to one of your gestures, and the $x_i$ is a vector of your measurements (i.e. the time series of $x$, $y$ and $z$ accelerations unpacked into a vector).

If you have $p$ different gestures, then your neural network will output $p$ numbers, which are the probabilities that the gesture performed by the user is each of the $p$ possible gestures. You simply select the gesture with the highest output value.

Once you have fit a network to the training data, it will be very fast to evaluate which gesture has performed: the computations taking place inside a neural network are nothing more than addition, multiplication, division and exponentation.

The tricky part will be fitting the neural network. If you can get your data into an easily accessible format (e.g. a CSV) then you could import it into R and use one of the many excellent neural network packages to get the coefficients that you can then hard-code into your application.

  • 0
    As regards the input data - yes, you would probably need to map the time series to a fixed-size vector in order for a neural network to properly interpret it. This might be desirable, though - presumably different people will perform the gestures at different speeds, and mapping each input to a fixed-size vector at least gets rid of this ambiguity.2011-11-02