Hi!
I am trying to achieve reliable and responsive detection of a few (up to 5-6 let’s say, but currently 2) short gestures, let’s say swipe left, swipe right, and idle. My current input is the accelerometer data (with gravity). I am interested in making a dataset to show enough variance to the model. Currently I have three labels, “left”, “right”, and “still”, the latter for everything else which is not a swipe-to-left or swipe-to-right gesture. I have 10 instances of each class. So here are some questions:
-
Which one is the best object for this use case? mubu.gmm, mubu.hhmm or mubu.xmm?
-
How many examples / class are needed for consistent results? The case is 2 short gestures (let’s say 0.3-0.6 s long) plus the class for filtering out everything else (for which the examples are 5-6 s long random stuff). My background is working with LSTMs so I am used to much bigger datasets, but my hunch was that maybe it is not necessary in this case? Should 10 be enough? Less, more?
-
I started with mubu.gmm, then moved on to mubu.hhmm. With some tweaking of the parameters I get fairly OK results with not so much noise or misfiring. But it always takes some time for mubu.hhmm to go back from either “left” or “right” to “still”. So I cannot quickly retrigger the same gesture over and over, since it always takes around 0.5-1.5 s to go back to “still”. Which setting could make this transition faster? (Whithout losing precision.)
-
About the training data. I noticed that I got better results with mubu.hhmm when I recorded the gesture with not so much “idle” before and after the action. How extreme should I be with this? Should I precisely select the gesture with markers removing as much idle from the recording as possible? Would this possibly also help the previous issue?
-
I also noticed that training gestures which are similar (swipe left is let’s say trough-then-peak, and swipe right is peak-then-trough in the sensor readout) it is much easier to confuse the model compared to a situation of swipe up vs swipe left. Any advice on how to improve accuracy in these cases?
-
How important is scaling? Will the model (or the mubu.track?) autoscale everything anyway or should I be careful about that? -1 to 1 or 0 to 1 as a normalized range? (Does it matter?) When I scale, should the signal be as “hot” as possible (as if it was a mic signal)? Some accelerometers are scaled extremely broadly in order to not clip with even the most extreme movements - but that means that most “normal” gestures will barely move the numbers from the middle position. Would “zooming into” the useful range (ie. gaining up the signal) help? Or it doesn’t matter?
-
Any other advice, comment or suggestion you have, I am super grateful to get!
Thanks a lot!!!
Balint