Mubu.gmm "poly recognition"

alessandrobaril · January 17, 2016, 2:12pm

Hi everybody,

I’m trying to use mubu.gmm to recognize different kinds of strokes on a snare drum (the center of the head, the edge of the head and the rim of the snare drum) similarly to the example patch gmm_scratching that recognizes different kind of scratching (soft scratching, hard scratching and tapping).
The system works, but I’d like to find a way to make it “polyphonic”.
In the case of the scratching example it means having a system able to recognize the simultaneous presence of two types of scratching, like for example soft scratching with your left hand and tapping with your right hand at the same time.
Is there a way to achieve this result? Until now I used a standard dynamic microphone but in a few days I’ll start using a contact microphone that should be more suitable for the purpose.
Any suggestion will be appreciated. Thanks for your attention.

Best, Alessandro

francoise · January 17, 2016, 10:11pm

Hi Alessandro,

If I understand well, you have 3 classes (say, center/edge/rim) that you want to recognize from audio input, and you want to be able to identify when 2 classes are played at the same time with the left and right hand?

I can think of two things to improve this:

you can try increasing the “varianceoffset” attribute. It specifies that minimal variance of each Gaussian component in model, so increasing its value might increase the overlap between your different classes. You can
you can define new classes with examples of mixed strokes, for example center+edge, center+rim, and edge+rim.

Does any of these solutions work?

Best,
Jules

alessandrobaril · January 18, 2016, 2:06pm

Hi Jules,

Thanks a lot for your help. Today I’ll follow your advices and I’ll let you know how the system responds, but I’m not sure that defining new classes, which describes combination of sounds, will work for me.

Referring to the scratching examples, every kind of scratch is related to a determined resonant filter. Let’s say that soft scratching activate filter A and tapping activate filter B. Defining a new class, for describing soft scratching and tapping at the same time, could activate a filter C, but it won’t be as having A (only soft scratching sounds) + B (only tapping sounds).
Even if the event C would means activating both filters A and B, this filters will be applied to the entire signal, without the sounds/gesture discrimination.

In other words what I’m looking for is a way to recognize 3 different timbres, recorded in a previous phase, and been able to understand when there’s a combination of sounds, but treat them as independent events with different functions. I know it is a difficult thing to achieve… but not impossible, right?

Thanks again for your attention

Best, Alessandro

bevilacq · January 18, 2016, 2:55pm

Hi Alessandro

Reading your initial email,

“like for example soft scratching with your left hand and tapping with your right hand at the same time.”

it seems that the most practical way of achieving such a type of “polyphony” would be to use several contact microphones (for example one close to the left hand and one close to the right hand) and duplicate the patches. How many timbres (or playing techniques) would you like to superimpose ?

best
Fred

alessandrobaril · January 18, 2016, 7:36pm

Hi Fred,

Thanks for your reply. Since I’m going to use a contact microphone, and all the environment sounds should be avoided, I think that two timbres at the same time should be enough.

I also thought about duplicating the patch, but I don’t understand the advantage given by the use of two contact microphone.
Sorry maybe I’m missing something but it seems to me that even in the case of two contact microphones there is still to solve how to recognize a timbre A played alone or with another timbre B. For example the mic1 could be near the left hand that is playing timbre A, but timbre B, played by the right hand, will also reach mic1, because the surface is the same.

For what I have understand about mubu.gmm, you train some models and then the system compares the input signal to these models and choose from them the most similar to the input signal. It is always a comparison. Maybe the most helpful thing to do is a wise chose of the models? Like training a timbre A as 1st model, timbre A + white noise as 2nd model, timbre A + filtered white noise as 3rd model dcc., in order to recognize timbre A in every circumstance?

Best, Alessandro