Detecting repetition in audio other signals

jamesstaub · February 7, 2017, 11:04pm

Hi folks

I was hoping to get some guidance on how to go about detecting repetition of a phrase or gesture in a single recording using MUBU.

I want to record some audio, segment and analyze it with either MFCC or YIN, then be able to find series of segments that are the most similar. So if I play the same phrase on an instrument 3 times in a given recording, I want to detect that repetition occurred, and which segments are contained in each iteration of the phrase.

Is this something autocorrelation could be used for?

One approach might be looping over the segments and using KNN to find adjacent segments that have adjacent KNN results, although it doesn’t seem like you can exclude segments from KNN so a given segment always finds itself as the nearest neighbor in a given buffer (please correct me if I’m wrong on that!).

A friend told me to look into dynamic time warping or recurrence matrixes, but I’m not sure how to go about that in max.

Any input would be appreciated!

James

tothesun · April 25, 2017, 11:00pm

This is exactly what I’m trying to do as well. I believe an LSTM-type recurrence model is what you’d want to utilize rather than DTW if you’re trying to identify patterns in audio. Although dynamic time warping could be better for gestures, if you want the gesture executed slowly to be equivalent to a fast execution.

I haven’t actually messed around with this external in a long time, but as I remember it the issue was that it was centered around supervised learning (requiring you to provide examples of the pattern ahead of time) rather than unsupervised (which might find patterns regardless of what form they take).

jamesstaub · April 26, 2017, 2:35am

Thanks for the reply, I saw your other post about LSTM with MUBU, I’m assuming there’s no way of doing that in max currently.

From this paper it seems like a self similarity matrix could also be used to identify repeated phrases, but I’m not sure where to start with implementing that in max. It probably makes sense to use the python max object or create a python OSC server and send data back and fourth that way.

Have you looked at the catoracle/pyoracle patches provided in the MUBU examples at all? That uses the [py] object to interface a python script with MUBU. I don’t fully understand what the oracle algorithm is doing, but it appears to have some means of tracking similarity between phrases and building new phrases. The oracle playback seems a little buggy in that patch so I havent’ got it totally working yet.

Anyway, I’d learn to learn more about what you’re doing and the approaches you’re taking.

tothesun · May 18, 2017, 10:16pm

I just figured out that my spam filter was deleting my notifications from this forum. Lame.

That’s an interesting paper. The self similarity matrices sound like just an extensive amount of fairly simple math. It could probably be done in Max but a language like Python would be more efficient at it. I suppose the idea would be that you’d have to compare every “frame” in your recording to every other “frame” and see which are most similar. Not sure how important the dynamic time warping would be to getting good results though and they don’t go into how to implement that. I’m certainly no expert but it sounds kind of like the longhand way to go about something that machine learning would “do for you” with the right algorithm.

What I’m trying to do needs to be done in real time so I don’t think those methods would work for me. I want the computer to analyze everything you play in a free-form jam and provide a similarity percentage compared to what it thought might come next. I’ve been thinking about augmenting a project of mine with this functionality for a long time, but I haven’t been able to get my head around a way to do it yet. I will probably have to jump into a Python library like TensorFlow and create my custom network from the ground up. This guy’s project would be fairly similar to what I’m trying to do, except with raw audio instead of MIDI:

I’m not quite sure what those example patches are doing either, but it doesn’t help that I can’t try them out. The newest version of mubu doesn’t seem to be working for me.