< Back to IRCAM Forum

KNN finding segment from same sound w mubu.track

Hello,
I’d like to do offline concat/mosaicing. As a sanity check I decided to just use mubu.track to get the actual matrix columns and do a search with KNN to find the corresponding segment in the same track. Since I’m inputing a list into KNN that has an exact match, I figured it should be able to find the corresponding segment easily. What I’ve found is that KNN is unable to find the correct segment, and also if I search with only 2 matrix columns (for example frequency mean and energy mean) it performs significantly better (but still doesn’t usually find the right segment). Usually the distance when using more than 2 columns is around 20.

My question is: surely I’m doing something horribly horribly wrong, right?
I have some understanding of how KNN should work, I think, but maybe I’m missing something crucial?

Here’s my patch (I labeled everything important in pink and hid the other stuff away so it’s easy to reproduce my problem, and put comments with numbers to indicate the order to do things to reproduce my problem). https://pastebin.com/26CAhL6j

Thank you so much for your time.
All the best,
Greg

Edit: I looked into a bit more - https://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data

Seems like knn loses effectiveness in more dimensions, I’m guessing the knn object is meant for real time so it is less focused on finding the best match and is instead trying to find the fastest “good enough” match. What are my options for trying to do mosaicing offline as accurately as possible? I suppose I might just work out a brute force method since I don’t mind waiting super long for it to be done, I just want it to sound cool. If anyone has any advice or knows of anything that does this, or anything at all, please let me know.

Hi Greg,

My question is: surely I’m doing something horribly horribly wrong, right?

your hypothesis is correct =-)

I have some understanding of how KNN should work, I think, but maybe I’m missing something crucial?

for your target list, you’re slicing away the first column with Duration, but with list input, knn expects a list to compare with the data rows element by element.
instead, set the first weight to 0 to ignore the Duration column.
If you want to use a subset of columns for matching, have a look at the columns and select messages.

Edit: I looked into a bit more - https://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data

Seems like knn loses effectiveness in more dimensions, I’m guessing the knn object is meant for real time so it is less focused on finding the best match and is instead trying to find the fastest “good enough” match. What are my options for trying to do mosaicing offline as accurately as possible? I suppose I might just work out a brute force method since I don’t mind waiting super long for it to be done, I just want it to sound cool. If anyone has any advice or knows of anything that does this, or anything at all, please let me know.

This is irrelevant here. It is only about computational efficiency. The knn search is always exhaustive and finds the k best matches.

Best, and let us hear when it sounds cool =-)

Hahah yes, that took me all day to realize what I was doing wrong unfortunately. Thank you so much for responding. I actually wrote my own nearest neighbor search before i realized i had stripped the duration hahah i dont know why but i think i initially thought that was the time column i was slicing off and not the duration. Derp.

I did get it working in the end the way I was imagining, but it still didn’t sound that great haha. I’ll keep hacking at it and post what I come up with. :slight_smile:

Hey schwarz,
So I got a version up and working but it’s… well… it has a lot of room for improvement lets say hahah. This does what I had originally intended and gives some interesting sounds but I think I’m gonna dig into mubu process and try some other ways of slicing the data up, the ircam descriptors? Do you have any recommendations? Have you tried this before?

https://pastebin.com/N637xGDz

Thanks again for your help, I’m still embarrassed I didn’t catch that haha.
Greg