< Back to IRCAM Forum

A bunch of questions regarding mubu

Dear mubu developers and users,

As the title says, I’m stuck with a couple of questions regarding mubu/pipo, and I would be very grateful for any hints! Here we go:

. I need to segment an audio buffer with onseg, and then apply descr or ircamdescriptors on the resulting segments, but it does not seem to work. Is there a way to apply the pipo modules “descr” and/or “ircamdescriptors” with a custom segmentation, i.e. not using the standard segmentation (defined with winsize and hopsize) of the module? The “catoracle” example provides an analysis procedure that looks similar (I replaced “basic” by “descr” in the syntax):
"mubu.process #1 audio descr:onseg @name descr @process 0 @prepad 0 @priority 2 @progressoutput input @timetagged 1 @descr.winsize 2048 @descr.hopsize 512 @descr.minfreq 24 …
=> but what happens actually is exactly the opposite: first the analysis is being done on constant frames with descr, and then the data is being used to generate the onset markers. I need to do reverse-wise, meaning first segmenting (using a first quick fft analysis for instance), and then do a descriptor analysis exactly on the segments between onset markers. Possible?

. almost the same question with onseg and yin: I’d like to segment an audio buffer according to onsets (so using slice:fft:sum:scale:onseg) and then to analyze the resulting segments with yin (I don’t want to use the output of slice for yin). As far as I understand, it means first resegmenting the original audio buffer, not with slice (that does a regular slicing), but using the information provided by onseg. Is it possible?

. How to calculate not the total energy of a segment (as provided by Ircamdescriptors in “TotalEnergy”) but the average power, i.e. the total energy divided by the number of samples in a segment?

. What means exactly the descriptor “loudness” in the pipo module descr? Is it the average power expressed in dBFS? The total energy? Or is it a more accurate loudness estimation (for instance with K-weighting), that takes into account the segment duration?

. mubu.knn: is there a way to select a subset for the unit selection? For instance, all selected segments in imubu/scatterplot, or all segments within the time selection in imubu/audio view?

. No way to switch the playback direction (i.e. to play backwards) in mubu.concat~ and mubu.granular~ ?

Many thanks in advance for your help,

Alexis

Hi Alexis,

the intention behind your first question seems to be quite similar to mine in Column write/add operations on track matrices. My approach, however, is different. Could it be interesting for you, too?

Regarding the segmentation algorithms which have been implemented in MuBu refer to the tutorial patch MuBu-howto-segmentation. There’s also an example for yin-based segmentation. By the way, the tutorial patch also includes an example for selecting a segment by sample index. You could adapt this for segment selection based on information outside the markers track.

Limit processing to a segment: To my knowledge this is not possible. But you could achieve it by splitting buffers.

Average energy/loudness: onseg can write minimum, maximum and mean loudness of a segment and the standard deviation to the marker matrix. Some valuable examples in the help patches regarding the configuration are hidden in the locked state, take a look inside the hidden config sub-patches.

The meaning of “loudness”: An interesting question, indeed. Anyway, without knowing how it is done, the results of the computation seem reasonable to me. It depends on the content of segments and, more importantly, your musical approach, which one of maximum, mean loudness, standard deviation is the most meaningful indicator or if it is a relation of these, or if all of these are too rough, e.g. when you’re interested in the distribution of energy within a segment. So it can be useful to understand how “loudness” is computed.

Unit selection: Do you mean include/exclude buffers? Yes, this can be done in mubu.knn, see the reference for the attribute/message names.

Playback direction: Yes, it can be reversed, see the attributes inspection in the mubu.concat~ help patch.

Sorry, I actually didn’t answer all questions. But maybe one or the other hint is helpful.

Hi Alexis,

thanks for these awesome questions! I’ll answer the simple ones first.

. No way to switch the playback direction (i.e. to play backwards) in mubu.concat~ and mubu.granular~ ?

of course! msg/attr @reverse

. What means exactly the descriptor “loudness” in the pipo module descr? Is it the average power expressed in dBFS? The total energy? Or is it a more accurate loudness estimation (for instance with K-weighting), that takes into account the segment duration?

It is itur468-weighted signal power in dB, calculated as in pipo.onseg help.

. How to calculate not the total energy of a segment (as provided by Ircamdescriptors in “TotalEnergy”) but the average power, i.e. the total energy divided by the number of samples in a segment?

Look at the options of slice (can normalise the window) and fft (power mode).
To know the precise formulas, pipo modules are now open source at https://github.com/Ircam-RnD/pipo

. mubu.knn: is there a way to select a subset for the unit selection? For instance, all selected segments in imubu/scatterplot, or all segments within the time selection in imubu/audio view?

You can go the classic way of defining an “active” descriptor with a high weight.
Also look at the soundset selection that Aaron implemented in catart-mubu-live.
This is a nice tutorial idea for the upcoming catart-by-mubu github repository.

. I need to segment an audio buffer with onseg, and then apply descr or ircamdescriptors on the resulting segments, but it does not seem to work. Is there a way to apply the pipo modules “descr” and/or “ircamdescriptors” with a custom segmentation, i.e. not using the standard segmentation (defined with winsize and hopsize) of the module? The “catoracle” example provides an analysis procedure that looks similar (I replaced “basic” by “descr” in the syntax): “mubu.process #1 audio descr:onseg @name descr @process 0 @prepad 0 @priority 2 @progressoutput input @timetagged 1 @descr.winsize 2048 @descr.hopsize 512 @descr.minfreq 24 … => but what happens actually is exactly the opposite: first the analysis is being done on constant frames with descr, and then the data is being used to generate the onset markers. I need to do reverse-wise, meaning first segmenting (using a first quick fft analysis for instance), and then do a descriptor analysis exactly on the segments between onset markers. Possible?

This can be achieved using copy-paste of the segments to a new buffer and then descr. analysis without slice.
This will be more easily possible with pipo2 where we will separate segmentation from temporal modeling.
Also note that ircamdescriptor~ has an option to work on a whole buffer~ without windowing.

. almost the same question with onseg and yin: I’d like to segment an audio buffer according to onsets (so using slice:fft:sum:scale:onseg) and then to analyze the resulting segments with yin (I don’t want to use the output of slice for yin). As far as I understand, it means first resegmenting the original audio buffer, not with slice (that does a regular slicing), but using the information provided by onseg. Is it possible?

As above.

Hi Kyl and Diemo,

thanks a lot for the hints, this is already a lot of infos. The help patch “MuBu-howto-segmentation.maxpat” is quite helpful to understand the workflow and the syntax more in detail. So:

. for reverse playing with the @reverse attribute: great, I haven’t seen this one. It would be great if it’s documented (at least in the reference). BTW, what means @cyclic and @microtiming (I don’t hear any difference with/without)?

. great that pipo modules are now open source, this will help a lot to understand better what they exactly do

. for the subset selection with knn: Kyl, I don’t mean buffer selection, but segments for one buffer. Diemo, I’ll have a deeper look at catart-mubu-live to understand how it works, thanks for the tip

. about the complex segmentation/processing question: Diemo, the idea with copy/paste is interesting but seems quite complex to implement. Let’s say, I have 300 segments in a buffer based on onset markers. You mean, I have to do as follows:
1/ create a new buffer that will contain the table of results for ircamdescriptors per segment
2/ loop into all segments and for each iteration:
. copy the segment n into a new buffer
. process the buffer with ircamdescriptor
. copy the output of ircamdescriptor into the nth line of the table
. delete the newly created buffer

=> is it correct?

Alexis

. for reverse playing with the @reverse attribute: great, I haven’t seen this one. It would be great if it’s documented (at least in the reference).

will do, but look at the long attr. names in the inspector, too

BTW, what means @cyclic and @microtiming (I don’t hear any difference with/without)?

cyclic audio (wraparound at buffer end)
sub-sampling, audible for short periods, like gabor interp

. about the complex segmentation/processing question: Diemo, the idea with copy/paste is interesting but seems quite complex to implement.

luckily, Joseph has done the “copy segments into new mubu buffer” part already in mubu.stats.marker.track, although it doesn’t use the more efficient copy/paste functions yet

Hi Diemo,

BTW, what means @cyclic and @microtiming (I don’t hear any difference with/without)?

cyclic audio (wraparound at buffer end)

=> OK

sub-sampling, audible for short periods, like gabor interp

=> sorry, I don’t get it. You mean FIR-interpolation? I tried with a massive slow-down of the audio (“resampling -2400”), I still hear mirroring/interpolation errors with @microtiming 1. Can you provide a concrete example?

. about the complex segmentation/processing question: Diemo, the idea with copy/paste is interesting but seems quite complex to implement.

luckily, Joseph has done the “copy segments into new mubu buffer” part already in mubu.stats.marker.track, although it doesn’t use the more efficient copy/paste functions yet

Yes, this is what I mean by “quite complex”…:slight_smile:

As far as I can judge, the copy/paste operation is being done sample per sample, this won’t work for big files. But you gave me the hint with the info about the copy/paste commands, which I did not know. Thanks!

Alexis

Hi Diemo,

Thanks for pointing out the copy-paste functions. I did a messy segmentation–>feature-stats-per-segment patch using [@progressoutput input] feature, but the mubu.stats.marker.track implementation is way more tidy. In my case, I needed to run the feature-stats calculation on a big audio corpus (of several GBs), so loading all audio files at once was not possible. Hence, the patch goes through bunch of files and generate feature-stats per segment.

Please keep us in the loop about the new copy-paste functions, I am quite interested.

Thank you,

@microtiming means sub-sample precise paste position into output buffer in granular synthesis, audible for short @period / high frequencies with @play 1 like in gabor interp

is that clearer?

Hi Diemo, yes so it’s much clearer.

Thanks,

Alexis