< Back to IRCAM Forum

Understanding Database Type and Solution Display for feature extraction

I have been experimenting with MaxOrch for a few months now and enjoy it very much.
I am hoping to understand better how to use MaxOrch to help get features I am looking for from a given sound file.

I wanted to ask for further clarification/documentation or suggested reading to understand better the
‘database type’ field (1b on the interface) :
In other words, what are the differences between
‘mfcc vs moments vs spacenv vs specpeaks vs spectrum’ ?

Similarly,
in the ‘Solutions Corpus’ window, the grid is configurable according to ‘skewness vs. centroid, segnum vs solnum, or segnum vs centroid’

Is there any documentation or perhaps someone could suggest some reading so that I can understand these terms better (and how the results are organised)?

Thanks a lot in advance for any suggestions or clarifications,
J

Hello Jorge,

MaxOrch is built around the Orchidea package for Max, so we’ve applied the Orchidea team’s choices… We’re waiting for more documentation ourselves. The documentation included in the patch is what we can say about the parameters.

For 1b, no more information…

Carmine Cella is the right person for these questions. @carminecella

Best,

Jerome

Thank you for your reply Jerome, I will keep exploring and perhaps try to contact @carminecella if I don’t find anything.

All the best,
Jorge

I actually aksed Carmine about this in a conversation quite some time ago. I took notes as quickly as I could, but unfortunately I don’t think I was fast enough… Here are my notes:

to preserve the real pitches present in the target, use “spectrum”
to maximize the timbre similarity, use mfcc (Mel frequency cepstral coefficients)

You can use a set of different databases together, the only constraint is that each database should contain the same type of features (ex. MFCC).

Notes from convo w/ Carmine:
Overall the main variables are:

  • emphasis on pitch vs. timbre
  • number of dimensions examined (which affects speed vs. accuracy, I think)

About specific types:

  • spectrum (Fourier spectrum)
    best to preserve the real pitches present in the target

  • mfcc (Mel frequency cepstral coefficients)
    cepstral - describes the envelope, a smoothed over version of the peaks in the fft
    best to maximize the timbre similarity with the target

  • specpeaks
    just the peaks of the spectrum. focus on most significant peaks of spectrum
    fewer dimensions

  • logspec
    same as specpeaks, but log of the peaks
    between spectrum and mfcc

  • specenv - ?

  • moments
    looks at first 4 spectral descriptors of centroid (1. center of spectrum (brightness), 2. spread (bandwidth), 3. skewness, 4. kurtosis (how shifted to left or right))
    gives good descriptors for timbres vs. pitch. Good intermediate between spectrum and mfcc.
    also fewer dimensions (4), so faster

1 Like

wow, super. That is quite helpful in fact.
I have read about mfcc and what you have written here makes sense to me. It is also good to have confirmation that spectrum refers to Fourier.

This is a very good starting point.

My other issue concerns what I am seeing when I view the solutions in the Solutions Corpus window. I can hear as I audition solutions, that in some cases they are getting progressively ‘darker’. This is what I would expect if I have the window arranged according to ‘skewness vs. centroid’

but what do segnum vs solum and segum vs centroid refer to? is segnum = Seg Num like ‘segment number’ referring to the part of the original sound analysed? So, e.g. moving from left to right would move timewise across the sample?

perhaps I am totally off, but it would be helpful to know because while any window can be used to audition any sample by ear, it would be great to know which view to use and where to look to hunt down the type of result you are looking for (as you might do with other types of analysis).

Amazing work meanwhile!
Jorge

Sorry it’s taken me a few months to get to this - but yes:

  • segnum is segment number (which only makes sense in a dynamic orchestration, where the onsetthreshold is less than 1 and multiple segments are generated)
  • solnum is solution number (which only makes sense if the maxexport parameter is greater than 1 and multiple solutions are generated per segment)

When running a Static orchestration, it really only makes sense to look at skewness vs. centroid, since there’s only one segment.

When running a Dynamic orchestration, I only use segnum vs. solnum.

That 3rd option, segnum vs. centroid, is definitely only useful in a dynamic orchestration. In fact it might be the most useful one after a Dynamic orchestration with a maxexport of 1. It will show each segment across the grid, left to right, and give you the centroid info about each one.
It ends up being a bit chaotic if there are multiple segments AND multiple solutions.

Hope that helps!