< Back to IRCAM Forum

CataRT Descriptor Explanations

Is there anyone anywhere who has some detailed information on how the CataRT descriptors work exactly?

What is happening during analysis based on CentroidMean, AC1Mean, KurtosisMean, SkewnessMean, etc?

Duration Loudness, Frequency and Energy are more intuitive but it would still be helpful to read a more detailed explanation of each descriptor - does this exist anywhere?

The documentation is quite minimal:

Following the trail of links from this page,

this paper is described as the “most complete article, with architecture and applications”:
Principles and Applications of Interactive Corpus-Based Concatenative Synthesis

It has a list of the descriptors on pg. 4 which references a paper by Julien Bloit - Analyse temps réel de la voix pour le contrôle de synthèse audio but this isn’t available publicly anywhere

The only link I could find was on this page,

It directs to here, but the Online version URL link is broken…

The appendix of this master’s thesis has some more general information on descriptors used in concatenative synthesis but its not based on the Ircam implementation

Are the descriptors used in the CataRT-mubu max package detailed anywhere?

Hi srs,
centroid, spread, skewness, kurtosis are the first 4 statistical moments on the spectrum.
The canonical reference is Peeters 2004 tech. report.
The real thing is the code in the pipo and rta-lib github projects.
HTH, Diemo

Great, thanks Diemo. This is is really helpful!

For anyone else looking for these, here are the direct links:
G. Peeters. A Large Set of Audio Features for Sound Description … (similarity and classification ) in the CUIDADO project. Technical Report version 1.0 Ircam – Centre Pompidou, Paris, France, April 2004