Is there anyone anywhere who has some detailed information on how the CataRT descriptors work exactly?
What is happening during analysis based on CentroidMean, AC1Mean, KurtosisMean, SkewnessMean, etc?
Duration Loudness, Frequency and Energy are more intuitive but it would still be helpful to read a more detailed explanation of each descriptor - does this exist anywhere?
The documentation is quite minimal:
Following the trail of links from this page,
this paper is described as the “most complete article, with architecture and applications”:
Principles and Applications of Interactive Corpus-Based Concatenative Synthesis
It has a list of the descriptors on pg. 4 which references a paper by Julien Bloit - Analyse temps réel de la voix pour le contrôle de synthèse audio but this isn’t available publicly anywhere
The only link I could find was on this page,
It directs to here, but the Online version URL link is broken…
The appendix of this master’s thesis has some more general information on descriptors used in concatenative synthesis but its not based on the Ircam implementation
Are the descriptors used in the CataRT-mubu max package detailed anywhere?