Sound descriptors references?

Hi there,
I’d like to know if you got some link references to the sound descriptors used with usual mubu segmentation.

I understand these or not as:

  • frequency mean : pitch
  • energy mean : like a mean of all bark energy bins?
  • periodicity mean : ?
  • AC1 mean : ?
  • loudness mean : perceived “volume”
  • centroid mean : spectrum barycenter
  • spread mean : how the segment spread around the centroid (kind of noisiness)
  • skewness mean : kind of measure of the assymetry of spectrum around centroid
  • kurtosis mean : ?

If you can give some “perception or musical or more listener understandable” interpretations of skewness, kurtosis, periodicity and AC1, I’d be happy :slight_smile:

links allowed, we can read it :slight_smile:

I’ll put this in pipo.descr help, if it is clear enough:

The first 4 descriptors are from yin pitch tracking:

  • Frequency: the f0 or fundamental frequency
  • Energy: linear signal energy
  • Periodicity [0…1]: f0 quality factor, pitch-ness
  • AC1 [0…1]: first order autocorrelation coefficient, corresponds to spectral tilt, related to vocal or instrumental effort. Very effective e.g. for trumpet “brassyness”.

The next descriptors are derived from the signal spectrum:

  • Loudness [cB]: ITU-R 468 weighted FFT to respect the frequency-specific sensitivity of the human ear.
  • Centroid: brilliance (1st spectral moment)
  • Spread: width of spectrum (2nd spectral moment)
  • Skewness: imbalance around centroid (3rd spectral moment)
  • Kurtosis:skirt width (4th spectral moment)
Clear enough.
I don’t know how to use AC1 btw.

Sorry for having missed the pipo.descr help…

as always, many of these will be show fine nuances only when the big ones (loudness, pitch, brilliance, periodicity) are constant
for AC1, try shouting or trumpet from mf to fff: loudness saturates, but ac1 increases