< Back to IRCAM Forum

Transposition using command line

Hello Antoine

after the long mail of Fredric you have most of the information already, so just a few comments:

the supervp command line you learned at IRCAM is not for an old version. It is just using different options. The optimal windowsize depends on the sound materiel, I can only repeat the suggestion of Frederic to try out different commands in AS and use those settings you are happy with. Because there was the question remaining about -Afft 70. This means use an autoregressive model (lpc) for envelope estimation and use ar model order 70. In nearly all cases you get better results with -Atenv instead of lpc for envelope estimation. You have to select the maximum fundamental frequency then as order parameter, but you can do this in AS as well (process file dialog).

-FcombineMul affects filtering operations only. It’s a noop if there are no filter operations active.

As Frederic said: please be sure to activate -shape 1 only for monophonic speech and singing voice.

Now to your latest questions:

.I never obtained a command line with an understandable size of the window for example…
-M0.0227272734045982s means window size is 0.0227272734045982 seconds. Why is this not understandable?

  1. It looks like I cannot use a fundamental analysis to improve the result of the transposition, I get an error (42) (cannot find F0 data)
    sure you can, but you need to set the correct path. It is difficult to answer here, as I don’t see the command you run
  1. If I edit manually the command line from the -transke factor, how will I calculate the -D option? Because obviously, transposing the sound up or down chromatically implies stretching the sound accordingly…

Your statement “chromatically implies stretching the sound accordingly…” is not correct. -transke automatically compensates the stretching it implies and so it does not affect the duration of the sound. So if you don’t want to time stretch you don’t need to change -D1 at all. (For your information: The only transposition that does not compensate the stretching is -transnc (which means -trans with no time correction).

  1. Is it Frederic who answers me because I wrote in English? I am asking because perhaps Axel would know things that Frederic does not…and I can ask my questions in french as well…

No I prefer English. But I simply had no time reading/answering and Frederic was so kind to help out.

Best
Axel

Sorry this is misguiding:

In nearly all cases you get better results with -Atenv instead of lpc for envelope estimation.

It should be

In nearly all cases you get better results with tenv instead of lpc for envelope estimation.

If maximum f0 for your sound is 200Hz then you select tenv by means of -Afft 200Hz
So the change is only visible in the order selection.

Axel

Hello,
Thank you Axel for your explanations.
I was able to do what I wanted…except that the results are a bit surprising so I tried to explain these results…please confirm.

  1. I produced a low A (55Hz) using frequency modulation in max msp and wrote that sound to disk.
  2. I then transpose the sound up chromatically up to 3200Hz. Up to around 440Hz the transposition works well. At around 880Hz I do not hear a A anymore, I hear an A and then where the sound is loudest I hear a Bflat then back to A. Why is that? Is it because of how the ear perceives things or is it that audiosculpt cannot transpose that much?
  3. At around 1600Hz and up I hear only silence plus a low noise. I do not have an explanation for that. Perhaps the harmonic partials are too high to be treated…It takes about 50 seconds for my computer to transpose that high my 5 seconds sound.
  4. for almost all my transpositions, I need to normalize the output sound, otherwise I get clicks in the sound.
  5. I wish the output sounds’ name could be labeled automatically because naming manually 100 files in the command line is annoying (but I agree that inventing and implementing audiosculpt must be quite a bit of work also, so I should not complain ;-))

Regards,

Antoine Escudier

Hello Antoine,

Unfortunately I cannot tell you much as I don’t have the input sound you used and don’t know the command lines. I would guess that the sound has rather low bandwidth, and for these cases envelope preservation is not so appropriate. The extreme is a stationary sinusoid. Preserving the envelope of a stationary sinusoid that you transpose up will lead to silence because the envelope of a sinusoid has very small amplitude if you transpose strongly.

This however would not go well with your remarque that you need to normalise to avoid clics (clipping)? Because what I mentioned before should lead to decreasing amplitude and not to clipping. Best would be if you put your input sound somewhere to download and send here the command line that you use for example to achieve 880Hz.

  1. I understand you use the command line, there are numerous scripting languages you could use to automatically generate command lines and output file names. In a shell you could use

transp=100
while [ $transp -lt 2000 ] ; do
supervp … -S infile.wav … -transke $transp ./outdir/outfilename_$transp.wav
transp=$[$transp+100]
done

which will generate all commands and name all files reasonable. You could also use “bc” (see man bc) to calculate output
pitch and name the files according to pitch. If you want note names then you are better off using a more advanced language
as python, perl, ruby to generate names and command lines.

Best
Axel

Hello,

Attached are the sound file and the command lines.

Regards,

Antoine Escudier

FM01.txt (8.87 KB)

The sound file was not attached. I suspect the forum site prevents such attachments. Could you put it into a zip file and attach that instead?

Hello,

Here is the zipped sound file again.

Regards,

Antoine Escudier

FM01_055.aif_.zip (640 KB)

Hello Antoine,

to understand the results you have obtained you can make use of AudioSculpt to see what the different options on your command line imply.
First you have -Afft 100. This means you estimate the spectral envelope by means of an lpc with order 100. You can load the input sound into AS and make spectrogramm analysis with lpc and order 100 selecting windowsize to be 4000. This shows you the envelop that will be preserved while transposing pitch.

You can see first that your sound does not have a lot of headroom. This is the reason why transposition with envelope preservation risks to produce clipping. When you transpose with envelope preservation (-transke) you better ensure your input sound to have a bit more headroom, otherwise when sinusoids are relocated with respect to the envelope formants you may increase the sound amplitude which will then lead to clipping.

You may notice that the envelope has a few peaks between 0 and 600Hz and attenuates rather quickly above 600Hz. This means that whenever you transpose above 600Hz the transposed sound will change its timbre significantly, in fact due to the strong decrease of the envelope and due to the fact that for pitch above 600Hz the second partial is much higher than the first, you will basically generate a modulated sinusoid for these cases. For even high pitches the fundamental becomes small compare to the noise around dc that is transposed into the main resonances of the envelope. Then for transposition that change pitch to intermediate values the alignment of sinusoids and the formant structure (the peaks in the envelope) can create amplifications that lead to clipping.

So the -norm flag is essential here.

  1. I then transpose the sound up chromatically up to 3200Hz. Up to around 440Hz the transposition works well. At around 880Hz I do not hear a A anymore, I hear an >A and then where the sound is loudest I hear a Bflat then back to A. Why is that? Is it because of how the ear perceives things or is it that audiosculpt cannot >transpose that much?

I don’t have this. I tried transposing by 4800cents using your command line and I get a sound that has 880Hz all the time. At around 2s there is a small frequency modulation which is most likely due to the fact that the fundamental changes sign at that time, which then transforms into a frequency modulation in the phase vocoder representation. You see as well that the noise around dc in the original sound starts to dominate the transposed sound because the fundamental is attenuated due to the envelope while the noise around dc still is located in the strong part of the envelope.

A question that you can only answer yourself is whether you really want to preserve the envelope (which will lead to having single sinusoids and later only the dc noise for the strong transpositions). Alternatively you may select to transpose pitch and envelope together (your -trans instead of -transke). As I don’t know what you want to achieve I can not tell whether this is more interesting, but for this kind of synthetic sounds and especially for transpositions well above one octave preserving the envelope is not necessarily the best option, because the preservation of the envelope leads to dramatic changes of the timbre.

I hope I have been able to explain all effects that you observe.
If not let me know what is not clear.

Best
Axel

Hello,
First let me tell you that I am flattered and embarrassed to have such good scientists helping me…
From what you wrote, I came up with a different command line that makes error…see attachment
I am also attaching the sound I got which goes up to Bflat and back. Perhaps a problem due to time machine that went on on my Mac or perhaps I call Bflat what you call modulation…
What I am trying to achieve is the same sound transposed on every degree of a chromatic scale. If you give me the command line, that might save you time and perhaps I will understand on my own. I made some efforts though, trying supervp -h supervp -hi and so on…I understand some things but not everything.
Best regards,
Antoine Escudier

error.tiff (127 KB)

I think the problem is than -Atenv will ouput the envelope, which is not what you want. If you go back to Axel’s correction of his post above, he wrote:

If maximum f0 for your sound is 200Hz then you select tenv by means of -Afft 200Hz

So you might want to try something like “-Afft 55Hz”. I’m likely to get this wrong, so you’d better look at the help for analysis options. There might also be an “advanced” option related to “true envelope” or something like that somewhere in the analysis panel of AudioSculpt, which would give you the proper syntax

Hi Frederic

the -Afft 55Hz is perfect it true envelope analysis is desried. But here in Antoines last trials there is no -transke so he transposes rhe envelopeand all these envelope parameter don’t have any impact.

I think the command line should be

supervp -trans 6000 -Z -Afft -S FM02_055.aiff -Wblackman -N4096 -M4000 -norm ./FM02_6000.aiff

Best
Axel

Hello,
Sorry to disturb you again. I tried to solve the below example myself but was not successful.
the command line:
supervp -t -Z -S…-trans -100 …
works.
However if you do:
transp=-100
supervp -t -Z -S…-trans $transp …
it does not work…
The help reference did not help me…I tried various things…no success.
Thanks again.
Regards,
Antoine Escudier

Hello,

However if you do:…
In fact, if I do this it works - I have tried :slight_smile:

What does it mean if you say the command does not work. It quits with an error, or the transposition is not performed?
Can you copy paste the list of commands you type after transp=-100 together with the error message you get?
If there is no error message what is the problem?

Best
Axel

Hello Axel,

You a right, if you do transp=-100 it works…
If you do transp=-0.01 it works also but then you have the error when transp=-.02
But if you do transp=-.01 it does not work. I guess this is because period has a special meaning under unix.
Below is the beginning of my script, you are welcome to try it. (And thank you for telling me about bc):
transp=-.01
while [ $(echo “$transp>=-0.03” | bc -l) -ne 0 ] ; do
supervp -t -Z -S"/Users/Ant/Music/AudioSculpt/Sounds/pizz_mi_vln01_net.aif" -Afft -Np0 -M0.0909091010689735s -oversamp 8 -Wblackman -norm -P0 -td_ampfac 1 -FCombineMax -shape 1 -trans $transp “pizz_mi_vln01_0.$transp.aif”
transp=$(echo “$transp-0.01” | bc -l)
echo $transp
done
Below is the error after doing the loop once if you initialize transp=-0.01:
-.02
Initializing…
error opening bpfile /Users/Ant/Music/AudioSculpt/Parameters/-.02 for reading

Regards,

Antoine Escudier

Hello Antoine,

indeed -.01 does not work. With -trans you can give either constants (as numbers) or filenames that are supposed to contain time varying
transposition. The filename variant is selected if the neither of the first two characters is a number. You can add the leading zero for the bc output as follows

transp=-0.01;
while [ $(echo “$transp>=-0.03” | bc -l) -ne 0 ] ; do
supervp…
transp=$(echo “$transp - 0.01” | bc -l | sed ‘s/^-./-0./’);
echo “$transp”;
done

Best
Axel

Hello Axel,

It is working for me now so I was able to try several things. However I am surprised by the results:

  1. transposing my frequency modulation sound (the one I attached to this thread some time ago) with fundamental frequency at 55hz by 0.01 cent and playing simultaneously the original sound and the transposed sound, I do not hear any significant effect. Going up by 0.01 up to 1 cent, nothing significant can be heard. Then going up by 1 cent, I begin to hear interesting effects around 10 cents and up. Remembering my courses 20 years ago, I expected to hear first chorusing, then phasing then beating. I have no idea what effect I am indeed hearing when playing the non transposed sound with the transposed sound 20 cents up (sounds like beating)…Hence my question, is there an accurate formula for obtaining by how much a sound must be transposed in order to get chorusing, phasing or beating ?
  2. Since AS allows to do pitch shifting and considering that pitch shifting is a transposition without keeping the frequency ratio between partials, I thought that by pitch shifting significantly a harmonic sound, I would get an inharmonic sound. I also thought that by pitch shifting a small amount and playing together the original sound and the pitch shifted sound, I could get interesting effects. So far, no luck…Can you confirm if my reasoning is correct for both assertions? If yes then I will just try again and again…
  3. May I hope to have some answers to further questions during the month of july?

Regards,

Antoine Escudier

Hi Antoine,

  1. Flanging and chorus effects are generally realised with time varying delay lines. You find a very good explanation about these effects in Julius O. Smith’s online book here:

https://ccrma.stanford.edu/~jos/pasp/Time_Varying_Delay_Effects.html

The central point is that delays create a comb filter effect. The difference between delay lines and transposition is the fact that with transposition the different sinusoids are displaced and can therefore not cancel each other completely, moreover the frequency displacement will increase with the frequency of the sinusoids and the phase displacement is time varying differently for different frequencies - all this leads to the fact that the regular structure of the comb filter effect cannot be created with the standard AS transposition.

You can get the effect you need in AS very easily by means of doing transposition (without envelope preservation) when you deactivate the time compensation in the transposition dialog. Deactivating this compensation produces the historical effect of playing tapes faster or slower) and is therefore exactly what you need. You can then modify the transposition over time by means of the bpf editor. I never tried this but you should be able to generate the flanging effect without problems. For chorusing you would simply superimpose more than two transformed signals using different transposition functions.

As a little note I’d like to mention that transposition of an 100Hz sinusoids using 0.01 cents produces a frequency difference of 0.000577Hz. that means the
period of the beating would be nearly 30min, for another partial at 100Hz t period would become 3min.

  1. The term pitch shifting is unfortunate because misleading. I don’t think we use it in AS. We have named the transformation “transposition” and this transposition preserves frequency rations between all sinusoids. On the other hand we have as well a frequency shifting operation (similar to what may elsewhere be called ring modulation) you find thus effect in the same transformation menu of AS. This effect shifts the spectrum and therefore it creates inharmonicity.

  2. I will be on holiday in july, but certainly read my mail regularly and will try to respond.

Hi Antoine,

  1. Flanging and chorus effects are generally realised with time varying delay lines. You find a very good explanation about these effects in Julius O. Smith’s online book here:

https://ccrma.stanford.edu/~jos/pasp/Time_Varying_Delay_Effects.html

The central point is that delays create a comb filter effect. The difference between delay lines and transposition is the fact that with transposition the different sinusoids are displaced and can therefore not cancel each other completely, moreover the frequency displacement will increase with the frequency of the sinusoids and the phase displacement is time varying differently for different frequencies - all this leads to the fact that the regular structure of the comb filter effect cannot be created with the standard AS transposition.

You can get the effect you need in AS very easily by means of doing transposition (without envelope preservation) when you deactivate the time compensation in the transposition dialog. Deactivating this compensation produces the historical effect of playing tapes faster or slower) and is therefore exactly what you need. You can then modify the transposition over time by means of the bpf editor. I never tried this but you should be able to generate the flanging effect without problems. For chorusing you would simply superimpose more than two transformed signals using different transposition functions.

As a little note I’d like to mention that transposition of an 100Hz sinusoids using 0.01 cents produces a frequency difference of 0.000577Hz. that means the
period of the beating would be nearly 30min, for another partial at 100Hz t period would become 3min.

  1. The term pitch shifting is unfortunate because misleading. I don’t think we use it in AS. We have named the transformation “transposition” and this transposition preserves frequency rations between all sinusoids. On the other hand we have as well a frequency shifting operation (similar to what may elsewhere be called ring modulation) you find thus effect in the same transformation menu of AS. This effect shifts the spectrum and therefore it creates inharmonicity.

  2. I will be on holiday in july, but certainly read my mail regularly and will try to respond.

Best
Axel

Hello Axel,

Thank you very much for the Julius O.Smith’s online book!
Thank you also for explaining that it makes no sense to transpose by 0.01 but since I do not know how to go from Hz to Cents and how to infer the period of beating, I just try anything as many other musicians probably do…Is there somewhere a summary of useful simple calculations that I could do without asking you?
2)
I apologize I did mean frequency shifting when I used the term pitch shifting

Thank you a thousand times for taking the time of answering me.

Best regards,

Antoine Escudier

Hi Antoine,

the cents measure is specifying a ratio in terms of half tone steps times 100.
The formular to go from cents to ratios is

rat = 2^(cents/1200)

so for 1200 cents you find

rat = 2^(1200/1200) = 2^1 = 2

that means 1200 cents == 1 octave.

Now to get the beating. Beating is due to overlapping sinusoids cancel or enforce each other due to changes in phase relations. If you observe this in AS then the critical factor is the window size. The smaller the window the larger is the distance of two sinusoids that create beating.
The period of a phase relation cycle for two sinusoids is given by the frequency distance. So if you start with a sinusoid with F=100Hz
transpose by 100cent and listen to the original mixed with the transposed then frequency difference will be about 6Hz and so the beating frequency will be this frequency.

Whether you hear two sinusoids beating or does obviously not depend on the windowsize you use to analyze the sounds in AS. It depends on the sound perception in the ear. As you may know you can express two sinusoids equivalently as 1 amplitude modulated sinusoid or two stationary sinusoids. Which are the frequency ranges that lead to the perception of beating is a psycho acoustical question that is outside of my scope. I am pretty certain that for frequency difference above 50Hz you don’t hear beating but two sinusoids, and on the other end below 12Hz you will hear beating (amplitude tremolo). I am not sure about which processes in the ear create these limits. May be the critical bands are involved?

Best
Axel