< Back to IRCAM Forum

Transposition using command line

Hello,
I would like to do numerous transpositions and thus put the command lines in a shell script and execute that shell script. However after executing a transposition and looking at the command line, I find this :
supervp -t -Z -S"/Users/Ant/Music/AudioSculpt/Sounds/lagrave-out.aif" -Afft -Np0 -M0.0227272734045982s -oversamp 8 -Wblackman -norm -P1 -td_thresh 1.39970004558563 -td_G 2.5 -td_band 0,22050 -td_nument 10 -td_minoff 0.02s -td_mina 9.99999974737875e-06 -td_minren 0 -td_evstre 1 -td_ampfac 1 -FCombineMul -shape 1 -trans “/Users/Ant/Music/AudioSculpt/Temp/transpfile1” -D1 “/Users/Ant/Music/AudioSculpt/Sounds/lagrave-out 3.aif”

I would like to be able to give an input sound and, after running my shell script, obtain the same sound transposed chromatically to every note in a multiple octave scale. I took the ircam training so I understand some of the command line. However, for a transposition up of 100 cents, I was expecting to find somewhere something like:
-transp 100
which I could have edited to become (for each successive command line):
-transp 200
-transp 300
-transp 400
etc…
but no luck, it is not so simple…
I thought of comparing two command lines, one a half step up (command line above) and one a whole step up (command line below), but that does not help much :
supervp -t -Z -S"/Users/Ant/Music/AudioSculpt/Sounds/lagrave-out.aif" -Afft -Np0 -M0.0227272734045982s -oversamp 8 -Wblackman -norm -P1 -td_thresh 1.39970004558563 -td_G 2.5 -td_band 0,22050 -td_nument 10 -td_minoff 0.02s -td_mina 9.99999974737875e-06 -td_minren 0 -td_evstre 1 -td_ampfac 1 -FCombineMul -shape 1 -trans “/Users/Ant/Music/AudioSculpt/Temp/transpfile2” -D1 “/Users/Ant/Music/AudioSculpt/Sounds/lagrave-out 4.aif”
Any help welcome.
Regards,
Antoine Escudier

Your idea of using -trans 100, -trans 200… should work. AudioSculpt lets you define transposition factors that change over time, this is why it uses a file. This is the general case, but when the transposition factor is constant you can specify it on the command line, instead of a file name.

Let me know if you run into problems

Frederic Cornu

Hello Frederic
Thank you for your answer but this is not answering my question at all. If you look at the 2 command line that I pasted in my previous message, there is no -trans…This is the problem.
Regards,
Antoine Escudier

Hello again Frederic
Sorry about my previous message. I just understood what you meant.
The strange thing in the command line that I obtained is that it does not look like the command line that I got when I did the training at Ircam:
supervp -t -Z -S"Voix-tibet1.aif" -Afft 70 -N4096 -M4000 -Wblackman -transke 500 -D3 “Voix-tibet1_500.aif”
Regards,
Antoine Escudier

AudioSculpt put in a lot of additional “advanced” options, if memory serves most come from the “apply transformation” panel. I don’t know whether these are useful for what you’re trying to do, but they are unrelated to the -trans issue.

Some of these differences are different ways of saying the same thing, for instance AudioSculpt specifies the window size (-M) in seconds, your training command line uses samples (though 4000 samples seems much bigger than 0.027s), and -Np0 expresses the fft size relative to the window size, while -N4096 sets the size in samples. -oversamp 8 sets the step size to one eight of the window size, I’m not sure what the default is there.

Some are related to additional features, for instance -P1 all these -td_… options configure transient detection and preservation and -shape enables “shape invariant” mode, which I think is most appropriate when transposing voice (you’d have to double check that, or just try it out). I think -norm will normalize the output so that the maximum amplitude is 1, but I could be wrong

I’m not sure what the -FCombineMul does, I thought it only applied with two input sounds, and I don’t remember what the 70 does in -Afft 70.

There are also definite differences, the main one being that -D3 in the training command line will stretch the sound to 3 times the duration of the input. I also tlso -transke will transpose the pitch but not the envelope, whereas -trans will affect both.

I think a reasonable thing to do would be to use AudioSculpt to try options, and when you’re happy with the results, go back to your original idea, copy the command line, and replace the file name for -trans (or -transke) with the constant factors in your script

If you need more information, I could try to find out more about these options next week, the limitation being that I’m no audio expert, just a developer, so most of the advanced options are Greek to me. I mean, I don’t even know what transients are, so I won’t be able to explain the -td_… options. I think the command line supervp has a -h option that will give you more details, hopefully that can help

Hello Frederic,
Thank you for your long answer. I just spent one hour trying various settings and looking at the resulting commande lines…You are right the -tdxxx have to do with transient preservation etc…But there are some options that I do not understand…Perhaps the command line given to me at the training at Ircam comes from an older version of audio sculpt…I never obtained a command line with an understandable size of the window for example…
Anyway, I still run into other problems:

  1. It looks like I cannot use a fundamental analysis to improve the result of the transposition, I get an error (42) (cannot find F0 data)
  2. If I edit manually the command line from the -transke factor, how will I calculate the -D option? Because obviously, transposing the sound up or down chromatically implies stretching the sound accordingly…
  3. Is it Frederic who answers me because I wrote in English? I am asking because perhaps Axel would know things that Frederic does not…and I can ask my questions in french as well…
    Best regards,
    Antoine Escudier

Hello Antoine

after the long mail of Fredric you have most of the information already, so just a few comments:

the supervp command line you learned at IRCAM is not for an old version. It is just using different options. The optimal windowsize depends on the sound materiel, I can only repeat the suggestion of Frederic to try out different commands in AS and use those settings you are happy with. Because there was the question remaining about -Afft 70. This means use an autoregressive model (lpc) for envelope estimation and use ar model order 70. In nearly all cases you get better results with -Atenv instead of lpc for envelope estimation. You have to select the maximum fundamental frequency then as order parameter, but you can do this in AS as well (process file dialog).

-FcombineMul affects filtering operations only. It’s a noop if there are no filter operations active.

As Frederic said: please be sure to activate -shape 1 only for monophonic speech and singing voice.

Now to your latest questions:

.I never obtained a command line with an understandable size of the window for example…
-M0.0227272734045982s means window size is 0.0227272734045982 seconds. Why is this not understandable?

  1. It looks like I cannot use a fundamental analysis to improve the result of the transposition, I get an error (42) (cannot find F0 data)
    sure you can, but you need to set the correct path. It is difficult to answer here, as I don’t see the command you run
  1. If I edit manually the command line from the -transke factor, how will I calculate the -D option? Because obviously, transposing the sound up or down chromatically implies stretching the sound accordingly…

Your statement “chromatically implies stretching the sound accordingly…” is not correct. -transke automatically compensates the stretching it implies and so it does not affect the duration of the sound. So if you don’t want to time stretch you don’t need to change -D1 at all. (For your information: The only transposition that does not compensate the stretching is -transnc (which means -trans with no time correction).

  1. Is it Frederic who answers me because I wrote in English? I am asking because perhaps Axel would know things that Frederic does not…and I can ask my questions in french as well…

No I prefer English. But I simply had no time reading/answering and Frederic was so kind to help out.

Best
Axel

Sorry this is misguiding:

In nearly all cases you get better results with -Atenv instead of lpc for envelope estimation.

It should be

In nearly all cases you get better results with tenv instead of lpc for envelope estimation.

If maximum f0 for your sound is 200Hz then you select tenv by means of -Afft 200Hz
So the change is only visible in the order selection.

Axel

Hello,
Thank you Axel for your explanations.
I was able to do what I wanted…except that the results are a bit surprising so I tried to explain these results…please confirm.

  1. I produced a low A (55Hz) using frequency modulation in max msp and wrote that sound to disk.
  2. I then transpose the sound up chromatically up to 3200Hz. Up to around 440Hz the transposition works well. At around 880Hz I do not hear a A anymore, I hear an A and then where the sound is loudest I hear a Bflat then back to A. Why is that? Is it because of how the ear perceives things or is it that audiosculpt cannot transpose that much?
  3. At around 1600Hz and up I hear only silence plus a low noise. I do not have an explanation for that. Perhaps the harmonic partials are too high to be treated…It takes about 50 seconds for my computer to transpose that high my 5 seconds sound.
  4. for almost all my transpositions, I need to normalize the output sound, otherwise I get clicks in the sound.
  5. I wish the output sounds’ name could be labeled automatically because naming manually 100 files in the command line is annoying (but I agree that inventing and implementing audiosculpt must be quite a bit of work also, so I should not complain ;-))

Regards,

Antoine Escudier

Hello Antoine,

Unfortunately I cannot tell you much as I don’t have the input sound you used and don’t know the command lines. I would guess that the sound has rather low bandwidth, and for these cases envelope preservation is not so appropriate. The extreme is a stationary sinusoid. Preserving the envelope of a stationary sinusoid that you transpose up will lead to silence because the envelope of a sinusoid has very small amplitude if you transpose strongly.

This however would not go well with your remarque that you need to normalise to avoid clics (clipping)? Because what I mentioned before should lead to decreasing amplitude and not to clipping. Best would be if you put your input sound somewhere to download and send here the command line that you use for example to achieve 880Hz.

  1. I understand you use the command line, there are numerous scripting languages you could use to automatically generate command lines and output file names. In a shell you could use

transp=100
while [ $transp -lt 2000 ] ; do
supervp … -S infile.wav … -transke $transp ./outdir/outfilename_$transp.wav
transp=$[$transp+100]
done

which will generate all commands and name all files reasonable. You could also use “bc” (see man bc) to calculate output
pitch and name the files according to pitch. If you want note names then you are better off using a more advanced language
as python, perl, ruby to generate names and command lines.

Best
Axel

Hello,

Attached are the sound file and the command lines.

Regards,

Antoine Escudier

FM01.txt (8.87 KB)

The sound file was not attached. I suspect the forum site prevents such attachments. Could you put it into a zip file and attach that instead?

Hello,

Here is the zipped sound file again.

Regards,

Antoine Escudier

FM01_055.aif_.zip (640 KB)

Hello Antoine,

to understand the results you have obtained you can make use of AudioSculpt to see what the different options on your command line imply.
First you have -Afft 100. This means you estimate the spectral envelope by means of an lpc with order 100. You can load the input sound into AS and make spectrogramm analysis with lpc and order 100 selecting windowsize to be 4000. This shows you the envelop that will be preserved while transposing pitch.

You can see first that your sound does not have a lot of headroom. This is the reason why transposition with envelope preservation risks to produce clipping. When you transpose with envelope preservation (-transke) you better ensure your input sound to have a bit more headroom, otherwise when sinusoids are relocated with respect to the envelope formants you may increase the sound amplitude which will then lead to clipping.

You may notice that the envelope has a few peaks between 0 and 600Hz and attenuates rather quickly above 600Hz. This means that whenever you transpose above 600Hz the transposed sound will change its timbre significantly, in fact due to the strong decrease of the envelope and due to the fact that for pitch above 600Hz the second partial is much higher than the first, you will basically generate a modulated sinusoid for these cases. For even high pitches the fundamental becomes small compare to the noise around dc that is transposed into the main resonances of the envelope. Then for transposition that change pitch to intermediate values the alignment of sinusoids and the formant structure (the peaks in the envelope) can create amplifications that lead to clipping.

So the -norm flag is essential here.

  1. I then transpose the sound up chromatically up to 3200Hz. Up to around 440Hz the transposition works well. At around 880Hz I do not hear a A anymore, I hear an >A and then where the sound is loudest I hear a Bflat then back to A. Why is that? Is it because of how the ear perceives things or is it that audiosculpt cannot >transpose that much?

I don’t have this. I tried transposing by 4800cents using your command line and I get a sound that has 880Hz all the time. At around 2s there is a small frequency modulation which is most likely due to the fact that the fundamental changes sign at that time, which then transforms into a frequency modulation in the phase vocoder representation. You see as well that the noise around dc in the original sound starts to dominate the transposed sound because the fundamental is attenuated due to the envelope while the noise around dc still is located in the strong part of the envelope.

A question that you can only answer yourself is whether you really want to preserve the envelope (which will lead to having single sinusoids and later only the dc noise for the strong transpositions). Alternatively you may select to transpose pitch and envelope together (your -trans instead of -transke). As I don’t know what you want to achieve I can not tell whether this is more interesting, but for this kind of synthetic sounds and especially for transpositions well above one octave preserving the envelope is not necessarily the best option, because the preservation of the envelope leads to dramatic changes of the timbre.

I hope I have been able to explain all effects that you observe.
If not let me know what is not clear.

Best
Axel

Hello,
First let me tell you that I am flattered and embarrassed to have such good scientists helping me…
From what you wrote, I came up with a different command line that makes error…see attachment
I am also attaching the sound I got which goes up to Bflat and back. Perhaps a problem due to time machine that went on on my Mac or perhaps I call Bflat what you call modulation…
What I am trying to achieve is the same sound transposed on every degree of a chromatic scale. If you give me the command line, that might save you time and perhaps I will understand on my own. I made some efforts though, trying supervp -h supervp -hi and so on…I understand some things but not everything.
Best regards,
Antoine Escudier

error.tiff (127 KB)

I think the problem is than -Atenv will ouput the envelope, which is not what you want. If you go back to Axel’s correction of his post above, he wrote:

If maximum f0 for your sound is 200Hz then you select tenv by means of -Afft 200Hz

So you might want to try something like “-Afft 55Hz”. I’m likely to get this wrong, so you’d better look at the help for analysis options. There might also be an “advanced” option related to “true envelope” or something like that somewhere in the analysis panel of AudioSculpt, which would give you the proper syntax

Hi Frederic

the -Afft 55Hz is perfect it true envelope analysis is desried. But here in Antoines last trials there is no -transke so he transposes rhe envelopeand all these envelope parameter don’t have any impact.

I think the command line should be

supervp -trans 6000 -Z -Afft -S FM02_055.aiff -Wblackman -N4096 -M4000 -norm ./FM02_6000.aiff

Best
Axel

Hello,
Sorry to disturb you again. I tried to solve the below example myself but was not successful.
the command line:
supervp -t -Z -S…-trans -100 …
works.
However if you do:
transp=-100
supervp -t -Z -S…-trans $transp …
it does not work…
The help reference did not help me…I tried various things…no success.
Thanks again.
Regards,
Antoine Escudier

Hello,

However if you do:…
In fact, if I do this it works - I have tried :slight_smile:

What does it mean if you say the command does not work. It quits with an error, or the transposition is not performed?
Can you copy paste the list of commands you type after transp=-100 together with the error message you get?
If there is no error message what is the problem?

Best
Axel

Hello Axel,

You a right, if you do transp=-100 it works…
If you do transp=-0.01 it works also but then you have the error when transp=-.02
But if you do transp=-.01 it does not work. I guess this is because period has a special meaning under unix.
Below is the beginning of my script, you are welcome to try it. (And thank you for telling me about bc):
transp=-.01
while [ $(echo “$transp>=-0.03” | bc -l) -ne 0 ] ; do
supervp -t -Z -S"/Users/Ant/Music/AudioSculpt/Sounds/pizz_mi_vln01_net.aif" -Afft -Np0 -M0.0909091010689735s -oversamp 8 -Wblackman -norm -P0 -td_ampfac 1 -FCombineMax -shape 1 -trans $transp “pizz_mi_vln01_0.$transp.aif”
transp=$(echo “$transp-0.01” | bc -l)
echo $transp
done
Below is the error after doing the loop once if you initialize transp=-0.01:
-.02
Initializing…
error opening bpfile /Users/Ant/Music/AudioSculpt/Parameters/-.02 for reading

Regards,

Antoine Escudier