Source filter synthesis in SuperVP

roebel · February 9, 2021, 12:57pm

Hello,

sorry I lost track of the question.

Does a smaller f0 mean greater frequency resolution in the processing?

Yes that is correct. Smaller F0 means you have more samples of the spectral envelope on the frequency axis and the frequency resolution the F0 and so smaller F0 means smaller step which is than higher resolution.

In fact this question is a bit confusing. You probably need to distinguish the F0 that is actually in the sound (so you cannot choose it anyway) and the F0 that you choose as a parameter to get a spectral envelope for a given sound. I will denote these as SndF0 and EnvF0.

So first the SndF0: the higher the pitch the lower the resolution. While you don’t hear this as a quality reduction (because you are used to it) in fact it is one and leads to problems in the perception. You probably know this effect from soprano singers for that you have difficulty to understand the text sung. A soprano sings so high that your perception does not manage to get the formant positions correctly which in urn hinders understanding.

Now the EnvF0. Here we try to find the formants the sampled filter envelope. We don’t do this for understanding (besides for example in text recognition) but for sound modification. If we use

EnvF0 == SndF0

The spectral envelope will gather all details that are available so you get best quality for all transformations. Let’s assume we want to do transposition. If we don’t transpose all errors will be compensate so it does not matter. If we transpose up the necessary resolution reduces so we have not lost anything and in a first approximation you don’t perceive problemes due to the sub sampling of the envelope. If you transpose down you would need more resolution to construct the features of a voice with the new pitch, you cannot get that and the sound will be perceived as strange. Most of the time this generates an effect that resembles as voice pronounced while pressing the nose with the fingers from both sides (you close the it). The more you transpose down from the original F0 the stronger will be the effect.

Now if you choose

EnvF0 > SndF0

Problems with the effect of the closed nose will start earlier and will be strong

if you want to be very clever and choose

EnvF0 < SndF0

So you want to extract more resolution than there is than you will get even more problems because the filter you estimate now encodes as well the partial positions (which it should not) and when you transpose (up or down) you will get additional timbre modulations depending on the transposition you choose.

Obviously, all these are only approximations, but it should be good to explaining the principle effects.

Best
Axel

srs · March 29, 2021, 3:25pm

Thanks for such a detailed clarification Axel, this is very helpful.

For the case of doing cross-synthesis using the supervp.sourcefilter (instead of transposition as in your example), would you recommend setting the maximal f0 for envelope estimation as the ‘fundamental frequency’ of the source and filter material?

And if either the source or filter material has melodic content so the fundamental frequency is constantly changing, is there an f0 that is ideal?

For example, when using a melodic solo guitar part as the filter for an orchestral texture source? Or is supervp.sourcefilter better suited to using tonal<>noise or noise<>tonal source/filter combinations?

roebel · May 21, 2021, 8:50pm

Hello @srs

For the case of doing cross-synthesis using the supervp.sourcefilter (instead of transposition as in your example), would you recommend setting the maximal f0 for envelope estimation as the ‘fundamental frequency’ of the source and filter material?

The envf0 flag always needs to be adapted to the sound for that you want to estimate the filter component of a source-filter model. When you work with the source filter model all sounds are seen a composed of source and filter. For your example you would normally first remove the filter component of the source sound to get only the source component of the source sound. For this you would use set the envF0 parameter to the F0 of the source sound. Then to extract the filter component of the filtering sound you would estimate the filter using the f0 of the filter sound as envF0 parameter. The for the filtering itself you don’t need any other parameter because the filtering works independently of any estimation.

Now to the question of the dynamic F0s. As explained above you loose details if your are to high and you risk artifacts that may become rather extreme if you are too low. So you would better stay at the upper end, I often position the envF0 parameter 30% below the maximum F0 because with 30% too low you generally don’t hear any artifacts.

For example, when using a melodic solo guitar part as the filter for an orchestral texture source? Or is supervp.sourcefilter better suited to using tonal<>noise or noise<>tonal source/filter combinations?

No, while the parameter settings are rather uncritical if you want to extract the filter component of a noise sound, with a bit more effort invested into parameter selection all these effects work exactly the same when working with tonal sounds. For noise sources it is better (mathematically speaking) to use the LPC estimator to extract the envelop (the filter component). For estimating envelopes of orchestral sounds you need to see whether there one instrument standing out (in which case true env would be better) or if the partials are rather dense (in which case you can favor LPC).

If you want to apply the guitar as filter to the orchestral sound (and you don"t want to remove the sound color of the orchestral sound you only need to consider envelope estimation for the guitar.

Best
Axel