< Back to IRCAM Forum

2 questions about transient detection

Hi all,

I have two questions regarding transient detection in Audiosculpt/supervp:

1/ I have a multichannel sound recording of a piano/cello/bass trio (spot microphones, main pair, flanks), and I’d like to time-stretch a few seconds of it, where only the piano is playing new notes (the strings are playing continuo). Such a time-stretching works only well if the transients of the piano are correctly detected. If I stretch only on the spot microphones of the piano, it works like a charm, but with all channels, it seems that not all transients are detected, probably since most of them are hidden by reverb and leakage in the other channels. My question is: is it possible to force supervp to detect the transients on one or two channels only and do not consider the other ones? As far as I understood, either it detects all the transients in all channels independently, or it tries to find transients which are common to all channels

2/ The sound quality of the transients is heavily dependent of the window size: if the window size is too long, transients are smoothed and loose all their precision. Question: is there a way to have a variable window size for the synthesis (small size on transients, longer window size for the rest)? Something similar to the so-called “Block-Switching” algorithm in MP3/AAC… I guess that it would dramatically improve the transient synthesis quality. Actually, as far as I remember, in older versions of Audiosculpt/supervp, transients were processed separately. Or maybe I missed something in settings?

All the best,

Alexis

Hello Alexis,

  1. as you mentioned already there are only two modes of transient synchronization in SuperVP:

Either transients are synchronized between channels - this means whenever a transient are detected in different channels there preservation time position is synchronized between the related channels, to avoid introducing artificial delays between the differnt channels. Or they are not synchronized - which means transient detection and preservation is completely independent in the different channels. I don’t see how it would be possible that the transients that are detected depend on the number of channels. If you want different sets of transients in different groupds of channels you might be trying tpoi process these channels groups independently by means of splitting them firdt an then process and then multiplex them back into the multi channel format.

  1. Yes you can use time varying window sizes in SuperVP and to some extend also in AS. In AS you have the AAAS analysis that automatically adapts the windowsize to the sound but - as far as I remember this can for the moement only be used for the spectrogram display. In SuperVP you can provide a bpf file as argument to the -M flag and you will get time varying windowsizes. you can these bpf files in action if you look into the command line that is used for calculating the spectrogramm in AAAS mode. The bpf format is described in the processing section of the Supervp help under -M flag. SuperVP will not use an arbitrary number of win sizes, it calculates the min and max winsize that is used and then establishes a grid of sizes that logarithmically samples this range with a number of sizes given by the winn flag. Npte that time varying windowsizes will be applied to all channels. You cannot have window sizes varying independently within different channels.

Hope this helps

Best
Axel

Thanks for your tips Axel!

  1. Demultiplexing/remultiplexing the channels will not help, since what I want to do is to use the transient information detected on some channels to force transient processing on the other ones, even if no transients where found there. Something great for instance, would be to force transient processing on positions based on markers (either generated by “Generate Markers”, or manually positionned), thus bypassing supervp transient detector.

  2. Good to know. It may be in practice quite tricky to configure, since again I need first a time description of the transients to reduce the window size there. Another attempt I did today was first to split the signal in two parts: transients only, and all the rest (partials and noise then). And then process each file separately with different window size. In practice however it did not work well, too much glitches occurred in the reconstruction.

Best

Alexis

Hello Alexis,

a further little hint about using variable window sizes in re-synthesis: as Axel said, in AS the AAAS analysis works for display only. But the optimal window sizes chosen are stored in a bpf in the AS FFt folder; this bpf can be used (as it is or edited, if needed) by SuperVP with the -M flag.

Best. marco

Something great for instance, would be to force transient processing on positions based on markers (either generated by “Generate Markers”, or manually positionned),
thus bypassing supervp transient detector.

I see the idea, but it would not work so well - at least not as well as does the detected transients mode because transients are not time markers they are time frequency areas. It would be possible howeevr to modify the transient sensitivity around user supplied markers, which should then give the eeffect you search for.

For the moement if you want transients to be detected you may want to lower the transient sensitivity, you can lower the threshold parameter and if this gives too many dtections you may on the other hand increase the min interval parameter.

Best
Axel

Marco, Axel, thanks a lot for your answers, this was really helpful. I also had a further look at supervp options and ended by finding a solution that works ok: I did a first transients analysis with “generate markers”, modified it by hand, and exported to an ascii file, then I forced supervp to focus on those time areas with the option “-td_beatpos”, that does exactly what you mentioned, Axel (reduce the transient sensitivity around those time areas). I doubled-checked by synthesizing the transients only, it’s not perfect (some transients were not detected, and some regions were detected as transients), but ok. The synthesis works way better with this method.

To improve it more, I would have needed to reduce the window size on transients with the bpf method, but it was too time-consuming and I gave up. I’m pretty sure that this would be a really interesting development of supervp/AS if it could do that automatically (namely having two block sizes, a short one for transients and a long one for the rest). This is for instance one of the major improvements between MPEG1-layer2 (mp2) and MPEG1-layer3 (mp3). In MPEG this is a little bit tricky, since windows are not symmetrical when switching the block size (to allow an optimal reconstruction), but it works quite well and is still used intensively nowadays in more modern algorithms like in AAC. As you mentioned Axel, as far as I understand it would require however two distinct transient detection algorithms: one purely time-related to generate reliable markers, and then the current one to do the separation between transients, partials and noise afterwards.

Best,

Alexis

Hi Alexis,

Funny, I had forgotten the td_beatpos option. It dates a bit that I worked on that, the idea was to use beat markers to modulate the sensitivity of the transient detection algo. But you are completely right, this can be done with manually annotated transients as well. I am really happy that you reported that. I don’t see how this method can lead to missclassified regions as you write? This should be a question of the transient sensitivity that is selkected for the standard and beatpos regions.

With respect to the window size, as we mentioned before you can already do this today. Just not yet in AS. but,for the moment we are concentrating on AudioSculpt 4.
On the other hand, it is not very obvious that this would help, because reducing the window size around transient regions may introduce disturbance for non transient components which may well be much more annoying then the missing transients. The transient preservation is there to deal with the issues without requiring the change resolution. I am aware of the effects in the compression, but this is not the same. Missing resolution is much less critical for compression (mp3)
than for time stretching. Anyway, the time varing resolution is a very interesting avenue that will certainly be extended in the future.

Best
Axel

Hi Axel,

I don’t see how this method can lead to missclassified regions as you write? This should be a question of the transient sensitivity that is selkected for the standard and beatpos regions.

As far as I understood -td_beatpos, it reduces the confidence threshold provided by -td_G. So for example, with the options:
“-td_G 1.5 -td_beatpos “markers.txt”,1,0.01,2”
… the confidence threshold is 1.5 everywhere, except around markers, where it falls to 0.5. Am I right?

I tried then the highest possible confidence threshold (5, even if only 4 works in AS, but I tried 4 as well), with a confidence threshold reduction of 3.5 in -td_beatpos. And even so the algorithm still detects some transients which do not correspond to the markers I entered.

Concerning the variable time resolution question, I imagine that if the reduction of the block size is limited to a few tens of ms before and after the transient, the disturbance on the non-transient components may not be that critical in most cases (i.e. if the transient is loud enough and full-band), since transients anyway disturbs the phases of the non-transient components, but on the other hand mask those disturbances. When I have time, I will try to make a small program that can automatically compute the “block size” bpf based to an ASCII marker file and see if it works.

Extra related question while I am at it: if the block sizes are variable, are there specific precautions to take regarding overlapping and time positions of the windows, or does supervp do that automatically (I mean repositioning the blocks to ensure the best possible reconstruction)?

Best

Alexis

As far as I understood -td_beatpos, it reduces the confidence threshold provided by -td_G. So for example, with the options:
“-td_G 1.5 -td_beatpos “markers.txt”,1,0.01,2″
… the confidence threshold is 1.5 everywhere, except around markers, where it falls to 0.5. Am I right?

Yes

I tried then the highest possible confidence threshold (5, even if only 4 works in AS, but I tried 4 as well), with a confidence threshold reduction of 3.5 in >-td_beatpos. And even so the algorithm still detects some transients which do not correspond to the markers I entered.

There upper limit for -td_G is 10, you may try with that. Main problem seems to be that you use the sub sampling parameter 2 at the end. This will lower the G threshold not only at the specified positions but as well in the middle of all segments defined by the markers in markers.txt. I think you would want sub sampling 1, that means no sub sampling at all.

Concerning the variable time resolution question, I imagine that if the reduction of the block size is limited to a few tens of ms before and after the transient,
the disturbance on the non-transient components may not be that critical in most cases (i.e. if the transient is loud enough and full-band), since transients
anyway disturbs the phases of the non-transient components, but on the other hand mask those disturbances. When I have time, I will try to make a small program
that can automatically compute the “block size” bpf based to an ASCII marker file and see if it works.

The reduction of the resolution even for a few ms can nevertheless generate phase jumps and therefore artifacts. The question is whether in the stationary components are perceptually more important then the transients. We had such cases in our trials, and that is why in fact the full method allows different window sizes in different frequency bands.

Extra related question while I am at it: if the block sizes are variable, are there specific precautions to take regarding overlapping and time positions of the
windows, or does supervp do that automatically (I mean repositioning the blocks to ensure the best possible reconstruction)?

SuperVP does everything automatically besides if you tell it not to. Specifically, you should use the -oversamp parameter to control analysis step size and not the fixed step size parameter -I. The oversamp parameter controls step size relative to the window size.

Best
Axel