< Back to IRCAM Forum

100% phase coherence between 2 similar audio files

Hi there,

I am trying to alter a series of similar waveforms (let’s say 2 different people singing “ah”) at the approximately the same pitch. I’m trying to alter a waveform (destructive DSP) by forcing it to be the exact same frequency over time as the “source” waveform. The syncing can be by Zero Crossing or crests or whatever works best. OR…. maybe there is another way to achieve this by 100% tuning each sound file so there is no phase when overlapped. I think the former would sound more natural though…. keeping the human tuning.:

Example Image

Maybe one way would be to do a pitch tracker (using autocorrelation or something more advanced) and then use a time-domain technique like PSOLA (pitch-synchronous overlap-add) to recombine snippets of the target waveform. Another way would be to do it in the frequency-domain using a phase vocoder. If the pitches are pretty similar to begin with, PSOLA would probably be the simplest.

It might be possible to skip the pitch-tracking step altogether, except maybe to estimate the reasonable range. It sounds like this could be a possibility:

  1. For each major pitch peak in the source waveform, find the nearest peak in the target waveform
  2. Window it with something like a Hanning window
  3. Overlap-add it to where it was in the source waveform
  4. Do whatever PSOLA does to make this work out right - I forget the details

Do any of IRCAM’s software packages offer something that could help me with this?

Thanks for any help.

hello,

Your goal is quite complex, I don’t see anything we have available that can help you to achieve the phase synchronisation between two independent files. I outline here below the means we have and how you can combine them to achieve what you want. As you will see quite some programming effort on your side is required.

For having exactly the same pitch contour you can take the phase vocoder in AS, calculate F0 contours of both files, calculate the transposition by means of dividing those then apply the transposition using shape invariant mode (you find that in AS in the processing dialog, you need to enable Waveform preservation). For managing the F0 files and calculate the transposition you would need to make a little program that allows you to read sdif or ascii F0 data, and outputs the transposition contour - I’d suggest python with the EASdif extension that you can download from source forge.

Problems will arise due to the fact that the phoneme segments will most likely not be exactly aligned. To fix this you could use phoneme markers that you put manually on both speech signals indicating the beginning and end of each phoneme and then apply time stretching to align those, and afterwards do transposition. Again, you need to write your own programs to handle the markers and time stretching parameters.

All this will however only produce synchronous pitch contours not phase synchronous wave forms. For the latter you will have to program a PSOLA cross synthesis algorithm yourself. You may use the AS glottal pulse analysis to find the glottal pulse positions, for this however, you need to use SuperVP on the command line because AS does not output the pulse positions in the RD pulse shape analysis. With those markers you would then need to program a psola algorithm as I don’t think this kind of cross synthesis exists anywhere. This is no extremely difficult but not a 5min project either. Personally, I would use numpy/scipy for this. I did not check but you may be able to find a psola algorithm in python to start with, which may also provide the markers, and maybe voiced/unvoiced detection.

Best
Axel

Hi Axel,

Thanks so much for answering. As you suggested, I think we are going to build out own tool using Python and available modules and hopefully it will not sound like a chain saw :slight_smile:

Again… thanks so much for sharing your thoughts and approach. At least I know I wasn’t going crazy by not finding a solution in the IRCAM suite.

Best,

Thonex

Hi

This does not answer specifically your topic but you may find some interesting python modules concerning audio signal processing…

N.