< Back to IRCAM Forum

Lpcformants pipo giving weird results

Hi,
I’m trying to build a real-time IPA vowel chart, or F2-F1 graph visualization.
The only way I found to extract formants from real-time audio in Max without doing the math myself (way beyond my skills) is the lpcformants pipo, but it seems to give a bit strange results.

I use it as [pipo~ slice:lpcformants @slice.hop 512 @lpcformants.Bandwidth 0 @lpcformants.nFormants 3 @lpcformants.threshold 50] on a monophic source with proper gain and clean signal.
F1 seems fine, but F2 is always around 5-10KHz while I would expect it arounr 1-3KHz. Same goes for F3 which is around 10-15KHz where I expect 2-4KHz.

Here’s a simple test patch:

<pre><code>
----------begin_max5_patcher----------
650.3ocwVssaiBCD8YxWgkebU1Hr4RR5SU6C8mX0pHGvMwQfMxXRytUMe6qY
.xkcS.ZJo8A.4ismyblwyfecjCdoZGOGid.8Sjiyqibb.nR.m5wN3T1tnDVN
rLbjJMkKM3wUyY36L.9SdMPYLSzZgb0BMOxTYZ+otSbGi7ozxOTO3sEB8q58
HKRUElDtAHw8HpP1.RpAEw.epka9NcFtD6sQiJeM9ip.ZKJfFDbWTvzAUAjV
Tf276h.BuEAXs5RttsCLyqC2jxOyAWuzyuh2RqQqfL+NiWYFLdLBujIWgOci
sKH+itklkxMb8BtjsLALo68PrzYteUh06SWrg9eUZkN3Zk+h0v+WUXgbiRHQ
dcFDnyCK+DFF1QTvusnvgmdGIH2TSmmSTVC2hnlAkrj.nASfaOyrOqzoLvBg
CWtl3+Ikq+1dj6jfNyzjYUm56JnPtVHHWrRxRtR.fdg..8VRw7+vhi12oZBp
9SRPy62asasXF2grtTdkfGtbWlHSsGkmHh3OjjEAGDklbzi.zj0pLT.ghd7j
4l7ClL9EQrYMx87IjO0rcuymvrVyyWqRhskDc2QfNE5HDzUeQu1pUrUp1Jp2
UKgCGgyYa4wKrPVOaAyXzhkElpaH5bH75fYQFwVfSBf.Yjl7BvJNQH+2aVB5
tD+7jUtpPG0HfZuAcT4w7biPxLBk7j0XahAK5hmG5KQ1xjtYJXHHpzaocPT4
UOFFlHcwj2PwTWAOJcHXxuOGHnUB+iwDoGQOxfnIRe0z4LUUbwxx1x040qFH
w1yaiRWNb9XXnPVMDJ2wZ9VQy5gK2iYZaSHisNuPCNFdWX0uMwopXtVVHpqu
sxyRIzOUZ+YZdFqRIPa2QuM5ubyiSE.
-----------end_max5_patcher-----------
</code></pre>

Is there anything I’m missing?

Hi, sorry for the holidays induced wait. To check this, do you have a set of standard vowel recordings with known formant frequencies? Do you have access to other calculation methods (in Praat/Matlab/Python)?

Hi, no problem for the delay.
Sadly I have none of this.
Would taking the audio from the IPA vowel chart from Wikipedia considered a set of standard recordings? After a quick search I couldn’t find a set of audio - formants pairs.
As for another calculation method, I can check in Github, unless you have a go-to solution to suggest?

That would be a good start. From this page linked in the WP article, there are urls for Excel charts with formant frequencies, other vowel collections, and the Praat app with which you can measure vowel formants for reference.

Thanks a lot for pointing out these resources!
I’ll get back to this later when I’ll have more available time to do proper testings.

sure, take your time. I did observe that the existing implementation gives more expected results when resampling to 8000Hz, which was the standard when the research producing the implemented formulas was done. Use

mubu.process lpcform-help-mubu audio resample:slice:lpcformants @name formants @resample.targetrate 8000 @lpcformants.sr 8000

for that, as in here: lpcformants vowel triangle.maxpat (41.2 KB) [updated]

It would still be good to validate with some verified example data.

Thanks for the suggestion! I tried to adapt it to the pipo~ fashion to use it in real time, like so:

pipo~ resample:slice:lpcformants @resample.targetrate 8000. @lpcformants.sr 8000. @slice.size 20 @slice.hop 10 @lpcformants.Bandwidth 0 @lpcformants.nFormants 2 @lpcformants.threshold 50

but I get bunches of these error messages:

pipo~: “resample” audio or pipo processing load too heavy, skipped 101 output frames since last warning (828 total, reporting every 100)

I tried to lower the sample rate in the audio settings but I cannot go lower than 44100Hz, at least with CoreAudio and the builtin mic as input.

In the doc, resample is said to take data as input instead of audio, so I guess that’s the issue.
Any idea?

You are missing the @slice.unit ms, otherwise a result every 10 samples really overloads the scheduler…

BTW, I chose 20ms windows more out of thin air, would be good to look up the most appropriate size for speech analysis.