Lpcformants pipo giving weird results

tfl3000 · August 1, 2025, 1:09pm

Hi,
I’m trying to build a real-time IPA vowel chart, or F2-F1 graph visualization.
The only way I found to extract formants from real-time audio in Max without doing the math myself (way beyond my skills) is the lpcformants pipo, but it seems to give a bit strange results.

I use it as [pipo~ slice:lpcformants @slice.hop 512 @lpcformants.Bandwidth 0 @lpcformants.nFormants 3 @lpcformants.threshold 50] on a monophic source with proper gain and clean signal.
F1 seems fine, but F2 is always around 5-10KHz while I would expect it arounr 1-3KHz. Same goes for F3 which is around 10-15KHz where I expect 2-4KHz.

Here’s a simple test patch:

<pre><code>
----------begin_max5_patcher----------
650.3ocwVssaiBCD8YxWgkebU1Hr4RR5SU6C8mX0pHGvMwQfMxXRytUMe6qY
.xkcS.ZJo8A.4ismyblwyfecjCdoZGOGid.8Sjiyqibb.nR.m5wN3T1tnDVN
rLbjJMkKM3wUyY36L.9SdMPYLSzZgb0BMOxTYZ+otSbGi7ozxOTO3sEB8q58
HKRUElDtAHw8HpP1.RpAEw.epka9NcFtD6sQiJeM9ip.ZKJfFDbWTvzAUAjV
Tf276h.BuEAXs5RttsCLyqC2jxOyAWuzyuh2RqQqfL+NiWYFLdLBujIWgOci
sKH+itklkxMb8BtjsLALo68PrzYteUh06SWrg9eUZkN3Zk+h0v+WUXgbiRHQ
dcFDnyCK+DFF1QTvusnvgmdGIH2TSmmSTVC2hnlAkrj.nASfaOyrOqzoLvBg
CWtl3+Ikq+1dj6jfNyzjYUm56JnPtVHHWrRxRtR.fdg..8VRw7+vhi12oZBp
9SRPy62asasXF2grtTdkfGtbWlHSsGkmHh3OjjEAGDklbzi.zj0pLT.ghd7j
4l7ClL9EQrYMx87IjO0rcuymvrVyyWqRhskDc2QfNE5HDzUeQu1pUrUp1Jp2
UKgCGgyYa4wKrPVOaAyXzhkElpaH5bH75fYQFwVfSBf.Yjl7BvJNQH+2aVB5
tD+7jUtpPG0HfZuAcT4w7biPxLBk7j0XahAK5hmG5KQ1xjtYJXHHpzaocPT4
UOFFlHcwj2PwTWAOJcHXxuOGHnUB+iwDoGQOxfnIRe0z4LUUbwxx1x040qFH
w1yaiRWNb9XXnPVMDJ2wZ9VQy5gK2iYZaSHisNuPCNFdWX0uMwopXtVVHpqu
sxyRIzOUZ+YZdFqRIPa2QuM5ubyiSE.
-----------end_max5_patcher-----------
</code></pre>

Is there anything I’m missing?

schwarz · August 25, 2025, 4:39pm

Hi, sorry for the holidays induced wait. To check this, do you have a set of standard vowel recordings with known formant frequencies? Do you have access to other calculation methods (in Praat/Matlab/Python)?

tfl3000 · August 25, 2025, 5:02pm

Hi, no problem for the delay.
Sadly I have none of this.
Would taking the audio from the IPA vowel chart from Wikipedia considered a set of standard recordings? After a quick search I couldn’t find a set of audio - formants pairs.
As for another calculation method, I can check in Github, unless you have a go-to solution to suggest?

schwarz · August 26, 2025, 8:37am

That would be a good start. From this page linked in the WP article, there are urls for Excel charts with formant frequencies, other vowel collections, and the Praat app with which you can measure vowel formants for reference.

tfl3000 · September 3, 2025, 8:16am

Thanks a lot for pointing out these resources!
I’ll get back to this later when I’ll have more available time to do proper testings.

schwarz · September 3, 2025, 9:58am

sure, take your time. I did observe that the existing implementation gives more expected results when resampling to 8000Hz, which was the standard when the research producing the implemented formulas was done. Use

mubu.process lpcform-help-mubu audio resample:slice:lpcformants @name formants @resample.targetrate 8000 @lpcformants.sr 8000

for that, as in here: lpcformants vowel triangle.maxpat (41.2 KB) [updated]

It would still be good to validate with some verified example data.

tfl3000 · September 5, 2025, 7:36am

Thanks for the suggestion! I tried to adapt it to the pipo~ fashion to use it in real time, like so:

pipo~ resample:slice:lpcformants @resample.targetrate 8000. @lpcformants.sr 8000. @slice.size 20 @slice.hop 10 @lpcformants.Bandwidth 0 @lpcformants.nFormants 2 @lpcformants.threshold 50

but I get bunches of these error messages:

pipo~: “resample” audio or pipo processing load too heavy, skipped 101 output frames since last warning (828 total, reporting every 100)

I tried to lower the sample rate in the audio settings but I cannot go lower than 44100Hz, at least with CoreAudio and the builtin mic as input.

In the doc, resample is said to take data as input instead of audio, so I guess that’s the issue.
Any idea?

schwarz · September 5, 2025, 3:00pm

You are missing the @slice.unit ms, otherwise a result every 10 samples really overloads the scheduler…

BTW, I chose 20ms windows more out of thin air, would be good to look up the most appropriate size for speech analysis.

tfl3000 · October 20, 2025, 11:22am

Getting back to this, it seems one of the main culprits of the odd results I was getting is the lpcformants.nFormants attribute. I was expecting setting it to 3 would provide the 3 lowest formants in the spectrum, but it doesn’t seem to be the case. If you set it to 3, 6 or 9 (or any value) you get widely different values for the first 3 formants of the list.
Digging a bit further down, it seems to be expected and due to the way LPC formant analysis works mathematically, and looks like nFormants actually internally defines the order, or number of poles, of the analysis, is that it?

A quick patch to demonstrate the various results when different nFormants values.

<pre><code>
----------begin_max5_patcher----------
831.3oc2XFsbhBCEF999TjgK2wkgj.nzq5rWzWhNcbhPplNPBSHXcaG6y9lD
PqtEQXaA6VlVUHmD+yGGx4O9xU.8gyBwFZgy0f6rmZNdY+m1Eft4iunsgLxl
3TRgoyNb5ShEO5L48QonaTlHJRYITPfm9ulBKmnhWw3KmKowl3uC3GE3hO7X
B.Ghc8l.BmZdEgb8.22vPwKyX7TpxHLbysKJU0A.aH.VhQv54yOwMI0pNq9c
N0pSmGREDkC39ihb69y1N4qFYQSibCO73RfVz2RzF5a32nCS3kGlwhrLJW0F
MuE2wm7m5Y.mugbZ9YYIxqCTDdNJ50dJ4LmKGaPc7Y2ffKCaldAYCrarAGcY
PS3.gFsFVPkcuZocxCMuEE04UdN6jG09j2uY8IIYTEUNmxIKRomfg+8JTNSz
jgvWNbq32Chhl4cYHJ96JQqJNN9.E8eEPOqojR9iBFGf6CzQQgVGIggedT2u
cKISc5HN2++PgTswGsv6FtlYWFEFXqgD30JsdPHyHltENrIvP+uWIv+3Ufma
PeRegypV5vqyYunOng5Ns8jB1RNIcvHI8YRR7q8gSAUNfB185PuzJD1CLMYv
A14S8xY4hWARZAIKOkdsd+cwzqSyisOJyUEfa10lqhHWRURhhBl444At4fvb
Kj0WzNBtErmoZq46NckHG.22XImo.YEGO.+hvSdhknVA9qQle6NofOtA0Jsz
VIRSNwlQSYbZrnjqNwcsSVZ.Y2l5LeaoA7mRVCt0rlftVYnZGqUkFZnKEj0z
j45QTOglSTJIaQox9iY897DaGHwJ1Zy3CeW6aGlLRipJYczvYUw45ZOvyT7w
LxlugFycb9n2.ae0wO05Qmdw.6mpm61z6V+cJs.0DUy2UJDkx35uy54.nYxl
PKTLNQwD72BOnJ7dmkzSUg6mpP9iirLl4.ntqK73HKTOoEdDoErG5ZjjUOoE
ZDoUOzEbbjUPOU0zQhVnujqaA6YxEDMh5B9ETW9+C7B1wRkj770TYQ0.7lXz
FSdTX7LDM4fqw31qcPAeGIcMqt2GTA2gH0FVTZ2JkRyLwYS3g9.bxDITIWa1
Ym8pZX4XMCw0dCJxIVFXsNc01+fERWAv
-----------end_max5_patcher-----------
</code></pre>

It makes me slowly realize that what I want to achieve (a plug’n’play realtime IPA vowel graph) is not as straightforward as I thought, and probably beyond my skills for now.