< Back to IRCAM Forum

Naive Question on Audio Spatial Representation

Hi,

My question is specifically about the reverse of audio spatialization, one could say that it’s about forensics rather than production. It’s naive and somewhat rough around the edges, but here goes:

Assuming that we’re speaking of the recording of an actual live event in a physical setting. How far can we go in assessing/representing the actual spatial setup of said event from the final audio render alone? And how does our ability to assess it change if the final render is monophonic, stereophonic, etc.?

Any hints and insights or free-floating thoughts on this issue would be much appreciated.

All the best,
António

Hi,

All I can think of is the use of metrics, scopes and imagers to probe the distribution of the signal of the audio render in the stereo or 3D fields and have an overall image… (but the question is whether the final mix corresponds to the exact physical setup ?)

Bye,

N.

Hi Nadir,

Perhaps if I put it as a simple thought experiment it becomes easier to understand: imagine that you have access to the final render alone, my question is what can we actually assess/represent from the physical space in which the recorded event took place (e.g., it could be a couple talking, cars passing by, sirens buzzing, glass breaking, any random set of actual events will do)? There’s no one-to-one mapping, it can only be a fuzzy sketch at most.

My question is originally about forensics rather than production.

All the best,
António

Hum… my one before last attempt to free-float-think :

  • Is it indoor or outdoor ?
  • if it is an indoor setting : Is it a large or a small space ?
  • if it is outdoor : Is it open space or surrounded by buildings (distribution of early reflections)
  • Are the events as you call them moving or not ?
  • An impression of foreground and background events according to the complexity of the recording setting

And I agree there’s no one-to-one mapping, even ambisonic recordings give a general impression of the surrounding… (unless it is emphasised in post-production).

N.

In this small thought experiment I laid out above we’re supposed to be agnostic about everything other than what the rendered audio can inform us on its own. We’re assuming that it’s the only source of information that we’ve got. So, what properties (format, number of tracks, sampling rate, etc.) does the rendered audio file need to have for us to be able to answer any subset of the points about the physical space that you raise?

yes I am completeley agnostic, though these information are gathered empirically. Every day life, memory… All beings having senses of “alert”. To relate them to the properties you listed and to brain processing, one needs to undergo large experiments with different categories of listeners (musicians, non-musicians, etc.) There must be a study somewhere.
The format is definitely relevant as monophonic audio lack any information of width for instance, and ambisonic aims at spherical reproduction close to the perception of real environment.

thanks to your [

] I feel like reading some books and articles ! Any reference ? May be the ear training approaches for sound engineers ?

1 Like

I’d love to dive deeper on this issue too, at this (naive) stage of mine I don’t have any scholarly references to go by either… It would be awesome if some fellow ircamian could point us in the right direction.

All the best,
António

I found this paper as far and its bibliography : https://www.researchgate.net/publication/8337538_Perceptual_evaluation_of_multi-dimensional_spatial_audio_reproduction

to be continued…

All the best,

N.

I forgot to ask what is the aim of your investigation ?

Thanks, it’s an interesting paper. But like every publication on immersive audio I ever came across, it addresses the issue from the (re)production side of things (i.e., audio spatialization.) I never came across one that addressed the strictly analytical or forensic side of things.

There’s no specific/professional aim to this inquiry, I’m yet again just trying to better understand natural sound and audio, and hopefully gain some insights from it that can be applied artistically/musically.

All the best,
António

Hi Nadir,

On the issue of immersive audio, this is the most in-depth textbook I found while browsing and braving the interwebs: Immersive Sound: The Art and Science of Binaural and Multi-Channel Aud

There’s also a lot to be said about how it’s being implemented and embedded in Spatial Computing generally, especially with the rising tide of Augmented and Virtual Reality applications coming our way in the near future.

All the best,
António

Thank you António for the reference.
I’ve been thinking about other fields of research that might give some insights on a way one could describe and model sound and space perception : Auditory scene descriptions and Procedural audio (Cf. in the Sound Design book, how Andy Farnell elaborated a protocol to describe and reconstruct a sound object and its context…)

Well definitely we need the help of experts concerning the illusionary perception of sound and space, I at least ! And as Henri Bergson put it :-)) : La chose et la perception de la chose, c’est la même chose, sauf que la perception de la chose c’est la chose moins ce qui ne nous intéresse pas…

Bye,

N.

Hi Nadir,

Funnily enough, thanks to searching the works you mentioned I just found the field of research which studies my original “forensic” question… It’s called Auditory Scene Analysis/Detection.

We all need (other) researchers and inquisitive minds generally, we’d be even more in a naive state on all things if it weren’t for them.

Thanks for your input on this issue, it was very helpful.

All the best,
António