Lab 3: The Synthetic Psychology of Sound Localization

Return to the lab overview, or move on to task 3.

Task 2: Learn some background theory

The question we are going to concern ourselves with is how your two ears are able to perceive a 3D stereo image. This is sometimes called the problem of "auditory scene analysis."

David Heeger describes the above pictures this way: "Al Bregman calls this the problem of auditory scene analysis and he uses this picture as an analogy for what your auditory system must do. The lake corresponds to your auditory world, the waves on the lake correspond to sound waves, the two channels correspond to your ear canals, and the two pieces of cloth correspond to your two ear drums. Just from the motion of the cloths, you have to figure out what's happening on the lake."

The overall solution to this problem is not completely understood, but particular aspects of the problem are. In particular, there is a pretty well developed theory of the spatial localization of single sound sources in the auditory field. This refers to the ability to "perceive" the location of a single audio source such as a person talking. This will be the topic of the lab.

First, a bit of terminology... Single point sound sources can be localized using two coordinates: the azimuth which is the angular difference along the horizon, and the elevation which is how high or low things sound compared to your head. Together these two coordinates provide a complete description of the location of a sound in space.

To perceive these differences there are two "cues" or features of the world that your brain appears to rely heavily on: the inter-aural intensity differences (IID) and the inter-aural timing difference (ITD).

Inter-aural intensity differences (IDD)

IID refers to that fact that the loudness of sound radiating from a single point source will differ between the two ears. There are a variety of reasons for this. First is that the intensity of sound drops off as a function of distance. You know this to be true because yelling at someone far away is sometimes impossible. Sound travels through the air, but the air itself absorbs some of the energy and amplitude of the wave is reduced. This is reflected in the image below in the sense that there is an "extra length" that the sound wave has to travel to reach the ear furthest from the sound.

Another reason is that you have a big fleshy mound between your ears... your head (and brain). Your head tends to absorb the sound energy causing what is known as a sound shadow on one side of your head. This is depicted in the figure very clearly. Here, the head absorbs the sound arriving from the left and makes it much quieter on the right. You may not realize this is happening because it is pretty constant your whole life. However, your brain is constantly compensating for these features.

Interestingly, this "acoustic shadow" created by your head is more apparent for higher frequency sounds. This is because low frequency sounds (which have a relatively large wavelength) pass through your head without being as readily absorbed. in contrast, your head tends to reflect back a lot of the high-frequency content of a wave. For example:

Inter-aural timing differences (ITD)

A second cue that your brain can exploit to help localize a single sound source is slight differences in the arrival time between the two ears. For example, consider the figure below. Here a complex wave form is generated a slight angle off to the right of the observer along the azimuth axis. The sound wave generated by this event will take time to travel to the ears. Since the left ear is a little further away from the sound source, it will receive the sound fractions of a millisecond later. Amazingly, your brain is sensitive to these tiny differences and can use the ITD as a cue to locate a sound.

The magnitude of the difference in the ITD relates to the offset angle along the azimuth. A sound source that is located centrally will have the sound arrive at both ears simultaneously. If a sound arrives a little later on the right than the left, it means the items is located on the left side of the body. Vice versa for the right side. The next graph shows the time difference in milliseconds as a function of the angle from directly ahead.

As I've just said, it can helpful at first to think of the timing differences as being delays in the arrival of a sound However, with a continuous sound source this is also equivalent to a phase shift.

Return to the lab overview, or move on to task 3.
Copyright © 2013 Todd Gureckis, Diagrams and schematics of the Parallax robot come from the Parallax website, much excellent material was taken from David Heeger's course notes on sound localization