Chapter 1:
The nature of sound and the structure and function
of the auditory system
-
Psychoacoustics or auditory psychophysics attempts to
specify the relationships between the physical characteristics of sounds
that enter the ear and the sensations that they produce.
-
Emphasis on underlying mechanisms at various levels
of explanation.
-
Must first know something about the physical nature
of sound and basic anatomy and physiology of the auditory system.
The physical characteristics of sounds.
-
The nature of sound.
-
Sound originates from the motion or vibration of an
object that is imposed upon the surrounding medium as a pattern of changes
in pressure.
-
The sound wave is propagated in all directions from
the vibrating object by transmission of the vibration (condensation and
rarefaction) of individual molecules along the axis of propagation through
the surrounding medium.
-
Such a longitudinal wave weakens with distance from
the object and is subject to reflections and refractions caused by objects
in its path.
-
The simplest type of such vibration mathematically,
physically and auditorily is the sine wave that can be modeled by a pendulum,
a spring, or a tuning fork.
-
Plotting pressure variations against time to obtain
a waveform for the sine wave yields a function of the form A sin (2p
ft), where A is the maximum amplitude of the vibration, f is the frequency
of the vibration, and t is time as shown in Fig. 1.1.
-
A single continuous sine wave can thus be completely
specified by two parameters.
-
A is the amount of pressure variation about the mean.
-
The frequency parameter f is the number of times per
second the waveform repeats itself, specified in hertz (where 1 Hz = 1
cps).
-
Instead of frequency, one can specify the period, which
is simply the reciprocal of frequency.
-
If the sine wave is turned on and off or we are interested
in the relationship between two or more different sine waves, the phase
must also be given to specify the portion of the cycle through which the
wave has advanced in relation to some fixed point in time.
-
The perceptual qualities of the pure tone to which a
sine wave gives rise are related to the parameters of the wave.
-
Pitch is related monotonically to frequency (demo).
-
Loudness is related monotonically to amplitude (demo).
-
Fourier analysis and spectral representations.
-
All sounds can be specified in terms of variations in
sound pressure over time, but when sounds are complex it is often more
useful to specify them in the frequency domain.
-
This is made possible by Fourier analysis that breaks
down the complex wave into a series of sinusoids, each with a specific
frequency, amplitude and phase.
-
Adding these sinusoids together produces the original
complex wave and is referred to as Fourier synthesis.
-
The simplest type of complex tone is one that is periodic.
-
Such periodic or harmonic complex tones are composed
of a number of sinusoids whose frequencies are integer multiples of a (not
necessarily present) fundamental frequency.
-
The fundamental frequency equals the repetition rate
of the complex waveform as a whole.
-
The components of the tone are called harmonics and
are numbered beginning with the fundamental as one.
-
Fig. 1.2 illustrates how a complex tone can be built
up from a series of sinusoids (demo).
-
The structure of a sound, in terms of its frequency
components, is often represented by its magnitude spectrum, a plot of sound
amplitude, energy or power as a function of frequency. Fig. 1.3 shows examples
of magnitude spectra.
-
The term partial is used to describe any discrete sinusoidal
component of a complex sound, whether it is a harmonic or not.
-
The measurement of sound level.
-
Instruments used to measure magnitudes of sounds, such
as microphones, normally respond to changes in air pressure.
-
Sound magnitudes are often specified in terms of intensity,
which is the sound energy transmitted per second (i.e. the power) through
a unit area in a sound field.
-
For our purposes, acoustic intensity is proportional
to the square of the pressure variation.
-
Our auditory systems can deal with a huge range of sound
intensities making it inconvenient to deal with sound intensities directly.
Instead, a log scale expressing the ratio of two intensities is used.
-
One intensity, I0, is chosen as a reference
and the other intensity, I1, is expressed relative to this.
-
One bel is defined to be an intensity ratio of 10:1.
Thus intensity in bels = log10 (I1/I0).
-
The bel is a rather large unit and thus is usually divided
into 10 decibels (dB) so that number of decibels = 10 log10
(I1/I0).
-
When the magnitude of a sound is specified in dB, it
is referred to as a sound level.
-
The sound level is an intensity ratio, not an absolute
intensity.
-
To specify the absolute intensity it is necessary to
state that the sound, I1, is n dB above or below some reference
intensity I0.
-
The most common reference intensity for sound measurements
is 10-12 W/m2, which was chosen to be close to the
average human absolute threshold for a 1000-Hz pure tone. A sound level
specified using this reference is referred to as sound pressure level (SPL).
-
It is sometimes useful to choose as a reference level
the threshold of a subject for the sound being used. A sound level specified
in this way is referred to as sensation level (SL).
-
It is also useful to adapt the dB notion for ratios
of pressures. Number of decibels = 10 log10 (I1/I0)
= 10 log10 (P1/P0)2 = 20 log10
(P1/P0).
-
Table 1.1 gives some examples of sound levels, in dB
SPL, corresponding to various common sounds.
-
Beats.
-
When two sinusoids with slightly different frequencies
are added together, the resulting wave resembles a single sinusoid, with
frequency equal to the mean frequency of the two components, but with amplitude
fluctuating at a regular rate.
-
These fluctuations are known as beats and occur because
of the changing phase relationship between the two sinusoids, which causes
them alternately to reinforce and cancel one another (Fig. 1.3b).
-
Beats are heard as loudness fluctuations and can be
a problem in some experiments (demo).
-
The concept of linearity.
-
The auditory system is often conceived as being made
up of a series of devices or systems, each with input from the previous
device and output to the subsequent device.
-
Such a device is said to be linear if certain relationships
between its input and output are true.
-
Superposition--The output of the device in response
to a number of independent inputs presented simultaneously should be equal
to the sum of the outputs that would have been obtained if each input were
presented alone.
-
Homogeneity--If the input to the device is changed in
magnitude by a factor k, then the output should also change in magnitude
by a factor k, but be otherwise unaltered.
-
The output of a linear device never contains frequency
components that were not present in the input signal.
-
Some parts of the auditory system are approximately
linear, while other parts behave in a grossly nonlinear way.
-
If a device is linear, measuring its response to a sinusoidal
input as a function of frequency tells us all we need to know to predict
its response to any input.
-
Perform a Fourier analysis of the arbitrary complex
input.
-
The response to the complex input can then be calculated
as the sum of the responses to its sinusoidal components.
-
This is one of the reasons sinusoidal stimuli are so
frequently studied in psychoacoustics.
-
If a device is not linear, its response to complex inputs
cannot generally be predicted from its response to sinusoidal inputs.
-
To discover the characteristics of such a nonlinear
device its response to both sinusoidal and various complex inputs of interest
must be studied directly.
-
Filters and their properties.
-
Filters are used to manipulate the spectra of stimuli
for psychoacoustic experiments and provide models of how certain parts
of the auditory system behave.
-
Filters are linear devices that attenuate some frequencies
more than others.
-
A highpass filter removes all frequency components below
a certain cutoff frequency, but does not affect components above that frequency.
-
A lowpass filter does just the opposite.
-
A bandpass filter has two cutoff frequencies, passing
components between those two frequencies and removing components outside
this passband.
-
A bandstop filter also has two cutoff frequencies, but
it removes components between these two frequencies, leaving other components
intact.
-
In practice it is not possible to design filters with
perfectly sharp cutoffs. Instead there is some range of frequencies over
which components are increasingly attenuated, but not completely eliminated.
-
Thus, in order to specify a filter we have to define
both its cutoff frequency and the slope of the filter response curve.
-
Some typical filter characteristics are shown in Fig.
1.5.
-
The cutoff frequency is usually defined as the frequency
at which the output of the filter has fallen by 3 dB or reduced in power
by 1/2, relative to output in the passband.
-
For a bandpass or bandstop filter, the range of frequencies
between the two cutoffs defines the bandwidth of the filter and the midpoint
of the pass or stop band is called the filter's center frequency (CF).
-
An alternative way to measure bandwidth is the equivalent
rectangular bandwidth (ERB), which is simply the bandwidth of a rectangular
filter with the same height and area as our filter.
-
The characteristics of filters outside their pass bands
are often linear when plotted on dB versus log-frequency coordinates. Thus,
slopes are often specified in dB/octave.
-
A filter does not affect the waveform of a sinusoid,
but usually does alter waveforms that are more complex. E.g., passing white
noise through a narrow bandpass filter produces a waveform resembling a
sinusoid fluctuating in amplitude from moment to moment and has a pitch-like
quality corresponding to the center frequency of the filter.
-
The characteristics of a filter can be obtained by a
Fourier analysis of its impulse response.
-
A filter's response is not instantaneous.
-
The narrower the bandwidth and the steeper the slope
of a filter, the longer its response time.
-
Thus, an increase in frequency selectivity can only
be obtained at the expense of a loss of resolution in time.
-
Basic structure and function of the auditory system.
-
The outer and middle ear.
-
The outer ear consists of the pinna and auditory canal,
as shown in Fig. 1.7.
-
The pinna modifies incoming sound, particularly at high
frequencies, and plays a role in sound localization.
-
The auditory canal channels the sound to the middle
ear.
-
The middle ear consists of the tympanic membrane (eardrum)
and three small bones (ossicles), the malleus (hammer), incus (anvil),
and stapes (stirrup).
-
Sound arriving at the tympanic membrane sets it into
vibration that is transmitted through the ossicles, acting as a series
of levers, to the inner ear.
-
The main function of the middle ear is to match the
acoustic impedance of the air to that of the inner ear.
-
Transmission of sound through the middle ear is most
efficient at middle frequencies (500-4000 Hz).
-
Middle ear reflex.
-
The inner ear and the basilar membrane.
-
The inner ear consists of the cochlea, which is a rigid,
bony, snail-shaped structure filled with almost incompressible fluids.
-
The cochlea is divided along its length by the basilar
membrane into two connected chambers, the scala vestibuli and the scala
tympani.
-
The stapes contacts the oval window at the base of the
cochlea and its movements in response to sound force fluid from the upper
to the lower chamber of the cochlea, with the pressure producing an outward
movement of the round window.
-
This creates a pressure difference across the basilar
membrane that causes it to move.
-
The response of the basilar membrane to sinusoidal stimulation
takes the form of a travelling wave that moves along the membrane from
base to apex.
-
The amplitude of this wave increases at first and then
decreases abruptly, as shown in Fig. 1.8, producing an envelope maximum
at a particular position along the membrane.
-
The position of the peak vibration along the basilar
membrane varies with frequency of stimulation, as shown in Fig. 1.9.
-
High frequencies produce a maximum displacement near
the base of the basilar membrane.
-
Low frequencies produce patterns of vibration that extend
all along the membrane, but reach a maximum near the apex.
-
The frequency that gives maximum response at a particular
point on the basilar membrane is known as the characteristic frequency
of that place.
-
In response to steady sinusoidal stimulation, each point
on the basilar membrane vibrates in an approximately sinusoidal manner
with a frequency equal to that of the input waveform.
-
Each point on the basilar membrane may be considered
a bandpass filter with a center frequency, bandwidth and slopes outside
the passband. The bandwidth increases roughly in proportion to the center
frequency (typical bandwidths are 0.5-0.15 octaves).
-
If two sinusoids of different frequencies are presented
simultaneously, the response of the basilar membrane is somewhat more complex
and depends on the frequency separation of the two tones.
-
At large separations, two separate peaks of vibration
are obtained, much as if each tone had been presented separately.
-
When the tones are closer together, some points respond
to both tones and those points have a complex, rather than a sinusoidal
vibration.
-
If the tones are yet closer together there is simply
a single peak vibration, but that peak is somewhat wider than for a single
tone.
-
For frequencies above 500 Hz, the position on the basilar
membrane most excited by a given frequency varies approximately with the
logarithm of frequency, and relative bandwidths of the vibration patterns
are approximately constant.
-
The impulse response of a particular point on the basilar
membrane resembles a dampened oscillation with a frequency corresponding
to the center frequency of that point.
-
The transduction process and the hair cells.
-
How is information about frequency, amplitude and time,
which is carried in the vibration patterns of the basilar membrane, converted
or coded into neural signals in the auditory nervous system?
-
Between the basilar membrane and the tectorial membrane
are hair cells, which form part of a structure called the organ of Corti,
as shown in Fig. 1.13.
-
On the side of the tunnel of Corti nearest the outside,
there are about 25,000 outer hair cells, each with about 140 'hairs'.
-
On the other side of the tunnel are about 3,500 inner
hair cells, each with about 40 'hairs'.
-
The functions of the inner and outer hair cells are
different from each other.
-
Motion of the basilar membrane excites the inner hair
cells that in turn excite the approximately 20 auditory neurons that contact
each cell.
-
The outer hair cells receive efferents from higher brain
centers and affect the mechanics of the cochlea to produce high sensitivity
and sharp tuning.
-
Cochlear echoes.
-
Kemp found that if a low-level click is applied to the
ear, then it is possible to measure sound being reflected from the ear,
using a microphone sealed into the ear canal.
-
The early part of this reflected sound comes from the
middle ear, but at longer delays it reflects active processes occurring
inside the cochlea.
-
Investigation of these cochlear echoes suggests several
conclusions.
-
These processes have a strong nonlinear component.
-
They are biologically active.
-
They are physiologically vulnerable.
-
They appear to be responsible for the sensitivity and
sharp tuning of the basilar membrane.
-
Neural responses in the auditory nerve.
-
Spontaneous firing rates and thresholds.
-
Auditory nerve fibers have spontaneous firing rates
from less than 0.5 to about 250 spikes/sec.
-
The spontaneous rates are correlated with position and
size of the synapses on the inner hair cells.
-
High rates go with large synapses on the side of the
inner hair cells facing the outer hair cells.
-
Low rates go with small synapses on the other side of
the inner hair cells.
-
High spontaneous rates are correlated with low thresholds
and vice versa.
-
Thresholds vary from near 0 dB to 80 dB SPL or more.
-
Tuning curves and iso-rate contours.
-
Frequency selectivity of a single nerve fiber can be
illustrated by a tuning curve, which plots the fiber's threshold as a function
of frequency, as shown in Fig. 1.14.
-
On the log frequency scale, the tuning curves are steeper
on their high frequency side.
-
The frequency at which a fiber's threshold is lowest
is called its center frequency.
-
The frequency selectivity of a fiber is derived from
the frequency selectivity of the point on the basilar membrane that activates
it.
-
The tonotopic or place representation of frequency on
the basilar membrane is preserved in the auditory nerve bundle with high
center frequencies in the periphery of the bundle and an orderly decrease
in center frequency towards the center of the bundle.
-
Sharpness of tuning on the basilar membrane now appears
to be the same as for single neurons in the auditory nerve.
-
Iso-rate contours can be used to describe the characteristics
of single fibers above threshold.
-
The intensity of sinusoidal stimulation required to
produce a predetermined firing rate in the neuron is plotted as a function
of frequency.
-
Resulting curves have the same general shape as tuning
curves, but sometimes broaden at high sound levels.
-
Another alternative is iso-intensity contours that plot
firing rates at equal sound levels as a function of tone frequency.
-
Shape depends on sound level chosen.
-
Difficult to interpret because relationship between
firing rates and intensity of stimulation is non-linear.
-
Show potentially important result that for some fibers
the frequency that gives maximum firing rate varies with frequency.
-
Rate versus level functions.
-
Fig. 1.16 shows how the rate of firing of an auditory
neuron varies with intensity of a sinusoid at the neuron's center frequency.
-
Neurons vary as to spontaneous and maximum levels.
-
Neuron is said to saturate when further increases in
intensity produce no further increases in firing rate.
-
Range between threshold and saturation is called dynamic
range, and is between 20 and 50 dB for most neurons.
-
Some neurons show sloping saturation, or a gradual increase
in firing rate even at high sound levels. This occurs mainly for neurons
with low spontaneous rates.
-
Neural excitation patterns.
-
In response to low levels of sinusoidal stimulation,
there is a high level of activity in neurons with center frequencies close
to that of the stimulus and falling off rapidly to either side.
-
At higher levels of stimulation, saturation can produce
a high level of activity across units with a wide range of center frequencies.
-
Phase locking.
-
Information about the stimulus is carried not only in
the firing rate of neurons, but also in the temporal pattern of these firings.
-
In response to sinusoidal stimulation, nerve firings
tend to be phase locked or synchronized to the stimulating waveform.
-
A given fiber does not necessarily fire on every cycle
of the waveform, but its firings occur at roughly the same phase of the
waveform.
-
Thus, the time intervals between firings are approximately
integer multiples of the period of the waveform.
-
One way to demonstrate phase locking in a single auditory
nerve fiber is to plot a histogram of the time intervals between successive
firings as in Fig. 1.19.
-
There is variability in the exact instant of initiation
of a nerve impulse and as frequency increases the period of the waveform
eventually becomes as short as this variability.
-
Thus, phase locking in the human auditory system breaks
down above 4-5 kHz.
-
Two-tone suppression.
-
The tone-driven activity of a single fiber in response
to one tone can be suppressed by the presence of a second tone.
-
For a neuron responding to a tone near its center frequency,
a second tone presented within the excitatory area bounded by the tuning
curve for that neuron usually increases its firing rate.
-
When the second tone falls just outside this area, the
firing rate is usually reduced, as shown in Fig. 1.20.
-
The suppression effects onset and offset very rapidly,
and are thought to occur on the basilar membrane.
-
Phase locking of the neuron may also shift from the
original tone to the suppressor tone.
-
Limited investigations of phase locking to stimuli that
are more complex have begun, but are beyond the scope of our course.
-
Neural responses at higher levels in the auditory system.
-
Anatomy is well known, but physiology is not.
-
Feature detectors are involved, but cataloging of relevant
features is just beginning.