Chapter 2:

Chapter 2: The perception of loudness

Introduction.

The human ear has incredible absolute sensitivity and dynamic range.

The most intense sound we can hear without immediate damage to the ear is at least 120 dB above the faintest sound we can just detect.
This corresponds to an intensity ratio of 1,000,000,000,000 : 1.
How could such a range be encoded?

How does the loudness of sounds depend on frequency and intensity?
How can the loudness (as opposed to the intensity) of sounds be measured.
What adaptation and fatigue occur in the auditory system?
How can hearing disorders be diagnosed?

Absolute thresholds.

The absolute threshold of a sound is the minimum detectable level of that sound in the absence of other external sounds.
Two methods of measuring physical intensity of the threshold stimulus yield slightly different results.

Minimum audible pressure (MAP) uses a small microphone at or inside the ear canal and sound is usually presented with earphones.
Minimum audible field (MAF) presents sound via loudspeakers in an anechoic chamber and sound pressure is measured by placing a microphone where the center of the listener's head would be.
Average results for these two methods in young listeners are shown in Fig. 2.1.

Both MAP and MAF curves are lowest in the middle frequencies.

Outer ear enhances sound level at the eardrum by as much as 15 dB between 1.5-6 kHz.
Transmission by the middle ear is most efficient at middle frequencies.

At low frequencies, MAPs are 5-10 dB higher than MAFs due to physiological noise of vascular origin.
Highest audible frequency may be as much as 20 kHz but decreases and becomes more variable with age--a condition known as presbyacusis.
The low frequency limit for true hearing is about 16 Hz.
In most practical situations, detection depends more on masked threshold than absolute threshold.
In clinical measurement of hearing, thresholds are specified relative to the average threshold for young healthy listeners with 'normal' hearing.

Thresholds specified this way have units dB HL (hearing level) or dB HTL (hearing threshold level).
Fig. 2.1b shows typical audiograms for a young graduate student and an old professor.

Equal-loudness contours.

It is often useful to have a subjective scale for the loudness of a sound. Since sounds are often analyzed in terms of their individual frequency components, a useful first step is to devise such a scale for pure tones.

One way to do this the loudness level, which tells us not how loud a tone is, but how intense a 1000 Hz tone must be to sound equally loud.
The loudness level in phons of a 1000 Hz pure tone is defined to be equal to its sound pressure level in dB SPL.
The loudness level in phons of any other pure tone is the sound pressure level in dB SPL of a 1000 Hz pure tone judged to be equally loud.
If subjects are alternately presented various pure tones with a 1000 Hz pure tone, and asked to adjust one or the other of the tones until they have the same loudness, the equal loudness contours shown in Fig. 2.3 result.

The equal loudness contours are similar in shape to threshold functions, but become flatter at higher loudness levels.
This means that the rate of growth of loudness with intensity differs for different frequencies.
Specifically, loudness level grows faster with intensity at low frequencies (and to some extent high frequencies) than at middle frequencies.

Practical implications of equal loudness contours for the reproduction of sounds.

The relative loudness of different frequency components in a sound changes as a function of overall level, thus affecting the tonal balance.
At low levels, we are less sensitive to the very low and very high frequencies.
Many amplifiers have a 'loudness' control which boosts bass and treble at low listening levels.
Sound level meters have weighting scales (A, B and C) to crudely correct for the effect of overall sound level on the contribution of different frequencies to overall loudness.

The scaling of loudness.

Development of scales relating the physical magnitude of sounds to their subjective loudness was pioneered by S. S. Stevens, primarily using two methods.

In magnitude estimation, sounds with different levels are presented and the subject is asked to assign a number to each one according to its perceived loudness. Sometimes a reference sound is also provided.
In magnitude production, the subject is asked to adjust the level of a test sound until it has a specified loudness, either in absolute terms, or relative to a reference sound.

Results of such studies indicate that loudness (L) is a power function of intensity (I): L = kI^0.3, where k is a constant depending on the subject and the units used.

Simple approximation is that a two-fold change in loudness is produced by a 10-dB change in level.
The sone is a unit of loudness equivalent to the loudness of a 1000 Hz pure tone at 40 dB SPL. Fig. 2.5 shows the relationship between sones and phons for a 1000 Hz pure tone.

Criticisms of loudness scaling.

Susceptible to bias caused by a number of factors.

The range of stimuli presented.
The first stimulus presented.
The instructions to the subject.
The range of permissible responses.
Symmetry of the response range.
Other factors related to experience, motivation, training and attention.

Large individual differences and within-subject variability require the averaging of many subjects and responses for consistent results.
We are used to judging the loudness of sources, but not of sensations.
Scaling assumes that the relationship between sensation and response is linear, but that assumption is not independently verifiable.

Models of loudness.

As an alternative to the scaling of loudness, models have been constructed which are fairly successful in predicting the loudness of simple and complex sounds from their physical parameters.
Treatment of these models is beyond the scope of this course.

Temporal integration.

For tone durations in excess of 500 ms, sound intensity at threshold is independent of duration.
For durations less than about 200 ms, the sound intensity necessary for detection increases as duration decreases.
Over a reasonable range of durations, the ear appears to integrate the energy of the stimulus over time in the detection of short duration tones.

If this were exactly true, then the threshold intensity (I) times the tones duration (t) would be a constant for a particular frequency.
A better representation of actual results is that (I - I_L) x t = I_L x t = constant, where I_L is the threshold intensity for a long-duration tone and t is the integration time of the auditory system.

The auditory system almost certainly integrates neural activity, rather than stimulus energy.
The auditory system may not actually perform the operation of integration, but simply have more opportunities to detect the stimulus as its duration increases (multiple looks).
Some investigators have found that the time constant of integration t decreases with increasing frequency, but others have found it to remain relatively constant.
The limits of energy integration have been studied using tones of various durations, but constant energy (I x t). Typical results are shown in Fig. 2.7.

Using a constant-energy, 1 kHz tone with durations from 5 to 500 ms, it was found that detectability (d') was constant for durations from 15 to 150 ms, but fell off for durations longer and shorter than this.
Other investigators have found that the plateau occurs at longer durations for lower frequencies and shorter durations for higher frequencies.
The fall in detectability at very short durations may indicate that energy can only be integrated over a narrow range of frequencies.

The detection of intensity changes and the coding of loudness.

The smallest detectable change in intensity (difference threshold) has been measured for many types of stimuli by a variety of methods.
Most of these methods use two-interval, two-alternative forced-choice (2AFC). Two stimuli differing in intensity are presented successively in random order. The threshold is usually defined as the intensity difference which yields 75% correct responding.

Modulation detection. In one interval, the stimulus is unmodulated and in the other it is amplitude modulated at a low rate. Subjects must indicate which interval contained the modulation.
Increment detection. A continuous stimulus is presented and an increment in level is imposed in one of two intervals. Subjects must indicate which interval contained the increment.
Intensity discrimination of pulsed stimuli. Two pulses of sound are presented successively, one being more intense than the other. Subjects must indicate which was more intense.

Results for these three methods are similar and are usually specified in decibels such that D L = 10log₁₀{(I + D I)/I}.

For wideband or bandpass-filtered noise, Weber's law holds.

The smallest detectable change is a constant fraction of the intensity of the stimulus, i.e., D I/I (the Weber fraction) is constant.
If D L is expressed in decibels, it too is constant at 0.5-1 dB.

For pure tones a 'near miss' to Weber's law is obtained.

If D I is plotted against I (both in dB), a line of slope 0.9 is obtained instead of the slope 1.0 predicted by Weber's law.
Discrimination, as measured by the Weber fraction, improves at high levels.
For a 1000 Hz pure tone, D L ranges from 1.5 dB at 20 dB to 0.3 dB at 80 dB.

A suitable account of the physiological encoding of intensity in the auditory system must thus account for a 120-dB dynamic range, Weber's law for noise bursts, and improved discrimination with level up to about 100 dB for pure tones.
The dynamic range of the auditory system.

If intensity discrimination were based on the firing rates of neurons with center frequencies close to the frequency of the stimulus, we might expect discrimination to worsen at sound levels above about 60 dB SPL since most of these neurons would be saturated.
Since discrimination does not worsen above 60 dB SPL, there might be other mechanisms for coding of intensity changes at high intensities.

One possibility is that when neurons at the center of the excitatory pattern are saturated, changes in intensity could still be signaled by changes in the firing rates of neurons at the edges of the pattern.
Attempts to selectively mask the edges of the excitatory pattern with noise show that this information may play a role in intensity discrimination, but is not necessary to the wide dynamic range of the auditory system.
Another possibility is that even when neurons are saturated, an increase in intensity increases phase locking to the stimulus (quantity and quality).
Studies using stimuli containing only frequencies above the range where phase locking occurs indicate that although changes in phase-locking may play a role, they are also not necessary for the wide dynamic range of the auditory system.

New studies showing that individual neurons carry information about intensity changes in the shape of the rate vs. level function and its variability at each level change the problem.

Simulations show that such information from about 100 neurons is sufficient to account for intensity discrimination.
There are about 30,000 neurons in the auditory nerve, so the question arises as to why intensity discrimination is not finer than it is.
The problem seems to be more one of understanding the limited capacity of more central parts of the auditory system to use information carried in the firing rates of neurons, than one of how intensity changes can be coded in those firing rates.

Weber's law.

At one time it was thought that Weber�s law held for bands of noise because the statistical fluctuations in the noise limited performance.

In intensity discrimination, a device that chooses the noise burst containing the greater energy on each trial can be shown to conform to Weber�s law.
However, even noise without random fluctuations from trial to trial produces results conforming to Weber�s law, indicating that it must arise instead from the operation of the auditory system.

The information conveyed by a single neuron is optimal over a small range of sound levels.

Levels close to or below threshold result in minimal changes in firing rate with level.
Poor coding at high levels is the result of saturation.
Thus, if discrimination were based on information from single neurons, it would not conform to Weber�s law.

Weber�s law for bands of noise can be predicted by models which combine the firing rate information from a small number of neurons (about 100) whose thresholds and dynamic ranges are appropriately staggered so as to cover the dynamic range of the auditory system.

Such models assume that information from a number of neurons with similar center frequencies is combined.
It is further assumed that there are many independent channels, each responding to a limited range of center frequencies.
Weber�s law is assumed to hold for each of these channels.
In later chapters we shall consider this notion that there are many frequency channels in the auditory system, each conforming to Weber�s law, in more detail and see that there is some evidence to the contrary.

The near miss to Weber�s law.

If Weber�s law reflects the normal mode of operation of a given frequency channel, we need to explain why intensity discrimination of pure tones and very narrow bands of noise deviates from it.
There are probably at least two factors contributing to the improvement in intensity discrimination of pure tones and narrow bands of noise at high sound levels.

Zwicker, who studied modulation detection for pure tones, described the first factor.

He assumed that Weber�s law holds for all frequency channels and that the Weber fraction is about 1 dB.
The high-frequency side of excitation patterns (estimated from masking studies, Chapter 3) grows in a nonlinear way with increasing intensity, as shown in Fig. 2.9.
Thus, for example, a 1-dB change in stimulus level produces greater than a 1-dB change on the high-frequency side of the pattern at high sound levels and the Weber fraction appears to decrease.
This idea is supported by Zwicker�s demonstration that addition of a highpass noise to mask the high-frequency side of the excitation pattern produces discrimination results very close to Weber�s law.

The second factor contributing to the near miss to Weber�s law is Florentine and Buus�s suggestion that subjects combine information across a number of frequency channels, i.e., the whole excitation pattern.

As the level of a tone is increased, more channels become excited which allows for improved intensity discrimination.
A model based on this idea is capable of predicting both the near miss to Weber�s law and the effects of highpass masking noise on intensity discrimination.

In summary, the near miss to Weber�s law for pure tones can be accounted for by the nonlinear growth of the high-frequency side of the excitation pattern; and the ability of subjects to combine information from different parts of the excitation pattern.

Loudness adaptation, fatigue and damage risk.

The distinction between adaptation and fatigue.

In all sensory systems, exposure to a stimulus of sufficient duration and intensity produces reductions in responsiveness.
Auditory fatigue results from application of a stimulus which is usually much in excess of that required to sustain the normal physiological response of the receptor and is measure after the fatiguing stimulus has been removed.
Auditory adaptation refers to a decline in response of a receptor to a steady stimulus as a function of time until it reaches a steady value.

Post-stimulatory auditory fatigue.

The most common measure of auditory fatigue is a temporary threshold shift (TTS).

The subject�s absolute threshold at a particular frequency is measured.
A fatiguing stimulus is presented for a specified time, and then removed.
The threshold is again measured and any increase would be taken as a measure of fatigue.

There are five major factors that influence the size of the TTS.

The TTS generally increases with the intensity of the fatiguing stimulus.

At low intensities of the fatiguing stimulus TTS changes slowly with intensity and only occurs for test tones with frequencies close to that of the fatiguing stimulus.
As intensity increases, the TTS increases, as well as the range of frequencies affected. At very high frequencies, the maximum TTS may occur for test frequencies a half octave or more above the frequency of the fatiguing stimulus as shown in Fig. 2.10.
For fatiguing stimuli above 90-100 dB, there is a precipitous increase in TTS that may represent the transition between fatigue that is physiological and transient and that, which is more permanent and pathological.

TTS generally increases with duration of exposure to the fatiguing stimulus and is often linearly related to the log of the duration.
TTS generally increases with frequency of the fatiguing stimulus, at least up to 4-6 kHz. This is also the range in which permanent hearing loss resulting from exposure to intense sounds or from presbyacusis tends to be greatest.
TTS generally decreases with time since the fatiguing stimulus was removed, but is often diphasic at higher test frequencies, as shown in Fig. 2.11.

There is some suggestion that high levels of sound may be less permanently damaging if the sound is pleasant (e.g., music) than unpleasant (e.g. industrial noise). However, long enough exposure to sufficiently intense sound will produce permanent damage.

Exposure to a sound level of 85 dB SPL for 8 hours a day is currently considered safe.
If the exposure duration is halved, permissible intensity is doubled (i.e., increased by 3 dB).
Sound levels over 110 dB can produce permanent damage very quickly.

Auditory adaptation.

Early studies of auditory adaptation used a simultaneous dichotic loudness balance test in which a continuous tone is applied through earphones to one ear and the subject adjusts the level of a similar tone applied occasionally to the other ear until it sounds equally loud.
Such studies found large amounts of auditory adaptation in that the test tone was adjusted to lower levels as time passed.
Newer techniques that eliminate binaural interaction between the adapting and the comparison tones either use very different frequencies for the two tones, or use just an adapting tone that the subject adjusts to maintain constant loudness.
Such techniques find little if any auditory adaptation for tones well above absolute threshold (50-90 dB SPL).
Significant amounts of auditory adaptation only appear to occur for low level, high frequency sounds, and even then there are large individual differences in the results obtained.

Abnormalities of loudness perception in impaired hearing: Loudness recruitment and pathological adaptation.

Types of hearing loss.

Conductive hearing loss refers to a defect in the outer or middle ear that reduces transmission of sound to the inner ear.

Could be produced by conditions such as ossification of the ossicles, growth of bone over the oval window, build up of fluid due to middle ear infection, or wax in the ear canal.
Produces a simple attenuation of the incoming sound so that the difficulty experienced by the sufferer can be well predicted from the elevation in absolute threshold (audiogram).
Usually amenable to treatment by a hearing aid to amplify sound or surgery to remove the obstruction.

Sensorineural hearing loss (sometimes inaccurately called nerve deafness) refers to a defect in the cochlea (cochlear loss) or, less commonly, in the auditory nerve or higher centers in the nervous system (retrocochlear loss).

Could be produced by such things as birth defects, anoxia, traumatic injury, prolonged exposure to intense sounds, age or certain drugs.
Often the extent of loss increases with frequency and difficulties experienced by the sufferer are not always well predicted from the audiogram.
Sufferers often experience difficulty in understanding speech in noisy settings.
The condition is usually not completely alleviated by conventional hearing aids, nor is it usually treatable by surgery.

Loudness recruitment.

Cochlear hearing loss is almost always accompanied by loudness recruitment, which is an unusually rapid growth of loudness as the sensation level of a tone is increased.
Thus, absolute thresholds on an audiogram would be elevated, but loudness levels at high sound levels (perhaps as indicated by loudness discomfort levels) would be similar to those for normal ears.
Loudness recruitment occurs in normal ears for very low and very high frequencies.
When only one ear is affected, loudness recruitment can be measured with the alternate binaural loudness balance test.

A tone of a given level and frequency is present to one ear and alternated with a variable tone of the same frequency to the other ear.
The level of the variable tone is adjusted to have the same loudness.
If this is repeated for a number of different levels, the rate of growth of loudness in the normal and impaired ear can be compared, as shown in Fig. 2.12.

There are other clinical tests for loudness recruitment based on the assumption that if loudness is increasing more rapidly than normal as the stimulus intensity is increased, then a smaller than normal intensity change should be required for a just-noticeable difference in loudness.

Intensity discrimination is affected not only by the slope of the loudness-growth function, but also by internal variability that also tends to increase in impaired ears and may offset the gain from the steeper growth.
These tests are of questionable validity and should not be relied on.

The simplest and best clinical test for loudness recruitment at this time is to look for a combination of elevated absolute thresholds (audiogram) and normal loudness discomfort levels. This is a very reliable indicator of cochlear damage.
Without going into detail, it appears that loudness recruitment is caused by a steepening of the input-output function (velocity of movement as a function of sound level) of the basilar membrane when the cochlea (probably the outer hair cells) is damaged.

Pathological adaptation.

Abnormal processes in the auditory nerve (much less commonly in the cochlea) sometimes result in very rapid decreases in neural responses after a nearly normal onset response.
Perceptually this leads to more extreme and rapid than normal adaptation and can be used to diagnostically identify the source of a hearing loss as retrocochlear.
Methods of measurement.

The simultaneous dichotic loudness balance procedure also used to study normal adaptation.
The most common clinical procedure is called a tone decay test and simply measures threshold (audiogram) for continuous vs. interrupted tones. For persons with retrocochlear hearing loss, the threshold for continuous tones may be 20-30 dB higher than for interrupted tones.

In summary, recruitment is the hallmark of cochlear hearing losses, while pathological adaptation is usually indicative of a retrocochlear loss.