|Year : 2016 | Volume
| Issue : 2 | Page : 92-99
Temporal fine structure frequency bands criticality in perception of the speech in the presence of noise
School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN 38105, USA
|Date of Web Publication||11-May-2016|
School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN 38105
Source of Support: None, Conflict of Interest: None
Context: In the competing background, noise speech cues are compromised. In quiet and noisy conditions, the distribution and weighting of temporal cues for the speech perception vary across different frequency spectrum. Temporal fine structure (TFS) cues help in perceiving speech in noise. Knowledge about the perceptual weighing of fine structure cues is essential for the design of assistive listening devices Aims: In the present study, the perceptual weighing of the fine structure cues across frequency bands in different listening conditions is measured. Settings and Design: Data were collected in a double room set up and adapted Plomp (1986) sentence speech recognition threshold (sSRT) method was used. Subjects and Methods: Forty normal hearing individuals presented without and with (very low frequency [VLF], low frequency, mid frequency [MF], and high frequency) filtered fine structure bands of hearing in noise test sentences. sSRTs were measured for uninterrupted and interrupted stimulus in different signal to noise conditions (quiet, 0 dB, +10 dB, and −10 dB). ANOVA, post hoc, and bivariate Karl Pearson correlation statistical analysis were used. Results: The sentence recognition for filtered stimulus in quiet and in noise resulted in significantly higher relative perceptual weight placed upon the TFS in the VLF band and MF at 0 dB signal to noise ratio (SNR) (P < 0.001). The relative weighting of fine structure cues demonstrated the importance VLF band in quiet and noisy conditions and of the MF band that contains dynamic formant movement at 0 dB SNR condition.
Keywords: Different signal to noise conditions, Low frequency, Mid-frequency and high-frequency bands, Temporal fine structure, Very low frequency
|How to cite this article:|
Yellamsetty A. Temporal fine structure frequency bands criticality in perception of the speech in the presence of noise. Indian J Otol 2016;22:92-9
|How to cite this URL:|
Yellamsetty A. Temporal fine structure frequency bands criticality in perception of the speech in the presence of noise. Indian J Otol [serial online] 2016 [cited 2020 Sep 26];22:92-9. Available from: http://www.indianjotol.org/text.asp?2016/22/2/92/182274
| Introduction|| |
In background listening conditions, quasi-periodic spectrally and temporally varying speech signal interferes with the noise spectrum; this results in faded cues for perceiving speech. This effect may vary with different kinds of noise and with signal to noise ratio (SNR). When complex mixed signal-like speech in the noise passes through the cochlear pass band filters output consist of fast varying fine structure and slow amplitude envelope of both speech signal and noise. The varying fine structure cues are specific for a given signal; these cues contribute to the perception of speech even in the presence of the noise and in music perception. Temporal fine structure (TFS) encods the dynamic property of speech and provides essential information for distinguishing phonemes(Price and Simon, 1984; Van Tasell, Soli, Kirby and Widin, 1987).,
Audiological studies have shown that damage to the peripheral auditory system (e.g., cochlear lesions) degrades the ability to use TFS cues but preserves the ability to use E cues. Individuals with moderate to severe hearing loss have poor ability in encoding TFS than individuals with mild hearing loss and normal hearing listeners (Hopkins and Moore, 2008)., Nie et al. (2005) signified TFS indirectly in his studies, with cochlear implants (CIs) experienced the difficulties to recover the TFS precisely in the presence of noise. A number of studies were conducted to test the importance of the TFS in normal hearing listener and in hearing impaired by altering the signals presented.
The effect of temporal cues varied with in quiet and with signal to noise conditions. Studies in quite conditions showed that fine structure information in speech is neither necessary nor sufficient for understanding speech in quite , (Smith, Delgutte, and Oxenham, 2002). In contrast, listeners with normal hearing showed perfect consonant identification with only TFS information  (Narne, Manjula, and Vanaja 2008) may be due to preserved formant transitions. Drullman (1995b) showed that at critical SNR conditions, listeners with normal hearing obtained 100% scores in quiet when only envelope cues and eight channels of spectral information (speech shape noise) was provided, whereas in presence of noise, speech understanding reduced dramatically and was as low as 10–20% when the SNR was 0 dB , (Dorman, Loizou, and Tu, 1998; Stickney). They attributed a reduction in identification scores by removal of fine structure information in adverse noise conditions. The temporal envelope information is very critical for understanding speech in quiet and in the presence of noise while TFS plays an important role in understanding speech in critical SNR conditions. In the presence of competing signal-like noise, perceptual weighting shifts. Studies conducted on this spectral energy distribution in normal listeners showed that the high frequency (HF) spectral region (>1.5 kHz) of speech contains a significantly greater amount of energy in the mid-frequency (MF) modulation spectrum (10–25 Hz) than does the lower frequency spectral region  and below 12 Hz – low spectral region <1000 Hz (Silipo et al. 1999). Hopkins and Moore  then explored the importance of TFS acoustic information to the contribution of five independent frequency regions spanning the frequency range of 100–8000 Hz. They found that TFS significantly and equally contributed to speech recognition thresholds (SRTs) in each frequency band above what was provided by envelope information alone. Fogerty (2011) reported the perceptual weighting functions of envelope and TFS information across frequency bands. In normal hearing listeners during uninterrupted meaningful sentences, listeners weighted MFs the most for E and TFS processing, with envelope cues weighted more than TFS across all frequency bands. In CIs, signal is mainly encoded through envelope information in different frequency bands, with the advance technology TFS is partly coded, yet complete TFS is not restored compromising the perception of speech in the competing noise. Few studies evaluated the potential contribution of TFS information to speech recognition in noise through acoustic stimulation in CI. Results revealed with added TFS cues to FM slow varying FM signal improved perception by 71% in noise (Nie et al., 2005). In CI, processing restored low frequency (LF) information aids in the intelligibility of speech in the presence of reversed or competing speech (Kong and Carlyon, 2007; Li and Loizou, 2008).,
In electro-acoustic stimulation, the HF information in the acoustic signal is coded by electrical stimulation whereas the LF signal is delivered acoustically; therefore, the benefit of electro-acoustic stimulation in delivering the TFS cues would depend on the amount of residual hearing at the LFs. However, until date, there are very few studies that have been carried out to know the extent of residual LF information that is important for perception of TFS. Therefore, there is limited information available on the spectral bands that are important for perception of TFS and corresponding length of the cochlea of this frequency band spectrum which aids speech perception in noise. If information on which of the spectral bands that is important for the perception of speech in noise can be estimated, then one may be able to predict the outcome measures of electro-acoustic stimulation necessary for perception of speech in noise.
In a multi-linguistic and cultural country like India, as many languages are in use, they tend to influence perception and production of the link language, i.e. Indian English. Thus, English language used in India is known to differ from English as used in America or UK where some studies have been carried out on the spectral frequency bands (TFS) that are important for the perception of speech in the presence of noise in both normal and hearing-impaired individuals. As there are no such studies which suggest criticality of TFS frequency bands that are important for the perception of speech in the presence of noise, the current study has been undertaken.
The present study aimed at measuring the perceptual weighting of the frequency range of the fine structure becomes important for the perception of speech in noise. The objective of the present study is to gather information on which frequency fine structure is critical for the perception of speech in the different competing noise levels and in quite. Therefore, this information may useful in designing signal processing strategies that can at least provide the TFS of the frequencies that are important for the perception of speech in the presence of background noise.
| Subjects and Methods|| |
A total of 40 participants, 12 males and 28 females, in the age range of 18–25 years (mean age = 21.5 years) were selected for the study after a passing the subject selection ANSI criteria (audiometric thresholds of 20 dB HL and normal immittance measures and histories consistent with normal hearing). All subject volunteered to participate in the study. A written consent was taken from all the voluntary participants after explaining the nature and purpose of the study.
Stimuli and design
The sentences used in this study HINT test in Indian English (unpublished dissertation – Apurva, 2004).
Synthesis of the stimuli
The stimuli consisted of two set of sentences, unprocessed (UP) sentences and processed (P) sentences. Unprocessed (UP) sentences are the HINT sentences in quite conditions which were used without any modification. Processed (P) stimuli are the sentences were the HINT sentences in quite condition were frequency band filtered selectively resulting in four types of sentence lists which were: Very low frequency (VLF) band TFS filtered out sentence list, LF band TFS filtered out sentence list, MF band TFS filtered out sentence list, and HF band TFS filtered out sentence list. To determine the frequency range for the filtration, it was required to understand the bandwidths and the frequency allocation in the basilar membrane (BM) of the cochlea.
The frequency range selected for the study spanned from 100 Hz to 10 kHz which covering the essential length of the cochlea for the perception of speech. The selected frequency range is divided into 32 equal bands; each band corresponds to 1 mm on the BM. Further divided into four bands (eight equivalent rectangular bandwidth number [ERBn] each) VLF, LF, MF, and HF bands. The bandwidth of each frequency band was determined by the formula given by Greenwood and Donald (1990)
Therefore, 32 length points and their bandwidths were obtained. Using the data of bandwidth and the ERBn, values of the four band groups were calculated. Apical BM from 3.84 to 9.74 mm corresponds to the frequency range from 116.2 to 520 Hz − VLF, from 10.58 to 16.48 mm −521 to 1560 Hz LF, from 17.33 to 23.23 mm ~1.56 to 4.176 kHz MF, and from 24.91 to 29.9 mm ~4.2 to 10.2 kHz represented HF. By using these frequency ranges constituting four bands, further signal synthesis is carried out.
Synthesis of the processed stimuli
Synthesis of the processed stimuli was done using Mathwork MatLab 2010a. A full wave rectification and low-pass filtering (using Chebyshev filters with a roll of 12 dB/Octave) are used to extract the fine structure ,, (Ardoint and Lorenzi 2010; Moore et al., 2006), of all the HINT quite sentences. The signal processing program therefore yielded four types of stimuli VLF, LF, MF, and HF processed sentences, i.e., the VLF processed stimuli contains the envelope and fine structure of all the bands except the VLF fine structure for all the HINT quite sentences and same for other band groups and normalized. Reverted speech babble was used as noise to avoid any spectro-temporal dip cues.
For all the participants, stimuli were delivered monaurally, i.e. signal alone or both signal and speech babble was presented to the right ear at the starting level of PTA + 15 dB, and sentence SRT (sSRT) is established. Participants were instructed to repeat the whole sentence. Plomp (1987) gave a standard 2 dB procedure for measuring sSRT was adapted to track the sSRT for the present study.
The unprocessed sentences are presented to the participants to find the sSRTs in quite condition and noise at different SNRs namely 0 dB SNR, +10 dB SNR, −10 dB SNR. Similarly, the processed sentences are presented to the subjects to track the sSRT's for all the four list of TFS filtered sentences, i.e. VLF, LF, MF, and HF filtered band in quiet condition and noise at various SNRs (0 dB SNR, +10 dB SNR, −10 dB SNR) conditions. Final sSRT for each condition is the average of minimum dB at which the whole sentence was repeated to maximum dB where consecutive five sentences were correctly repeated by the participant.
| Results|| |
Listeners were tested on the perceptual weighting for processes stimulus and unprocessed stimulus in quiet and in noise conditions.
Analysis of response to unprocessed signal in quiet and noise at various signal to noise ratio
To find out whether the observed difference in the mean values of sSRT from quiet to noise conditions and for processed to unprocessed sentences with different band pass filters stimuli turns out to be statistically significant or not the group mean (m), standard deviation (SD), a one-way ANOVA, and post hoc test using SPSS v20.0 (IBM, New York, U.S.) were calculated and data are presented below.
From [Figure 1], the obtained mean value for no noise (UPq) condition was found lower as compared with noise conditions of UPn 0 dB and UPn +10 dB SNR levels. However, for –ve SNR of −10 dB, there were no responses found. The SD indicates that the score was homogeneous. One-way ANOVA showed statistically significant differences between quite to with noise condition for unprocessed stimuli at 0.05 levels F (2, 15) = 256.189 and P < 0.001.
|Figure 1: The mean speech recognition threshold across for all the three conditions (0 dB signal to noise ratio, +10 dB signal to noise ratio, and no noise conditions)|
Click here to view
Analysis of response of sentence speech recognition threshold for the processed stimuli in no noise condition
Filtering of specific frequency band of the TFS resulted in VLFq, LFq, MFq, and HFq filtered sentences. sSRT were tracked for the filtered sentences in quite condition.
The lowest mean sSRT level was obtained for MFq band filtered sentences and highest scores for VLFq filtered band [Figure 2]. The obtained results are statistically significant (F = 8.221; P < 0.001). This implies that filtering of MF or HF components of TFS will not affect the sSRT value as compared to removal of LFq or VLFq filtered TFS in no noise condition.
|Figure 2: The mean speech recognition threshold and standard deviation for Pq stimuli (very low frequency, low frequency, mid frequency, and high-frequency bands and no noise condition)|
Click here to view
To determine which band of frequency TFS filtration affect sSRT more significant than the other band, Bonferroni post hoc test is carried out. Results revealed that VLFq filtered TFS sentences in quite condition comparison with other bands as LFq, MFq, and HFq-all turned out to be statistically significant (P < 0.001).
Analysis of processed stimuli in different noise conditions
At 0 dB SNR condition, the lowest mean sSRT level was obtained for MF 0 dB band filtered sentences and the highest for VLF 0 dB filtered band [Figure 3]. This follows the same trend seen in no noise condition [Figure 2]. This predicts that filtered MF 0 dB, HF 0 dB components of TFS will not affect much the sSRT values as compared to removal of LF 0 dB and VLF 0 dB filtered stimuli as obtained sSRT levels increased with that.
|Figure 3: The mean speech recognition threshold and standard deviation for processed stimuli (very low frequency, low frequency, mid frequency, and high-frequency bands) at 0 dB signal to noise ratio noise condition. The mean and sentence speech recognition threshold of the very low frequency, low frequency, mid frequency, and high-frequency bands where very low-frequency band has the highest mean and standard deviation, mid frequency has the lowest mean values|
Click here to view
When VLF 0 dB band is filtered, the mean sSRT scores are higher as compared to other frequency bands filtered stimuli (F (3, 12) = 11.098 at P < 0.000). However, unlike in no noise condition where filtration of VLF 0 dB bands turned out to be significant here in post hoc analysis, the contribution of MF 0 dB band for sSRT was found to be important at 0 dB SNR (VLF 0 dB [P < 0.003], LF 0 dB [P < 0.001], and HF 0 dB [P < 0.001]).
At +10 dB SNR condition, the lowest mean sSRT was obtained with MF filtered bands sentences and the highest for VLF +10 dB band filtered stimuli [Table 1]. These noticed changes in sSRT mean value were found to follow the same trends as seen with 0 dB SNR noise condition. VLF +10 dB filtered bands of TFS as sSRT levels with their stimuli got altered significantly F (3, 12) = 16.01 at P < 0.001. The mean sSRT (Bonferroni post hoc test) of VLF +10 dB filtered sentences was turned out to be statistically significant with +10 dB LF (P < 0.005), MF (P < 0.001), and HF (P < 0.001) filtered band TFS. It is noticed that there was more effect on sSRT (higher thresholds) with VLF band filtration with respect to all other bands of frequencies followed by LF band filtration. The VLF followed by the LF band filtration effect was pronounced in all signal to noise conditions on the sSRT values. The influence of MF and HF filtration had variable outcomes in different SNR conditions on the sSRT.
|Table 1: The mean and SD value of sSRT with differently filtered bands stimuli at +10 dB SNR condition|
Click here to view
At −10 dB SNR condition, the processed signal (Pn) were presented at −10 dB SNR; no responses could be obtained for VLF, LF, and HF band filtered sentence material as found in the case of unprocessed sentences in the presence of noise (−10 dB SNR). However, a few responses were seen in MF band pass filtered sentences. No further statistical analysis was computed.
The effect was compared between genders. No gender effect is seen for both no noise and noise conditions.
Significance of the various bands
The analyzed data indicated that filtration of different frequency bands contributed differently for sSRT in the three different conditions evaluated, i.e., quite, 0 dB SNR, and +10 dB SNR conditions.
However, it becomes necessary to know whether or not the mean sSRT of different filtered bands of TFS showed variation with respect to no noise, 0 dB SNR, and +10 dB SNR. For the VLF filtered TFS sentences in no noise condition and 0 dB SNR and +10 dB SNR condition, the mean sSRT values changed to become higher (mean square = 2888.656) turning out to be statistically significant meaning that it affects perception of speech (F (2, 15) = 24.161, P < 0.000) indicated significant difference for sSRT among all the three conditions for VLF TFS filtration and least for MF filtered TFS (mean square = 2888.656; F (2, 15) = 77.000, P < 0.000). This indicates that the distribution of weights high on VLF band, i.e., the contribution of VLF is high for the perception of speech at 0 dB and +10 dB SNR conditions.
For checking each frequency band correlation between different SNR conditions (i.e., in between no noise condition and noise conditions 0 dB SNR, +10 dB SNR), Bi-variant Karl Person correlation is computed.
[Table 2] indicates that there is a positive correlation between no noise, +10 dB SNR, 0 dB SNR for VLF band, i.e., when the VLF band is removed sSRT scores among no noise, 0 dB SNR, and +10 dB SNR also varied. However, it was observed that there was a strong correlation when MF band is removed in no noise and +10 dB SNR condition and when HF band is removed in +10 dB SNR condition.
|Table 2: The Bi-variant Karl person correlation when the VLF, LF, MF and HF is removed for all the 3 conditions (0dB SNR, +10dB SNR and no noise)|
Click here to view
| Discussion|| |
For both the unprocessed and processed conditions, performance was better at no noise, 0 dB, and +10 dB SNR conditions; at −10 dB SNR, all the participants failed to respond. For the processed stimulus, at no noise conditions, VLFq had the highest and MFq band had the lowest mean values. This shows that no noise conditions the MF cues are readily available to the listener and these cues majorly contribute for the perception of speech in quite conditions. However, the post hoc variation was found to be significant only with VLFq band filtered TFS stimuli. This is contradicted by the study by Hopkins and Moore  where they have studied the importance of TFS acoustic information to the contribution of five independent frequency regions spanning the frequency range of 100–8000 Hz and they reported that TFS significantly and equally contributed to SRTs in each frequency band above what was provided by envelope (E) information alone.
With the present study frequency hierarchy, when HFq, MFq, and LFq frequency bands of TFS when filtered the impact was not significant. Whereas when VLFq (100–500Hz) TFS were filtered, the sSRT levels changed to become higher and significant meaning that it affects the perception of speech even in quiet conditions. This finding were also supported in the study done by Ardoint and Lorenzi (2010), which states that for both E and TFS speech, the greatest effect of low-pass and high-pass filtering was found for cutoff frequencies between 1.3 and 3.4 kHz. Some improvement was also noted when adding TFS above 4000 Hz when no other TFS information was available, suggesting that even though phase-locking is reduced at these frequencies, TFS cues, when presented in quiet, still provide important speech information. This suggests that TFS may contain partially redundant information in multiple frequency regions. Hopkins and Moore  have investigated the spectral distributions of TFS cues more comprehensively. They incrementally added TFS to E-only speech starting the first with the LF bands and then starting with the HF bands. The most improvement in intelligibility occurred when TFS was added to frequency bands under 1000 Hz, consistent with TFS serving as an important cue for coding F0 (Moore and Moore, 2003).,
In noise conditions (0 dB, +10 dB, and −10 dB SNR), the perceptual weighting distribution of the TFS was noticed to be varying at different SNR conditions. At 0 dB SNR conditions, the contribution of the MF 0 dB band for sSRT was found to be significant and even though the mean values trend was similar to the no noise condition, VLF 0 dB band had the highest value. These results are consistent with findings of Ardoint and Lorenzi (2010) that suggest E and TFS cues convey important – but distinct – phonetic information between 1 and 2.5 kHz and the gradients in their study differed across speech processing schemes in various filtering conditions, suggesting that E and TFS do not convey identical cues in the MF range. Similarly, in a study conducted by Doherty and Turner (1996) for most of the nonsense syllables, he found the majority of listeners weighted the MF band (750–2500 Hz). In natural conditions and modulated speech conditions, the importance of TFS cues increases during interrupted speech; the availability of E cues appears to decrease. This suggests that the perceptual weights of listeners will shift to more available or informative cues depending upon the speech context. In the present study, as the stimuli used are meaningful sentences, there may be phonetic cues (informative cues) present in the MF region that contribute for the perception of the stimuli. Recently, Fogerty (2011) reported the perceptual weighting functions of E and TFS information across frequency bands. During uninterrupted meaningful sentences, normal hearing listeners weighted MFs the most for E and TFS processing.
Unlike 0 dB SNR, VLF +10 dB filter band and HF +10 dB filter band sentences were turned out to be significant. These findings are supported by the results of Hopkins and Moore, which showed that normal hearing subjects benefited more from TFS information when listening in modulated noise than in steady noise, as at +10 SNR signal is 10 dB higher than the noise, this may result in slight modulation of the presented sentences. They also compared E and TFS cues and found that TFS is most important for masking release, particularly the TFS in high spectral frequency bands as it contributes to the intelligibility of speech during fluctuating noise, but not steady-state noise Hopkins and Moore (2010). This is consistent with the idea that hearing-impaired subjects may be able to use TFS information at LFs but are unable to use higher-frequency TFS information, even for frequencies where phase-locking is believed to be robust in the normal auditory system (Palmer and Russell, 1986; Moore, 2003; Stone, Moore, and Hopkins, 2008).,, Therefore, it can be inferred that removal of VLF +10 dB band only has greater effect on sSRT at +10 dB condition followed by LF band filtration and HF band contribution was significant at +10 dB SNR condition for sSRT values.
Traditionally, it has been assumed that the benefit from TFS information at HFs (above 4000 Hz) through phase-locking information is unusable for frequencies above 4000–5000 Hz. Heinz et al. used an auditory nerve model for investigating whether human pure-tone frequency and level discrimination could be accounted for by rate-place information alone at HFs. They found that for frequencies up to at least 10,000 Hz, psychophysical performance was best predicted when both rate-place and temporal information were included in the model. This result suggests that, even if phase-locking characteristics in humans are similar to those in other mammalian species, some TFS information could be useful even at very HFs.
Significance of the various bands
When individual frequency bands filtration effect at different SNR and no noise conditions were tested, the results showed significant effect of all the bands.
This suggests that there are differences shown in SNR cues for different noise condition across the processed and unprocessed conditions. This is explained by the recent researches that have shown that changes in SNR alone can result in substantial changes in the fluctuating masker benefit (FMB) that occurs with a target speech signal and a fluctuating masker. Specifically, these results have shown that the magnitude of the FMB systematically increases with decreasing SNR (Bernstien and Gran, 2009; Oxenham and Simonson, 2009).,
At different SNRs, the slopes are different for sSRTs. This may be because of the interaction between masker level and the shape of the intensity importance function that describes the distribution of speech information across the dynamic range. The unmasked by dips in the fluctuating masker level portion of the dynamic range might contribute more to overall intelligibility at lower SNR's than at higher SNR's (Bernistein and Grant, 2009; Freyman et al., 2008)., Similarly, a similar study conducted by Fogerty, where he has varied the signal to noise in each channel independently and estimation of the relative perceptual weight a listener placed on that channel was obtained for each interruption condition. Overall, the results demonstrated similar relative weighting patterns across the interruption conditions, with the majority of weight placed on the HF band envelope. However, in the present study, it was associated to the contribution of the signal which was filtered off the VLF TFS and less by HF band comparatively; this may be because of the difference in the type of stimulus used.
To summarize, the present study attempted an investigation in the area of TFS contribution for the perception of speech in a difficult to listening conditions in normal hearing individuals, and it was observed that the VLF band with the center frequencies below 520 Hz which covers the apical most ~9.74 mm on the BM (calculated on the basis of Greenwood frequency-place mapping formula) contributes the most for the perception of speech both in quiet and in the presence of the background noise (speech babble). This was followed by the LF band with center frequencies below 1500 Hz which covers the apical ~16.48 mm on the BM. This helps us to know how much apical part BM length has to be retained in CI candidates to restore LF information in consistent with the contribution of TFS information as measured from the present study which aids in speech perception in challenging conditions from the retained apical areas which aids in the intelligibility of speech in the presence of reversed or competing speech (in Indian English). In hybrid implants, to determine the depth of insertion of the electrode array so that the apical portions (~16.48 mm) that code the TFS cues which contribute mostly for the perception of speech in difficult listening condition and music can be preserved/retained.
The author would like to specially thank Dr. M. N. Nagaraja, Dr. Madhuri Gore, Dr. Rashmi Bhat, Dattareya Madiraju, Nisha Sara Dickson, JISNAR, who were instrumental in supporting and offering comments regarding this work. This work was conducted as part of a dissertation completed at Dr. S. R. Chandrasekhar Institute of Speech and Hearing, Bangalore, Karnataka. Special thanks to Dr. Shaum P. Bhagat for his support in publishing this work.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Price PJ, Simon HJ. Perception of temporal differences in speech by ''normal-hearing'' adults: Effects of age and intensity. J Acoust Soc Am 1984;76:405-10.
Van Tasell, Dianne J. Speech waveform envelope cues for consonant recognition. J Acoust Soc Am 1987;82:1152-61.
Lorenzi C, Gilbert G, Carn H, Garnier S, Moore BC. Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proc Natl Acad Sci 2006;103:18866-9.
Hopkins, Kathryn, Moore BCJ. Moderate cochlear hearing loss leads to a reduced ability to use temporal fine structure information. J Acoust Soc Am 2007;122:1055-68.
Zeng FG, Nie K, Liu S, Stickney G, Del Rio E, Kong YY, et al.
On the dichotomy in auditory perception between temporal envelope and fine structure cues. J Acoust Soc Am 2004;116:1351-54.
Smith, Zachary M, Delgutte Bertrand, Oxenham AJ. Chimaeric sounds reveal dichotomies in auditory perception. Nature 2002;416:87-90.
Kumar Narne V, Vanaja CS. Speech identification and cortical potentials in individuals with auditory neuropathy. Behavioral and Brain Functions 2008;4;15.
Drullman, Rob. Temporal envelope and fine structure cues for speech intelligibility. J Acoust Soc Am 1995;97:585-92.
Dorman MF, Loizou PC, Fitzke J, Tu Z. The recognition of sentences in noise by normal-hearing listeners using simulations of cochlear-implant signal processors with 6–20 channels. J Acoust Soc Am 1998;104:3583-85.
Silipo, Rosaria, Greenberg S, Takayuki Arai. Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations. Eurospeech 1999.
Hopkins K, Moore BCJ. The importance of temporal fine structure information in speech at different spectral regions for normal hearing and hearing-impaired subjects. J Acoust Soc Am 2010;127:1595-608.
Fogerty D. Perceptual weighting of individual and concurrent cues for sentence intelligibility: Frequency, envelope, and fine structure. J Acoust Soc Am 2011;129:977-88.
Zeng, Fan-Gang, Kaibao Nie, Stickney GS, Kong YY, Vongphoe M, Bhargave A, et al.
Speech recognition with amplitude and frequency modulations. Proceedings of the National Academy of Sciences of the United States of America 2005;102:2293-98.
Kong, Ying-Yee, Robert PC. Improved speech recognition in noise in simulated binaurally combined acoustic and electric stimulation. J Acoust Soc Am 2007;121:3717-27.
Li, Ning, Philipos C. Loizou. A glimpsing account for the benefit of simulated combined acoustic and electric hearing. J Acoust Soc Am 2008;123:2287-94.
Greenwood, Donald D. A cochlear frequency-position function for several species#8212;29 years later. J Acoust Soc Am 1990;87:2592-05.
Moore BCJ, Glasberg BR, Hopkins K. Frequency discrimination of complex tones by hearing-impaired subjects: Evidence for loss of ability to use temporal fine structure. Hear Res 2006;222:16-27.
Ardoint, Marine, Christian Lorenzi. Effects of low-pass and high-pass filtering on the intelligibility of speech based on temporal fine structure or envelope cues. Hearing research 2010;260:89-95.
Moore BC. The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people. J Assoc Res Otolaryngology 2008;9:399-406.
Moore BCJ, Geoffrey A. Moore. Discrimination of the fundamental frequency of complex tones with fixed and shifting spectral envelopes by normally hearing and hearing-impaired subjects. Hearing research 2003;182:153-63.
Hopkins, Kathryn, Moore BCJ, Stone MA. Effects of moderate cochlear hearing loss on the ability to benefit from temporal fine structure information in speech. J Acoust Soc Am 2008;123:1140-53.
Doherty, Karen A, Turner CW. Use of a correlational method to estimate a listener's weighting function for speech. J Acoust Soc Am 1996;100:3769-73.
Palmer AR, Russell IJ. Phase-locking in the cochlear nerve of the guinea-pig and its relation to the receptor potential of inner hair-cells. Hearing research 1986;24:1-15.
Heinz MG, Colburn HS, Carney LH. Evaluating auditory performance limits: I. one-parameter discrimination using a computational model for the auditory nerve. MIT Journal 2001;13:2273-316.
Oxenham AJ, Simonson AM. Masking release for low- and high-pass-filtered speech in the presence of noise and single-talker interference. Acoust Soc Am 2009;125:457-68.
Bernstein JG1, Grant KW. Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. J Acoust Soc Am 2009;125:3358-72.
Freyman RL, Balakrishnan U, Helfer KS. Spatial release from masking with noise-vocoded speech. J Acoust Soc Am 2008;124;1627-37.
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2]