Phonemic restoration in a sentence context: Evidence from early and late ERP effects
Introduction
In everyday life, speech is often heard in noisy environments and yet we are able to understand it quite effortlessly. Even when some of the speech sounds are not acoustically present we may not notice any disturbance in continuous speech (Warren, 1970). In such cases, semantic context and the properties of the intervening sounds influence the perceptual synthesis of missing speech sounds. Warren and his colleagues have studied behaviorally situations in which a speech sound is replaced by another non-speech sound (Warren 1970 1984; Warren and Obusek 1971; Warren and Sherman 1974). In their original study participants heard a sentence in which one phoneme of a word was replaced by a cough. As their participants were not aware of the missing phoneme and could not specify the location of the non-speech sound in the written form of the sentence that they had just heard, the phenomenon was called “phonemic restoration”. Subsequent behavioral experiments (e.g., Samuel 1997 2001) have indicated that lexical context can influence the perceptual restoration of a phoneme. The present study aimed to shed light on the time course of bottom-up vs. top-down processes in the perception of words when their initial phoneme is replaced by a non-speech sound. We examined the electrophysiological and behavioral responses to sentence final words which varied in their phonological completeness as well as in their expectancy in the given context.#
Models of word recognition of isolated words usually assume a bottom-up analysis of the acoustic signal constituting the word's beginning, thus emphasizing the importance of the word onset in auditory processing. The onset activates a group of candidate words and the analysis continues until only one word matches with the signal (recognition point) and becomes selected (Marslen-Wilson, 1987; Norris, 1994). On the other hand, when words are presented within sentences, context appears to influence the recognition point of the word, making it earlier for the expected word (Zwitserlood, 1989). The models differ in respect to the effect of top-down context information on word recognition. It has been suggested that context can affect word recognition at an early sensory level (Morton 1979; McClelland and Elman 1986), or at some later level (lexical access, selection, or integration phase). Furthermore, the information processing between different levels (e.g., sensory, semantic) has suggested to interact (McClelland and Elman, 1986), or to proceed, at least partly, in a parallel manner (Marslen-Wilson, 1987).#
In order to examine the processing of incomplete speech input and the influence of missing phonetic information on the word recognition, we conducted an event-related potential (ERP) experiment and a separate reaction time (RT) experiment. In a semantically restricting sentence context, a highly or a less expected final word was presented. Half of the words were manipulated to have a cough replacing the initial phoneme of the word. The expectancy of the final word was measured by the cloze probability, i.e., the percentage of using a certain word in the sentence final position when only the beginning of the sentence is presented to participants. Henceforth, we use the term expectancy to refer to the cloze probability. Auditory ERPs can reflect different cognitive stages. Rather automatic stimulus-driven processes are reflected by the N1 response (N滗t鋘en and Picton, 1987). The N400 effect (more negative response to less compared to highly expected words) is related to semantic processing of language, thus, reflecting higher level processing. We used the N400 effect (Kutas and Hillyard 1980 1984) as an index of semantic processing demands of sentence final words with obliterated onset phonemes. The amplitude of the N400 effect is known to be influenced by the expectancy of a word in a given context, the word's position in a sentence, and semantic relationship between the words (see Kutas and Federmeier, 2000, for a review). The N400 has been interpreted to reflect the relative difficulty of contextual integration, or semantic working memory processing of an unexpected sentence ending. Therefore, it provides a suitable measure for the possible increased processing demands of incomplete words. However, there is still debate regarding whether the N400 reflects lexical access from the long-term memory, or the relative difficulty of semantic integration of the word within the context (Brown and Hagoort 1993 2000; Chwilla et al. 1995; Holcomb 1993; Kutas and Federmeier 2000).#
Furthermore, it has been suggested that an early peak of the N400 deflection would reflect a different processing stage, either phonetic (PMN) or lexical (N200) processing. Focusing on the phonological aspects of the target word, Connolly and Phillips (1994); Connolly et al. (1992 2001); Kujala et al. (2004) observed a negative peak preceding the N400 response in the time range of 250–350 ms when a semantically appropriate but less expected word was presented in a highly constraining sentence context. Thus, the word began with a different phoneme than the most expected word. This phonological mismatch negativity (PMN) was interpreted to reflect a phonological stage of word recognition (mapping a sound with abstract phoneme representation). Lexical or semantic top-down processes did not seem to influence the PMN (Connolly et al., 2001). Connolly and Phillips (1994) reported the absence of an N400 response following the PMN when the sentence final word was semantically appropriate, but was less expected. However, a rather long-lasting negative response following the peak of the PMN was observed for these words in their figure, although it was less pronounced than for semantically inappropriate words and was not reported to be significant using peak amplitude analysis. Thus, the PMN could reflect an early phase of the lexical processing. Indeed, other interpretations concerning the early negativity have been suggested. Van Petten et al. (1999) suggested that the early negativity (N200) reflects semantic processing which begins upon receipt of partial information of the upcoming word. Hagoort and Brown (2000) and Van den Brink et al. (2001) have further refined that the N200 reflects a lexical selection process in word recognition that precedes the integration process of the selected word into the sentence context.#
The present experiments used both behavioral and electrophysiological measures to uncover the time course and interplay of bottom-up and top-down processes. Highly restricting sentences with either the highly expected final word or one of the less expected ones were presented acoustically. In half of both highly and less expected final words, the beginning of the word was replaced by a non-speech sound, i.e., a cough (see Table 1 for sentence examples). Because naturally pronounced connected speech was used, the selection of the sentences was limited to those that had a final word beginning with a plosive or a fricative (see Experimental procedures). These phoneme types have clearer onsets compared to other phoneme classes, such as vowels and nasals, and therefore, the beginning of the word could be more easily defined for setting the trigger point to the word onset. The duration of cough replacements was adjusted according to the duration of different beginning phonemes. In summary, the study had a 2 factors design with two levels of the onset phoneme/cough duration (short/long), two levels of expectancy (high/low), and two levels of onset manipulation (normal/cough). In addition, a separate behavioral experiment was carried out to study if the cough replacement would have an effect on the recognition speed of the sentence final words. The behavioral experiment was run separately in order to avoid contamination of the ERP components of interest by those related to response preparation. On the other hand, ERPs alone would not reveal whether the distorted words could be understood.#
With regard to the ERPs, the N400 response was investigated as the marker for semantic processing of words. We first hypothesized that normal less expected sentence final words would elicit the N400 response, as shown by several earlier auditory studies (Friederici et al. 1993; Holcomb et al. 1992), which would be preceded by an earlier negativity (PMN/N200). Since we replaced the initial phonemes of final words by a cough and hence violated the acoustic/phonetic expectation, our second hypothesis assumed that an early negative response could also be observable for the manipulated words. However, it is also possible that no early response is elicited because the cough might be processed as a non-speech sound, which does not violate phonemic expectation. Thirdly, we had two-fold hypotheses concerning the N400. If the cough does not disturb semantic processing of the manipulated word, then no enhancement of the N400 compared to the normal words would indicate that the detection of a non-expected beginning sound had no effect on word recognition or semantic integration processes. On the other hand, an enhancement of the N400 could be taken as evidence that the obliterated onset made word recognition more difficult. As suggested by the Cohort-model (Marslen-Wilson, 1987), bottom-up activation of possible word candidates cannot be initiated by a non-speech sound and word recognition should become impossible, leading to a drop in behavioral recognition performance. However, the sentential context may help the word recognition, under the assumption that bottom-up and top-down information interact (McClelland and Elman, 1986). Moreover, more difficult word recognition could influence the integration of the lexical element with the context, possibly delaying the N400 effect. Finally, as fricatives were longer, and thus replaced by a longer cough (see Experimental procedures), the influence of the long cough could be more disturbing and elicit an even larger N400. However, since for both phoneme types only the first phoneme was replaced, the longer cough did not reduce phonetic information more than a short cough. Therefore, we did not expect an increase of N400 due to the length of the replacement.#
Results
Behavioral experiment
Fig. 1 illustrates mean reaction times (RTs) for repeating the final words in the behavioral experiment. The results show that all three factors, Duration, Replacement, and Expectancy, increased the voice onset times of the oral responses independently. The RTs were longer for repeating the final words beginning with a fricative or a long cough compared to a plosive or a short cough (820 vs. 726 ms), longer for manipulated words compared to normal words (793 vs. 753 ms), and longer for less expected words compared to highly expected words (810 vs. 737 ms). Statistical analysis confirmed significant main effects of the three factors: Duration, F1,19=199.19, p<0.001, Replacement, F1,19=25.49, p<0.001, and Expectancy, F1,19=233.49, p<0.001, and no significant interactions. The number of missed responses (RTs longer than 3 SD from the mean of the subject) in the less expected word condition was 1.65 (long coughs), 1.20 (short coughs), 0.25 (fricatives) and 0.20 (plosives). For the highly expected word condition, the mean number was 0.45 (long coughs), and 0.10 (plosives), while the other two conditions had no misses.#
Electrical responses
Fig. 2 shows responses to normal final words beginning with either a plosive or a fricative in the highly and less expected word conditions, averaged over all participants at a selected set of EEG electrodes. The responses for less expected words were more negative compared to highly expected words for both phoneme types, showing the N400 effect. The difference seems to be larger for words beginning with a plosive compared to a fricative. Fig. 3 shows the electric responses to the manipulated final words. The N1 response is elicited by both the highly and less expected words. It is more prominent for the word beginning with longer coughs, which replaced fricatives. Further, the N400 effect is observed for words beginning with a short cough, whereas the effect appears to be negligible for words beginning with a long cough. Fig. 4A shows the averaged responses over the set of central electrodes used in the statistical analysis. This figure shows more clearly that when words begin with a plosive, the cough replacement delays the onset of the N400 effect such that it begins after 300 ms. For words beginning with a fricative, the N400 effect for normal words seems to be smaller relative to words beginning with a plosive. The effect almost disappears for words beginning with a long cough. Fig. 4B displays the averaged responses for normal and manipulated highly expected words beginning with a plosive/short cough or with a fricative/long cough. While a tendency for an N400 effect is observed for the short cough as compared to normal words, no difference is observed for the long cough. Statistical analysis partly confirmed these observations. Table 2 summarizes the significant main effects and interactions.#
N1 response (120–180 ms)
In the time window covering the N1 response, which was prominent for the manipulated words, the main effect of Duration was significant. An interaction between Duration and Replacement refined the finding by showing that the N1 was of equal amplitude for words beginning with a plosive (mean amplitude −0.59 μV) or a short cough (−0.55 μV). However, long coughs elicited a larger N1 (mean −1.92 μV) compared to the normal words beginning with a fricative (mean −0.84 μV; Tukey–Kramer p<0.05). The responses to long coughs were also larger relative to short coughs (p<0.001) and to normal words beginning with a plosive (p<0.01). Thus, the largest N1 response was elicited by words beginning with a long cough. The mean N1 peak latency to coughs was 146.2 ms with no significant difference between the two cough durations. Since connected speech was used, the normal words did not elicit a sufficiently reliable N1 for the analysis. An interaction of Expectancy識eplacement showed that the responses to the manipulated highly expected words were more negative (mean −1.71 μV) compared to the corresponding less expected words (mean −0.76 μV; p<0.05) and to the normal highly expected words (mean −0.38 μV; p<0.05). The responses did not differ significantly from the normal less expected words (mean −1.06 μV, n.s.). The responses to normal words did not differ significantly between highly and less expected words, although they were more negative to less expected words than to those of high expectancy (mean amplitudes −1.06 and −0.38 μV, respectively).#
Early effect of Expectancy for plosive words (180–280 ms)
A significant main effect of Expectancy was found in the time window of 180–280 ms. The interaction of Expectancy識eplacement further indicated that the responses to the manipulated words were, on the average, positive to both less and highly expected words (means 0.66 and 0.56 μV, respectively) and did not differ significantly from each other. For the normal words, the responses to less expected words were significantly more negative compared to the responses to highly expected words (means −1.82 and −0.13 μV; p<0.05). The responses to both the less and highly expected manipulated words were more positive compared to responses to normal less expected words (p<0.001 for both comparisons), indicating a P2 response to the manipulated words. Finally, an overall interaction of Expectancy譊uration識eplacement indicated that the probability effect was larger for normal words which began with a plosive (difference less minus highly expected −2.02 μV; p<0.05) whereas the difference did not reach significance for words beginning with a fricative (−1.16 μV, n.s.). The difference between highly and less expected words was not significant for manipulated words of either cough length. In addition, the responses elicited by the manipulated highly expected words were significantly more negative for the short coughs compared to the long coughs (means −0.50 and 1.63 μV; p<0.01), since the short coughs did not elicit a clear P2 response. The responses to manipulated less expected words did not differ significantly between short (−0.001 μV) and long coughs (1.32 μV).#
Later N400 effects
In the time window of 280–380 ms, covering the earlier part of the N400, a significant main effect of Expectancy and an interaction of Expectancy識eplacement were found. The responses were more negative to less expected compared to highly expected words, and although less expected words elicited more negative responses in the case of both normal and manipulated words, the difference reached significance only in the case of the normal words (less minus highly expected −2.11 μV for normal words, p<0.01, and −0.23 μV for manipulated words, n.s.). The responses to normal less expected words were also more negative than to the manipulated less (p<0.05) or highly (p<0.01) expected words. Finally, the time window of 380–520 ms, covering the later part of the N400 effect for the normal words, also showed a significant main effect of Expectancy. An Expectancy譊uration interaction indicated that the effect was more pronounced for words beginning with a plosive or a short cough (difference less minus highly expected −1.48 μV; p<0.001) as compared to a fricative or a long cough (−0.53 μV, n.s.). Furthermore, a significant Duration識eplacement interaction indicated that the words with a short cough elicited more negative responses (−0.86 μV) compared to the words with a long cough (0.22 μV; p=0.057), whereas the responses to normal words did not differ significantly between plosive and fricative beginnings (−0.13 and −1.01 μV). The difference between words beginning with a short cough and words beginning with a plosive also approached significance (p=0.060), the short coughs eliciting more negative responses. Although the averaged responses were more negative to words beginning with a fricative (−1.01 μV) than to those with a long cough (0.22 μV), the difference did not reach significance (p=0.11). No significant effects were found in the last time window of 520–620 ms.#
Highly expected words
In order to explore whether an N400 was elicited by the manipulation of the highly expected words, sentences with normal and manipulated highly expected words were compared in a separate analysis-of-variance (ANOVA) with factors Duration and Replacement (see Fig. 4B). A significant interaction of Duration識eplacement was found in two time windows, namely in 180–280 ms and 380–520 ms (F1,25=16.87, p<0.001, and F1,25=7.51, p<0.05). In the first time window, the result showed that more negative responses were elicited for normal fricative words compared to words with a long cough (difference −1.89 μV; p<0.01), indicating a P2 response to the manipulated words. The responses to the short cough words did not significantly differ from normal plosive words, although they were slightly more negative (difference −0.49 μV, n.s.). The responses were more negative to words with a short cough compared to a long cough (difference −2.13 μV; p<0.01). In the later N400 time window (380–520 ms) only two comparisons approached significance in post hoc tests. The responses for the normal fricative words were more negative compared to plosive words (means −0.49 and 0.90 μV; p=0.058). There was not a difference between words beginning with a long or short cough. The words with a short cough elicited slightly more negative responses compared to the normal plosive words, however, the difference was not significant (means −0.42 and 0.90 μV; p=0.086).#
Topographical distribution
Fig. 5 presents the topographical distribution of the early and late N400 effects in the four time windows for normal and manipulated words. For normal words beginning with a plosive (upper row) the negative effect (blue) is pronounced in posterior areas and begins in the time window of 180–280 ms, whereas for fricatives (second row) it is weaker and appears later. For manipulated words, the words originally beginning with a plosive and replaced with a short cough (third row) have a posterior negative distribution beginning in the time window of 280–380 ms. The effect is later than that for normal words, and further, it has two negative maxima. The topographical distribution did not seem to noticeably differ between the early and later effects. Only the amplitude was increased in the later time window.#
Discussion
Behavioral experiment
The present study examined the electrophysiological correlates underlying recognition of spoken words with obliterated initial phonemes in a sentence context. The influence of replacing a word's initial phoneme on word recognition was measured in a separate behavioral experiment as the repeating speed of the sentence final word. The performance level was good: at worst there were 1.65 missed responses (due to considerably long response times) for the less expected words beginning with a long cough, thus, only few words were difficult to understand. Both low expectancy and manipulation slowed down the reaction times. Faster repetition of highly compared to less expected words irrespective of the onset manipulation indicated that the semantic context eased recognition of the expected final word. The cough replacement made the recognition of both highly and less expected words slower. However, the advantage of context-based expectation exceeded the disadvantage caused by the cough, as the repetition of highly expected words was faster irrespective of the onset manipulation. Finally, the words which started with a long phoneme (fricative) or a long cough were repeated slower compared to words beginning with a short phoneme (plosive) or a short cough. This may reflect a later recognition point of the fricative words and a larger distraction of the long coughs. No interactions between the three factors were found, suggesting that each of the underlying processes had an independent effect on word recognition.#
Electrical responses
The N1 effects
For normal words, as expected, no clear N1 was elicited (see Fig. 2). In connected speech, N1 response to word onsets is usually not elicited (Friederici et al. 1993; Hahne and Friederici 2001) because the words in fluent speech rarely have breaks between them. The manipulated words elicited a clearer N1 response, reflecting an automatic detection of the cough by the auditory system. This finding, together with the other results, supports the view that the phenomenon of phonemic restoration is a consequence of higher level semantic processing (Samuel, 2001). In Warren's experiments (e.g., Warren, 1970), the restoration may have occurred during the time delay between the perception of the cough and the behavioral response, i.e., indicating the cough's location in a written form of the sentence after hearing it. As the cough in Warren's experiments occurred in the middle of a sentence, it was followed by both a longer auditory input as well as reading of the sentence, which introduced a considerable time delay. As the auditory sensory memory for the stimulus decays rapidly, it is likely that the detection of the cough's location afterwards becomes more difficult.#
The longer coughs (145–200 ms) elicited a more prominent N1-P2 response complex whereas the N1 response to short coughs (80 ms) did not differ significantly from the responses to the normal words beginning with a short phoneme (plosive). Thus, both the N1 amplitude and the behavioral results indicated a greater salience of the long coughs and more severe disruption of the semantic processing. The N1 amplitude has been shown to vary as a function of physical properties of the stimulus, such as intensity and the slope of the energy change (Picton et al. 1970; Elberling et al. 1981; Bak et al. 1985; Kodera et al. 1979; Loveless and Brunia 1990). In the present experiment, there was no striking difference in the onset slope between the different cough lengths (see Fig. 6 for the average cough amplitudes). Thus, it is unlikely that the N1 amplitude had varied due to different onset slopes. If that had been the case, a more prominent N1 would have been elicited by the slightly sharper onset of the short coughs. Instead, another factor may have increased the N1 to long coughs, namely the temporal integration of stimulus energy with increasing duration. Temporal integration results in greater perceived loudness for a longer tone compared to a shorter tone with the same energy. The detection of tones improves with increasing duration up to 250–500 ms: in other words, short tones need greater power than long tones in order to be detected (Moore, 2003). The N1 amplitude has been found to increase with increasing stimulus duration up to 50 ms for sinusoidal tones (Onishi and Davis, 1968). Accordingly, the N1 amplitude of the present study may reflect the greater perceived loudness of longer coughs compared to short ones. In addition, when the average power was calculated for the first 50 ms of the coughs, the relative intensity was larger for long coughs (984 arbitrary units, a.u.) compared to short coughs (736 a.u.), indicating an intensity difference between the cough onsets.#
Finally, for the manipulated words, the N1 was larger for highly expected words compared to those of less expectancy. This result was not hypothesized since by the latency of the N1, there is not much information about the upcoming word fragment.11The N1 peaked at about 57 ms after the offset of the short coughs and 0–54 ms after the long coughs. Notice that the mean latency of N1 was measured from the grand averages over all long coughs, since averaging over separate cough durations was not reasonable due to a small number of certain durations. Therefore, the relative N1 latency varies according to the cough duration. The latencies compared to the two most frequently occurring cough duration (180 ms, occurred 62 times, and 150 ms, occurred 14 times) were 34 and 4 ms, respectively. In principle, an immediate coarticulation at the penultimate word could hint at an unexpected final word even in the case when the onset phoneme was obliterated. In that case, however, an opposite result would have been expected, i.e., more negative responses to less expected words (Praamsta et al., 1994). The sentences were recorded in their presentation form, i.e., the sentences were pronounced as such and not produced by splitting the less expected final word from another sentence. Therefore, it might be that other systematic differences between sentences with a highly or less expected final word, which were not covered by the parameters estimated so far, may still be present. Therefore, the explanation for the larger N1 to manipulated highly expected words remains open.#
Early N200 effect for normal words
In the case of normal words, the responses to less expected words were more negative compared to highly expected words between 180 and 520 ms in total. This expectancy effect was influenced by the type of the beginning phoneme. The effect was present as early as the time window of 180–280 ms for words beginning with a plosive, compared to those beginning with a fricative or all manipulated words, indicating an earlier onset of the negative response. In the next time window (280–380 ms), the effect was found for normal words with both phoneme types. As the duration of plosives is shorter, the next phonemes follow earlier and word recognition can be faster, resulting in earlier detection of the less expected word. For reaction times, the difference between phoneme types was around 100 ms, corresponding to the difference in phoneme durations. Further, it is possible that the more variable durations of fricatives have caused variability in the latency of the effect. This, in turn, would reduce its amplitude in the grand averaged responses. So far, most of the studies of the auditory N400 have used target words beginning with variable phonemes. Only one study (Van den Brink et al., 2001) reported to have used sentence final words beginning exclusively with a plosive. The authors also found an early negativity (N200) for incongruent compared to congruent words between 150 and 250 ms. They interpreted the response to reflect the assessment of the semantic features that are expected by the contextual specifications, thus, semantic processing based on partial information about the upcoming word. The present results showed that the duration of the onset phoneme has an effect on the early phase of the negative effect.#
The early negativity in the present study differs from the Connolly's PMN response in response latency and topography. The response for less expected plosive words was earlier than the PMN which occurred in the time range of 250–350 ms, with a mean latency of 275 ms (Connolly and Phillips, 1994). The later latency can partly be due to the fact that Connolly and his colleagues employed peak analysis. However, the assessment of peaks in individual responses can be rather arbitrary when no clear single peak can be defined. We, therefore, calculated the mean amplitude within different time windows. Additionally, since the final words in Connolly's studies were not restricted to begin with a certain phoneme type, it may have caused a large variability in the response amplitudes and latencies. Their response latency corresponds to our second time window, where the effect was elicited also by the fricative words. Finally, unlike the PMN response that was largest in right frontal electrodes (Connolly et al., 2001), the early effect observed here had a central posterior distribution (see Fig. 5).#
These differences support the argument that the early negative effect reflects semantic processing (N200) rather than phonological processing (PMN). If the early effect was only due to the unexpected phoneme, we would not expect a difference between fricatives and plosives because they must be differentiated by the duration of a plosive. Instead, if the early effect reflected semantic processing, more phonetic information than just the first phoneme would be needed. Given the shorter duration of the plosives compared to the fricatives, additional phonemes can come into the system earlier after plosives and therefore, semantic processing can begin earlier. Moreover, as the later N400 effect was also largest centro-parietally (as in Van den Brink et al., 2001), the topographical distribution did not differ noticeably between the early and later effects, only the amplitude increased. The similar topography suggests that it may reflect an early phase of the same (semantic) processing. Thus, as the early effect was modified by the duration of the onset phoneme and no difference was found in the topographical distribution compared to the later N400, we conclude that the response reflects early semantic processing (e.g., lexical selection as suggested by Van den Brink et al., 2001). The expectancy for the possible final word created by the sentence context was thus violated by the detection of the non-expected word, based on the word onset.#
The later N400 effect
The recognition of the less expected words was hypothesized to be more difficult when the word's onset was obliterated. In addition to an absent initial phoneme, the following word fragment did not fulfill the expectations created by the preceding sentence context. This hypothesis was confirmed by the delayed behavioral responses. In the EEG experiment, the manipulated words elicited the N400 effect, indicating increased integration demands with the context for less expected words compared to corresponding highly expected words. This, in turn, indicates that the words could be recognized in spite of the manipulated onset. However, the amplitude of the N400 did not increase for the manipulated words as hypothesized. Instead, the effect was elicited in a later time window. For the manipulated words the expectancy effect was present only in the time window of 380–520 ms, whereas for normal words the effect had its maxima in two earlier time windows (between 180–380 ms). The later onset of the N400 indicates that the word could only be recognized after some phonetic information following the cough had been presented. If the N400 amplitude had linearly reflected difficulties in semantic processing, it would have been larger to manipulated words compared to normal words, and could have been largest when the word's onset was replaced with a long cough. This was not the case. Actually, for words beginning with a long cough, as compared to a fricative, the effect was the reverse: responses to fricatives were more negative than they were to long coughs, although not significantly. The responses to the words beginning with a short cough tended to be more negative compared to normal plosive words, but mainly because the larger and earlier N400 for normal words was already decaying in this later time window.#
Considering the highly expected words, when their initial phoneme was replaced by a cough, the delayed behavioral responses indicated more difficult recognition of these words. This difficulty was not reflected as a significant enhancement of the N400 amplitude compared to normal words. The highly restrictive sentence context seemed to point to the expected final word candidate so efficiently that the word could be accessed without remarkably increased effort even though the phonetic information of the word's onset was eliminated. However, a tendency of an increased N400 was found for the highly expected words beginning with a short cough compared to the normal words beginning with a plosive. No such tendency was found for words beginning with a long cough compared to a fricative (see Fig. 4b). Of course, one could argue that the N400 was not enhanced (at least not for the long coughs) because no word recognition was possible any more. The behavioral results contradict this interpretation, as it was possible to repeat the highly as well as less expected manipulated words. For the degraded highly expected words the mean repetition time was even faster compared to normal but less expected words. If the beginning of the word neither confirms nor refutes the expectation created by the context, the remaining fragment of the word seems to be sufficient to complete the integration of the expected word with relative ease. Thus, both the behavioral and the electrophysiological results suggest easier semantic processing of the highly expected words, even when the initial phoneme is obliterated, compared to less expected words.#
As the N400 amplitude is considered to reflect either lexical access, or relative difficulty of the semantic integration, the non-existing N400 for the manipulated highly expected words (or small in the case of short coughs) and the non-enhanced N400 for the manipulated less expected words can be interpreted as follows. Concerning lexical access, some word recognition models (e.g., Marslen-Wilson, 1987) suggest that lexical access is initiated by the word's onset. A cough in the word's beginning cannot initiate activation of any lexical candidates because no word begins with such a sound. Therefore, there should be no activation related to the initial lexical processing if that was solely based on bottom-up information of the speech signal and no N400 should be elicited if it reflects lexical access only. Since in our study successful word recognition took place even when the word's onset was manipulated, the semantic processing cannot be solely based on bottom-up analysis. Also, concerning selection, competition is not possible between different lexical candidates having an identical onset, as no phonetic information was present, and again, no N400 would be expected.22These considerations can also explain the smaller N400 effect to manipulated words compared to normal words in the early time windows. As word access or selection was not possible when the word's onset was obliterated, no early N400 (N200) was elicited. Finally, concerning contextual integration, the online processing of the context may occur in parallel to the analysis of the current auditory speech input and the integration may occur after the analysis. If the speech input is interrupted by a cough, no immediate lexical activation takes place. Lexical reconstruction must take place on the basis of the available speech input prior to and following the cough. The context analysis, however, can be continued and possibly completed during processing of the cough, supporting activation of the most expected word. When the remaining word fraction follows and fulfills this expectation, no N400 is elicited. In the case of short coughs, however, the time might have been too short for complete contextual analysis, and therefore, a small N400 would appear for highly expected word fragments. Thus, it seems that the processing of contextual information was not completed by the end of the short coughs. No effect was elicited after long coughs, suggesting that the context processing is more advanced given the additional time.#
In normal semantic processing the N400 amplitude reflects increased demands of word recognition or semantic integration. Processing of the semantic properties of the word and information about the preceding linguistic context may contribute differently to the overall N400 response (Federmeier and Kutas 1999; Kutas and Federmeier 2000). The N200 and N400 responses have been suggested to reflect separate processes: the earlier N200 recognition of the word, and the N400 integration into the context (Hagoort and Brown, 2000). When the final word's onset is replaced, the semantic processing of words may differ from the normal. The non-enhanced N400 suggests that the underlying processes of word recognition may differ in latency or distribution of activation, both influencing the N400 amplitude. For the manipulated words, there is no difference between the less and highly expected words during the cough replacement. Thus, no information is initially available to indicate that the upcoming word will not be the expected one, and consequently, the manipulation affected the early expectancy effect. Later, the N400 effect was nevertheless elicited, although it was delayed. The delayed N400 may reflect either delayed word recognition, which takes place on the basis of the remaining word fragment, or difficulties in the integration process. As the effect is not enhanced and word recognition is still effective, it may be that the influence of context takes place during the cough processing, as discussed earlier. In the case of short coughs (80 ms), the N400 onset was approximately 220 ms after the beginning of the word fragment. Van den Brink et al. (2001) have suggested that the sentence context has an effect on word recognition at 220 ms, and possibly already at 140 ms after the onset of words which begin with plosives. The present results suggest similar timing in the processing of the remaining word fragment. An obliterated onset does not completely hinder the word recognition even when the following input does not match the expectations created by the context. The remaining fragment seems to provide sufficient phonetic information for the word recognition. In summary, the present findings can be interpreted according to a view that different sub-processes underlying the N400 were differentially activated by the manipulated words. The earlier part (N200), possibly reflecting the recognition of word or lexical access based on the initial phonetic information, was diminished. The later part (N400), indicating integration with the linguistic context, seemed to be completed in the same time window as for normal words. If the word beginning is obliterated, the listener may rely more on context based predictions in word recognition. As the N400 was not increased for the manipulated words, these predictions seem to offer considerable help for the processing of a word's meaning.#
Conclusions
By using event-related potentials we were able to show the automatic detection of coughs that obliterated the sentence final word onsets, as reflected by the N1-P2 response complex. Moreover, we showed that in spite of the missing onset phoneme, the semantic processing of the words took place, suggesting successful repairing of these words. The difficulty of word recognition was reflected as a delay of the N400 response, but not as an amplitude enhancement. As only a small performance drop was observed, a word recognition in the absence of complete phonetic information may rely on predictions based both on the semantic context and on phonetic information of the remaining word fragment. The results support the view that the apparent perception of an absent phoneme is not an early bottom-up phenomenon, but rather reflects top-down expectations. In summary, the experiment demonstrates the importance of bottom-up phonetic information for early semantic processing and the efficient integration of both bottom-up and top-down information in the rapid online processing of fluent speech.#
Experimental procedures
Participants
Thirty-one right-handed German volunteers with reported normal hearing and no neurological deficits participated in the experiment. The participants gave their written informed consent after the nature of the experiment was explained to them. The data of five participants were excluded from the analysis due to too low signal-noise ratio (less than 20 averaged responses in two or more conditions), leaving 26 participants for the analysis (13 female; 21–30 years, mean age 25; handedness 75–100, mean 96). Participants' individual hearing thresholds were determined for each ear separately and stimuli were presented 48 dB above that level.#
Stimulus material
Naturally spoken sentences were used as stimuli. Four types of sentences were prepared as follows: first, the cloze probability (i.e., the percentage of expectancy) of the final word in a set of German sentences was examined. For that purpose 343 sentences were created and divided into two questionnaires. The last word of each sentence was omitted and participants were asked to complete each sentence fragment with the first word that came to their mind to complete the sentence. Altogether 102 participants (65 females; mean age 23 years) filled out one of the questionnaires. Sentences from the questionnaires were selected for the experiment if they had a final word with a cloze probability of more than 0.5. These highly restricting sentence beginnings were used with the high cloze probability final word (p>0.5; average 0.81, range 0.50–1.00) and one of the low cloze probability words (p<0.5; average 0.07, range 0.02–0.34). Additionally, the final word had to begin either with a plosive (/b/, /d/, /g/, /k/, /p/, /t/) or a fricative (/s/, /f/, /h/). These phoneme types have clearer onsets and, therefore, the trigger points at the word onset can be better defined. Optimally, with respect to possible coarticulation effects, only plosives should have been used. However, in order to increase the variability of the beginning phonemes of the final words fricatives were also included. If the phoneme requirements were not met for both probability classes, the sentence was used only with that final word which met the criterion. In summary, by selecting highly constraining sentences (cloze probability >0.5), the sentence beginning created high expectation for a certain last word, which then either fulfilled that expectation or did not.#
Next, the sentences were uttered by an experienced female speaker and recorded in a sound-attenuated room to PC with a 44,100-Hz sampling rate and a 16 bit resolution. The mean number of words per sentence was 7.6 words and the mean duration of the sentences was 3239 ms (range 1745–5794 ms). The last words had 1–4 syllables, most frequently 1 or 2 syllables (together 83–90% per condition, see Table 1). The mean duration of the last words was 651 ms (138–1153 ms) and the mean word frequencies in each condition were 551–1103 (no significant differences between conditions). Finally, sentences with a high or a low cloze probability final word were further divided into two subgroups with comparable averaged probability and amount of different beginning phonemes. One subgroup of each cloze probability was used as such. In another subgroup the beginning of the final word was replaced by a cough (see Table 1 for examples). The onset of the word was marked by both visual and auditory examination with a speech waveform editor and the cough replacement was inserted to begin at that point. The length of the cough was adjusted according to the length of the phoneme so that it covered the first phoneme and the second phoneme as long as the coarticulation of the first was still audible. This resulted in short coughs of 80 ms for plosives and long coughs of 145 to 200 ms for fricatives in different consonant–vowel combinations. To map the separation between short and long phonemes onto plosives and fricatives, accordingly, some sentences with short fricatives (/h/) were excluded from the analysis. However, they were presented as fillers to balance the ratio of plosives and fricatives. The amplitude of the coughs was scaled to correspond to the mean amplitude of each sentence.#
Procedure
Sentences were presented randomly with an interstimulus interval (ISI) of 1800 ms in four measurement blocks, which included 25% of each sentence type. The order of the blocks was counterbalanced between participants. Participants were seated in a comfortable chair in an electrically shielded and sound-attenuated room. The auditory stimuli were presented via loudspeakers located at the left and right side of a computer screen, which was approximately 1.3 m in front of the participant. Participants were asked to fixate on a cross in the middle of the screen while listening to the sentences in order to answer questions concerning the contents of the sentence that they heard last before the question. This task was employed to encourage the semantic processing of the sentences and to control the attention to the stimulation. Each block was interrupted 11 times for a question, which was presented visually on the computer screen after each 5–22 sentences, and the participants' oral answers were written down by the experimenter. Six to seven of the 11 questions referred to the latter part of the sentence, including the last word, but usually not only to it (the answer often included a preceding preposition and an article or an adjective).#
EEG recordings
EEG was recorded with 64 cap-mounted Ag/AgCl electrodes referenced to the nose. Eye movements were measured with electrodes placed above and below the right eye and adjacent to the right and left outer canthi. Electrode impedances were kept below 5 kΩ. EEG was recorded with a low pass filter of 68 Hz and sampling rate of 250 Hz. The EEG amplifier resolution was 0.072 μV/bit. Data were further band-pass filtered off-line from 0.36 to 15 Hz. The filter (FIR, 1202 points) was designed to have a strong DC suppression (−100 dB) without significantly suppressing the slow potentials (less than 20% reduction at 0.5 Hz). Epochs from −250 to 1000 ms triggered at the beginning of the last word of the sentence were averaged and the baseline was corrected according to the mean amplitude of −250–0 ms with respect to the trigger point. Epochs were excluded from the analysis if the standard deviation within a 200 ms wide sliding window exceeded the rejection criteria, which were 30 μV for horizontal and vertical eye movements in the EOG, or 30 μV for the EEG. Single trials were excluded if the EEG amplitude variation exceeded 150 μV.#
Behavioral experiment
Another twenty participants (13 female; 19–27 years, mean age 23), non-overlapping with the ERP experiment group, participated in a separate behavioral experiment. Participants heard two blocks that were selected out of the four used in the EEG experiment and counterbalanced across participants. Sentences were presented with variable ISI, which consisted of the individual duration of the last word plus 200 ms. The participants' task was to repeat the last word as soon as possible. The voice onsets of their oral responses were recorded as a reaction time with a voice-key. A 3-way ANOVA was computed on the reaction times with factors Duration, Replacement, and Expectancy. Reaction times that deviated more than −3 SD from the mean of that participant were excluded as being outliers and those that deviated more than +3 SD from the mean were calculated as missed responses.#
Statistical analysis
Time windows for statistical analysis were defined at electrode location Cz around the prominent deflections of the grand-averaged responses of all participants, resulting in five time windows (120–180, 180–280, 280–380, 380–520, and 520–620 ms). The first time window covered the N1 response to coughs. The time window of 180–280 covered the early part of the N400 response to plosives, and the remaining three covered the later part of the N400. For statistical analysis mean amplitudes of a subset of central electrodes (see Fig. 4C) were computed in each time window. Repeated measures of ANOVAs were conducted with a mixed model using a restricted maximum likelihood method and unstructured covariance matrix. The experiment had a 2 factors design with two levels of onset phoneme/cough duration (Duration: short/long), two levels of cloze probability (Expectancy: high/low), and two levels of onset manipulation (Replacement: normal/cough). In the analysis, the factors were treated as fixed effects and the subjects were treated as a random effect. The two levels of Duration (short/long) translate to plosives (=short) and fricatives (=long) in the case of normal words, and to short and long coughs in the case of manipulated words. Tukey–Kramer post hoc tests were applied in cases of significant interactions. In order to study if the N400 effect was elicited for the manipulated highly expected words, a 2-way ANOVA was performed on the highly expected words with Replacement and Duration factors. The N1 peak latencies for the words beginning with a cough were defined at the most negative peak at Cz (or in two subjects C4) between 100–200 ms, and differences were tested with a 2-way ANOVA with Duration and Expectancy factors.#
Acknowledgments
This study was supported by the Leibniz Science Prize awarded to A.D.F. by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and a grant to P.S. by Helsingin Sanomat Centennial Foundation. We thank Stefan Heim, Constance Langheinrich, and Katja K黨n for their help in preparing the stimulus material, Kristiane Werrmann and Sandra B鰄me for collecting the EEG data, and Andrea Gast-Sandmann and Stefan Liebig for the graphical work.#
Figures and Tables
Table 1
| Condition (mean cloze p, range) | Mean no of syllables | Percent of 1- and 2-syllabic words | Example (cloze p of the example final word) |
| Mean word frequency | |||
| High normal (p=0.81, 0.50–1.00) | 1.7/1043 | 41/49% (=90%) | Der Hund jagte die Katze auf den Baum (p=0.57) [The dog chased the cat up the tree] (plosives 58, fricatives 38) |
| High cough (p=0.80, 0.50–1.00) | 1.7/1136 | 42/47% (=90%) | Das Messer war zu stumpf zum #neiden (p=0.98) [The knife was too edgeless to #ut] (plosives 59, fricatives 37) |
| Low normal (p=0.06, 0.02–0.34) | 1.9/619 | 28/55% (=83%) | In seinen Kaffee tat er Zucker und Sahne (p=0.06) [With his coffee he took sugar and cream] (plosives 51, fricatives 39) |
| Low cough (p=0.08, 0.02–0.30) | 1.8/551 | 35/55% (=90%) | Ihre neuen Schuhe hatten die falsche #arbe (p=0.17) [Her new shoes had the wrong #olor] (plosives 58, fricatives 39) |
| The table presents four experimental conditions and examples of each sentence type (# indicates the cough replacement). For each condition, the cloze probability, mean number of syllables, mean word frequencies, percentages of the 1- and 2-syllabic final words from all final words, and the number of words beginning with a plosive or a fricative used in the analysis are shown. |
Table 2
| 120–180 ms | 180–280 ms | 280–380 ms | 380–520 ms | |
| Main effects | ||||
| Expectancy | – | 7.15* | 14.58*** | 9.20** |
| Duration | 12.39** | 11.83** | – | – |
| Replacement | – | 30.39*** | – | – |
| Interactions | ||||
| Exp識epl | 10.41** | 9.02** | 9.61** | – |
| Exp譊ur | – | – | – | 7.37* |
| Dur識epl | 11.38** | 9.35** | – | 15.12*** |
| Exp譊ur識epl | – | 4.31* | – | – |
| F1,25-values in each time window of the significant main effects and interactions for factors Expectancy, Duration, and Replacement. Significances are indicated as *** p<0.001, ** p<0.01, * p<0.05, and -- non-significant. |
References
17. W.Marslen-WilsonFunctional parallelism in spoken word-recognitionCognition25198771102
18. J.McClellandJ.ElmanThe TRACE model of speech perceptionCogn. Psychol.181986186
19. B.MooreAn Introduction to the Psychology of Hearing2003Academic PressSan Diego
21. D.NorrisShort list: a connectionist model of continuous speech recognitionCognition521994189234
25. A.G.SamuelLexical activation produces potent phonemic perceptsCogn. Psychol.32199797127
29. R.WarrenPerceptual restoration of missing speech soundsScience1671970392393
30. R.WarrenPerceptual restoration of obliterated soundsPsychol. Bull.961984371383
31. R.WarrenC.ObusekSpeech perception and phonemic restorationsPercept. Psychophys.91971358362
32. R.WarrenG.ShermanPhonemic restorations based on subsequent contextPercept. Psychophys.161974150156