Edited by: Hubert Truckenbrodt, Centre for General Linguistics (ZAS), Germany
Reviewed by: Laura Gonnerman, McGill University, Canada; Hubert Truckenbrodt, Centre for General Linguistics (ZAS), Germany; Katrin Schweitzer, University of Stuttgart, Germany
*Correspondence: Frank Kügler, Department Linguistik, Universität Potsdam, Karl-Liebknecht-Straße 24–25, 14476 Potsdam, Germany
This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
This study investigates the phonetics of German nuclear rise-fall contours in relation to contexts that trigger either a contrastive or a non-contrastive interpretation in the answer. A rise-fall contour can be conceived of a tonal sequence of L-H-L. A production study elicited target sentences in contrastive and non-contrastive contexts. The majority of cases realized showed a nuclear rise-fall contour. The acoustic analysis of these contours revealed a significant effect of contrastiveness on the height/alignment of the accent peak as a function of focus context. On the other hand, the height/alignment of the low turning point at the beginning of the rise did not show an effect of contrastiveness. In a series of semantic congruency perception tests participants judged the congruency of congruent and incongruent context-stimulus pairs based on three different sets of stimuli: (i) original data, (ii) manipulation of accent peak, and (iii) manipulation of the leading low. Listeners distinguished nuclear rise-fall contours as a function of focus context (Experiment 1 and 2), however not based on manipulations of the leading low (Experiment 3). The results suggest that the alignment and scaling of the accentual peak are sufficient to license a contrastive interpretation of a nuclear rise-fall contour, leaving the rising part as a phonetic onglide, or as a low tone that does not interact with the contrastivity of the context.
This paper reports the results of a production experiment and a series of perception experiments that concern the prosodic expression of contrast in German. In particular, we investigate the phonetic details of the rise-fall contour in contexts that license either a non-contrastive or contrastive interpretation of the answer. The perception experiments seek to clarify the functional interpretation of the rise-fall contour in these contexts. In the following section a brief background on the focus-to-accent theory and the theory of intonational meaning is provided, which is mostly based on a discussion of English intonation. This discussion is followed by a brief review of German intonation and its relation to the prosodic expression of focus and contrast.
Focus-to-accent theory proposes that the semantic interpretation of a focus in a sentence is distinguished from its phonological interpretation by means of the presence of a pitch accent (Gussenhoven,
(1) | a. | A: Erzähl mir bitte, was passiert ist. ‘Please tell me, what happened?’ |
B: [ Martin hat den Wal gesehen. ] |
||
b. | A: Hat Martin den Frosch gesehen? ‘Has Martin seen the frog?’ | |
B: Nein. Martin hat den [ Wal ] |
The concept of contrast in linguistic research has a long research tradition and is generally connected to information structural categories such as topic or focus. Whether “contrast” forms its independent category in information structure (Molnár,
It is assumed that intonation may, depending on the language and the melody parts, carry post-lexical, sentence-level meaning (Ladd,
The effect of the two theories, focus-to-accent theory and the theory of intonational meaning, is that a pitch accent carrying a particular meaning has a preference to occur with a context that triggers this particular meaning. In other words, a L+H* pitch accent carrying the contrastive meaning may occur more likely with a context question (1-b) that requires a contrastive interpretation of a constituent in the answer. On the other hand, speakers may produce a H* pitch accent that carries the meaning of providing new information more likely with a context question that requires new information (1-a). Speakers may however vary their prosodic realizations since a context question may allow for different possible answers. Assume for instance that a speaker may imagine that an actual answer in (1-a) is to be contrasted with another conceivable answer. Hence, a speaker may choose, in relation to imaginated additional assumptions about the context, that a contrastive contour may nevertheless be used.
The previous discussion was based on English intonation and its analysis of intonational meaning. German intonation differs from English in some respects, yet there are similar assumptions related to the meaning of H* and L+H* pitch accents (Grice et al.,
As discussed above for English intonation (Pierrehumbert and Hirschberg,
(2) | a. | |
b. |
According to Féry (
The prosodic realization of contrast in German has been intensively studied. Generally, a focus is prosodically marked by means of a pitch accent in German (Uhmann,
This difference in prosodic marking of contrast has led to a number of studies investigating the concept of contrast in psycholinguistic research (e.g., Alter et al.,
Previous results indicate some degree of free variation with respect to accent realization, and our results of the production data show that not all speakers use raised accentual peaks in order to signal contrast. While some researchers assume a phonological difference between L+H* and H* pitch accents and their accompanied difference in meaning that these accents express (e.g., Grice et al.,
In order to study the prosodic expression of contrast in German, the rise-fall contour is particularly suitable since the rise can be attributed to the L+H* pitch accent which is assumed to carry the meaning of contrast (cf. Grice et al.,
In intonation research a considerable body of research is concerned with the investigation of the appropriate method to test intonational categories perceptually (Gussenhoven,
In recent years, however, researchers emphasize the role of functional perception tests (Prieto,
The speech production experiment examines the prosodic realizations of broad and contrastive focused sentences by comparing the phonetics of the nuclear rise-fall contour in German. The experimental sentences contain the word order subject-auxiliary-object-verb (SAuxOV). The target words were embedded as objects in non-final sentence position in order to avoid any intonational phrase boundary effects. The following two factors were manipulated in order to elicit a nuclear rise-fall contour:
The number of syllables ofx the target word varied between one (Wal [vaːl] “whale”), two (Roman [ro.ˈmaːn] “novel”), and three (Admiral [ad.mi.ˈraːl] “admiral”), all with ultima word stress. Ultima word stress was chosen to provide segmental space for a low leading tone within the accented word if such a category exists. Word-level effects on tonal alignment have been shown for English (Ladd and Schepman,
The length of the sentence: From the basic SAuxOV structure, sentences were gradually lengthened by adding one of the two adverbials (gestern “yesterday”), and (glücklicherweise “luckily”) or a combination of both prior to the target word to increase the interaccentual distance between a prenuclear, sentence initial accent, and the nuclear accent on the target word. We expected that a larger interaccentual distance would increase the chance that speakers realize two single peak accents (Kügler et al.,
As an experimental factor,
(3) | a. | Erzähl mir bitte, was passiert ist. |
b. | Maja hat den Hahn gefüttert. |
|
(4) | a. | Hat Maja den Hund gefüttert? |
b. | Nein, Maja hat den Hahn gefüttert. |
The experimental sentences are highly sonorant to allow for a maximally accurate F0 analysis. Sentences were interspersed with fillers (proportion of target-filler sentences was 1:3) and fed into the DMDX presentation software (Forster and Forster,
Eight speakers participated in the experiment. All were female undergraduate students at the University of Potsdam in their twenties. All were native speakers of standard German spoken in the Berlin-Brandenburg region and reported no speech or hearing impairment. They either received course credit or were paid for participation. All subjects of this production study and of subsequent perception experiments gave written informed consent in accordance with the Declaration of Helsinki.
For each sentence, a context eliciting broad focus (3-a) and contrastive focus (4-a), spoken by a male voice, had been previously recorded. The contexts were presented together with a target sentence both visually on screen and auditorily over headphones. The pre-recorded context sentences ensured that no uncontrolled variation of an experimenter speaking the context questions would affect the data elicitation. Speakers were asked to read and listen to the context and then to speak out the answer displayed on the screen as a response to the question. Subjects were familiarized with the task through written and verbal instructions. In case of hesitations or false starts, participants were asked to repeat the sentence. Recordings took place in a sound-proof chamber equipped with an AT4033a audiotechnica studio microphone, using a C-Media Wave sound card at a sampling rate of 44.1 kHz with 16 bit resolution. Presentation flow was controlled by the experimenter, and participants were allowed to take a break at any point. A total of 384 target sentences (8 speakers × 2 focus conditions × 6 target words × 4 sentence lengths) had been recorded.
As there is a range of possible nuclear intonation contours in German (Féry,
Subgroup (a) contains 255 non-downstepped nuclear rise-fall contours, which comprise contours that contain either a prenuclear rising or falling accent (cf. Uhmann,
Subgroup (b) contains 25 downstepped nuclear rise-fall contours, which comprise either rising or falling prenuclear accents. The H tone of the nuclear accent in Figure
The third subgroup (c) consist of 36 hat patters (Kohler,
The fourth subgroup (d) contains 68 other types of nuclear accents, such as early peaks. This category displays cases of a prenuclear accent followed by a nuclear accent, where the nuclear accent displays a different alignment shape as the ones before. In Figure
Both authors conducted the grouping independently and agreed in about 92% of the cases. For the remaining cases, we discussed each individual contour by listening and looking at the F0 contour to eventually decide on the contour.
As can be seen in Table
(a) Non-downstepped | 115 | 140 | 255 |
(b) Downstepped | 19 | 6 | 25 |
(c) Hat pattern | 29 | 7 | 36 |
(d) Other (early peak) | 29 | 39 | 68 |
Sum | 192 | 192 | 384 |
The 255 experimental sentences of group (a) were hand-annotated and subjected to phonetic analysis using Praat software (Boersma and Weenink,
The pitch peak (H) of the target words in Hertz (Hz), see point (1) in Figure
The corresponding time of the peak (
A low turning point in pitch prior to the peak (l) in Hz, which corresponds to the “elbow” measure in D'Imperio (
The corresponding time of the low turning point (
The beginning and the end of the accented syllable (
Pitch analysis was conducted using a Hanning window of 0.4 s length with a default 10 ms analysis frame. The pitch contour was smoothed using the Praat smoothing algorithm (frequency band 10 Hz) to diminish microprosodic perturbations. Out of these phonetic measurements, the following variables were calculated:
The excursion (E) between the low turning point and the peak:
The velocity (V) of the pitch rise:
The relative alignment of the the pitch peak (
The end of the accented syllable was chosen based on the results of Grabe (
The relative alignment of the low turning point (
The duration (D) of the accented syllable:
The results of the phonetic calculation were evaluated against the fixed factor
The statistical results are shown in Table
L (Hz) | Intercept | 191.187 | 6.623 | 28.439 | |
CF | −2.579 | 2.271 | −1.136 | n.s.a | |
H (Hz) | Intercept | 233.67 | 8.415 | 27.769 | |
CF | 4.728 | 2.471 | 1.913 | ( |
|
E(Hz) | Intercept | 42.416 | 6.287 | 6.746 | |
CF | 7.412 | 3.524 | 2.103 | ||
V (Hz/s) | Intercept | 251.19 | 31.52 | 7.969 | |
CF | 46.85 | 11.03 | 4.247 | ||
A–L (%) | Intercept | 14.275 | 4.568 | 3.125 | |
CF | 4.940 | 4.401 | 1.122 | n.s.a | |
A–H (%) | Intercept | 19.98 | 1.671 | 11.953 | |
CF | −8.31 | 2.223 | −3.738 | ||
D (ms) | Intercept | 243.297 | 8.361 | 29.10 | |
CF | 15.581 | 3.898 | 3.997 |
The analysis of the phonetic variables yields no clear indication that the low F0 turning point prior to the accentual peak represents a systematic difference between the two focus contexts. Neither the model for L-tone scaling nor the model for L-tone alignment showed a systematic difference between a broad and a contrastive context. On the other hand, the model for scaling and the model for alignment of the accentual peak showed differences as a function of focus context. The scaling of the accentual H tone is higher in contrastive focus contexts, which is well in line with previous findings (Bannert,
The fact that H-tone scaling only approaches significance seems to be due to the fact that not all speakers employ this strategy to realize contrastive focus. Model comparison for H-tone scaling applying likelihood ratio tests revealed that when removing the slope factor for the random effect of speaker, the effect of
Furthermore, the phonetic effects triggered by focus should be seen in relation to prenuclear accents. The utterances realized under broad focus exhibit a F0-lowering from prenuclear to nuclear accents, while it is the other way around for the utterances realized under contrastive focus (see Table
Broad focus | 263.1 | 236.1 | |
Contrastive focus | 234.0 | 240.8 |
Taken together, the results of the production study indicate that speakers realize a phonetic difference in intonation as a function of the focus condition. In the following series of studies we will test which parts of the rise-fall contour interact perceptually with the contrastivity of the context.
A series of semantic congruency tasks investigate whether German listeners use the phonetic differences shown in the production study to distinguish the rise-fall contour between contexts that elicit broad or contrastive focus. Semantic congruency tests have been successfully used to explore the perception of functional intonation contrasts (Rathcke and Harrington,
The first experiment investigates whether the acoustic differences found in the production data are perceived as an indicator for the appropriate context they were realized in. Following different perception studies that rely on the speech of one speaker (Kohler,
The target sentences correspond to the six SAuxOV sentences from the production study (cf. Supplementary Material). Each one was uttered in broad focus (BF) and contrastive focus contexts (CF) resulting in 12 sentences. The semantic congruency experiment consisted of these 12 target sentences where intonation was congruent with the pragmatic context (6 BF–BF dialogs, 6 CF–CF dialogs), and 12 cross-spliced target sentences where intonation was incongruent with the pragmatic context (6 CF–BF dialogs, 6 BF–CF dialogs). Stimuli were scaled at an intensity of 70 db. Each dialog was presented 3 times which resulted in a total of 72 dialogs per experiment. The stimuli were auditorily presented over headphones with the MFC Praat software (Boersma and Weenink,
Thirty-six participants took part in the experiment (10 male, 26 female). They were all undergraduates in their twenties, reported no speech or hearing deficits, and were naïve with respect to the purpose of the study. They were either paid for participation or received course credits.
For the factor C
Figure
For the statistical, frequency-based analysis, we fit a multilevel model (Bates et al.,
The model representing the best fit used both random slopes and intercepts of speaker and item for both fixed factors. The model reveals a significant effect for
(Intercept) | 0.6673 | 0.1297 | 5.143 | 0.00001 | |
context = |
0.8405 | 0.5234 | 1.606 | n.s. | 0.10833 |
congruency = |
0.9649 | 0.2780 | 3.470 | 0.00052 | |
Interaction | 0.2596 | 0.2651 | 0.979 | n.s. | 0.32735 |
The semantic congruency task revealed that listeners judged congruent dialogs as more congruent than incongruent dialogs. The expected effect of
Two subsequent perception experiments were carried out to determine whether the phonetic difference of the high peak or of the low turning point is functionally relevant. The high peak and the low turning point were manipulated separately from each other in two different experiments. The next section describes the phonetic manipulation of the accentual peak on listeners' interpretation in relation to contrast, the third perception experiment investigates the role of the low turning point itself.
Given that original stimuli are appropriately categorized according to focus contexts (perception Experiment 1), and in line with previous findings on the effect of contrast on accentual peaks (Ladd and Morton,
We test this prediction by manipulating the scaling of the H* accent successively. The sentences for the H* manipulation were taken from the same speaker used for the first experiment. To keep the total amount of stimuli in a manageable size for a perception study, a total of four target sentences including disyllabic and trisyllabic target words were chosen for the manipulation procedure. These sentences were realized in broad focus contexts, and in contrastive focus contexts yielding eight sentences in total. For each of the 4 sentences, the manipulation of the H* peak was done in relation to the corresponding prenuclear accent on the subject; Figure
Forty-eight undergraduate students from Potsdam University (13 male, 35 female) participated in the experiment. They were native speakers of German in their twenties and reported no speech or hearing impairment. The participants were naïve as to the purpose of the experiment and did not participate in perception Experiment 1. Each participant received course credit for participation. Participants were divided into two groups to listen to either the first or the second experimental set.
If a phonetic cue for contrastiveness (e.g., a higher H*), has an effect on the perception of contrast, it will influence the congruency ratings in the two contexts differently: In a contrastive context condition, an effective cue for contrastiveness will lead to more congruency judgements. In a non-contrastive context, an effective cue for contrastiveness will lead to less congruency judgements. For the H* accent manipulation, we expect thus that for contrastive contexts higher F0 peaks (manipulation step 5) cause a perceptual impression of contrastiveness, both in originally congruent (CF–CF) and originally incongruent dialogs (CF–BF) (cf.Baumann et al.,
Figure
As described for perception Experiment 1, Section 3.1.4, we fit a multilevel model with
(Intercept) | 0.8907 | 0.1625 | 5.482 | 0.00001 | |
context = |
−0.3338 | 0.3401 | −0.982 | n.s. | 0.32629 |
manipulation = |
0.9754 | 0.2031 | 4.804 | 0.00001 | |
Interaction | 0.9477 | 0.2747 | 3.450 | 0.00056 |
We computed a Pearson product-moment correlation coefficient to assess the relationship between the manipulation steps and the congruency ratings, separately for each dialog type. Figures
The results reveal two major aspects. First, the manipulation of the pitch peak has a significant effect on the interpretation of the pitch accent. The higher the peak the more often were stimuli rated as congruent in the contrastive focus context. This result was independent of stimulus origin, i.e., whether a stimulus was originally uttered in a broad or contrastive context did not affect its interpretation. It is thus the F0 height (in relation to the previous pitch accents) that caused the perception of contrastiveness in the experiment. This result is in line with previous findings and assumptions on the relationship between contrastive focus and its prosodic realization in German (Bannert,
Second, the obtained significant effect for m
Given the significant interaction of m
This experiment investigates the role of the low turning point in F0 of the nuclear rise-fall contour, more specifically the issue whether the height of the low turning point interacts with the contrastivity of the context. The sentences for the low turning point manipulation were the same as the ones used for perception Experiment 2. Each sentence was manipulated at the position of the low turning point, cf. Figure
Using a Praat script, manipulation procedure was as follows: The F0 contour of the original file was stylized. The F0 points at the onset of the target word and at the accentual peak were retained and the F0 points between them were deleted. At the time of the label of the low turning point (see production study) a pitch point was inserted, and pitch was interpolated between the remaining pitch points. The end points of the F0 height continuum of the inserted pitch points were determined relative to the F0 height that was produced in the utterance. A distance of two standard deviations from the mean in both directions resulted in a manipulation range from 150 to 190 Hz for each sentence. Thus, five stimuli with a difference of 10 Hz between the low turning points were created, cf. Figure
Each manipulated target sentence was concatenated with an originally congruent context question (BF–BF, CF–CF) and with an originally incongruent context question (CF–BF, BF–CF), resulting in a total of 80 target sentences (4 sentences × 2 focus conditions × 2 contexts × 5 manipulations). These 80 target sentences were scaled at an intensity of 70 db, and stimuli were subdivided into two lists of 40 stimuli each (see the Supplementary Material for the stimuli and their groupings). The experimental task was identical to that one of perception Experiment 2. The experiment lasted approximately 15 min.
Forty-eight undergraduate students from Potsdam University (16 male, 32 female) with no hearing deficits took part in this perception experiment. They did not take part in the first or second perception experiment. They were all in their twenties, and were either paid for participation or received course credit points. Participants were divided into two groups to listen to either the first or the second experimental set.
As in the previous experiment, we predict a significant interaction of the factors m
Figure
As described for perception Experiment 1, Section 3.1.4, we fit a multilevel model with
(Intercept) | 0.0888 | 0.1736 | 0.512 | n.s. | 0.6089 |
context = |
0.4521 | 0.2688 | 1.682 | n.s. | 0.0926 |
manipulation = |
0.1771 | 0.1479 | 1.198 | n.s. | 0.2311 |
Interaction | −0.2227 | 0.1479 | −1.506 | n.s. | 0.1321 |
As for Experiment 2, we computed a Pearson product-moment correlation coefficient to assess the relationship between the manipulation steps and the congruency ratings, separately for each dialog type. Figure
The results of the manipulation of the low F0 turning point reveal two aspects. First, independently of the prosodic manipulation, congruent context-target dialogs were rated better than incongruent dialogs. Second, the non-significant interaction of m
This study was concerned with the phonetics of the nuclear rise-fall contour in German. In particular, we investigated how the phonetic realization of the rise-fall contour interacts with contexts that require a contrastive or broad focus interpretation in the answer. To this end, a production experiment and a series of perception experiments were carried out. The analysis of the production data revealed that contrastive focus changes the phonetics of the rise-fall contour. Speakers realized significantly higher and later F0 peaks in contrastive contexts. The realization of the low turning point prior to the accentual peak showed no significant differences. The fact that contrastive focus raises nuclear H* accents in German confirms earlier results (Baumann et al.,
A series of semantic congruency experiments investigated the perceptual role of the phonetic differences found in the production experiment. The first perception experiment investigated whether listeners were able to perceive the phonetic differences found in production as a function of focus using congruent (BF–BF and CF–CF) and incongruent dialogs (BF–CF and CF–BF). Interestingly, the results of the perception study show that listeners are able to distinguish between congruent and incongruent dialogs, (see Figure
In order to investigate which parts of the rise-fall contour functionally interact with a contrastive interpretation, two separate perception experiments were conducted that examined whether the higher scaling of H* accents causes the perceptual impression of contrastive focus, or whether the lower scaling of the low turning point is a sufficient phonetic cue. To this end, sentences with manipulated height values of the H* peak, and of the low turning point were generated, respectively. The perception of the H* accent manipulation revealed that a higher scaling of the H* accent increased the perceptual impression of a contrastive accent. Specifically, contrastive contexts required higher F0 values. Broad focus context allowed both, lower and higher H* values, see (Féry and Kügler,
The experiments presented in this paper are partly related to the debate of how to analyse pitch accents, the so called “on-ramp” vs. “off-ramp” approach (Gussenhoven,
Related to the present study, a rise-fall contour is phonologically analyzed as L+H* L− in the “on-ramp” approach (Grice et al.,
From the off-ramp perspective, a rise-fall contour is analyzed as a phonological fall H*+L following a phonetic rise (Féry,
Similarly, structural conditions were found as evidence for an off-ramp analysis of Dutch prenuclear falling accents (Chen,
As an alternative to the on-ramp and off-ramp interpretations of tonal contours, there are languages exhibiting tones that do not carry meaning, e.g., the accentual phrase tones in Tokyo Japanese (Gussenhoven,
Along these lines, our perceptual results of the manipulated stimuli may suggest that the onglide toward a high accentual F0 peak is either a phonetic transition (in the sense of the off-ramp approach) or a leading low tone that does not carry meaning. If the rise would have been a reflex of a phonological tone (L+) that carries a contrastive meaning as in English (Pierrehumbert and Hirschberg,
This study investigated the phonetics of the rise-fall contour in German. In particular, it was tested whether phonetic differences in the rise-fall contour were realized in relation to contrastive and non-contrastive contexts, and which parts of the rise-fall contour seem to play a functional role in perception. The acoustic analysis of nuclear rise-fall contours elicited in broad and contrastive focus contexts revealed a significant difference for the realization of the accentual high tone, yet not for the low F0 turning point prior to the accentual high. In a series of semantic congruency perception tests, listeners judged the congruency of congruent and incongruent context-stimulus pairs on the basis of three different sets of stimuli: (i) original data from the production study in congruent contexts and cross-spliced yielding incongruent dialogs, (ii) stimuli with manipulated accentual high tone that were combined with originally congruent contexts and, again, cross-spliced with originally incongruent contexts, and (iii) stimuli with manipulated low F0 turning point of the rising part of rising-falling accent shapes, again combined with congruent and incongruent contexts. The first perception experiment revealed that listeners distinguish between nuclear rising-falling contours with respect to their focus context. The second perception experiment revealed that independent of stimulus origin, higher F0 peaks were rated significantly more frequent as congruent to contrastive focus contexts than lower peaks; hence, the scaling of the nuclear peak determined its contextual interpretation in our experiments as assumed in the literature on German intonation (Bannert,
The results of the perception experiments suggest that the scaling of the accentual peak is sufficient to license a contextual interpretation of a nuclear rising-falling accent shape (perception Experiment 2). The manipulation of a low F0 turning point prior to the accentual peak as a potential reflex of a low leading tone (L+) does not drive the perception as a function of focus context (perception Experiment 3). The results seem to support the view that focus affects the pitch register (Féry and Kügler,
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was supported by DFG-grants (KU 2323/1-2) to the project “Prosody in Parsing,” principle investigators Frank Kügler, Caroline Féry and Shravan Vasishth, and projects D5 and T2 in the SFB 632 “Information structure” at Potsdam University. We are grateful to Tobias Guenther who provided considerable technical assistance, and to Jana Häussler for statistical advice. Parts of this work were presented at the 17. ICPhS, HongKong 2011 published as Kügler and Gollrad (
The Supplementary Material for this article can be found online at: