Multimodal emotion perception after anterior temporal lobectomy (ATL)

Milesi, Valérie; Cekic, Sezen; Péron, Julie; Frühholz, Sascha; Cristinzio, Chiara; Seeck, Margitta; Grandjean, Didier

doi:10.3389/fnhum.2014.00275

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 05 May 2014
Sec. Cognitive Neuroscience
Volume 8 - 2014 | https://doi.org/10.3389/fnhum.2014.00275

Multimodal emotion perception after anterior temporal lobectomy (ATL)

Valérie Milesi^1,2*

Sezen Cekic^1,2

Julie Péron^1,2

Sascha Frühholz^1,2

Chiara Cristinzio^1,2,3

Margitta Seeck⁴

Didier Grandjean^1,2

¹Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
²Neuroscience of Emotion and Affective Dynamics Laboratory, Department of Psychology, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland
³Laboratory for Neurology and Imaging of Cognition, Department of Neurology and Department of Neuroscience, Medical School, University of Geneva, Geneva, Switzerland
⁴Epilepsy Unit, Department of Neurology, Geneva University Hospital, Geneva, Switzerland

In the context of emotion information processing, several studies have demonstrated the involvement of the amygdala in emotion perception, for unimodal and multimodal stimuli. However, it seems that not only the amygdala, but several regions around it, may also play a major role in multimodal emotional integration. In order to investigate the contribution of these regions to multimodal emotion perception, five patients who had undergone unilateral anterior temporal lobe resection were exposed to both unimodal (vocal or visual) and audiovisual emotional and neutral stimuli. In a classic paradigm, participants were asked to rate the emotional intensity of angry, fearful, joyful, and neutral stimuli on visual analog scales. Compared with matched controls, patients exhibited impaired categorization of joyful expressions, whether the stimuli were auditory, visual, or audiovisual. Patients confused joyful faces with neutral faces, and joyful prosody with surprise. In the case of fear, unlike matched controls, patients provided lower intensity ratings for visual stimuli than for vocal and audiovisual ones. Fearful faces were frequently confused with surprised ones. When we controlled for lesion size, we no longer observed any overall difference between patients and controls in their ratings of emotional intensity on the target scales. Lesion size had the greatest effect on intensity perceptions and accuracy in the visual modality, irrespective of the type of emotion. These new findings suggest that a damaged amygdala, or a disrupted bundle between the amygdala and the ventral part of the occipital lobe, has a greater impact on emotion perception in the visual modality than it does in either the vocal or audiovisual one. We can surmise that patients are able to use the auditory information contained in multimodal stimuli to compensate for difficulty processing visually conveyed emotion.

Introduction

The ability to decode emotional information is crucial in everyday life, allowing us to adapt our behaviors when confronted with salient information, both for survival and for social adaptiveness purposes. The emotional features of objects in the environment have been shown to bring about an increase in the neuronal response, compared with the processing of non-emotional information (for a review, see Phan et al., 2002). The role that different brain regions play in decoding emotional information appears to depend on the modality. Furthermore, research has shown that both the primary and secondary sensory regions are modulated by emotion. For example, visual extrastriate regions are modulated by emotions conveyed by facial expressions (e.g., Morris et al., 1998; Pourtois et al., 2005a; Vuilleumier and Pourtois, 2007), while temporal voice-sensitive areas have been shown to be modulated by emotional prosody (e.g., Mitchell et al., 2003; Grandjean et al., 2005; Schirmer and Kotz, 2006; Wildgruber et al., 2006; Frühholz et al., 2012).

According to Haxby’s face perception model (Haxby et al., 2000), visual information is processed along a ventral pathway leading from the primary visual cortex (V1) to the fusiform face area (FFA) and inferior temporal cortex (ITC). Face perception is sufficient to activate the FFA (see, for example, Pourtois et al., 2005a; Kanwisher and Yovel, 2006; Pourtois et al., 2010), but the activity of this structure is enhanced when the facial information is emotional (see, for example, Breiter et al., 1996; Dolan et al., 2001; Vuilleumier et al., 2001; Williams et al., 2004; Vuilleumier and Pourtois, 2007). Another structure whose activity increases when decoding emotional facial information is the amygdala (see, for example, Haxby et al., 2000; Calder and Young, 2005; Phelps and LeDoux, 2005; Adolphs, 2008). In monkeys, this structure has been shown to project to almost every step along the visual ventral pathway (Amaral et al., 2003). Human studies, meanwhile, have suggested that connectivity between the amygdala and the FFA is modulated by emotion perception (Morris et al., 1998; Dolan et al., 2001; Vuilleumier et al., 2004; Sabatinelli et al., 2005; Vuilleumier, 2005; Vuilleumier and Pourtois, 2007).

Regarding the amygdala’s role in emotion perception, the current hypothesis is that this structure detects salience, a general feature of emotion (for a discussion, see Sander et al., 2003; Armony, 2013; Pourtois et al., 2013), through reciprocal connections with the cortex (Amaral et al., 2003). Its main function is to facilitate attention and perception processing (e.g., Armony and Ledoux, 1997, 1999; Whalen, 1998; Vuilleumier et al., 2001) without explicit voluntary attention (for a review, see Vuilleumier and Pourtois, 2007). According to Ledoux’s (2007) model, the amygdala’s output is directed both to regions that modulate bodily responses (via the endocrine system related to the autonomic system), and to the primary and associative cortices. These encompass regions modulated by emotion such as in the extrastriate visual system, the FFA for face perception, and the voice area in the superior temporal gyrus (STG; including the primary auditory region).

Further insight into emotional face perception and its subcortical bases has been provided by studies of patients with lesions of the amygdala. More specifically, studies have assessed patients with temporal lobe epilepsy whose lesions are linked either to the epileptogenic disease itself or else to its surgical treatment (see, for example, Cristinzio et al., 2007). These studies included patients with congenital or acquired diseases resulting in bilateral lesions, and patients with unilateral epilepsy arising from mesial temporal sclerosis who had undergone lobectomy with amygdalectomy. Patients with bilateral damage have been found to display impaired fearful face perception (Adolphs et al., 1994, 1995; Young et al., 1995; Calder et al., 1996; Broks et al., 1998) and deficits in the perception of surprise and anger (Adolphs et al., 1994). Unilateral lesions have yielded either no differences (Adolphs et al., 1995; prior surgery, Batut et al., 2006) or else a deficit for patients with right-sided lesions covering either a range of emotions (Anderson et al., 2000; Adolphs and Tranel, 2004) or solely fearful faces (prior surgery, Meletti et al., 2003). Palermo et al. (2010) found that both left- and right-lesion groups exhibited a deficit in fear intensity perception, but the left-lesion group was more impaired for fear detection. Anterior temporal lobectomy with amygdalectomy is generally expected to affect the perceived intensity of facial emotional expressions. The functional explanation for this is a lack of modulation by the amygdala of the ventral visual processing network and, more specifically in the case of emotional faces, of the FFA.

In addition to visual emotional information, the amygdala has been shown to be associated with different responses to emotional vocalizations. According to Schirmer et al. (2012), the processing of auditory information takes place along three streams in the temporal lobe: a posterior stream passing through the posterior part of the superior temporal sulcus (pSTS) for sound embodiment; a ventral stream directed toward the middle temporal gyrus (MTG) for concept processing; and an anterior stream extending as far as the temporal pole (TmP) for the perceptual domain (i.e., semantic processing). Another specificity of emotional vocalization perception is the hemispheric specificity modeled by Schirmer and Kotz (2006). In their model, the left temporal lobe has a higher temporal resolution for processing information than the right hemisphere, and is more involved in linguistic signal processing (segmental information), with suprasegmental analysis taking place in the right hemisphere. The amygdala has been shown to be modulated by emotional vocalizations, including onomatopoeia (e.g., Morris et al., 1999; Fecteau et al., 2007; Plichta et al., 2011), and emotional prosody consisting either of pseudowords (e.g., Grandjean et al., 2005; Sander et al., 2005; Frühholz and Grandjean, 2012, 2013), or of words and sentences (e.g., Ethofer et al., 2006, 2009; Wiethoff et al., 2009).

In contrast to research on emotional face perception, studies of auditory emotion processing in patients with bilateral amygdala lesions have produced divergent results. Some have failed to find any effect at all on emotion recognition (semantically neutral sentences: Adolphs and Tranel, 1999; names and onomatopoeia: Anderson and Phelps, 1998). Others have reported either a general impairment (counting sequences: Brierley et al., 2004) or specific impairments for fear (semantically neutral sentences: Scott et al., 1997; non-verbal vocalizations: Dellacherie et al., 2011), surprise (Dellacherie et al., 2011), anger (Scott et al., 1997), or sadness perception (musical excerpt: Gosselin et al., 2007). There is a similar divergence for unilateral lesions, with either no effects (Adolphs and Tranel, 1999; Adolphs et al., 2001) or a specific impairment for fear (counting sequences: Brierley et al., 2004; meaningless words: Sprengelmeyer et al., 2010; non-verbal vocalizations: Dellacherie et al., 2011). To sum up current knowledge about auditory emotion processing, there is a strong hypothesis about right hemispheric involvement for emotional prosody. The amygdala appears to be involved in prosody perception, but may also be sensitive to the proximal context of the stimulus presentation (for a discussion, see Frühholz and Grandjean, 2013).

In the case of face-voice emotion integration, studies featuring audiovisual emotional stimuli have replicated the response facilitation effect at the behavioral level, namely an increase in perceptual sensitivity and reduced reaction times (e.g., Massaro and Egan, 1996; De Gelder and Vroomen, 2000; Dolan et al., 2001; Kreifelts et al., 2007), that has already been demonstrated in non-emotional studies (e.g., Miller, 1982; Schröger and Widmann, 1998). Responsibility for the behavioral improvement has been mainly attributed to various cortical substrates, including the left MTG (e.g., Pourtois et al., 2005b), the posterior STG (pSTG; e.g., Ethofer et al., 2006; Kreifelts et al., 2007), and, interestingly, the amydala, either bilaterally (e.g., Klasen et al., 2011) or the left side (e.g., Dolan et al., 2001; Ethofer et al., 2006; Müller et al., 2012). Animal studies have yielded a more detailed multimodal model, with different levels of integration. For instance, a rhinal cortex lesion, as opposed to a direct lesion of the amygdala, is sufficient to disrupt associative mechanisms (Goulet and Murray, 2001). Meanwhile, a comparison of the roles of the perirhinal cortex (PRC) and the pSTS led Taylor et al. (2006) to suggest that the pSTS plays a presemantic integration role, while the PRC integrates higher level conceptual representations.

In summary, studies of the amygdala’s modal specificity have reported impairments in patients with temporal lobectomy or specific amygdalectomy for faces and either voices (Scott et al., 1997; Sprengelmeyer et al., 1999; Brierley et al., 2004) or emotion in music (Gosselin et al., 2007, 2011). However, some patients seem to have a specific deficit for visual emotional stimuli (Adolphs et al., 1994, 2001; Anderson and Phelps, 1998; Adolphs and Tranel, 1999). Discrepancies between studies have been explained by a number of different factors, including the date of epilepsy onset (e.g., McClelland et al., 2006), the nature and context of the stimuli (e.g., face presentation duration; Graham et al., 2007; Palermo et al., 2010). The fear specificity of amygdala processing has also been strongly called into question (for a discussion, see Cahill et al., 1999; Murray, 2007; Morrison and Salzman, 2010). To the best of our knowledge, however, the role of lesion size has not been taken in account thus far.

Our aim in the present study was to test whether the categorization and intensity perception of unimodal (i.e., either visual or non-verbal auditory emotional stimuli), as opposed to bimodal (i.e., audiovisual) emotional stimuli is modified in patients who have undergone unilateral temporal anterior lobectomy with amygdalectomy. The impact of anterior temporal lobe ablation is assumed to differ with modality. Regarding the auditory network, above and beyond the absence of voice area modulations owing to amygdala resection, Schirmer et al. (2012) suggests that the anterior temporal lobe is more involved in semantic processing, representing the final temporal step before the processing shifts to the frontal regions associated with emotion evaluation. We would therefore expect disruption of this input to have an impact on categorization, with patients making more mistakes or confusing more items than matched controls. For the visual modality, we would expect to find the same kind of deficit, stemming from the lack of emotion-related modulation of visual cortical input. Finally, for audiovisual material, we would expect to observe either a better preserved ability for correct detection and perceived intensity, if an intact pSTS and a more dorsal pathway toward the frontal lobe are sufficient to integrate audiovisual information, or no improvement because of the PRC lesion.

Participants rated the intensity of brief onomatopoeic vocalizations produced by actors (Bänziger et al., 2012) and animated synthetic faces (Roesch et al., 2011) on visual analog scales. At the group level, we expected the patients to have a higher error rate than controls when it came to identifying unimodal emotional stimuli. This has been shown to be the case in the visual modality for fearful faces (bilateral lesion: Adolphs et al., 1994, 1995; Young et al., 1995; Calder et al., 1996; Broks et al., 1998; unilateral lesion: Anderson et al., 2000; McClelland et al., 2006), and in the auditory modality for both fearful voices (bilateral lesion: Scott et al., 1997; Adolphs and Tranel, 2004; unilateral lesion: Scott et al., 1997; Brierley et al., 2004; Sprengelmeyer et al., 2010; Dellacherie et al., 2011) and angry voices (bilateral lesion: Scott et al., 1997). For the audiovisual stimuli, we expected to observe a higher error rate for fear identification, arising from the combined effects of the unimodal deficits in each modality. Regarding intensity perception, we expected to observe similar patterns, even after controlling for the extent of the lesion along the ventral pathway. Finally, we investigated the effects of lesion size on emotion recognition. We predicted that perception of emotion intensity would be modulated by the size of the lesion, with more extensive lesions resulting in impairment at different levels of information processing. We developed an additional hypothesis to explain the discrepant findings of previous studies.

Materials and Methods

Participants

We recruited five patients who had undergone unilateral anteromedial temporal lobectomy together with the unilateral removal of the amygdala. One patient (JP) had a lesion that extended to the occipital and posterior parietal lobes. The surgery had been performed to control the patients’ medically intractable seizures (see Figure 1 for the location and extent of their lesions): four on the left side (FB, 23 years old; CG, 37 years old; JP, 45 years old; and RS, 62 years old) and one on the right (CM, 31 years old). CG was the only woman in the patient group, and FB the only left-handed patient. Controls were recruited via local advertisements: 12 were matched with FB, CM, and CG for sex, handedness, and age; six with JP; and three with RS (see Table 1 for a summary and Table 2 for a detailed description of each patient). Patients did not exhibit any gnosis deficit in their respective neuropsychological tests. The study was approved by the local ethics committee, and all the participants gave their written informed consent. The controls received financial compensation (CHF 15) for taking part in the experiment.

FIGURE 1

FIGURE 1. (A) Anatomical images of the lesions for each patient: each lesion was delineated manually on the axial plane and corrected using the coronal plane. (B) Probability map for the normalized lesion size.

TABLE 1

TABLE 1. Participants.

TABLE 2

TABLE 2. Patient description.

Lesion Delimitation and Description

In order to compute the lesion size of each patient, anatomical images were segmented and normalized using a unified segmentation approach (Ashburner and Friston, 2005) together with the Clinical toolbox¹. Because of the cost function masking purpose (Andersen et al., 2010), lesion masks drawn on the patients’ anatomical scans were included in the brain segmentation. Structural images and lesion masks were normalized to MNI space with the DARTEL toolbox, using individual flow fields, which were estimated on the basis of the segmented gray (GM) and white matter (WM) tissue classes. The normalized lesion masks were used to calculate the lesion size for each patient in standard space.

CG had a left anterior temporal lesion with an intact inferior temporal gyrus (ITG) and lateral occipitotemporal gyrus (LOTG). The lesion area included the periamygdaloid cortex (PAM), entorhinal cortex (Ent), medial occipitotemporal gyrus (MOTG), inferior part of the hippocampus (Hi), parahippocampal gyrus (PHG), and amygdala, and ended in the lateral anterior portion of the temporal lobe, in the MTG and TmP.

CM had a right anterior temporal lesion extending to the middle and ventromedial part of the temporal lobe, including the inferior temporal pole (ITmP), ITG, Ent, PAM, PRC, amygdala, inferior Hi, STG, anterior fusiform gyrus (FuG), and rhinal sulcus. In the posterior part of the lesion, the PPo (planum polare), STG and STS were intact.

FB had a left anterior temporal lesion that included the TmP, MTG, MOTG, Ent, Hi, PAM, amygdala, anterior STG, and posterior temporal cortex (PTe). The lesion ended in two separate tails: one in the lateral anterior part of the temporal lobe, the other in the medial part.

JP had an extended left lateral resection including the temporal, frontal, parietal, and occipital lobes. The temporal part included the TmP, MTG, Ent, MOTG, ITG, PTe, anterior STG, and PHG. The frontal part included the lateral inferior and superior frontal gyri, precentral gyrus and postcentral gyrus. Finally, part of the lateral superior posterior occipital gyrus had been removed, but the FFA was intact.

RS had a left anterior temporal lesion encompassing the TmP, STG, MTG, MOTG, ITG, FuG, amygdala, anterior Hi, Ent, PAM, FuG and anterior PHG. It ended in the lateral anterior part. See Figure 1 for visual descriptions of the patients’ brain damage.

Stimuli and Procedure

Non-verbal auditory expressions were drawn from the validated Geneva Multimodal Emotion Portrayal (GEMEP) corpus (Bänziger et al., 2012). We selected angry, joyful, and fearful non-verbal sounds (“ah”) produced by two male and two female actors, on the basis of the recognition rate established in a previous pilot study. For the neutral stimuli, we chose the most neutrally rated vocal expressions produced by the same actors (neutrality rating: M = 26.5, SD = 15.67), and the fundamental frequency was flattened using Praat (Boersma and Weenink, 2011). Sounds were cut and/or stretched to achieve a duration of 1 s (mean duration before time stretch = 0.92 s, SD = 0.30 s) with SoundForge², and 0.025 s fade-ins and fade-outs were included using Audacity³. The dynamic faces were created with FACSGen (Roesch et al., 2011), which allows for the parametric manipulation of 3D emotional facial expressions according to the Facial Action Coding System (Ekman and Friesen, 1978). They were selected on the basis of results of a previous study in which participants assessed the gender and believability of each avatar (Roesch et al., 2011). The lips were animated to match the intensity contour of each different sound for both unimodal visual and audiovisual items. The action units (AUs) for each emotion began at 0.25 s and ended at 0.75 s after onset, with their apex at 0.5 s (100% intensity). VirtualDub⁴ was used to generate the image sequences and to combine the voiced sounds with them at a rate of 26 frames per second (the final image was a dark screen).

After signing the consent form, participants completed the behavioral inhibition system (BIS)/behavioral approach system (BAS) scales and the state trait anxiety inventory (STAI) on a web interface. They then rated the intensity of 216 items in unimodal [auditory (A), or visual (V)] and audiovisual (AV; congruent: same information in both modalities; incongruent: one modality emotional, the other neutral) conditions. The unimodal and congruent audiovisual stimuli could either express the emotions of anger, fear, or joy, or be neutral (control condition). Each condition (modality, emotion, or congruency) was repeated 12 times. Items were presented using E-Prime (standard v2.08.90⁵) in a pseudorandomized order to avoid repetition of the same stimulus (i.e., synthetic face or actor’s voice) or condition. The participants gave their answers by clicking on a continuous line between Not intense and Very intense for six different emotions (disgust, joy, anger, surprise, fear, sadness), plus neutral. In each trial, they could provide ratings on one or more scales. At the end of the experiment, they completed a debriefing questionnaire.

Statistical Analysis

Since multiple intensity scales were used to collect the answers, our data mostly contained zero ratings. To assess the interactions, we therefore ran a zero-inflated mixed model on congruent trials only, using the glmmADMB package for R⁶. This allowed the excess zeroes and remaining values to be modeled as binomial responses, and modeled the distribution as a generalized linear model (GLM) following a negative binomial distribution. Main effects were tested for group (control vs. patient), modality (audio, visual, audiovisual), and emotion (anger, fear, or joy, plus neutral). Contrasts were performed to test specific hypotheses.

The first hypothesis we tested was a group effect for a specific emotion on the target scale (e.g., fearful item ratings on the fear scale) for each modality (A, V, AV). Sex, age, and normalized lesion size were added as control variables. Participant and stimulus ID were added as random effects. A different model was run for each of the three emotions, plus neutral. Second, four different models, one for each emotion, plus neutral, were tested in order to compare the impact of the three different modalities in each group. For instance, for angry item ratings on the anger scale, the modalities were tested in pairs (AV-A, AV-V, A-V) for the patient group, and individually for the control group. For this second set of models, we added the same control and random variables as for the first model. The third model was run to investigate the lateralization effect of the lesion for a specific modality and a specific emotion, controlling for handedness, age, and sex, and with random effect variables for participant ID and stimulus ID. Owing to the limited size of our patient sample, this comparison was of a purely descriptive and exploratory nature. In order to test whether the effects we found in the different modalities were perceptual or emotional, we ran a complementary analysis to compare emotional versus neutral items in each modality and each group, adding age, sex, and normalized lesion size as control variables, and participant ID and stimulus ID as random effects. Intergroup effects were also tested for emotional versus neutral items in each modality (A, AV, V), with the same control variables. Finally, we tested the impact of lesion size by including the number of voxels in a separate linear model for each emotion and each modality. In this final set of models, random effect variables (participant ID and stimulus ID) were added.

Results

Categorical Responses

Participants could rate the intensity of each item on six different scales (anger, disgust, fear, surprise, joy, sadness, and neutral). For each item, we identified the scale with the highest rating, and calculated a proportional corrected score for each participant (Heberlein et al., 2004; Dellacherie et al., 2011), by looking at how many other members of the participant’s group (patient or control) had given the same response. This score could range from 0, meaning that nobody else in the group had chosen the same scale, to 1, meaning that everyone in the group had chosen the same scale. This type of correction is used to weight labeling errors, bearing in mind that some errors are more correct than others. For instance, it is easier to confuse visual fear and surprise (see, for example, Etcoff and Magee, 1992) than it is to confuse fear and anger, as the first two expressions share a number of AUs. For vocal expressions, confusion is also possible, but between different pairs of emotions (see, for example, Banse and Scherer, 1996; Belin et al., 2008; Bänziger et al., 2009).

Using these corrected scores, we looked for possible differences between the two groups. As our data violated the assumptions of homoscedasticity and normal distribution, we ran non-parametric tests for multiple groups. In order to pinpoint differences between the groups within a specific emotion in a specific modality, we used the Kruskal–Wallis test, calculating z scores and p values corrected for multiple comparisons of mean ranks (z′). These multiple comparisons are summarized in Figure 2. The control group was more accurate than the patient group in recognizing joy, whether it was expressed vocally (z′ = 3.02, p < 0.005), visually (z′ = 3.17, p < 0.005), or bimodally (z′ = 3.19, p < 0.005). Greater accuracy within the control group was also observed for visual anger (z′ = 2.99, p < 0.005), vocal fear (z′ = 2.78, p < 0.01) and - marginally - visual (z′ = 1.69, p = 0.08) and bimodal fear (z′ = 1.89, p = 0.058). Finally, a reverse group effect was observed for the neutral vocal (z′ = 3.64, p < 0.001) and audiovisual (z′ = 3.64, p < 0.001) stimuli.

FIGURE 2

FIGURE 2. Mean proportional corrected scores for patients and controls, taking modality and emotion into account (bars represent the standard error of the mean, *p < 0.05, **p < 0.005, ***p < 0.001).

Finally, we tested the impact of lesion size on the corrected hit rate for emotion recognition. We ran supplementary analyses using a GLM to test this effect with the modality (A, AV, V) and emotion (anger, joy, fear, neutral) factors, and added the normalized lesion size as a covariate. The control variables were age, sex, and lateralization. We observed a significant linear relationship between normalized lesion size and corrected hit score for visual anger (z = -2.91, p < 0.005), visual joy (z = -2.37, p < 0.05) and visual neutral stimuli (z = -3.52, p < 0.001). All the linear regressions were negative, meaning that the more extensive the lesion, the lower the corrected score. We observed no such effect for fear in the visual modality, as patients did not recognize this emotion (their corrected score was equal to 0), confusing it with surprise.

Intensity Perception

Using a GLM, we first compared the two groups on each specific emotion in each specific modality, controlling for sex, age, and normalized lesion size, and adding participant and stimulus ID as random effects. No significant results were observed, even for the fear items. However, when we ran pairwise comparisons of the modalities for a specific emotion on its target scale and for a specific group, we did observe significant effects, especially for the three emotions (see Figure 3). Patients provided higher intensity ratings of audiovisual versus unimodal visual information for angry (z = -4.14, p < 0.001), joyful (z = -6.14, p < 0.001), fearful (z = -8.45, p < 0.001), and neutral (z = -5.61, p < 0.001) items. They also provided higher intensity ratings of auditory versus visual information for the same emotions (anger: z = -4.14, p < 0.001; joy: z = -6.14, p < 0.001; fear: z = -8.45, p < 0.001; neutral: z = -5.61, p < 0.001). The differences between audiovisual and unimodal auditory information were not significant for any of the emotions (p > 0.15). In the control group, a slightly different pattern emerged for anger and joy. Anger was given a higher intensity rating in the audiovisual condition than in either the auditory (z = -3.27, p < 0.001) or visual (z = -10.93, p < 0.001) condition, and a higher rating in the auditory condition than in the visual one (z = -6.94, p < 0.001). For joy, audiovisual information was perceived of as more intense than visual information (z = -12.69, p < 0.001), but auditory information was given a higher intensity rating than both audiovisual information (z = 3.07, p < 0.005) and visual information (z = -15.58, p < 0.001). Finally, fear stimuli were rated as more intense in the audiovisual modality than in the visual one (z = -11.74, p < 0.001), and also more intense in the auditory modality than in the visual one (z = -12.33, p < 0.001). No significant differences were observed between the modalities for neutral stimuli (p > 0.4).

FIGURE 3

FIGURE 3. Boxplot of GLM results for intensity ratings of each emotion on the corresponding target scale. Each box corresponds to a specific modality (A: auditory, AV: audiovisual, V: visual) and a specific group (patients vs. controls). The difference between A and AV for the controls is almost invisible, as zero values data were included in the plot. *Indicates a significant difference between modalities (pairwise).

In order to ascertain whether the results were perceptual or emotional, we tested another model contrasting emotional versus neutral stimuli for each group and each modality (A, V, AV). Controls rated emotional auditory items as more intense than neutral auditory items (z = 5.61, p < 0.001), and this was also the case for audiovisual information (z = 5.15, p < 0.001). By contrast, the patients provided higher intensity ratings for neutral items than they did for emotional items in the auditory (z = -2.64, p < 0.01) and audiovisual (z = -2.42, p < 0.05) modalities. In the visual modality, patients (z = -3.56, p < 0.001) and controls (z = -8.40, p < 0.001) alike gave higher intensity ratings for neutral items than for emotional ones. When we compared the two groups on emotional and neutral items in each modality, we found that the patients rated the intensity of the neutral items more highly than the controls did in the auditory modality (z = 2.24, p < 0.025). For the audiovisual modality, the effect was only marginal (z = 1.86, p = 0.062).

Intensity Perception and Lesion Effect

We then assessed the impact of lesion lateralization for each specific emotion in each specific modality. In this GLM analysis, we compared the patients’ ratings on the target scale according to the side of their lesion, controlling for handedness, age, and sex, and adding participant and stimulus ID as random effects (see Figure 4). The patient with a right lesion was found to provide higher intensity ratings than the patients with left lesions, but only for angry faces (z = -4.36, p < 0.001) and auditory joy (z = -3.23, p < 0.005). All other significant effects concerned the opposite relationship, namely, the patients with left lesions rated the items as more intense than the patient with a right lesion did. This was the case for visual joy (z = 3.19, p < 0.005), auditory fear (z = 8.29, p < 0.001), audiovisual fear (z = 8.23, p < 0.001), and audiovisual neutral items (z = 3.67, p < 0.001).

FIGURE 4

FIGURE 4. Boxplot of GLM results for intensity ratings of each emotion on the corresponding target scale. Each box corresponds to a specific modality (A: auditory, AV: audiovisual, V: visual) and a specific patient subgroup (left lesion vs. right lesion). *Indicates a significant difference between left and right lesion conditions.

When we added the normalized lesion size as a covariate and compared the interactions of perceived intensity and modality for a specific emotion on the target scale, we observed a massive effect in the visual modality across all emotions: the larger the lesion, the less intensely the patients perceived visual anger (z = -2.90, p < 0.005), visual joy (z = -2.79, p < 0.005), and visual fear (z = -2.96, p < 0.005). Neutral visual stimuli, however, failed to reach significance (p > 0.15). This relationship also held good for audiovisual joy (z = -2.15, p < 0.051), but no significant effects were observed either for other audiovisual expressions (angry, fearful or neutral), or for auditory stimuli (p > 0.15).