# **WHAT MAKES WRITTEN WORDS SO SPECIAL TO THE BRAIN**

**Topic Editors Mohamed L. Seghier, Urs Maurer and Gui Xue**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2015 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-379-0 **DOI** 10.3389/978-2-88919-379-0

# *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **WHAT MAKES WRITTEN WORDS SO SPECIAL TO THE BRAIN**

Topic Editors:

**Mohamed L. Seghier**, University College London, United Kingdom **Urs Maurer**, University of Zurich, Switzerland **Gui Xue**, Beijing Normal University, China

The involvement of different reading routes depends on the type of grapheme/phoneme correspondence of the language being read. Shallow orthographies with consistent grapheme/phoneme correspondences favour encoding via non-lexical pathways (assembled reading; green triangle), whereas deep orthographies with inconsistent grapheme/phoneme correspondences favour lexical pathways (addressed reading; violet triangle). For more details, see Figure 4 of Buetler et al. Buetler KA, de León Rodríguez D, Laganaro M, Müri R, Spierer L and Annoni J-M (2014) Language context modulates reading route: an electrical neuroimaging study. Front. Hum. Neurosci. 8:83. doi: 10.3389/fnhum.2014.00083

Reading is an integral part of life in today's information-driven societies. Since the pioneering work of Dejerine on "word blindness" in brain-lesioned patients, the literature has increased exponentially, from neuropsychological case reports to mechanistic accounts of word processing at the behavioural, neurofunctional and computational levels, tapping into diverse aspects of visual word processing. These studies have revealed some exciting findings about visual word processing, including how the brain learns to read, how changes in literacy impact upon word processing strategies, and whether word processing mechanisms vary across different alphabetic, logographic or artificial writing systems. Other studies have attempted to characterise typical and atypical word processes in special populations in order to explain why dyslexic brains struggle with words, how multilingualism changes the way our brains see words, and what the exact developmental signatures are that would shape the acquisition of reading skills. Exciting new insights have also emerged from recent studies that have investigated word stimuli at the system/network level, by looking for instance, at how the reading system interacts with other cognitive systems in a context-dependent fashion, how visual language stimuli are integrated into the speech processing streams, how both left and right hemispheres cooperate and interact during word processing, and what the exact contributions of subcortical and cerebellar regions to reading are.

The contributions to this Research Topic highlight the latest findings regarding the different issues mentioned above, particularly how these findings can explain or model the different processes, mechanisms, pathways or cognitive strategies by which the human brain sees words. The introductory editorial, summarising the contributions included here, highlights how varieties of behavioural tests and neuroimaging techniques can be used to investigate word processing mechanisms across different alphabetic and logographic writing systems.

# Table of Contents


*134 Opposite Effects of Visual and Auditory Word-Likeness on Activity in the Visual Word Form Area*

Philipp Ludersdorfer, Matthias Schurz, Fabio Richlan, Martin Kronbichler and Heinz Wimmer

*144 From Regular Text to Artistic Writing and Artworks: Fourier Statistics of Images with Low and High Aesthetic Appeal*

Tamara Melmer, Seyed A. Amirshahi, Michael Koch, Joachim Denzler and Christoph Redies

*159 Cross-Modal Integration in the Brain is Related to Phonological Awareness only in Typical Readers, not in those with Reading Difficulty*

Chris Mcnorgan, Melissa Randazzo-Wagner and James R. Booth


Jiayu Zhan, Hongbo Yu and Xiaolin Zhou

# What makes written words so special to the brain?

#### *Mohamed L. Seghier <sup>1</sup> \*, Urs Maurer <sup>2</sup> and Gui Xue3*

*<sup>1</sup> Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, London, UK*

*<sup>2</sup> Department of Psychology, University of Zurich, Zurich, Switzerland*

*<sup>3</sup> National Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China*

*\*Correspondence: m.seghier@ucl.ac.uk*

#### *Edited and reviewed by:*

*John J. Foxe, Albert Einstein College of Medicine, USA*

**Keywords: reading, word processing, learning, multilingualism, dyslexia, laterality, fMRI, ERP**

Reading is an integral part of life in today's information-driven societies. How the human brain sees and processes words is the focus of this ebook. It includes a collection of 22 papers that illustrate current issues in the neurobiology and psychophysics of word processing. Using varieties of behavioral tests and neuroimaging techniques, they investigated word processing mechanisms across different alphabetic and logographic writing systems, such as English, Chinese, Arabic, Japanese, French or German. Each paper provides useful literature reviews, methodological developments and a host of novel findings that will inspire future investigations into the neural systems that support reading.

Several behavioral, fMRI and ERP studies investigated how word-likeness modulated the underlying cognitive and neural processing. The fMRI studies focused on many regions of the reading system, in particular a region in the left ventral occipitotemporal cortex known as the visual word form area (VWFA), whereas the ERP studies featured prominently the N1/N170 component around 200 ms post-stimulus. Participants were mainly healthy skilled readers, although a few studies have also looked at reading in dyslexic, autistic, or congenitally deaf subjects.

Ludersdorfer et al. (2013) investigated how the VWFA responds to visual and auditory stimuli that differed in their word-likeness. VWFA activation decreased for visual stimuli from false-fonts over pseudowords to words, presumably reflecting more efficient processing of familiar words. In contrast, auditory stimuli lead to a general deactivation in visual areas for all stimuli, except for the VWFA where deactivation was spared for word and pseudoword stimuli, presumably due to modulation by linguistic information. Deng et al. (2013) used two cross-modal tasks, phonological retrieval of visual words and orthographic retrieval of auditory words, to examine the unimodal and multimodal regions for logographic language processing. The VWFA responded exclusively to visual inputs, whereas an adjacent region in the left inferior temporal gyrus showed comparable activation for both visual and auditory inputs. A role of the VWFA in integrating visual and auditory processes is also suggested by McNorgan et al. (2013) who reported correlations between a behavioral phonemic awareness task (phoneme elision) and neural activation in an audiovisual condition in typically developing children. No such correlations were found in children with reading disability, nor in any of the groups for unimodal stimuli.

Using ERP, an inverse word-like effect was also found for single letters by Herdman and Takai (2013), who reported an increased and delayed N1 for pseudoletters than letters, which was not modulated by attention. As studies using entire words typically report a reversed effect, this may suggest that processing single letters differs from processing letter strings and that early orthographic processing of letters is largely automatic. Hasko et al. (2013) not only showed a positive word-like effect in the N1 component, as the N1 was larger for letter than false font strings in normally reading children, but they also revealed that this effect was reduced in dyslexic children, presumably reflecting deficient orthographic processing in dyslexics. While Hasko et al. (2013) did not find N1 differences between pseudohomophones and words in children, Taha and Khateb (2013) found a larger N1 for pseudohomophones than words in Arabic, suggesting that such effect may depend on reading development, task, or properties of the writing system.

Orthographic analysis in the later part of the N1 also seems to be sensitive to stimulus repetition, as shown by Du et al. (2013). However, such N200 repetition effects appear to be delayed if word form configuration is changed, which can be achieved in Chinese by switching characters in two-morphemic words. Regarding the impact of orthographic depth on reading routes, Buetler et al. (2014) recorded electrical brain activity in highly proficient bilinguals who read the same pseudowords either in German or French. The topography of the ERPs to identical pseudowords differed 300–360 ms post-stimulus onset when the pseudowords were read in different orthographic depth context. Their findings suggest that reading in a shallow context relies more on non-lexical pathways with greater engagement of frontal phonological areas, whereas reading in a deep orthographic context recruits less non-lexical pathways with greater engagement of visuo-attentional parietal areas.

We note that many of these fMRI and ERP studies used conditions which differed in word-likeness, and reported either positive or negative relations in the observed neural activation. The reason for this divergence is still poorly understood, but probably reflects that orthographic processing includes both visual processing and modulation by linguistic information.

Regarding the elusive role of the VWFA, a review by Vogel et al. (2014) challenges current models that posit a functional specialization of the VWFA solely for words. They argue that the VWFA is not used specifically or even predominantly for reading. In their model, the VWFA is used in processing visually complex stimuli in "groups," and it is strongly connected to the dorsal attention network so that attention can be directed to familiar stimuli, such as words, in groups. They suggest that the VWFA can be seen as a brain region with specific processing characteristics rather than a brain region devoted to a specific stimulus class. Given the strong interactions between the reading and the attentional system, Montani et al. (2014) conducted a behavioral study to examine the impact of spatial attention on written word perception. They found that high frequency word identification was best in the neutral cue condition when attention was directed to both the possible locations, whereas pseudowords (and a similar trend for low frequency words) were better identified in the valid cue condition when attention was focused on the target location.

Beside the ventral visual stream, modality-specific responses and interaction between orthography and linguistic components are also found elsewhere. Kollndorfer et al. (2013) used independent component analysis on fMRI data, collected during two language comprehension and production tasks with visual and auditory stimuli, and showed that the intraparietal sulcus and the hippocampus were predominately activated in the visual modality. In reading Chinese compound words, Zhan et al. (2013) found that mixed pseudohomophones, which shared the first constituent with the base words, were more difficult to reject than non-pseudohomophone non-words, and pure pseudohomophones, which shared no constituent with their base words. This effect was accompanied by increased activation of bilateral inferior frontal gyrus, left inferior parietal lobule, and left angular gyrus, and stronger effective connectivity of a phonological pathway from left inferior parietal lobule to left inferior frontal gyrus for the mixed pseudohomophones. Hillen et al. (2013) used the "Landolt" paradigm to dissociate linguistic and orthographic brain networks from those involved in occulomotor control and attention. In this paradigm, subjects were asked to scan for targets (Landolt's rings) in a reading-like fashion from left to right when all letters were replaced by closed circles. Significant fMRI activations were identified in right superior parietal cortex and postcentral gyrus, most likely related to gaze-orienting, which suggests the usefulness of the "Landolt" paradigm in dissociating linguistic from non-linguistic factors during reading.

To depict the interactions between semantic and phonological processing areas, Boukrina and Graves (2013) assessed effective connectivity while participants read aloud words of high or low spelling-sound consistency, word frequency, and imageability. Semantic areas significantly interacted with phonological areas, and connectivity patterns depended on word properties. Interestingly, they found that modulation of the inferior temporal and angular gyri connectivity correlated with reading performance. Some of the connectivity patterns were better predicted by the connectionist than the dual-route cascaded model. Regarding the role of subcortical structures in reading, Oberhuber et al. (2013) found that the putamen was mainly involved in articulating speech during reading, as compared to picture and color naming. Intriguingly, pseudowords showed greater activation in the anterior putamen, which is consistent with the role of this subregion in the initiation of novel sequences of movements. In contrast, words showed greater activation in the posterior putamen, which is consistent with studies that associated this putaminal subregion with memory guided movement.

To investigate the effect of language experience on reading, Li et al. (2014) examined differences in brain activation between Chinese congenitally deaf individuals and hearing controls during character reading. They found that congenitally deaf individuals showed less activation than controls in left inferior frontal gyrus, but greater activation in several right hemisphere regions including inferior frontal gyrus, angular gyrus, and inferior temporal gyrus, and the deaf individuals who are fluent readers showed less activity in the right hemisphere. Regarding reading in other clinical populations, Moseley et al. (2013) used fMRI to investigate semantic deficits in adults with autism spectrum conditions. They found that, compared to typically developing controls, the high-functioning adults with autism showed a deficit in semantic processing of action-related words, which, intriguingly, significantly correlated with the hypoactivity of motor cortex to these items.

Another interesting topic is how lateralized reading processes interact with task and script. In a behavioral study, Perrone-Bertolotti et al. (2013) investigated hemispheric specialization and inter-hemispheric interactions during a lexical decision task within a divided visual field presentation of verbal material. The authors manipulated three types of information (i.e., perceptual, semantic, and decisional) to determine how the type of information modulates inter-hemispheric cooperation. Their findings suggest inter-hemispheric cooperation is less likely to emerge during pre-lexical (perceptual) and/or post-lexical (decisionmaking) processing, but mainly occurred during lexical semantic processing when the semantic information was shared between hemispheres. Koyama et al. (2014) tested whether left-lateralized fMRI activations for reading differ between first (L1) and second (L2) languages in bilingual L2 readers. They asked late L2 learners to perform a visual one-back matching task either in English or Japanese. Weaker left lateralization was observed in the posterior lateral occipital region for logographic Kanji compared with syllabic (Kana) and alphabetic (English) scripts. When both L1 and L2 scripts were non-logographic, functional lateralization did not differ between L1 and L2 scripts in any region. Remarkably, they showed that functional lateralization for L2 visual word processing predicted L2 reading competency.

As most of the above studies focused on single word learning, little is known about the neural correlates of text comprehension. To address this issue, Swett et al. (2013) reported that, compared to single word comprehension, left posterior cingulate cortex and left angular gyrus were activated only for discourse-level comprehension. Over the course of comprehension, reliance on the same regions in the semantic control network increased, while a region in intraparietal sulcus associated with attention decreased. In addition, central ideas are functionally distinct from peripheral ideas, showing greater activation in the posterior cingulate cortex and precuneus.

Last but not least, two papers used psychophysics and mathematical modeling to characterize other word properties. Starrfelt et al. (2013) explored the word superiority effect, which refers to the observation that when written stimuli are degraded by noise or brief presentation, letters in words are reported more accurately than single letters and letters embedded in non-words. With a novel combination of psychophysics and mathematical modeling, they showed that word superiority is due to single words being simply processed faster than single letters (at least for simple short words). However, there is a limit to this effect as letters are perceived more easily than words in particular when multiple stimuli are presented simultaneously. For neuroimaging studies interested in the impact of spatial frequencies upon brain responses along the different reading pathways, Melmer et al. (2013) introduced some Fourier spectrum based measures that are useful for assessing statistical image properties. They showed how those statistical properties can reflect more global aspects of text, including for instance its aesthetic appeal. Their findings suggested that the statistical properties of different categories (regular text, aesthetic writing, calligraphy, ornamental art) were similar across cultures.

Overall, the papers in this ebook illustrate the wide range of techniques that can be used to reveal the functional anatomy and the time course of activity within the reading system. The exciting new insights that emerged from those studies can deepen our understanding of the mechanisms of individual differences in learning to read, and may help to guide the discovery of novel diagnostic tools and biomarkers for reading disorders.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; accepted: 30 July 2014; published online: 22 August 2014. Citation: Seghier ML, Maurer U and Xue G (2014) What makes written words so special to the brain? Front. Hum. Neurosci. 8:634. doi: 10.3389/fnhum.2014.00634 This article was submitted to the journal Frontiers in Human Neuroscience. Copyright © 2014 Seghier, Maurer and Xue. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, dis-*

*tribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Language context modulates reading route: an electrical neuroimaging study

# *Karin A. Buetler1\*, Diego de León Rodríguez1, Marina Laganaro2 , René Müri <sup>3</sup> , Lucas Spierer1 and Jean-Marie Annoni <sup>1</sup>*

*<sup>1</sup> Neurology Unit, Laboratory for Cognitive and Neurological Sciences, Department of Medicine, Faculty of Science, University of Fribourg, Fribourg, Switzerland*

*<sup>2</sup> Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland*

*<sup>3</sup> Division of Cognitive and Restorative Neurology, Departments of Neurology and Clinical Research, Inselspital, University Hospital, University of Bern, Bern, Switzerland*

#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Giordana Grossi, State University of New York, USA Yaxu Zhang, Peking University, China*

#### *\*Correspondence:*

*Karin A. Buetler, Neurology Unit, Laboratory for Cognitive and Neurological Sciences, Department of Medicine, Faculty of Science, University of Fribourg, Chemin du Musée 5, CH-1700 Fribourg, Switzerland e-mail: karin.buetler@unifr.ch*

**Introduction:** The orthographic depth hypothesis (Katz and Feldman, 1983) posits that different reading routes are engaged depending on the type of grapheme/phoneme correspondence of the language being read. Shallow orthographies with consistent grapheme/phoneme correspondences favor encoding via non-lexical pathways, where each grapheme is sequentially mapped to its corresponding phoneme. In contrast, deep orthographies with inconsistent grapheme/phoneme correspondences favor lexical pathways, where phonemes are retrieved from specialized memory structures. This hypothesis, however, lacks compelling empirical support. The aim of the present study was to investigate the impact of orthographic depth on reading route selection using a within-subject design.

**Method:** We presented the same pseudowords (PWs) to highly proficient bilinguals and manipulated the orthographic depth of PW reading by embedding them among two separated German or French language contexts, implicating respectively, shallow or deep orthography. High density electroencephalography was recorded during the task.

**Results:**The topography of the ERPs to identical PWs differed 300–360 ms post-stimulus onset when the PWs were read in different orthographic depth context, indicating distinct brain networks engaged in reading during this time window. The brain sources underlying these topographic effects were located within left inferior frontal (German > French), parietal (French > German) and cingular areas (German > French).

**Conclusion:** Reading in a shallow context favors non-lexical pathways, reflected in a stronger engagement of frontal phonological areas in the shallow versus the deep orthographic context. In contrast, reading PW in a deep orthographic context recruits less routine non-lexical pathways, reflected in a stronger engagement of visuo-attentional parietal areas in the deep versus shallow orthographic context. These collective results support a modulation of reading route by orthographic depth.

#### **Keywords: reading, pseudoword, orthographic depth, grapheme-phoneme conversion, dual-route model, EEG, ERP, bilingual**

# **INTRODUCTION**

Grapheme to phoneme conversion is a critical step in reading processing, as notably evidenced by its role in literacy acquisition (Goswami, 1998; Seymour et al., 2003; Huang et al., 2004; Lallier et al., 2013) and, when impaired, in the emergence of language-related disorders including dyslexia (Goswami, 1998; Wheat et al., 2010). Referred to as orthographic regularity, the rules of grapheme to phoneme conversion vary considerably across stimuli and languages. Consequently, reading strategies must be adjusted depending on the writing systems involved (in terms of orthographic transparency-opacity; Katz and Feldman, 1983). However, how reading strategies and the underlying brain network are actually modified when reading languages with different orthographic regularities remains largely unresolved.

According to the dual route cascade model (Coltheart et al., 2001; for a review see Jobard et al., 2003), after letter identification, word reading processing may follow two pathways mapping differently graphemes to phonemes. On the non-lexical pathway, each grapheme is sequentially mapped to its corresponding phoneme (grapho-phonological assembling). The non-lexical route has been advanced to be predominantly involved when reading unfamiliar words or letter strings (non-words; Coltheart et al., 2001; Proverbio and Zani, 2003; Proverbio et al., 2004; Heim et al., 2005; Lu et al., 2011). In contrast, the reading of familiar words may preferentially involve the faster lexical pathways, where phonemes are retrieved from memory structures, i.e., from orthographic and phonological lexical entries (lexico-semantic access; Coltheart et al., 2001; Proverbio and

Zani, 2003; Proverbio et al., 2004; Lu et al., 2011; Fisher et al., 2012).

Studies identifying the neural correlates of the two routes by contrasting word versus pseudoword (PW) reading yielded diverging results. In reading tasks, greater activation for PWs than words was found in both, left occipito-temporal and inferior frontal regions (Xu et al., 2001; Mechelli et al., 2003; Kronbichler et al., 2004; Binder et al., 2005). In contrast, studies using lexical decision tasks have reported greater activation for words than PWs in left occipito-temporal cortices, along with stronger or equivalent activation to PWs in left inferior frontal regions (Fiebach et al., 2002; Rissman et al., 2003; Ischebeck et al., 2004; Binder et al., 2005). Finally, evidence has been found for an equal engagement of occipito-temporal regions in early word and PW reading (Jobard et al., 2003; Wilson et al., 2007). In a review of 35 neuroimaging studies, Jobard et al. (2003) suggest that pre-lexical processing does not differentiate between words and PWs and the selection in favor of one route occurs at later stages. On the non-lexical route, after pre-lexical processing, regular words and PWs are encoded via grapho-phonological conversion. The grapho-phonological route relies on left superior temporal, supramarginal, and inferior frontal areas (pars opercularis; BA 44). On the lexical route, after pre-lexical processing, regular and irregular words are encoded via lexico-semantic representations. These lexico-semantic areas involve basal inferior and posterior middle temporal and inferior frontal areas (pars triangularis; BA 45).

More recent studies confirmed these results by linking the lexical processing of words to bilateral posterior cingular, inferiormiddle temporal and temporo-parietal regions (Ischebeck et al., 2004). In contrast, left posterior superior temporal (Graves et al., 2008), supramarginal (Roux et al., 2012) and inferior frontal regions (Ischebeck et al., 2004; Nixon et al., 2004; Heim et al., 2005; Rodriguez-Fornells et al., 2006; Wheat et al., 2010) were identified to subserve non-lexical word and PW processing.

In addition, lesion-based studies of acquired dyslexiafound that surface dyslexia, where due to impaired lexical pathways patients are able to read pronounceable PWs and unable to read (irregular) words, goes in line with deficits in inferior temporal (Mechelli et al., 2005; Price and Mechelli, 2005), anterior temporal (Woollams et al., 2007) and anterior inferior frontal regions (Mechelli et al., 2005). In contrast, phonological dyslexia with impaired non-lexical pathways, where patients are able to read most highfrequency words (i.e., regular and irregular) and unable to process simple PWs, has been linked to deficits in left inferior-parietal (Rapcsak et al., 2009) and left inferior frontal regions (Feiz et al., 2006).

The inconsistent findings related to the anatomical underpinnings of reading routes may be due to differences in tasks applied, ranging from reading paradigms (Mechelli et al., 2003; Kronbichler et al., 2004; Binder et al., 2005) to lexical decision (Fiebach et al., 2002; Rissman et al.,2003;Ischebeck et al.,2004; Binder et al.,2005; Wilson et al., 2007) and rhyming tasks (Xu et al., 2001). In addition, the results may be impacted by differences of the stimuli used (especially differences in the lexicality of PWs). Finally, differences in the orthography of the language investigated may have influenced the findings, as some studies investigated languages with

regular (Fiebach et al., 2002; Ischebeck et al., 2004; Kronbichler et al., 2004) and irregular (Xu et al., 2001; Mechelli et al., 2003; Rissman et al., 2003; Binder et al., 2005; Wilson et al., 2007; Wheat et al., 2010) orthographies.

In addition to the degree of the lexicality or familiarity of the words being read, the orthographic depth hypothesis (Katz and Feldman, 1983; Katz and Frost, 1992) posits that the differential engagement of each reading pathway depends on the transparency of the language's grapheme to phoneme correspondence. Shallow orthographies (e.g., German and Italian) with consistent grapheme to phoneme correspondences favor encoding via non-lexical pathways (assembled reading strategy), whereas deep orthographies (e.g., French and English), with an inconsistent grapheme to phoneme correspondence favor lexical pathways (addressed reading strategy).

Of note, the engagement of a given pathway is not exclusive, i.e., reading processing generally involves both routes, but one route may be predominantly activated compared to the other depending on the orthographic depth index of the language (Heim et al.,2005; Mousikou et al., 2010; Timmer et al., 2012).

Only few studies have brought evidence for a modulation of brain activity during reading by orthographic depth of the used language. In a PET study contrasting English and Italian monolinguals, Paulesu et al. (2000) observed that English readers showed stronger activations than Italian readers within areas suggested to be involved in irregular word reading (left posterior inferior temporal and anterior inferior frontal). By contrast, monolingual Italian readers showed stronger activity than English readers in areas involved in phonological transcoding processing (left superior temporal). Simon et al. (2006) further showed on French monolinguals and French-Arabic bilinguals (with Arabic being the deeper orthography) that the N320 electroencephalography (EEG) component differentiated reading French words and PWs and Arabic words. The 300 ms latency has been associated with orthographic-linguistic processing, especially spelling-to-sound conversion (Bentin et al., 1999; Huang et al., 2004; Proverbio et al., 2004; Simon et al., 2004, 2006; Grainger et al., 2006; Hauk et al., 2006; Ashby et al., 2009; Carreiras et al., 2009). Similarly, in a study examining Hebrew bilinguals with a shallow and deep version of Hebrew script, Bar-Kochva and Breznitz (2012) showed larger event-related potential (ERP) amplitudes to the deep script 340 ms after word onset. Using the same paradigm with Hebrew bilinguals, Frost (1994) showed larger word frequency and semantic priming effects when reading words written in deep compared to shallow script. The author interpreted their results in terms of facilitated semantic access due to predominant engagement of lexical pathways.

Of note, relative early latencies have also been found to be critically engaged in graphemic/phonologic conversion. Wheat et al. (2010) found neurophysiological correlates to phonological processing starting as early as 100 ms after word onset in English readers. Proverbio and Zani (2003) and Sereno et al. (1998) found differences at 160 ms after word onset to support grapheme to phoneme conversion in Italian and English readers, respectively. In contrast, relatively late latencies were found in a rhyme task conducted by Rugg (1984), Rugg and Barrett (1987), suggesting the N450 to be critical for phonological processing. However, studies evidencing for an early (<200 ms; Sereno et al., 1998; Proverbio and Zani, 2003; Wheat et al., 2010), resp. late (>400ms; Rugg, 1984; Rugg and Barrett, 1987) grapho-phonological processing did not directly manipulate the impact of orthographic depth on grapheme to phoneme conversion. In contrast, studies explicitly manipulating the effect of orthographic depth consistently report latencies around 300 ms (Simon et al., 2006; Bar-Kochva and Breznitz, 2012) to be critically engaged in grapheme to phoneme mapping.

Collectively, the results indicate that a modulation of orthographic depth may impact reading routes around 300 ms after stimulus onset. Grapheme to phoneme mapping in languages with shallow orthographies seems to rely on regions involved in grapho-phonological processing (superior temporal, supramarginal and opercular inferior frontal regions), indicating an activation of non-lexical pathways. In contrast, grapheme to phoneme mapping in languages with deep orthographies seem to rely on regions involved in lexico-semantic processing (inferior and middle temporal and triangular inferior frontal regions), indicating an activation of lexical pathways. Thus, orthographic depth may indeed impact reading route selection.

However, because previous studies used between-subject or cross-language designs, the conclusions about the effect of orthographic depth that can be drawn from current literature are limited. In between-subject designs (Paulesu et al., 2000), intersubject heterogeneity resulting from a variety of socio-cultural differences may indeed account for the observed effects. For example, the differences found in the cited studies may reflect different reading habits, differences in education or intelligence across groups rather than differences in orthographic processing across languages. One way of minimizing confounds arising from intersubject comparisons is to investigate reading in bilingual subjects. Since there is evidence for a certain degree of independency in word processing for each language (Soares and Grosjean, 1984; Rodriguez-Fornells et al., 2006; Kovelman et al., 2008), bilingualism is an advantageous model to investigate reading strategies. Bilinguals, particularly natural bilinguals, have the possibility to engage in several language modes independently, i.e., to adapt to the specific linguistic constraints of a language (Soares and Grosjean, 1984; Rodriguez-Fornells et al., 2006). A bilingual reader, being native in a shallow and a deep language, should thus be able to apply an assembled strategy when reading the shallow orthography and an addressed strategy when reading the deep orthography. However, so far studies on reading strategies in bilinguals applied cross-language designs (Simon et al., 2006) or altered scripts within one language (Frost, 1994; Bar-Kochva and Breznitz, 2012). In cross-language designs, the effect of orthographic depth may be confounded with effects resulting from comparison across different linguistic stimuli. The same holds for comparisons between different stimuli within one language, as the impact of orthographic depth may be confounded with physical differences between the visual stimuli.

The aim of the present study was to investigate the impact of orthographic depth on reading route selection using an experimental design excluding possible effects related to differences in stimuli or readers. We presented the same PWs to highly proficient bilinguals and manipulated the orthographic depth of PW reading by embedding them among two separated language contexts respectively implicating shallow or deep orthography. The use of PWs as target stimuli will probably strengthen non-lexical processing independent of language contexts. In contrast, the reading route predominantly engaged during a language context will depend on its orthographic depth: the deep language context may strengthen lexical and the shallow language context non-lexical pathways. Consequently, if orthographic depth modulates reading routes, reading in the shallow context will support the non-lexical pathways routinely recruited to process PWs. In contrast, non-lexical pathways routinely recruited to process PWs may be less engaged when reading in the deep compared to the shallow context. Together, we predict a differential engagement of non-lexical pathways between pre-lexical and semantic processing stages (∼300 ms) reflected in a stronger activation of grapho-phonological (superior temporal, supramarginal and inferior frontal) areas in the shallow versus the deep language context when reading identical PWs across language context. The study of early/high proficient bilinguals enabled controlling for sociocultural effects and the use of identical stimuli across conditions for effects due to linguistic and/or physical differences, thus isolating the effect of orthographic depth in the 1 versus 1 within-subject design.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Fourteen healthy female French/German bilinguals participated in the study (all right-handed, Oldfield, 1971), aged 18–24 years (mean = 20.86 years, SD = 2.03 years). All participants learnt French and German before the age of six and showed balanced high proficiency across languages according the bilingual questionnaire they filled out (**Table 1**, see "Language evaluation"). No participant had a history of reading difficulties, neurological or psychiatric illness and all reported normal or corrected-to-normal vision. Each participant provided written, informed consent to participate in the study. The study was approved by the Ethics Committee of the University of Fribourg.

#### **LANGUAGE EVALUATION**

All participants filled out a questionnaire evaluating French and German language skills consisting of three parts (**Table 1**): immersion, self-evaluation and computer-based evaluation. To asses language immersion, participants were askedfor the age of acquisition, how long they lived in a region where predominantly German or French was spoken, which language they spoke with family members, during their childhood, in present activities, and if the language was acquired in school or out of school only. For the selfevaluation part, participants had to indicate in percentages how well they would estimate their reading, speaking, comprehension and writing skills. Finally, a sub-test from the computer-based DIALANG language diagnosis system (Zhang and Thompson, 2004) was performed to evaluate reading performance. Here, the task was to indicate for each of 75 stimuli whether it was a correct word in the corresponding language or a (highly word-like) PW. The score ranged between 0 and 1000, with a score >900 being mother tongue (L1) level and a score from 601 to 900 being fully functional with little to no difficulty in reading.

#### **Table 1 | French-German bilingualism characteristics of participants (***N* **= 14).**


*SD, standard deviation; \*p* < *0.05.*

#### **STIMULI**

Target stimuli of the study were orthotactic (i.e., orthographically legal) PWs composed of 4 to 6 letters (to avoid eye movements). One hundred and twenty PWs were generated using WordGen software (Duyck et al., 2004) and matched for their lexical distance to French and German language respecting summated bigram frequency [the frequency of all adjacent letter pairs of an item: e.g., for the item "word" the frequencies of the bigrams "wo," "or" and "rd" were summed up; French mean = 14005, German mean = 13773; *t*(119) = 0.631, *p* = 0.53; WordGen, Duyck et al., 2004], neighborhood size [the number of existing words that can be obtained by changing one letter of the item; French mean = 1.57, German mean = 1.45; *t*(119) = 0.781, *p* = 0.44; WordGen, Duyck et al., 2004], bi- and tri-gram legality (Lexique, New et al., 2001; lexikalische Datenbank; lexical database

(dlexDB), Digitales Wörterbuch der deutschen Sprache; Digital Dictionary of the German Language (DWDS), Geyken, 2007), onset phoneme legality (PWs started with a phoneme frequently used as a first phoneme in real words; Lexique, New et al., 2001; dlexDB, DWDS, Geyken, 2007) and letter (position independent and onset letter) frequency, which was fitted to the letter frequency distribution of each language (Best, 2005; CorpusDeThomas-Tempé, retrieved May 2012). Examples of PWs include: *Nate, Dand, Melle, Apase, Gantel,* and *Grutte*.

French and German words were presented in addition to the PWs to strengthen language context (see "Procedure and Task"). Four hundred and eighty French Words were selected from Lexique database (New et al., 2001) and 480 German Words were selected from CELEX database (Baayen et al., 1995). Words were closely matched across languages on length [WordGen, Duyck et al., 2004; French mean = 5 letters, German mean = 5 letters; *t*(958) = 0.000, *p* = 1.000], log-transformed lexical frequency [French mean = 1.59, German mean = 1.60; *t*(958) = 0.250, *p* = 0.803], neighborhood size [French mean = 3.31, German mean = 3.31; *t*(958) = 0.000, *p* = 1.000], summated bigram frequency [French mean = 11328, German mean = 11447; *t*(958) = 0.312, *p* = 0.755] and length in syllables [French mean = 1.52, German mean = 1.59; *t*(958) = 1.856, *p* = 0.064]. Examples of words include: *Noël, Trou, Année, Maman, Violon,* and *Esprit* (French) and *Maus, Kind, Draht, Seite, Prämie,* and *Lösung* (German).

One hundred and twenty symbol strings (symbols) were created by changing the font of the PWs to "symbols" in MS Word (Microsoft Corporation, 2010). symbols were intended to be part of future research and were not analyzed in the present study. Examples of symbols are: Nατε, αυδ, Mελλε, Aπασε, Γαυτελ, Γρυττε.

#### **PROCEDURE AND TASK**

The task in this study was to read aloud French and German words, PWs and symbols displayed on a computer screen.

Participants were seated in an electrically shielded and sound attenuated booth 90 cm in front of a 21-inch LCD screen. Stimulus delivery and response recording were controlled using E-Prime 2.0 (Psychology Tools, Inc., Pittsburgh, PA, USA). Stimuli were presented in the center of the screen and displayed in black font color on white background. Each trial started with the presentation of a fixation cross of 400 ms duration, followed by a pseudo-randomly determined stimulus (66% word, 17% PW or 17% symbol, see next paragraph) displayed for 472 ms to allow comfortable reading (Courier New, pt. 24). A response window displaying a fixation cross with a random duration between 1200 and 1700 ms was presented after the stimuli (inter trial interval; **Figure 1**).

Since the aim of the study was to investigate the effect of orthographic transparency on reading identical stimuli, orthographic depth of PW reading was manipulated by creating two separated language context sessions (experimental phase; **Figure 1**). To strengthen French language context (deep orthography), the 120 PWs (and 120 symbols) were embedded among 480 French words. In the French language context session, participants were asked to pronounce the PWs as if they were existing French words. To strengthen German language context (shallow orthography), the same 120 PWs (and 120 symbols) were embedded among 480 German words. In the German language context session, participants were asked to pronounce the PWs as if they were existing German words. The same procedure applied for the symbols, except that here, participants should try to recognize the symbols as lexical letter strings and pronounce them as if they were existing words in the given language (e.g., the symbol "μιoτ ε" could be read as "miwote"). E-Prime voice key was used to record audio responses and production latencies.

At the beginning of each language context session, a short text written in the corresponding language was presented in order to activate the given language. Next, a 2 min training block with words (not included in experimental phase) in the language of the selected context session was started to familiarize the procedure and verify the apparatus, before initiating the experimental phase.

**FIGURE 1 | Experimental paradigm.** Each trial started with the presentation of a fixation cross of 400 ms duration, followed by a pseudo-randomly determined stimulus (66% word, 17% pseudoword, or 17% symbols) displayed for 472 ms and terminated with a response window displaying a fixation cross with a random duration between 1200 and 1700 ms. Production latencies were recorded throughout each trial. Target stimuli of the study were pseudowords (PWs). To manipulate the orthographic depth of reading, the same PWs were embedded among two separated language context sessions: in the deep orthographic context, the words consisted of French and in the shallow orthographic context of German words. The order of language context sessions was randomized across participants.

To reduce fatigue, stimuli presentation of one language context session was divided into four blocks separated by 1–2 min breaks. One block comprised of randomly selected 30 PWs, 30 symbols, and 120 words and lasted around 6 min. The order of blocks was randomized across participants. Both language context sessions were separated by a pause of at least 10 min. The order of language context sessions was randomized across participants.

#### **EEG ACQUISITION AND PREPROCESSING**

Continuous EEG was acquired at 1024 Hz through a 128-channel Biosemi ActiveTwo system (Biosemi, Amsterdam, Netherlands) referenced online to the CMS-DRL ground, which functions as a feedback loop driving the average potential across the montage as close as possible to the amplifier zero. Electrode impedances were kept below 20 kOhm. EEG data preprocessing and analyses were conducted offline using Cartool (Brunet et al., 2011). EEG epochs from 100 ms pre-stimulus to 500 ms post-stimulus onset (i.e., 102 data points before and 512 data points after stimulus onset) were averaged and ERPs were calculated for each participant and condition (PW in French context versus PW in German context). symbols were excluded from analyses of the present study as they were intended to be the focus of future research. EEG epochs containing eye blinks or other noise transients were removed after visual inspection in addition to a ± 80 μV artifact rejection criterion at any channel. Data were band-pass filtered (0.18–40 Hz), notch filtered at 50 Hz and recalculated against the average reference. By removing slow drifts at the single epoch level, the high-pass filter resulted in a baseline correction on the whole epoch. Before group averaging, data at artifact electrodes from each participant were interpolated using a 3-dimensional spline algorithm (Mean 6.25% interpolated electrodes; Perrin et al., 1987). The average number ( ± SEM) of accepted epochs was 109 ± 2.90 for PWs in French context and 110 ± 2.11 for PWs in German context. These values did not differ statistically [*t*(13) = 0.377, *p* = 0.712], ruling out that our effects result from differences in signal-to-noise ratios across conditions.

#### **STATISTICAL ANALYSES**

#### *Behavioral analysis*

Response accuracy of PWs and words was assessed by auditory inspection of the audio files generated with E-Prime to determine whether different language contexts were created successfully. Expected pronunciations were a priori defined by a native German and a native French speaker. Five types of errors were defined: language intrusion (complete or partial German pronunciation in French context or complete or partial French pronunciation in German context), orthography (adding, exchanging or omitting letters), phonology (unusual phonological coding of correct orthographic form), intonation (wrong lexical stressing), and other errors leading to an incorrect response (e.g., abortion, correction, no response, pronunciation in a third language).

Examples of language intrusions demonstrated on the PWs "nate" (correct response German = ['na:t e ]; correct response French = [nat]),"melle" (correct response German = ['mεl e ]; correct response French = [mεl]) and "apsase" (correct response German = [a'pa:z e ]; correct response French = ['apa:z]) are: ['na:t e ] resp. ('mεl e ] (both complete) or ['apa:z e ] (partial) in French context and [nat] resp. [mεl] (both complete) or [a'pa:z] (partial) in German context. Examples for orthographic errors demonstrated on the PW "grutte" (correct response German = ['grυt e ]; correct response French = [gRyt]) are: ['gυt e ] or ['gυrt e ] in German and [gyt] or [gyrt] in French. Examples for phonological errors demonstrated on the PW "dand" (correct response German = [dant]; correct response French = [dã]) are: [tand] in German context and [dãd] in French context. Examples for intonation errors demonstrated on the PW "gantel" (correct response German = ['gantl]; correct response French = ['gãtεl]) are: [gan'tel] in German context and [gã'tεl] in French context. Phonetic notations are represented according to the International Phonetic Alphabet (IPA; International Phonetic Association).

To investigate whether response accuracy rates differentiate or interact across conditions, a 2 × 2 repeated-measures analysis of variance (ANOVA) with factors language context (French vs. German) and Stimulus Type (Words vs. PW) was performed. In addition, a paired *t*-test was performed contrasting PWs in French context versus PWs in German context.

To investigate whether error types in PW reading differentiate across language contexts, a one-way repeated-measures multivariate analysis of variance (MANOVA) was performed. language context (French, German) was included into the analysis as independent and Language Intrusion Errors, Orthographic Errors and Phonological Errors in PW reading were included as dependent variables. Due to low incidences (see first paragraph of "Behavioral Results"), intonation errors and errors labeled as "other" were excluded from the analysis to increase statistical power. A series of one-way univariate analyses were performed as *post hoc*

tests. To counteract alpha inflation due to multiple hypotheses testing in univariate analyses, Bonferroni correction was applied and significance threshold set at *p* < 0.02 (Dunn, 1961).

Production latencies [reaction times (RT)] were assessed with a speech analysis software (Praat; Boersma and Weenink, 2013) and compared across language context and Stimulus Type to determine whether they varied with the manipulated factors. Twelve participants were included into behavioral analyses and two had to be excluded due to invalid recordings. Trials containing RTs exceeding ± 2 standard deviations (SD) from the mean were considered as outliers/errors and excluded from analysis, which resulted in the removal of a total of 3% of trials from French context condition (mean number of excluded words = 15; mean number of excluded PWs = 4) and 3% of trials from German context condition (mean number of excluded words = 14; mean number of excluded PWs = 3).

To investigate whether production latencies differentiate or interact across conditions, a 2 × 2 repeated-measures ANOVA with factors language context (French vs. German) and Stimulus Type (Words vs. PW) was performed. In addition, a paired *t*-test was performed contrasting PWs in French context versus PWs in German context.

Unless otherwise stated, significance threshold was set at *p* < 0.05. All data analyses were performed using IBM SPSS Statistics 19 (2012).

#### **ELECTRICAL NEUROIMAGING ANALYSIS**

#### *ERP waveform analyses*

Waveform analyses were performed to determine time periods where ERP amplitude differences occurred between the conditions PWs in French context versus PWs in German context.

Time-frame wise paired *t*-tests were computed between the evoked potentials to the PW read in the French vs. in the German context for each electrode. Only differences lasting at least 11 time frames were retained with an alpha criterion of 0.05.

#### *Topographic patterns analyses*

A topographic pattern analyses was applied to the ERP to determine whether and when distinct configurations of brain network were engaged in response to the PWs when read in the French vs. German context. This approach is based on evidence that the ERP map topography does not vary randomly across time, but remains quasi-stable over 20–100 ms functional microstates before rapidly switching to other period of stable topography (Lehmann and Skrandies, 1980; Michel et al., 2004; Murray et al., 2008; Britz and Michel, 2011). Spatio-temporal segmentation summarizes ERP data into a limited number of topographical map configurations and identifies time periods during which different conditions evoke different configurations of the electric field at scalp. Because a change in the topography of the scalp-recorded electric field necessarily follows from a change in the configuration of the underlying brain's active generators, topographic modulations can be directly interpreted as the engagement of distinct brain networks (e.g., Lehmann and Skrandies, 1980).

The most dominant topographic maps appearing in the visual evoked potentials (VEPs) of the group-averaged ERPs from each condition over time were identified with a modified hierarchical cluster analysis, the topographical atomize and agglomerative hierarchical clustering (T-AAHC; Murray et al., 2008). The optimal number of clusters to describe the data set was identified using a modified Krzanowski–Lai criterion (Tibshirani et al., 2001). Then, differences in the pattern of maps observed between conditions in the group-averaged data were statistically tested by comparing the spatial correlation between these template maps from the group-averaged data and each time point of single-subject data from both experimental conditions. For this procedure, referred to as "fitting," each time point of each ERP from each subject was labeled according to the map with which it best correlated spatially (see Brandeis et al., 1995; Murray et al., 2008). The output of fitting is a measure of relative map presence in milliseconds, which indicates the amount of time over a given interval that each map, which was identified in the group-averaged data, best accounted for the response from a given individual subject and condition. Repeated-measures ANOVA was applied with the factors Condition (PWs in French, PWs in German) and Maps to analyze whether map presence is depending on condition.

The present multivariate topographic analyses have the advantage of being reference-independent (Michel et al., 2001, 2004) and insensitive to pure amplitude modulations across conditions as topographies of normalized maps are compared. Therefore, this approach is not biased by a priori hypotheses about electrode location(s) or period of interests (POIs) at which effects might be expected unlike classical analyses of single-electrode average evoked potentials (Tzovara et al., 2012).

#### *Electrical source estimations*

Electrical source estimations were calculated using a distributed linear inverse solution and the local autoregressive average (LAURA) regularization approach<sup>1</sup> (Grave de Peralta et al., 2001, 2004). The results of the above topographic pattern analysis defined the time period over which intracranial sources were estimated and statistically processed. ERPs for each participant and condition (PW in French context versus PW in German context) were first time-averaged over the period showing a significant topographic modulation. Then, intracranial sources were estimated for the resulting one time-sample ERP for each participant and condition and statistically compared at each solution point between the PWs in French context Condition versus PWs in German context condition using paired *t*-tests. The solution space included 3005 nodes, selected from a 6mm × 6mm × 6 mm grid equally distributed within the gray matter of the averaged brain of the Montreal Neurological Institute (MNI; courtesy of Grave de Peralta Menendez and Gonzalez Andino, University Hospital of Geneva, Geneva, Switzerland). In order to control for multiple comparisons, only solutions with a minimal cluster size of 15 consecutive points (*k*E) were retained (see also De Lucia et al., 2010; Knebel and Murray, 2012). Significance threshold was set at *p* < 0.05.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

Mean accuracy (SD) on the whole group of 12 subjects were for Words in French context 98% (7%), words in German context 98.5% (8%), PWs in French context 93% (4%) and PWs in German context 93% (3%). For words in French context, 0% intrusion, 0.23% (2.43%) orthographic, 01.06% (6.47%) phonological, 0.38% (2.34%) intonation and 0.16 % (2.49%) other errors were observed. For PWs in French context, 3.8% (4.09%) intrusion, 1.88% (2.13%) orthographic, 1.03% (1.23%) phonological, 0.48% (0.76%) intonation and 0 % other errors were observed. For words in German context, 0% intrusion, 0.38% (3.24%) orthographic, 0.35% (3.30%) phonological, 0.59% (5.16%) intonation and 0.16 % (2.49%) other errors were observed. For PWs in German context, 3.4% (3.15%) intrusion, 2.8% (1.75%) orthographic, 0.13% (0.37%) phonological, 0.3% (0.47%) intonation, and 0.13 % (0.37%) other errors were observed.

Repeated-measures ANOVA with factors language context (French vs. German) and Stimulus Type (Words vs. PW) was performed to investigate whether accuracy rates differentiate or interact across conditions. This analysis revealed no main effect of language context [*F*(1,11) <sup>=</sup> 0.85, *<sup>p</sup>* <sup>=</sup> 0.378, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.071], a main effect of Stimulus Type [*F*(1,11) <sup>=</sup> 33.09, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.751] and no interaction between language context and Stimulus Type [*F*(1,11) <sup>=</sup> 0.12, *<sup>p</sup>* <sup>=</sup> 0.741, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.010]. Paired *t*-test showed no difference in accuracy rates between PW in French context and PW in German context [*t*(11) <sup>=</sup> 0.61, *<sup>p</sup>* <sup>=</sup> 0.553, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.33).

Repeated-measures MANOVA with language context (French, German) as independent and Language Intrusion Errors, Orthographic Errors and Phonological Errors as dependent variables was performed to investigate whether error types in PW reading differentiate across language contexts. This analysis revealed a significant multivariate effect for language context [*F*(3,11) = 6.28, *<sup>p</sup>* <sup>=</sup> 0.010, Wilk's <sup>=</sup> 0.369, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.631]. *Post hoc* univariate tests showed that the depended variable "Phonological Errors" significantly differentiated across French and German language contexts (French <sup>&</sup>gt; German; *<sup>F</sup>*(1,13) <sup>=</sup> 13.30, *<sup>p</sup>* <sup>=</sup> 0.003, <sup>η</sup><sup>2</sup> <sup>p</sup>= 0.506). No statistical difference was found across language contexts for the dependent variables "Language Intrusion Errors" [*F*(1,13) = 0.10, *<sup>p</sup>* <sup>=</sup> 0.759, <sup>η</sup><sup>2</sup> <sup>p</sup> =0.008] and"Orthographic Errors" [*F*(1,13)= 3.72, *<sup>p</sup>* <sup>=</sup> 0.076, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.223].

Mean RTs (SD) on the whole group of 12 subjects were for words in French context 720 ms (168 ms), Words in German context 706 ms (165 ms), PWs in French context 730 ms (172 ms) and PWs in German context 718 ms (164 ms).

Repeated-measures ANOVA with factors language context (French vs. German) and Stimulus Type (Words vs. PW) was performed to investigate whether production latencies differentiate or interact across conditions. This analysis revealed no main effect of language context [*F*(1,11) <sup>=</sup> 2.59, *<sup>p</sup>* <sup>=</sup> 0.136, <sup>η</sup><sup>2</sup> <sup>p</sup>= 0.191), no main effect of stimulus type [*F*(1,11) <sup>=</sup> 0.02, *<sup>p</sup>* <sup>=</sup> 0.883, <sup>η</sup><sup>2</sup> <sup>p</sup>= 0.002] and no interaction between language context and stimulus type [*F*(1,11) <sup>=</sup> 0.96, *<sup>p</sup>* <sup>=</sup> 0.349, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.080]. Paired *t*-test showed

<sup>1</sup>LAURA inverse solution is a weighted minimum norm method together with the LAURA regularization approach. The LAURA method calculates a current density value at each solution point. The local auto-regressive average regularization approach describes the spatial gradient across neighboring solution points (Grave de Peralta et al., 2004; Michel et al., 2004). Specifically, the strength of the source regresses with distance according to electromagnetic laws (i.e., the square root of the distance).

no difference between PW in French context and PW in German context [*t*(11) <sup>=</sup> 1.58, *<sup>p</sup>* <sup>=</sup> 0.143, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.03].

# **ELECTRICAL NEUROIMAGING RESULTS** *ERP waveforms*

Evoked potential waveforms to the PWs presented in the two language context are depicted in **Figure 2** for seven exemplar electrodes and in **Figure 3A** for all 128 electrodes.

Paired *t*-tests between the ERP to the PWs in the French context versus in the German context revealed an increase in the number of electrodes showing a statistically significant difference over the time interval of 220–360 ms post-stimulus onset [*p* < 0.05, >1 ms, **Figure 3B**].

# *Topographic pattern analysis*

Agglomerative hierarchical clustering was applied on the ERPs to identify the pattern of predominating topographic maps of the electric field at the scalp in the cumulative group-averaged data. The output of the topographic pattern analysis is displayed in **Figure 3C**. The global explained variance of the T-AAHC analysis was 97%. The topographic pattern analysis identified the same sequence of stable topographic maps for group-averaged ERPs from the French context and German context condition, except for

**FIGURE 2 | Exemplar ERP waveforms.** Exemplar group-averaged ERP waveforms (Fz, CPz, Cp5, CP6, P9, P10, Oz) to PW reading in French (violet) and German (green) language context are plotted in microvolts as a function of time. In the middle of the figure, the array of the 128 electrodes with the electrode position of the displayed waveforms is presented.

the 300–360 ms post-stimulus onset time period. Over this period, different maps were observed for the PW in French context versus German context conditions. The reliability of this observation at the group-average level was assessed at the single-subject level using a spatial correlation fitting procedure (see "Material and Methods"). The individual-subject fitting revealed a significant interaction between language context condition and map over the 300–360 ms period [*F*(1,13) = 5.91, *p* = 0.03]. The map "F" characterized more frequently the response to the PW in the French context and the map "G" in the German context condition, indicating the engagement of distinct configurations of intracranial generators in PW reading across language context in this time window.

#### *Electrical source estimations*

In order to localize the effect in the brain space, paired *t*-tests of LAURA distributed source estimations between PWs in French context and PWs German context condition were performed for each of the 3005 solution points for time-averaged ERPs over the POI defined by the topographic pattern analysis (300–360 ms poststimulus). This analysis revealed a significant difference of activation within the left inferior frontal gyrus (German > French; *p* < 0.05; *k*<sup>E</sup> = 15), left superior parietal areas (French > German; *p* < 0.05; *k*<sup>E</sup> = 15) and left anterior cingulum (German > French; *p* < 0.05; *k*<sup>E</sup> = 15; **Figure 3D**).

# **DISCUSSION**

We investigated the spatio-temporal impact of orthographic depth on reading. Identical PWs were presented to highly proficient bilinguals embedded either in a deep orthographic (French) or in a shallow orthographic (German) language context. The lexical context in which the stimuli were presented (80% words and 20% PWs) has been designed to force initial automatic word reading in the pre-activated context and to force PW reading in the corresponding orthographic depth. Our results show that orthographic depth induced by language context indeed impacts brain response to reading physically identical stimuli. The topography of the ERPs to identical PWs differed 300–360 ms post-stimulus onset when the PWs were read in different orthographic depth context, indicating distinct brain networks engaged in reading during this time window. Analysis of electrical source estimation over the period of topographic modulation showed a differential engagement in left inferior-frontal, left superior parietal and left anterior cingular areas in the deep versus shallow condition.

# **TIMING OF THE EFFECT OF ORTHOGRAPHIC DEPTH**

The topography of the ERPs to identical PWs differed around 330 ms post-stimulus onset when the PWs were read in different orthographic depth context. Because distinct topographies necessarily follow from distinct configuration of the underlying brain network (e.g., Lehmann and Skrandies, 1980), our result indicates that the language context modulates the brain networks involved in reading. Because subject-relatedfactors (Proficiency,Age of Acquisition, Immersion) were controlled across language context and only the orthographic depth of reading physically identical PWs was modulated, the topographic differences most likely reflect an

significant (*p* < 0.05) topographic differences between the conditions is indicated in green. **(B)** Time-wise electrode-wise *t*-tests. Results of the time-wise paired *t*-tests at each of the 128 scalp electrodes from the group-averaged ERP waveforms are shown (*p* < 0.05). **(C)** Topographic pattern analysis. *Top*: Topographic pattern analyses identified 12 time periods of stable electric field topography across the collective 500 ms post-stimulus period form the group-averaged ERPs. Topographies (i.e., maps) are shown with the nasion upward and left scalp leftward. The dipole represents the positive and negative maximum of the electric field topography measured at the scalp. Two distinct maps were identified for one of these time periods (300–360 ms) for PWs in the French context (map "F") versus German context (map "G") conditions. *Bottom:* The reliability of this observation at the group-averaged level was assessed at and condition (see "Material and Methods"). Over the 300–360 ms post-stimulus period, the map "F" characterized more frequently the response to the PWs in the French context and the map "G" in the German context condition. There was a significant interaction between language context condition and map presence over the 300–360 ms period [*F*(1,13) = 5.91, *p* = 0.03]. Error bars indicate SEM. **(D)** Distributed LAURA source estimations. Paired *t*-tests were performed for each of the 3005 solution points for time-averaged ERPs over the period of topographic modulation (300–360 ms after stimulus onset), revealing differential (*p* < 0.05) activation of the left inferior frontal gyrus (German > French), left superior parietal gyrus (French > German) and left anterior cinguar cortex (German > French) when reading the PWs in the French versus the German language context.

adaptation of the reading processes to the orthographic depth of the language being read.

The 330 ms latency of the topographic modulation has been associated to processing stages involved in grapheme to phoneme conversion in previous studies (Bentin et al., 1999; Huang et al., 2004; Proverbio et al., 2004; Simon et al., 2004, 2006; Grainger et al., 2006; Hauk et al., 2006; Ashby et al., 2009; Carreiras et al., 2009). This period precedes the semantic processing previously found to take place around 450 ms (Bentin et al., 1999; Simon et al., 2006) and is subsequent to letter identification occurring around 200 ms (Maurer et al., 2005; Brem et al., 2006; Martin et al., 2006; Appelbaum et al., 2009; Lin et al., 2011).

The dual route cascade model posits a lexical and a non-lexical route among which graphemes and phonemes are being mapped (Coltheart et al., 2001).On the non-lexical route, each grapheme is sequentially mapped to its corresponding phoneme. The nonlexical route does not rely on lexico-semantic representations and is thus preferentially recruited in regular, non- and pseudowords (e.g., Jobard et al., 2003). In contrast, on the lexical route, phonemes are retrieved from memory, i.e., from orthographic and phonological lexical representations. The lexical route is efficient for encoding words, especially irregular words, in which phonological codes do not follow simple grapheme-phoneme rules (e.g., Jobard et al., 2003). Thus, whereas PWs predominately follow the non-lexical route, words may follow either of these routes. The orthographic depth hypothesis (Katz and Feldman, 1983; Katz and Frost, 1992) assumes that for words the predominant engagement of each route depends on the orthographic regularity of a language. In transparent orthographies, non-lexical pathways are preferentially activated to map graphemes and phonemes. In contrast, the sequential mapping on the non-lexical pathways does not fit grapheme to phoneme mapping in languages with irregular orthographies. Instead, irregular languages favor lexical pathways and phonemes are retrieved from memory structures. With regard to this framework, we propose that the topographic effects reflect a modulation of the engagement of the routine non-lexical route in PW reading across language context. Furthermore, we assume that the routine non-lexical route in PW reading was likely modulated by the variable manipulated in the present design, namely the orthographic depth of language contexts (i.e., word reading). We suggest that, when reading words across French and German language contexts, the modulation of orthographic depth in French versus German words may lead to different engagement of one or the other reading route. This modulation of reading routes across language contexts in word reading due to differences in orthographic depth might impact the routine non-lexical pathways recruited in PW processing. Thus, the topographic modulation found in PW reading might be explained by the fact that reading (German) words in a shallow context activates predominantly non-lexical pathways, which reinforce the non-lexical processing routinely recruited in PW reading in the shallow versus deep context. In contrast, reading (French) words in the deep context may activate predominantly lexical pathways, which reduce the engagement of the non-lexical pathways routinely recruited in PW reading in the deep versus the shallow context. Thus, the topographic modulation at 330 ms might reflect a modulation of non-lexical processing in PW reading due to a modulation of

reading routes by the orthographic depth of language contexts (**Figure 4**).

Alternatively, one might consider that the modulation of reading route selection by orthographic depth was not restricted to word reading, but directly impacted PW processing. Thus, PW reading might have recruited non-lexical pathways in the shallow and lexical pathways in the deep context. However, the use of low-lexical target stimuli (PWs) and the absence of a differential engagement of lexico-semantic networks across conditions in the results of electrical source estimation over the period of topographic modulation (see "Location of the effect of orthographic depth") speak against a direct and in favor of an indirect modulation of reading route by orthographic depth. Nevertheless, further research is needed to unravel the precise mechanism underlying the orthographic-related reading route modulation.

Given the fact that the 300–350 time window has also been associated to (early) semantic processing in reading (Proverbio et al., 2008; Yum et al., 2011), an alternative account of our results would thus be that lexical pathways, when reading in the deep orthographic context, induced stronger attempt of semantic processing of PWs (Holcomb et al., 2002; Deacon et al., 2004; Carreiras et al., 2007; Vergara-Martinez et al., 2013) compared to reading in the shallow orthographic context. This assumption is supported by the relative late latency of the effect found in the present study, which may rather reflect a modulation of semantic access than grapheme to phoneme conversion across language context. Indeed, studies on phonological processing have suggested latencies around 200 ms to be critically engaged in grapheme to phoneme conversion (Sereno et al., 1998; Proverbio and Zani, 2003; Wheat et al., 2010). In contrast, the few studies focusing on a modulation of orthographic depth consistently reported latencies around 300 ms to be linked to grapheme to phoneme conversion (Simon et al., 2006; Bar-Kochva and Breznitz, 2012). Thus, with the present design enabling isolating the orthographic depth, later latencies could have been expected. In addition, the high number of significant electrodes differentiating across language contexts found in the time-wise electro-wise *t*-test may indicate that the impact of orthographic depth on PW reading was subliminally initiated earlier (around 220 ms; **Figure 3B**). However, several reasons speak against a modulation of semantic processing by orthographic depth independent of the timing of the effect. First, semantic effects were likely reduced by the utilization of PWs (Friedrich et al., 2008). Second, the task did not strengthen semantic processing as participants had only to read aloud the stimuli. Third, neighborhood size, indicating how many (real) words can be created from a PW by changing one letter without changing letter position, was low and balanced across language context. Fourth, a potential semantic meaning due to a resemblance of a PW to a real word should lead to prolonged but not shortened semantic processing and thus unlikely be measured at an early latency (<400 ms) but rather at a late latency compared to words (>450 ms; Coch and Mitra, 2010). Finally, our results of electrical source estimation underlying the topographic effects confirm the absence of an engagement of semantic networks which have anatomically been linked to inferior and middle temporal and inferior parietal areas (Price, 2000).

Our results thus suggest that distinct brain networks support PW reading 300–360 ms post-stimulus onset when they were read in different orthographic depth context. We propose that these distinct brain networks reflect a modulation of the nonlexical grapheme to phoneme conversion routinely engaged in PW reading by the activation of different reading routes in word reading across language contexts. More precisely, reading (German) words in a shallow context may preferentially activate non-lexical pathways, which strengthen the engagement of the non-lexical pathways routinely recruited in PW reading in the shallow versus the deep context. In contrast, reading (French) words in a deep orthographic context may preferentially recruit lexical pathways, which reduce the reliance on routinely recruited non-lexical pathways in PW reading in the deep versus the shallow context. Thus, the topographic modulation in PW reading might indirectly reflect the engagement of different reading routes across the orthographic depth of language contexts.

#### **LOCATION OF THE EFFECT OF ORTHOGRAPHIC DEPTH**

Statistical analyses of electrical source estimations over the period of topographic modulation support the hypothesis of orthographic-related reading route modulation by showing differential engagement in the deep versus shallow conditions in left inferior-frontal, left superior parietal and left anterior cingular areas.

The left inferior frontal region (part of Broca's area complex) was activated stronger when reading in the German than French context. The inferior frontal activation might indicate enhanced engagement of phonological processing when reading PWs in the shallow orthography in contrast to reading in the deep orthography. Previous findings showed that inferior frontal regions are involved in grapheme to phoneme conversion (Fiebach et al., 2002; Heim et al., 2005; Rodriguez-Fornells et al., 2006; Wheat et al., 2010) and in enhanced short term memory capacities of non-lexical pathways compared to lexical pathways (Jobard et al., 2003; Nixon et al., 2004).

However, our findings contrast with the results by Paulesu et al. (2000) for an enhanced engagement of Broca's area in the deep (English) versus the shallow (Italian) language, suggesting an enhanced involvement of inferior frontal regions in lexical than non-lexical processing. The contradictory findings on the engagement of inferior frontal regions in reading route processing may originate from the functional distinction of Broca's subunits. While the anterior part [Brodmann area (BA) 45] has been associated to lexical processing, the posterior part (BA 44) has been linked to play a crucial role in grapheme to phoneme conversion (Fiebach et al., 2002; Heim et al., 2005). Here, the low spatial resolution of inverse solutions restricts an attribution of the source estimation to the anterior or posterior region within the inferior frontal area. However, we consider our results to most likely

reflect enhanced engagement of phonological processing in the posterior inferior frontal lobe (BA 44) when reading PWs in the shallow orthography in contrast to reading in the deep orthography. The stronger engagement of phonological processes when reading in the German than the French context may indicate that the bilingual reader relies more strongly on the non-lexical route, because unlike in the French context, the non-lexical route is strengthened by both, the type of stimuli (PW) and orthographic depth of language context (shallow). Thus, the stronger activation of phonological inferior frontal regions when reading in the shallow than the deep orthographic context may indicate enhanced engagement of phonological non-lexical pathways.

Additionally, inferior frontal regions have been associated to the motor control of speech articulators (Wheat et al., 2010). One alternative explanation of our findings may thus be that language context modulated motor planning. However, previous literature indicates that motor preparation, i.e., phonetic encoding, in reading starts later, namely after (approximately) 350 ms (Moller et al., 2007; Laganaro et al., 2013). Thus, the differential engagement of inferior frontal regions across orthographic depth seems more likely to reflect a modulation of phonological than motor planning processing.

A more pronounced engagement of non-lexical networks could have been expected, as numerous reading studies have demonstrated broad networks to critically underlie grapho-phonological processing covering temporal, parietal and frontal brain regions (Jobard et al., 2003; Ischebeck et al., 2004). The absence of an anatomically broadly distributed difference in activation in nonlexical networks across language contexts might be related to the fact that the present paradigm contrasted PW versus PW reading. In contrast, most studies investigating anatomical correlates of lexical and non-lexical reading routes compared words and PWs (Jobard et al., 2003). In word versus PW contrasts, the extensive differences in network activation found may be related to differences between the stimuli (lexicality, familiarity and/or physical form). In contrast, the present PW versus PW design may reveal networks related specifically to the variable manipulated, i.e., the orthographic depth of PW reading. Consequently, spatially restricted networks might be expected to show differential activity compared to those found in classical studies on reading routes contrasting word versus PW.

The differential engagement of parietal–cingular areas may follow from a modulation of attentional demands across language context.

The activity within superior parietal areas (BA 7) was stronger when reading in the French than German context. In reading, superior parietal areas have been advanced to contribute to visual attention, which could be involved in modifying the reading strategy (Rosazza et al., 2009; Lobier et al., 2012). This interpretation is in line with our hypothesis assuming a modulation of the nonlexical route in PW reading by the stronger recruitment of lexical pathways in the deep than the shallow language context. According to this hypothesis, the stronger engagement of parietal areas might reflect enhanced visual attention related to the recruitment of less routine non-lexical pathways strengthened by PW reading in the deep versus shallow orthographic context. This conclusion is supported by the results of reading error analysis, which revealed

significantly more phonological errors when reading the PWs in the deep (French) versus shallow (German) language context (among comparable overall accuracy rates). Thus, the recruitment of less routine non-lexical pathways may have increased phonological inaccuracy when reading the PWs in the deep versus shallow context.

However, alternative explanations could account for the effect found, as parietal areas have been put forward to be involved in a variety of cognitive tasks, including eye movements (especially IPS; Chen et al., 2013; Zaretskaya et al., 2013) spatial orientation (Cabeza and Nyberg, 2000) and multimodal integration (Macaluso et al., 2003). Even though many variables were controlled in the present design (e.g., short stimuli to avoid eye movements) or not required to perform the task (e.g., multimodal integration), the lack of an a priori hypothesis formulated on the parietal engagement and the complexity of functions attributed to this region limit the credibility of our conclusion.

The anterior cingular activity (parts of BA 24, BA 32, and BA 33) was stronger for reading in the German than French context. The cingulate cortex has been linked to inhibitory control and error detection (Garavan et al., 2002; Ridderinkhof et al., 2004), but to our knowledge, its role in reading is currently unknown. In addition to the lack of literature, the lack of an a priori hypothesis formulated on the engagement of cingular regions in the present study prevents us from drawing reliable conclusions. Further research is required to elucidate its role in reading processing.

Cingular activities have been found in proficiency-related control processes (Abutalebi, 2008; Magezi et al., 2012). In the current study, the modulations of activity in the anterior cingulum could be due to a higher proficiency in French than German. However, the experimental setting consisted of separated language contexts, which prevented ongoing language selection, and in turn minimized proficiency-related effects (Abutalebi, 2008). Behavioral results further showed that production latencies did not differ across language context, indicating balanced attentional load across languages. Finally, the scores of the computer-based as well as the self-evaluated proficiency assessments did not differ across languages. Together, the differential engagement of cingular areas unlikely reflects proficiency-related control processes. Instead, we consider these effects to be directly related to the modulation of reading routes by orthographic depth.

The temporal dynamic of the responses to PWs in a German versus French context also support our hypothesis for a modulation of reading route by orthographic depth. Given the timing of the effect between early (letter identification) and later (semantic) processing and our design enabling isolating the effect of orthographic depth, the topographic difference are likely to reflect different networks engaged in grapheme to phoneme conversion across language context 330 ms post-stimulus onset. We propose that the engagement of the non-lexical reading route routinely involved in PW reading is modulated by the activation of distinct reading routes in word reading across language context. Reading (German) words in a shallow context may activate non-lexical processing, which reinforces the involvement of the non-lexical pathways routinely recruited in PW reading, reflected in a stronger engagement of frontal phonological areas in the shallow versus the deep orthographic context. In contrast, reading (French) words in a deep orthographic context may weaken the non-lexical pathways routinely recruited in PW reading. The recruitment of less routine non-lexical pathways in PW reading might be reflected in a stronger engagement of visuo-attentional parietal areas in the deep versus shallow orthographic context.

Since in the present paradigm, many (real) words were used to create language context, the additional inclusion of words into the analyses may have helped to disentangle the nature of the effects found. However, we think that the joint analysis of words and PWs would unlikely help clarifying the interpretation of the present results because of the following reason: To compare words and PWs (e.g., in terms of pathways engaged), an interaction would be needed between the factors stimulus type (words, PWs) and orthographic depth (shallow, deep; Nieuwenhuis et al., 2011). However, when designing the experiment, the words were not selected to be included in the analysis, but to induce strong language context and to ensure identical performances of natural bilingual reading across languages. As a result, there are important differences in physical proprieties, semantic categories, letter frequency distribution or familiarity of the word stimuli across languages. These differences cannot be controlled a posteriori [or for some of them even a priori (familiarity)]. The effect of the confounding factors in the word stimuli would impact the results of a 2 × 2 design and could not be disentangled. Thus the result of a 2 × 2 analyses of our data could not be interpreted.

Several limitations of the current study constrain the interpretability of our results. First, investigating the language system in bilinguals might be less straightforward compared to investigating monolinguals, due to the two languages cohabiting in the brain. This complexity is majorly linked to switching between languages and inhibition of one language (Golestani, 2014). The results of our study, investigating low-level pre-semantic processing, should thus unlikely be affected by higher control strategies related to language switching or selection/inhibition. Moreover, the task was performed in separated language context sessions, further minimizing cognitive control strategies (Abutalebi, 2008). Finally, our bilingual within-subject design minimizes socio-cultural effects and other confounds induced by between-group comparison differences in brain activity and increases the statistical sensitivity of our analyses as each subject is compared to itself.

Second, language skills are probably never perfectly matched across languages, even if statistically comparable in our group. Marginal effects can be argued with regard to the language used to perform mental arithmetic's, the first language spoken by the mother and the language preferred to watch TV. However, we consider these differences to have unlikely impacted reading performance because of the following reasons: the variables showing differences across languages are related to oral language production, in contrast, variables directly linked to the task, i.e., written language skills (reading books, school, computer-based reading evaluation) showed no differences across languages. In addition, behavioral results (equal RTs/accuracy across languages) speak in favor of balanced proficiency and potential effects of proficiency are minimized by the two separated language context session (Abutalebi, 2008). Finally, 22 *t*-tests were performed to compare bilingualism variables across languages. Multiple hypothesis testing enhances the risk of false positive findings (Miller, 1966).

Consequently, a correction for multiple comparisons should be applied to counteract false positive findings. In **Table 1**, significance thresholds are depicted as uncorrected, because we wanted to be as conservative as possible. When applying a correction for multiple comparisons [Bonferroni (Dunn, 1961) or Holm-Bonferroni (Holm, 1979)], none of the variables tested reaches significance level.

Third, the use of PWs as target stimuli might have enhanced attentional demands and the differences found might reflect controlled instead of automatic processing. Indeed, response accuracy was lower during PW than word reading, indicating that PW reading might have enhanced cognitive control. However, equal RTs across words and PWs suggest that the inaccurate responses occurred pre-attentively during "automatic" reading. Additionally, equal RTs across stimulus type reflect that the "word-likeliness" of the PWs (unlike letter strings or non-words) and the strong language context (generated by adding four times more words than PWs) possibly facilitated the task. More importantly, the task was the same across conditions, thus control strategies should unlikely explain the results. Further, equal RTs and accuracy of PW reading across conditions support a comparable engagement of cognitive control processes, which should thus be cancelled out in the analysis. Finally, the effects found around 300 ms are unlikely to reflect higher processing mechanisms such as cognitive control strategies (Chouiter et al., 2013). Nevertheless, it is important to note that the use of PWs as target stimuli likely reinforced assembled/nonlexical reading in both languages and may thus not reflect natural everyday reading, especially in the French context.

Fourth, the low spatial resolution of EEG inverse solution limits the interpretability of the spatial aspects of our data. However, the high-density EEG montage (128 channels) enables that the localization accuracy with LAURA is in the order of the grid size, i.e., about 0.6 cm<sup>3</sup> (Michel et al., 2004). In addition, these limitations were partially remedied by applying statistical parametric mapping analyses to the source estimation (Michel et al., 2004). Even when the estimated activity in brain regions is of unrealistic size, statistical analysis can reveal whether differences between experimental conditions are reliable. Finally, a conservative statistical approach was applied in order to interpret only the most pronounced effects. Nevertheless, our labeling of areas should be interpreted with caution and with respecting these limitations.

Fifth, a further limitation of the study is the small sample size (*n* = 14) which is explained by the application of rigid inclusion criteria in order to have an optimally balanced group of native French-German bilinguals. Small sample sizes usually have low statistical power, which enhances the risk of false negative and false positive findings (Button et al., 2013). To estimate the statistical power of our study, a compromised *post hoc* power analysis was performed (Erdfelder, 1984) using G∗Power Software (Faul et al., 2009) which resulted, assuming a medium effect size (Cohen's *d* = 0 0.5), an alpha level of 0.05, a ratio q = beta/alpha = 1, and a sample size of 14 matched pairs, in a power (1- beta error) of 75%, which can be labeled as medium to large power size.

Sixth, due to differences in brain representations of language processing between bi- and monolinguals (Hernandez et al., 2000, 2005; Mechelli et al., 2004; Rodriguez-Fornells et al., 2005, 2006; Kovelman et al., 2008) a generalization of our results obtained

by investigating highly proficient bilinguals to non-native bi- or monolinguals is limited. In addition, only female subjects participated in the study, further limiting a generalization to male populations. It would be interesting for future research to investigate the manipulation of reading route by orthographic depth in male and mixed populations.

Finally, the dual route cascade model, in its original form, may be too rigid as a template to project our results. Instead, our results should be discussed in terms of a parallel engagement of both routes, but one may be predominantly activated compared to the other depending on the orthographic regularity of the language.

#### **CONCLUSION**

The present study reveals insights into the neural underpinnings of orthographic regularity processing. Our findings complement current literature on reading processing and support the orthographic depth hypothesis (Katz and Feldman, 1983), by showing that not only the lexicality/familiarity of a stimulus, but also its orthographic regularity may modulate the engagement of reading routes.

#### **ACKNOWLEDGMENTS**

This work was supported by a grant from the Swiss National Science Foundation to Jean-Marie Annoni (No. 325130\_138497). We would like to thank Michaël Mouthon for technical assistance with EEG recordings. Cartool software (http://sites.google.com/site/fbmlab/cartool) has been programmed by Denis Brunet, from the Functional Brain Mapping Laboratory, Geneva, Switzerland, and supported by the Center for Biomedical Imaging (CIBM) of Geneva and Lausanne.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 June 2013; accepted: 03 February 2014; published online: 20 February 2014.*

*Citation: Buetler KA, de León Rodríguez D, Laganaro M, Müri R, Spierer L and Annoni J-M (2014) Language context modulates reading route: an electrical neuroimaging study. Front. Hum. Neurosci. 8:83. doi: 10.3389/fnhum.2014.00083*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Buetler, de León Rodríguez, Laganaro, Müri, Spierer and Annoni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neural networks underlying contributions from semantics in reading aloud

# *Olga Boukrina and William W. Graves \**

*Department of Psychology, Rutgers, The State University of New Jersey, Newark, NJ, USA*

#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Jo S. H. Taylor, University of Cambridge, UK Fiona M. Richardson, Anglia Ruskin University, UK*

#### *\*Correspondence:*

*William W. Graves, Department of Psychology, Rutgers, The State University of New Jersey, Smith Hall, Room 337, 101 Warren Street, Newark, NJ 07102, USA e-mail: william.graves@rutgers.edu* Reading is an essential part of contemporary society, yet much is still unknown about the physiological underpinnings of its information processing components. Two influential cognitive models of reading, the connectionist and dual-route cascaded models, offer very different accounts, yet evidence for one or the other remains equivocal. These models differ in several ways, including the role of semantics (word meaning) in mapping spelling to sound. We used a new effective connectivity algorithm, IMaGES, to provide a network-level perspective on these network-level models. Left hemisphere regions of interest were defined based on main effects in functional magnetic resonance imaging and included two regions linked with semantic processing—angular gyrus (AG) and inferior temporal sulcus (ITS)—and two regions linked with phonological processing—posterior superior temporal gyrus (pSTG) and posterior middle temporal gyrus (pMTG). Participants read aloud words of high or low spelling-sound consistency, word frequency, and imageability. Only the connectionist model predicted increased contributions from semantic areas with those computing phonology for low-consistency words. Effective connectivity analyses revealed that areas supporting semantic processing (e.g., the ITS) interacted with phonological areas (e.g., the pSTG), with the pattern changing as a function of word properties. Connectivity from semantic to phonological areas emerged for high- compared to low-imageability words, and a similar pattern emerged for low-consistency words, though only under certain conditions. Analyses of individual differences also showed that variation in the strength of modulation of ITS by AG was associated with reading aloud performance. Overall, these results suggest that connections with semantic processing areas are not only associated with reading aloud, but that these connections are also associated with optimal reading performance.

#### **Keywords: semantics, effective connectivity, reading, fMRI, phonology, orthography**

# **INTRODUCTION**

The ability to process written language is fundamental to our capacity to encode and transmit the wealth of human knowledge. In contemporary society where text is ubiquitous, reading deficits represent a significant handicap. Yet, despite decades of research into the cognitive and neural mechanisms of reading, several basic questions remain unresolved, such as whether there are multiple routes to reading or a single basic system and whether word meaning plays a role in reading aloud.

Cognitive theories of reading differ with respect to the role that access to meaning (semantics) is thought to play during conversion of spelling into sound. According to singleprocess connectionist models (e.g., Seidenberg and McClelland, 1989; Plaut et al., 1996), orthography-phonology (orth-phon) mapping develops according to the frequency of exposure to spelling-sound correspondence patterns. This process is mediated by semantics, with the amount of semantic input depending on the nature of the word. Words with consistent spellingsound correspondence patterns (e.g., UST in DUST) can be pronounced without strong activation of semantic (sem) content, whereas words with inconsistent spelling-sound correspondence patterns (e.g., OST in HOST and COST) rely on access to meaning to a greater degree. In this model, semantics is used to reduce interference between inconsistent features of the input.

In contrast to connectionist models, the dual-route cascaded model (DRC; e.g., Coltheart et al., 1993, 2001) poses that semantic properties do not affect the early stages of word processing. Written words are processed in parallel along two-routes: a *direct* look-up in the orth and phon lexicons, or an *indirect* orth-phon conversion. The direct lexical route can be further subdivided into the lexical non-semantic and lexical-semantic pathways, but semantics has not been fully implemented in this model. Although Coltheart et al. (1999) described an implementation of a 3 word lexicon to simulate the Stroop effect, to our knowledge this work has not advanced further. Moreover, any semantic activation in the direct pathway is assumed to be "too slow to influence skilled word pronunciation" (Plaut et al., 1996, p. 60). According to the DRC model, both the direct and the indirect route are engaged during reading aloud; however, for irregular words like YACHT, only the direct route will produce the correct pronunciation. Regular words like SHIP, on the other hand, rely to a considerable degree on processing in the indirect route, where high spelling-sound regularity of such words allows mapping from orth to phon using a set of correspondence rules (Coltheart et al., 1993, 2001).

The two models propose different processing mechanisms for reading aloud, which we infer should rely on different neural substrates. This allows us to test them using neuroimaging data. A recent meta-analysis of neuroimaging studies of reading suggested that localization evidence can help constrain these cognitive models by offering information about functional overlap or separation of lexical processes in the brain (Taylor et al., 2013). The Taylor et al. analysis represents an important step in bringing functional neuroimaging data to bear on models of reading in a systematic way. The authors stated that their primary goal was not to adjudicate between connectionist and dual-route models of reading, and there were at least two factors that would have hindered their ability to do so. The first was that in order to have sufficient data for an effective meta analysis, they included studies using a variety of reading-related tasks, not only reading aloud but also silent reading, lexical decision, visual feature detection, etc. This variability produced uncertainty regarding the functions ascribed to specific brain regions. For example, the meta-analysis showed that the words *>* pseudowords contrast was associated with activations in the left anterior fusiform, middle temporal and angular gyri, the putative locations of the orthographic and phonological lexicons proposed as part of the dual-route model (Coltheart et al., 2001). Yet given the overlap of these activations with the Binder et al. (2009) meta-analysis, which applied very strict selection criteria for studies of semantic processing, these brain activations may instead correspond to the computation of word meaning. Such a result would be consistent with the connectionist model. Here we focus on reading aloud because, as Taylor et al. (2013) acknowledge, the model predictions diverge in terms of whether or not they propose a role for semantics in reading aloud. Such a focus would not have been possible in a meta-analysis given the limited number of studies using reading aloud in functional neuroimaging.

In accordance with the theoretical assumptions of the singleprocess model, in this study we expected that variations in wordfrequency, spelling-sound consistency, and imageability would produce different patterns of effective connectivity among the 5 regions of interest (ROIs) shown in **Figure 1** and described below. We were particularly interested in the division of labor between the phonological and the semantic pathways and the predictions that stem from the relative contribution of each pathway to reading aloud. If, for example, semantic access plays a role in reading aloud, then different patterns of network connectivity would be expected when reading words that differ in how much they engage the semantic system through a factor such as imageability (e.g., Strain et al., 1995). Imageability ratings reflect the relative ease with which a word evokes an image. Highly imageable words have richer, more easily computed semantic representations than less imageable words (Paivio, 1991; Schwanenflugel, 1991; Plaut and Shallice, 1993). Such words would be expected to rely to a greater degree on an interaction between regions involved in semantic processing and regions linked with orthographic and phonological processing. Similarly, words with low spelling-sound consistency may require additional involvement of the semantic system to help map their orthography to the correct pronunciation. By associating orth, phon and sem processes with different parts of the reading network, we can make predictions about the connectivity patterns that will be obtained under the assumptions of competing cognitive models.

The connectionist dynamics of reading aloud have been described in detail by Plaut et al. (1996). During typical reading the semantic pathway provides additional input to the phoneme units pushing them to their correct levels of activation. This additional input from semantics alleviates the pressure for the phonological pathway to master all the pronunciations. When the model is trained extensively, and the semantic pathway gains competence, the phonological pathway becomes increasingly specialized for reading consistent words, at the expense of inconsistent ones. Unlike in the dual-route model, however, even with extensive training the phonological route of the connectionist model can read inconsistent words, particularly those of high-frequency. Patient work has shown that when the semantic pathway is impaired the severity of reading disability is correlated with the amount of semantic deterioration (e.g., Patterson and Hodges, 1992). Similarly, as the amount of semantic input to the model's phoneme units is reduced (by degrading the semantic units), a gradual pattern of reading impairments emerges. Performance on the low-frequency inconsistent words is affected first, followed

**FIGURE 1 | Volume rendering of the functional ROIs in the lateral view of the atlas brain.** For coordinates see **Table 1**.

**Table 1 | Volumes (in mm3) and center of mass coordinates (in MNI space) for ROIs shown in Figure 1.**


*pSTG, posterior superior temporal gyrus; AG, angular gyrus; ITS, inferior temporal sulcus; pMTG, posterior middle temporal gyrus; pFG, posterior fusiform gyrus.*

by high-frequency inconsistent words. If semantic input is completely eliminated, performance on the low-frequency consistent words is also affected (Figure 25, p. 98 of Plaut et al., 1996). These findings support the notion that normal reading is accomplished via a division of labor between the phonological and the semantic pathways and that neither of them is completely competent in isolation. Following this logic, we expected that the amount of semantic input would vary depending on the nature of the words.

A detailed connectionist investigation of the role of semantic information in single-word reading was performed by Harm and Seidenberg (2004). They studied reading for meaning, where orthography mapped to semantics either directly or through phonology. Although they explicitly acknowledge that the network dynamics for reading aloud will be different than reading for meaning, consistency effects were greater for the orth→phon→sem pathway than the orth→sem pathway (Harm and Seidenberg, 2004, p. 691). That is, consistent words were read more accurately than inconsistent words when the pathway involved phonological mapping, whereas inconsistent words benefitted from the essentially arbitrary orth→sem mapping. Because reading aloud involves mapping to phonology rather than stopping at semantics, we expect connections from semantic (ITS, AG) to phonological regions (pSTG, pMTG) to emerge for words that do not have consistent orth→phon mappings<sup>1</sup> .

Using a previously collected fMRI dataset (Graves et al., 2010), we examined patterns of effective connectivity for words that differed in imageability, consistency, and frequency. Functional ROIs were defined in regions that Graves et al. found to be sensitive to lexical characteristics such as imageability [angular gyrus (AG), orth-phon consistency inferior temporal sulcus (ITS), posterior fusiform gyrus (pFG)], bigram frequency [posterior middle temporal gyrus (MTG), posterior superior temporal gyrus (pSTG)], and word frequency (pFG, AG). Other neuroimaging studies have also shown that a similar set of regions reliably activates during reading. For example, the orthographic composition of a visually presented word is thought to be analyzed along the length of the left FG, with increasingly complex elements processed in more anterior parts of the FG (Vinckier et al., 2007). The middle portion of the FG is often referred to as the Visual Word Form Area (VWFA), due to its association with orthographic processing (e.g., Cohen et al., 2000, 2002; Dehaene et al., 2004; Binder et al., 2006). Neural computation of phonology is linked with activity in peri-Sylvian regions such as pSTG (e.g., Hickok and Poeppel, 2004; Graves et al., 2007, 2008; Price, 2012) and orth-phon conversion is associated with activity in pMTG (Jobard et al., 2003; Sandak et al., 2004), as well as other regions, such as SMG (Jobard et al., 2003; Sandak et al., 2004; Katz et al., 2005; Vigneau et al., 2006; Cattinelli et al., 2013) and opercular IFG (Pugh et al., 1996; Fiez and Petersen, 1998; Jobard et al., 2003; Hickok and Poeppel, 2004; Sandak et al., 2004; Katz et al., 2005). Finally, among the areas involved in processing semantics are the ITS (Binder et al., 2009), AG (Binder et al., 1997, 2005, 2009; Price and Mechelli, 2005; Price, 2012) and triangular IFG (Poldrack et al., 1999; Bookheimer, 2002; Jobard et al., 2003; Binder et al., 2009).

The 5 ROIs considered here were selected because they showed reliable activations across participants in our previous univariate analysis of this dataset (Graves et al., 2010), and had clear functional interpretations (see **Figure 1**) on which we also based a structural connectivity analysis (Graves et al., submitted). Under the connectionist account, effective connectivity graphs should exhibit increased engagement of the semantic system during reading of words with high imageability and low consistency. We also expected to find a frequency-by-consistency interaction (e.g., Paap and Noel, 1991), such that the differential contribution of semantics to reading low- and high-consistency words would be greatest when these words are also of low frequency. This should be most evident in areas previously implicated in semantic processing such as the ITS (Binder et al., 2009; Graves et al., 2010) and AG (Binder et al., 1997, 2005, 2009; Price and Mechelli, 2005; Price, 2012). Connectionist models also posit that phonology is assembled rather than accessed from a wholeform lexicon. Therefore, during reading aloud regions involved in orth→phon conversion, such as the pMTG (Jobard et al., 2003; Sandak et al., 2004; Graves et al., 2010), were expected to be co-activated with regions involved in phonological processing (pSTG; e.g., Hickok and Poeppel, 2004, 2007; Graves et al., 2007, 2008; Price, 2012). In cases where the mapping between orthography and phonology was highly consistent, particularly for stimulus words of low imageability, the engagement of semantic processing was expected to be minimal. In this case we expected to see a direct link from occipitotemporal orthographic regions to posterior MTG and STG for engagement of orthographyphonology conversion and phonological recoding prior to speech.

In addition to a differential contribution of semantics to processing high and low levels of word frequency and consistency, we expected that semantic processes may be differentially utilized by individual readers. Recently we showed that the effect of semantic variables on brain activity varied considerably across individuals and was correlated with the volume of the neural pathways that connect posterior temporal areas with inferior temporal and parietal regions, involved in access to meaning (Graves et al., submitted). Although the association between use of semantics in reading and structural connectivity among proficient readers was novel, previous studies have found neural associations with individual differences in other aspects of reading (e.g., Bolger et al., 2008; Seghier et al., 2008; Levy et al., 2009; Seghier and Price,

<sup>1</sup>One of the reviewers pointed out that because Harm and Seidenberg (2004) showed essentially equivalent performance for inconsistent and consistent words along the orth→sem pathway, and most reading is presumably for meaning rather than reading aloud, the prediction that inconsistent words would engage semantics more than consistent words for reading aloud may not be obvious. However, as Harm and Seidenberg acknowledge, their focus on reading for meaning leaves open the possibility that in reading aloud, inconsistent words may benefit from semantics more than consistent words. Indeed, even if it also turned out to be the case in reading aloud that the orth→sem pathway were equivalently useful or activated by consistent and inconsistent words, this is only half of the semantic pathway in this task. For semantics to contribute to reading aloud, there must also be a mapping from semantics to phonology. The dynamics of the sem→phon path were not addressed in Harm and Seidenberg (2004), and it may be this part of the pathway that contributes differently for consistent and inconsistent words.

2009). Plaut et al. (1996) also argue that premorbid differences in the reliance on semantic support, stemming from differences in the nature of reading instruction, the quality of phonological representations, relative experience in reading aloud vs. silently, as well as general computational resources, may explain the differences in the level of reading impairment in patients who have comparable amounts of brain damage. To further investigate patterns of effective connectivity in the functional neural data, we tested for individual differences in these patterns with measures of the behavioral influence of three relevant stimulus properties: spelling-sound consistency imageability, and word frequency.

Recent advances in cognitive neuroscience have begun to make distributed, network-level analyses of functional brain imaging data tractable and reliable (Hanson and Halchenko, 2008; Poldrack et al., 2009; Ramsey et al., 2010, 2011). For example, the interactivity within neural systems has been examined using effective connectivity analyses (McIntosh and Gonzalez-Lima, 1994; Friston, 2003; Ramsey et al., 2010, 2011; Schuyler et al., 2010), comprised of algorithms that quantify the influence one brain region exerts on another and, consequently, can uncover the underlying causal structure of network activity. Such an approach should be capable of uncovering the underlying causal structure of network activity (Friston and Büchel, 2003). These analyses, combined with previous localization evidence, offer novel ways of investigating the neural dynamics of reading. They can help adjudicate between competing views by uncovering the patterns of neural interactions between areas supporting semantic, orthographic and phonological processes.

The success of a network-level investigation depends crucially on the use of a valid and reliable method of analysis to measure connectivity within the network. A troubling finding from a stimulation study by Smith et al. (2011) showed that of the 38 effective connectivity algorithms currently used in neuroimaging, none could find both connections and their orientations in 28 simulated networks without a large number of false positives, false negatives or both. However, Ramsey et al. (2011) showed that a graphical search algorithm, Independent Multiple sample Greedy Equivalence Search (IMaGES; Ramsey et al., 2010), combined with an orientation algorithm, Linear non-gaussian Orientation, Fixed Structure (LOFS), delivered high precision (*>*80%) in the reproduction of connections and orientations for all 28 simulated networks. The difficulty in arriving at a correct pattern of effective connectivity stems from computational intractability, which often arises because the number of alternative causal structures for a set of, say, 10 ROIs, could be on the order of billions (Ramsey et al., 2010). IMaGES solves this problem by using Bayesian methodology to partition the connectivity search space into manageable chunks of Markov equivalence classes and by further constraining the search to connections that carry the greatest predictive power (Perez et al., 2010). The advantages of this algorithm include greater flexibility to uncover new connectivity patterns not previously seen in reading studies, and the ability to provide better model fit. Here we used IMaGES analysis in concert with the theoretical approach of connectionist models to develop a network-level neural account of reading.

# **METHODS**

#### **PARTICIPANTS**

The participants were 20 (13 female) healthy, literate, righthanded volunteers with normal or corrected-to-normal vision. The mean age of participants at the time of the study was 23.2 (*SD* = 3*.*4). All participants provided written informed consent before taking part in the reading aloud fMRI study as described in Graves et al. (2010).

# **MATERIALS**

The stimuli were 465 monosyllabic English words selected such that their length, log-transformed frequency of occurrence, spelling-sound consistency, imageability, and log-transformed position-constrained bigram and biphone frequencies were uncorrelated. A detailed description of the stimuli is available in Graves et al. (2010).

# **TASK AND DATA ACQUISITION**

The experiment used a fast event-related fMRI design with continuous acquisition. On each trial a word appeared on the screen for 1000 ms before being replaced by a fixation cross of variable duration, with a mean intertrial interval of 4.9 s (*SD* = 3*.*72). Participants' task was to "read each word aloud as quickly and accurately as possible" and their responses were recorded with an MRI-compatible microphone.

A 3.0-T GE Excite (GE Healthcare, Waukesha, WI) MRI scanner with an 8-channel array radio frequency head coil was used for data acquisition. Functional images were acquired using a gradient-echo echoplanar imaging (EPI) sequence (*TE* = 25 ms; *TR* = 2000 ms; FOV = 192 mm; matrix = 64 × 64). Thirtytwo interleaved axial slices per volume were obtained (3 × 3 × 2*.*5 mm voxels, 0.5 mm gap). The data were acquired in 5 functional runs with 240 whole-brain image volumes each. High resolution, T1-weighted anatomical images were acquired using a spoiled-gradient-echo sequence (matrix = 0*.*938 × 0*.*938 mm; 134 contiguous 1 mm axial slices).

# **fMRI DATA ANALYSIS**

Image preprocessing was performed using FSL 5.0 software (FMIRB's Software Library, www.fmirb.ox.ac.uk/fsl). Functional images were skull stripped using BET (Smith, 2002) and registered to high-resolution anatomical and standard MNI (Montreal Neurological Institute) space images using FLIRT (Jenkinson and Smith, 2001; Jenkinson et al., 2002).

Mean activation timeseries were extracted from each participant's registered and skull-stripped fMRI data and each ROI. The ROIs were defined in MNI space (Grabner et al., 2006) from previous fMRI results (Graves et al., 2010) taken directly from that study using the exact significance and extent criteria described previously. The only modifications made were to apply anatomical masks so the regions did not extend beyond relevant anatomical boundaries, as defined in the Talairach atlas (Lancaster et al., 2000). This ensured that (1) the ROIs did not overlap and (2) they lay within defined anatomical regions. The (ITS) ROI showed increased blood oxygen level dependent (BOLD) signal for words of decreasing spelling-sound consistency, and was spatially bounded by the inferior and middle temporal gyri. The (pFG) ROI was defined as an area showing increased BOLD signal with decreasing word frequency, restricted to not extend beyond the atlas definition of the fusiform gyrus. The (AG) ROI showed increased BOLD signal for reading words of increasing word frequency or imageability, and was masked to not extend beyond the atlas definition of the AG. The (pSTG) ROI showed increased BOLD signal with increasing response time (RT) for reading aloud, and the posterior MTG ROI (pMTG, bounded by the atlas definition of the MTG) showed increased BOLD signal for words of decreasing bigram frequency (**Figure 1**). These ROIs have previously been linked with different aspects of lexical retrieval: orthographic (pFG, e.g., Cohen et al., 2000; McCandliss et al., 2003; Binder et al., 2006; Vinckier et al., 2007), semantic (AG and ITS, e.g, Binder et al., 2009), and phonological processing (pSTG, e.g., Hickok and Poeppel, 2004, 2007; Graves et al., 2007, 2008; Gow, 2012; Price, 2012) as well as orthographyphonology mapping (pMTG, e.g., Jobard et al., 2003; Brambati et al., 2009; Richlan et al., 2009).

The timeseries of neural activation was separated into 4 sets per participant based on the characteristics of the words presented at each timepoint. The words were divided according to a 2 × 2 design with high and low levels of consistency and imageability, high and low levels of consistency and frequency, or high and low levels of frequency and imageability. Levels of each variable were defined as the upper and lower quartiles of the consistency, imageability, and frequency distributions from the complete stimulus dataset. There were on average 27.1 (*SD* = 10*.*4) trials in each cell of the 2 × 2 design table. Of these, only the trials on which participants made a correct response were considered for analysis. Responses were counted as incorrect if the participant stuttered, mispronounced the word, failed to respond, or responded with an RT more than 3 SDs from the group mean. The resulting timeseries of neural activation was aggregated across the 5 ROIs and 20 participants and was analyzed by condition using the IMaGES algorithm for effective connectivity (Ramsey et al., 2010; Tetrad software package http://www*.*phil*.*cmu*.*edu/ projects/tetrad/). Candidate directed acyclic graphs were obtained using IMaGES search with a penalty discount optimized to find the first non-triangular configuration of connections. Next, connection directions were specified using the LOFS algorithm. LOFS belongs to a family of algorithms that exploit the fact that "the residuals of the correct linear model with independent non-Gaussian sources of error will be less Gaussian than the residuals of any incorrect model" (Ramsey et al., 2011, p. 4). This property of linear models is used in LOFS together with a non-Gaussianity (NG) measure to orient the effective connections. In our analysis we used the Anderson-Darling test of NG (Anderson and Darling, 1952). Model goodness of fit to each dataset was estimated using structural equation parametric modeling with a regression optimizer.

#### **INDIVIDUAL DIFFERENCES IN EFFECTIVE CONNECTIVITY**

Follow-up analyses, aimed at better understanding the effective connectivity from AG to ITS that emerged only in the low-frequency, high-imageability word condition (see below), were performed in terms of the association of individual performance parameters with connection strengths. As this was the only connection that deviated from the stable network structure, which emerged across all conditions, we examined it closely. We found that the strength of this connection varied across participants. We tested for correlations between AG→ITS connection strength and mean RT for each participant and for correlations with individual effects of the main stimulus parameters of interest. These analyses were performed separately because including RT in the same multiple linear regression analysis with effects of stimulus properties defined in terms of their effect on RT would have resulted in an over-determined model. RT for inscanner responses was calculated as the time from stimulus onset to response onset (for details see Graves et al., 2010). The behavioral effects of stimulus parameters were derived from a regression analysis performed separately on each participant, with RT as the dependent variable. RT was analyzed using multiple linear regression with the following six explanatory variables: length in letters, word frequency, consistency, imageability, the multiplicative interaction of word frequency, and consistency, and the multiplicative interaction of consistency and imageability. Values for these variables were mean-centered to avoid any multicollinearity that could result from inclusion of interaction terms (Kutner et al., 2005). This analysis resulted in for these variables were mean-centered to avoid -weights for each variable in each participant. The β -weights for consistency, imageability, and word frequency were then tested for correlation with AG→ITS connection strength across participants.

#### **RESULTS**

#### **THE READING NETWORK**

Graphs of effective connectivity revealed a network of areas with some stable components, such as connections between AG and pMTG and between pMTG and pSTG, and components that varied depending on the nature of the stimuli. For example, the direction of connectivity between pSTG and ITS varied as a function of imageability. When imageability was high, ITS was driving the activity in pSTG, and when imageability was low, the pattern of effective connectivity was reversed (pSTG provided input to ITS, **Figures 2**, **4**, where bold lines represent connections that differ across conditions). This connection was also modulated by frequency and consistency, with input from pSTG to ITS occurring only for words of high consistency and low frequency (**Figure 3C**). The connection between pSTG and pFG was more stable across conditions with only one reversal: for highfrequency high-imageability words only, pFG drove activity in the pSTG (**Figure 4D**). One additional connection, from the AG to ITS, was found for low-frequency and high-imageability words (**Figure 4B**). The strength of this connection had a direct relationship with performance on reading aloud, as described further below.

#### **EFFECTS OF CONSISTENCY AND IMAGEABILITY**

Across all levels of consistency and imageability (**Figure 2**), activity in the pMTG, an area thought to be involved in orth→phon conversion (Jobard et al., 2003; Sandak et al., 2004; Brambati et al., 2009; Richlan et al., 2009; Graves et al., 2010), was found to influence activity in the AG, an area implicated in processing semantics (Binder et al., 2009), and pSTG, an area involved in

**FIGURE 2 | Effective connectivity results for high (C,D) and low (A,B) levels of spelling-sound consistency crossed with high (B,D) and low (A,C) levels of imageability.** Colors correspond to **Figure 1**. Numbers along connections represent model regression fit coefficients averaged across participants, with standard errors in parentheses.

processing phonology (Hickok and Poeppel, 2004, 2007; Graves et al., 2007, 2008; Gow, 2012; Price, 2012). Similarly, across all conditions, pSTG influenced activity in the pFG, an area thought to be involved in orthographic processing (Mechelli et al., 2000;

Binder et al., 2006; Vinckier et al., 2007), and possibly some aspects of phonological processing (Hillis et al., 2005; Graves et al., 2010; Cattinelli et al., 2013; Mano et al., 2013). The effective connectivity graphs revealed a main effect of imageability. For high-imageability words, ITS, implicated in processing lexical semantics, drove activity in the pSTG, a phonological area (**Figures 2B,D**). The direction of effective connectivity between these areas was reversed for low-imageability words (pSTG provided input to ITS; **Figures 2A,C**). When regression coefficients were entered into an ANOVA with consistency (high or low), imageability (high or low), and connection (ITS to pSTG, pMTG to AG, pMTG to pSTG, and pSTG to pFG) as within-subjects factors, a main effect of connection [*F*connect *(*3*,* <sup>57</sup>*)* = 8*.*19, *p <* 0*.*001] and a marginal interaction between consistency and imageability was found [*F*con <sup>×</sup> imag *(*1*,* <sup>19</sup>*)* = 4*.*33, *p* = 0*.*051]. For high-imageability words, ITS provided stronger input to pSTG for low-consistency than for high-consistency words (**Figures 2B,D**).

#### **EFFECTS OF CONSISTENCY AND FREQUENCY**

The effective connectivity graphs (**Figure 3**) revealed the same set of causal connections between pMTG, AG, STG, and pFG, as in the analysis of consistency and imageability. That is, pMTG modulated activity in AG and pSTG, which then influenced activity in the pFG. There was also an interaction between consistency and frequency, driven by a difference in the connectivity of ITS and pSTG. The pSTG drove activity in ITS only when participants read words of high-consistency and low-frequency (**Figure 3C**), otherwise pSTG drove activity in ITS. The interaction between consistency and frequency also influenced the strengths of causal connections across the ROIs. This was revealed by an ANOVA

**Figure 2**.

on regression coefficients for each participant, with consistency (high or low), frequency (high or low), and connection (ITS to pSTG, pMTG to AG, pMTG to pSTG, and pSTG to pFG) as within-subjects factors [*F*con <sup>×</sup> freq *(*1*,* <sup>19</sup>*)* = 4*.*64, *p <* 0*.*05]. There was also a main effect of connection [*F*connect *(*3*,* <sup>57</sup>*)* = 7*.*72, *p <* 0*.*001], and an interaction between consistency and connection [*F*con <sup>×</sup> connect *(*3*,* <sup>57</sup>*)* = 2*.*85, *p <* 0*.*05]. Connection strength was larger when pSTG modulated ITS in the high-consistency, lowfrequency condition (**Figure 3C**) than when ITS modulated pSTG in all other conditions.

#### **EFFECTS OF FREQUENCY AND IMAGEABILITY**

Similar modulation by pMTG of the AG and pSTG was found as in the other analyses. Activity in pSTG modulated pFG in all but one case: When participants read words of highfrequency and high-imageability, this connection was reversed (**Figure 4D**). A main effect of imageability was observed, with ITS modulating pSTG for high-imageability words regardless of frequency, whereas the direction of effective connectivity between these regions was reversed for low-imageability words (compare **Figures 4A**–**D**). A main effect of imageability was found when regression coefficients were entered into an ANOVA with frequency (high or low), imageability (high or low), and connection (ITS to pSTG, pMTG to AG, pMTG to pSTG, and pSTG to pFG) as within-subjects factors [*F*imag *(*1*,*19*)* = 7*.*70, *p <* 0*.*05]. A main effect of connection was also observed [*F*connect *(*3*,* <sup>57</sup>*)* = 6*.*82, *p <* 0*.*005]. A three-way interaction between frequency, imageability and connection was also found, *F*freq <sup>×</sup> imag <sup>×</sup> connect *(*1*,* <sup>19</sup>*)* = 2*.*96, *p <* 0*.*05. Connection strength differed for high- compared to low-imageability words, such that connections from ITS to pSTG (high-imageability condition, **Figures 4B,D**) showed lower coefficients than connections from pSTG to ITS (lowimageability condition, **Figures 4A,C**). In addition, for highand low-frequency words with low imageability (**Figures 4A,C**), the connection from pMTG to pSTG was stronger than for high-frequency words with high imageability (**Figure 4D**). An additional connection was found in the graph for lowfrequency, high-imageability words, where AG modulated activation of ITS. This connection was found in no other condition. Following the direction of effective connectivity in this condition reveals an apparent cascade of activation as follows: AG→ITS→pSTG→pFG.

#### **INDIVIDUAL DIFFERENCES IN EFFECTIVE CONNECTIVITY**

The values of the regression coefficients for the AG→ITS connection varied considerably across participants, with a maximum strength of 0.97 and a minimum of −1.22. This was associated with individual variability in average RT on correct trials, such that stronger excitatory connectivity corresponded to faster RTs, *r* = −0*.*63, *t(*18*)* = −3*.*40, *p <* 0*.*005 (**Figure 5**). We also examined the relationship between the strength of this connection and individual differences in how much the other factors being considered here—word frequency, consistency, and imageability affected RT. A multiple linear regression model showed that all three of these variables contributed to predicting the AG→ITS connection strength, *F*regression *(*3*,* <sup>16</sup>*)* = 8*.*52, *p <* 0*.*005, and their joint effect explained 62% of the variance in the strength of the

AG→ITS connection. The x-axis in **Figure 6** represents β-weights for each factor, and each point is the effect for an individual participant. To use imageability as an example, individuals with negative values showed faster responses (lower RT) for higher imageabiltiy words, and the opposite was true for those with positive values. A decrease in the imageability [β = −0*.*53, *t(*16*)* = 3*.*33, *p <* 0*.*005] and consistency [β = −0*.*69, *t(*16*)* = 3*.*89, *p <* 0*.*005] β-weights, showing faster RT for high-imageability and high-consistency words, predicted a corresponding increase in AG→ITS connectivity. Conversely, an increase in word frequency effect values [*t(*16*)* = 2*.*87, *p <* 0*.*05] predicted an increase in AG→ITS connectivity (**Figure 6**). The latter effect was unexpected and likely driven by a single outlier. With the apparent outlier removed from the analysis, the results revealed that the imageability and consistency effects remained significant, [β = −0*.*46, *t(*15*)* = 2*.*48, *p <* 0*.*05; β = −0*.*71, *t(*15*)* = 3*.*66, *p <* 0*.*005, respectively], but the frequency effect did not [β = 0*.*26, *t(*15*)* = 1*.*22, *p* = 0*.*24].

#### **DISCUSSION**

In this study we analyzed effective connectivity among 5 ROIs previously shown to support different information processing components of reading aloud, including orthographic processing, orth-phon conversion, semantic access, and phonological processing. We investigated these components by varying levels of spelling-sound consistency, word frequency, and imageability. Following connectionist models of reading, we predicted that semantics would modulate lexical processing, especially under conditions when word imageability was high and word consistency was low, or when orth-phon mapping was not straightforward, such as in the case when low consistency was coupled with low word frequency.

Consistent with our predictions, effective connectivity analyses suggested that semantic access is an important component of reading aloud. Connectivity of ITS, an area previously implicated

in processing word meanings (e.g., Binder et al., 2009; Graves et al., 2010; Whitney et al., 2011), and pSTG, a region involved in computing phonology (e.g., Hickok and Poeppel, 2004, 2007; Graves et al., 2007, 2008; Price, 2012), varied as a function of orthographic, phonological, and semantic characteristics of the stimulus words.

Our results are in line with accounts of reading aloud where phonology is computed interactively from semantic and orthphon projections. The central role for ITS in mapping from semantics to phonology in our analysis is consistent with the Graves et al. (2010) findings, which showed that activity in this region was negatively correlated with spelling-sound consistency. Low-consistency words may invoke competing phonological codes and are the most likely to benefit from computational solutions that use semantic information to separate inconsistent words from their consistent word competitors. This assumption is in line with findings showing a clear link between activity in ITS/anterior MTG and single-word semantics (e.g., Jobard et al., 2003; Hickok and Poeppel, 2004; Binder et al., 2009; Whitney et al., 2011).

The role of AG, another semantic area (e.g., Binder et al., 1997, 2005, 2009; Price and Mechelli, 2005; Mechelli et al., 2007; Binder and Desai, 2011; Price, 2012), seemed to be distinct from that of the ITS. In our analyses, AG received feed-forward projections from pMTG, an area thought to be involved in mapping from spelling to sound (e.g., Jobard et al., 2003; Sandak et al., 2004), and their connectivity remained unchanged across all conditions. This pattern may reflect a more general activation of semantic and associative information in the AG (Graves et al., 2012). Furthermore, a closer look at the connectivity between AG and ITS revealed considerable individual variation in the strength of this connection related to participants' performance on the task, suggesting that AG engagement may be modulated by reading ability and dominant modes of processing by individual readers.

Regarding the cognitive models of reading, it is important to acknowledge that both the connectionist and dual-route models are incomplete with respect to relevant brain data. Indeed, the fact that the Taylor et al. (2013) meta-analysis explicitly addressed the role of effort in interpreting brain activations represents an innovative departure from the models they considered. Although connectionist reading models do offer specific, quantitative accounts of how differences in division of labor across reading pathways arise as a result of differences in stimulus properties, neither model deals with more domain-general issues such as how responses are selected or how working memory demands may be greater for some conditions than others. No doubt further exploring such issues will be an important direction for future studies.

Other models of functional and effective connectivity have also been described in the literature. In a finding largely consistent with the present study, Segal and Petrides (2013) reported that AG and its connectivity patterns play a central role in reading. They showed that functional connectivity between AG and MTG, STG, FG and IFG increased during reading words relative to viewing pictures. Similarly, Levy et al. (2009) used dynamic causal modeling (DCM) to model *effective* connectivity among brain regions. They showed increased connectivity between middle occipital cortex, posterior ventral occipitotemporal cortex, and left parietal lobe (white matter underlying the intraparietal sulcus) during reading word and pseudoword stimuli. Pseudowords and words differed primarily in that pseudowords had an intermediate connection with ventral occipitotemporal cortex, between middle occipital and parietal cortex, whereas effective connectivity for words did not include this ventral occipitotemporal region. Moreover, the strength of the connection from left middle occipital to left parietal cortices was correlated with several measures of reading performance. Overall, the Levy et al. study was similar to the current one in that it examined effective connectivity and individual differences in the neural reading network. However, several methodological differences (e.g., in tasks, contrasts, and analyses) may have contributed to the lack of overlap in results between the studies. Multiple word reading pathways were also reported in a DCM analysis by Richardson et al. (2011). They identified three pathways connecting an occipital visual area with temporal phonological and semantic areas. Two of the pathways traversed the ventral occipito-temporal cortex going either directly to anterior superior temporal sulcus (STS) or indirectly via posterior STS. The third pathway connected occipital visual cortex with posterior and anterior STS. Even though the Richardson et al. (2011) model did not include AG, because this area was not activated consistently across participants as a function of their contrast of interest, they showed a pattern of connectivity similar to ours where regions subserving orthographic and phonological processing are connected with regions supporting semantic processing (aSTS). We did not examine the role of aSTS in reading, however, a pathway going from occipitotemporal cortex to aSTS would necessarily pass through ITS, an ROI used in our analysis. This highlights the underlying similarity between our effective connectivity models and that of Richardson et al. On the other hand, the heterogeneity across models may stem from different cognitive processing models assumed by different groups, as well as from limitations on the number of ROIs, connections, and connection directions imposed by the DCM approach. Unlike DCM, the IMaGES approach used here does not incorporate biophysical assumptions. Instead, it offers a computationally tractable solution for discovering (rather than pre-specifying) connections and their directions among numerous ROIs (Ramsey et al., 2010). One promising avenue of future research might be to take multiple network-level neural models of reading that incorporate different cognitive processing assumptions and test them within the same effective connectivity framework such as IMaGES.

Overall, we take the current results to support the singleprocess connectionist account, with the caveat that this study is a preliminary investigation into the patterns of effective connectivity supporting reading aloud. The continuous design of the dataset we analyzed here was not optimized for the factorial analysis employed, and future studies are planned to address this limitation. Nevertheless, we found several qualitative and quantitative shifts in the directions of connectivity when contrasting high and low levels of spelling-sound consistency, imageability, and word frequency that paralleled independent predictions from connectionist models of reading aloud.

#### **EFFECTS OF CONSISTENCY AND IMAGEABILITY**

The comparison between high- and low-imageability words revealed a main effect of this variable on patterns of effective connectivity. Regardless of consistency, for highly imageable words the ITS exerted a causal influence on pSTG (**Figures 2B,D**), whereas less imageable words showed the opposite pattern (pSTG→ITS, **Figures 2A,C**). One way to distinguish the two classes of words is to hypothesize that high-imageability words have more salient meanings, operationalized as the number of defining semantic features (e.g., Plaut and Shallice, 1993). When the task is to read aloud as quickly and accurately as possible, the reading system uses whatever information is at its disposal for each word. In the case of highly imageable words, semantic information is particularly salient, and several kinds of semantic information, including imageability, have been shown to facilitate reading aloud (Strain et al., 1995; Hino and Lupker, 1996; Lichacz et al., 1999; Strain and Herdman, 1999; Hino et al., 2002; Shibahara et al., 2003; Balota et al., 2004; Rodd, 2004; Woollams, 2005; Yap et al., 2012). Here we have shown that such facilitation by semantics may be neurally instantiated as an area associated with semantics (ITS) exerting a causal influence on an area associated with phonology (pSTG).

#### **EFFECTS OF CONSISTENCY AND FREQUENCY**

In the analysis of the effects of consistency and frequency, pMTG was found to influence pSTG and this pattern was consistent across all conditions. In contrast, connectivity of ITS and pSTG varied. For example, we observed an interaction of word consistency and frequency, such that ITS influenced activity in pSTG for high-frequency words, regardless of consistency, and for low frequency words of low consistency (**Figures 3A,B**, and **D**). The opposite pattern, effective connectivity from pSTG to ITS, was found for high-consistency, low-frequency words (**Figure 3C**). Thus, we see evidence for the predicted switch between a lexical semantic processing area exerting influence on a phonological processing area for low-consistency words and the opposite pattern for high-consistency words, but only for words of low frequency. This may be a neural instantiation of a behavioral pattern. Psycholinguistic studies typically report an interaction between consistency and frequency, such that low-consistency words elicit longer RTs than high-consistency words, particularly if they are also of low frequency (Seidenberg et al., 1984; Seidenberg, 1985; Waters and Seidenberg, 1985; Taraban and McClelland, 1987; Andrews, 1992). Similarly, patients with surface dyslexia due to semantic dementia typically commit errors when reading aloud low-consistency words, especially when they are of low frequency (e.g., "sew" pronounced as "sue"; Patterson and Hodges, 1992; Woollams et al., 2007). Such an interaction is also inherent in the single-process model, as shown by a derived analytic solution describing the output state of a phoneme unit that should be activated for a given input word (equation 17 in Plaut et al., 1996). The activation state of the unit depends on input from the semantic and phonological pathways. This equation is a formalization of the pattern seen for word frequency and consistency. When the value of one variable is low, the value of the other variable matters more. This is paralleled in our analyses of effective connectivity, where consistency exerts an effect in the expected direction for lexical semantic and phonological areas, but only for low-frequency words. Why we did not see the complementary interaction of an enhanced effect of word frequency for low-consistency words is unclear. In addition, the equation discussed above from Plaut et al. (1996) predicts there would be little or no effect of word frequency for high-consistency words, yet our analyses did reveal frequency-related differences within the high-consistency condition (**Figures 3C,D**). Future work will be aimed at clarifying these results by exploring the role of imageability in producing these patterns of effective connectivity for words of high and low consistency and frequency. In the present study, insufficient numbers of trials were available to explore this three-way interaction. As noted above, we are currently planning a follow-up functional neuroimaging study of reading aloud in which the stimulus parameters of interest will be manipulated factorially (rather than continuously, as in the current study), and we expect new results to shed additional light on network-level neural systems that may correspond to network-level models of reading aloud.

#### **EFFECTS OF FREQUENCY AND IMAGEABILITY**

In the frequency by imageability analysis, we replicated the main effect of imageability obtained in the consistency by imageability analysis (**Figure 2**). Specifically, ITS drove activity in pSTG when word imageability ratings were high (**Figures 4B,D**), and the effective connectivity direction was reversed when imageability was low (**Figures 4A,C**). As described above, this pattern of results is interpreted as reflecting a neural correlate of the contribution of semantics to mapping from orthography to phonology.

In addition, reading high-frequency words of highimageability also resulted in modulation of pSTG activity by pFG. The pattern of effective connectivity obtained in this condition suggests that when both word frequency and imageability are high the input from regions processing semantics (ITS) and regions subserving orthography-phonology mapping (pFG) converges on regions computing phonology (pSTG). This suggests that the highly efficient reading of these words is achieved by the strong convergence of multiple streams of relevant information.

For words of low frequency high imageability, an additional effective connection emerged from AG to ITS (**Figure 4B**). This is the only condition in which both semantic ROIs were effectively connected to each other, and it is the condition in which semantic information would be both available and beneficial to reading aloud. Importantly, the connectivity between these areas was associated with individual differences in performance measures. Stronger AG→ITS connections were associated with faster responses to high-consistency words, and faster responses to high-imageability words (**Figure 6**). In addition, stronger AG→ITS connections were unexpectedly associated with faster responses to low-frequency words. However, this latter effect seems to have been driven by a single outlier (see below). The strength of AG→ITS connectivity also facilitated RT on correct trials, pointing to a more general role of connectivity between these areas in reading performance. AG has often been associated with skilled reading. Decreased activity in AG (e.g., Shaywitz and Shaywitz, 2005) and pSTG (e.g., Simos et al., 2002; Shaywitz and Shaywitz, 2005; Maisog et al., 2008; Richlan et al., 2009), an area downstream from AG in **Figure 4B**, have been shown in studies comparing cases of developmental dyslexia to typical readers. Individuals with dyslexia (reading disability) typically exhibit poor phonological awareness, verbal working memory, and impaired lexical retrieval during visual word recognition (e.g., Shaywitz and Shaywitz, 2005). This is revealed by overregularization of spelling-sound mapping or by an inability to read novel letter-sound combinations (Castles and Coltheart, 1993; Manis et al., 1996; Milne et al., 2003). Our analysis shows variations in AG→ITS connectivity associated with reading RT, suggesting that greater contributions from semantic areas are associated with better reading performance.

We see the current study as providing compelling evidence for the use of network-level effective connectivity analyses of functional neuroimaging data to reveal the information-processing dynamics of reading aloud. This approach also seems to offer clear

#### **REFERENCES**


*Psychol. Learn. Mem. Cogn.* 8, 234–254.


evidence for modulation of division of labor by word properties of a sort not previously reported in the literature. However, we caution that the current study should be considered somewhat preliminary. The main reason for this (as mentioned above) is that, for purposes of the effective connectivity analyses, we applied a factorial approach to a continuous design study, thereby making it necessary to exclude intermediate-value observations. In addition, it seems that the word frequency result at the bottom of **Figure 6** is being driven by an apparent outlier in the bottom left of the plot. We chose not to remove this data point because there was no obvious numerical criterion to apply that did not also reduce the fit of the overall multiple linear regression model. However, if data from this participant is simply removed, the apparent association between the behavioral word frequency effect and AG→ITS connection strength is no longer reliable, while the associations with consistency and imageability, though somewhat reduced in effect size, remain reliable in the same direction.

#### **CONCLUSION**

In the present study we used effective connectivity analyses to advance the understanding of brain processes that support reading. Although it is often assumed that a network of brain regions must work together to accomplish complex cognitive tasks such as reading (Fiez and Petersen, 1998; Jobard et al., 2003; Indefrey and Levelt, 2004; Price, 2012), and that there is some degree of variability in the function of this network that corresponds to individual differences (Bolger et al., 2008; Seghier et al., 2008; Jobard et al., 2011; Welcome and Joanisse, 2012; Graves et al., submitted), the present study constitutes a detailed, data-driven evaluation of these assumptions. Here we showed that, under conditions predicted by connectionist models of reading aloud, semantics supports mapping from spelling to sound and that a range of individual variability exists in how much participants engage the semantic system. We also found that a core network of brain regions supports reading aloud, regardless of stimulus properties. Future work using a robust and flexible framework for effective connectivity analyses, such as IMaGES, applied across studies testing different cognitive models, may help clarify network-level neural accounts of reading.

#### **ACKNOWLEDGMENTS**

This work was supported by a National Institutes of Health grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (grant number K99/R00 HD065839) to William W. Graves.

*Cogn. Sci.* 15, 527–536. doi: 10.1016/j.tics.2011.10.001


and Priesto, T. (1997). Human brain language areas identified by functional magnetic resonance imaging. *J. Neurosci.* 17, 353–362.

Binder, J. R., Medler, D. A., Desai, R., Conant, L. L., and Liebenthal, E. (2005). Some neurophysiological constraints on models of word naming. *Neuroimage* 27, 677–693. doi: 10.1016/j.neuroimage.2005.04.029


left posterior superior temporal gyrus participates specifically in accessing lexical phonology. *J. Cogn. Neurosci.* 20, 1698–1710. doi: 10.1162/jocn.2008.20113


the large-scale structure of brain function by classifying mental states across individuals. *Psychol. Sci.* 20, 1364–1372. doi: 10.1111/j.1467-9280.2009.02460.x


contrast of different training conditions. *Cogn. Affect. Behav. Neurosci.* 4, 67–88. doi: 10.3758/CABN.4.1.67


normal following successful remedial training. *Neurology* 58, 1203–1213. doi: 10.1212/WNL. 58.8.1203


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2013; accepted: 12 August 2013; published online: 02 September 2013.*

*Citation: Boukrina O and Graves WW (2013) Neural networks underlying contributions from semantics in reading aloud. Front. Hum. Neurosci. 7:518. doi: 10.3389/fnhum.2013.00518*

*This article was submitted to the journal Frontiers in Human Neuroscience. Copyright © 2013 Boukrina and Graves. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these*

*terms.*

# Unimodal and multimodal regions for logographic language processing in left ventral occipitotemporal cortex

#### *Yuan Deng1 \*, Qiuyan Wu1 and Xuchu Weng1,2*

*<sup>1</sup> Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China <sup>2</sup> Center for Cognition and Brain Disorders, Hangzhou Normal University, Hangzhou, China*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Jason D Zevin, Weill Cornell Medical College, USA Chuansheng Chen, University of California, Irvine, USA*

#### *\*Correspondence:*

*Yuan Deng, Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, 16# Lincui Road, Chaoyang District, Beijing, 100101, China e-mail: dengy@psych.ac.cn*

The human neocortex appears to contain a dedicated visual word form area (VWFA) and an adjacent multimodal (visual/auditory) area. However, these conclusions are based on functional magnetic resonance imaging (fMRI) of alphabetic language processing, languages that have clear grapheme-to-phoneme correspondence (GPC) rules that make it difficult to disassociate visual-specific processing from form-to-sound mapping. In contrast, the Chinese language has no clear GPC rules. Therefore, the current study examined whether native Chinese readers also have the same VWFA and multimodal area. Two cross-modal tasks, phonological retrieval of visual words and orthographic retrieval of auditory words, were adopted. Different task requirements were also applied to explore how different levels of cognitive processing modulate activation of putative VWFA-like and multimodal-like regions. Results showed that the left occipitotemporal sulcus (LOTS) responded exclusively to visual inputs and an adjacent region, the left inferior temporal gyrus (LITG), showed comparable activation for both visual and auditory inputs. Surprisingly, processing levels did not significantly alter activation of these two regions. These findings indicated that there are both unimodal and multimodal word areas for non-alphabetic language reading, and that activity in these two word-specific regions are independent of task demands at the linguistic level.

**Keywords: fMRI, visual word form area, Chinese, multimodal, task modulation**

# **INTRODUCTION**

Extensive evidence from imaging studies has shown that a region in the human left extrastriate visual cortex responds selectivity to written letters over other complex visual stimuli, such as linedrawings, faces, and houses, and that these responses are highly invariant with changes in visual script or font (Cohen et al., 2000, 2002, Cohen and Dehaene, 2004; Dehaene et al., 2001, 2002, 2005, 2010; Szwed et al., 2011). This region, located lateral to the middle part of the left fusiform gyrus, was labeled the visual word form area (VWFA; Cohen et al., 2000; Dehaene and Cohen, 2011). However, controversies remain about this region's function in reading and reading development. The main point of debate is whether the specialization of the VWFA is domain specific (Dehaene and Cohen, 2011) or process-specific (Price and Devlin, 2011).

In contrast to the view of visual-specific selectivity, the interactive view suggests that this region may act as an interface between sensory input and higher level associations (e.g., mapping visual word forms to sounds and meanings) (Price and Devlin, 2011), as functional connectivity studies have shown that the left fusiform gyrus interacts extensively with other regions of the reading network. When there was a strong demand for linguistic judgment, activation of this region was highly correlated with activation of regions associated with semantic and phonological processing (Bitan et al., 2005; Wang et al., 2011) as well as visuospatial processing of logographic writing systems (Deng et al., 2012). Evidence from lexical training studies has shown that the left mid fusiform region is critical for new script learning (Hashimoto and Sakai, 2004; Deng et al., 2008; Dehaene et al., 2010) and that activation of this region increases during phonological and semantic learning of a new script (Sandak et al., 2004; Xue et al., 2006). A recent publication found that congenitally blind subjects exhibited VWFA activation when selectively doing a letter-soundscapes task, suggesting that the VWFA may be responsible for linking letter shape to phonology (Striem-Amit et al., 2012).

Cohen et al. (2004) further verified the exclusive response of the VWFA to visual inputs by directly examining the modality effect in the left temporo-occipital region, and proposed that an adjacent region, the lateral inferotemporal multimodal area (LIMA), showed comparable activation for both visual and auditory inputs. A similar pattern of activation was found by Jobard et al., (2003, 2007), who labeled this multimodal area the basal temporal language area (BTLA). However, because the alphabetic writing systems used in these studies (English and French) have grapheme-to-phoneme correspondence (GPC) rules, it is difficult to disassociate visual-specific processing from form-to-sound mapping in VWFA activation for visual word recognition. These GPC rules may also contribute to the distinct spatial organization of unimodal and multimodal regions in the left inferotemporal cortex of alphabetic language speakers.

Compared to alphabetic language systems, a typical logographic language (such as Chinese) does not follow GPC rules for word form-to-sound mapping. Chinese characters map onto phonology at the mono-syllable level, and the relationship is usually arbitrary. For example, the character " " ("contribute") is pronounced /xian4/ (the number refers to tone), thus no visual component of this character corresponds to a phoneme of character pronunciation. This lack of systematic mapping between visual form and phonology makes Chinese script a unique tool to control for the possible confound of sub-lexical form-tosound processing by the VWFA and (or) associated cortical regions.

Thus, by taking advantage of this unique characteristic of the Chinese language, the current study aimed to examine the following issues regarding the function of the VWFA. First, without simultaneously changing phonological affordances of the stimuli, can different levels of phonological processing of Chinese characters influence VWFA activation? Second, is the VWFA a unimodal region in logographic word reading as it is in languages with GPC rules? Is there also a multimodal region in the ventral temporal cortex for Chinese character reading? Is the VWFA activated during auditory word processing when requiring access to orthographic representations in a logographic language system? Finally, can different orthographic retrieval requirements influence activation of VWFA and/or the putative multimodal region?

In order to answer these questions, the current study employed both visual-to-auditory and auditory-to-visual cross-modality tasks, and modulated task demands for phonological retrieval and character-form retrieval at different sub-lexical levels. If activation of the VWFA is modulated by phonological retrieval at different sub-lexical levels for Chinese reading, it suggests that attention to sub-lexical processing can indeed confound the response properties of the VWFA regardless of the form-to-sound mapping principle. The opposite result suggests that form-to-sound mapping in the VWFA happens at the mono-syllable level, at least for reading Chinese characters. Moreover, if there is a distinct subregion in the left inferotemporal region that shows comparable activation for both auditory and visual inputs, it may suggest a common multimodal region across writing systems. To our knowledge, this is the first study to directly examine the universality of unimodal/multimodal regions in the ventral temporal cortex.

#### **MATERIALS AND METHODS**

#### **SUBJECTS**

Fifteen native Chinese speakers (19–25 years old) participated in this study. All participants were undergraduate or graduate students. Participants were right-handed and had normal hearing and normal or corrected-to-normal vision. They gave informed consent in accordance with guidelines set by the Beijing MRI Center for Brain Research, China.

#### **TASKS AND MATERIALS**

#### *Experimental design*

As shown in **Figure 1**, a 2 input-modality (visual and auditory) × 2 processing-level (local and global) withinsubject design was adopted. There were four experimental tasks: syllabic-unit judgment (local-level) for visual words (Lv), tone judgment (global-level) for visual words (Gv), stroke judgment

visual-based tasks and Chinese characters for auditory-based tasks are displayed in parenthesis.

(local-level) for auditory words (La), and structure judgment (global-level) for auditory words (Ga).

Furthermore, a perceptual task was used in an independent scanning session (localizer task) in order to localize the wordspecific region for Chinese scripts in the occipitotemporal area (Ma et al., 2011).

#### *Stimuli in visual-based tasks*

One hundred and sixty single-character Chinese words were selected from a pool of the most commonly used characters according to the Modern Chinese Frequency Dictionary (see Supplementary Material). Half of them (80) were used for each task. There was only one phonological correspondence for each visual character, i.e., these characters were not polyphones. The average stroke number of these characters was 9.55 (*SD* = 2.40), suggesting a medium visual complexity. For both Lv and Gv tasks, all characters were presented in black against a white background in Song font (100 × 100 pixels).

In the Lv task, participants determined whether the pronunciation of a character contains the syllabic unit "an." In the Gv task, participants determined whether the character has a falling tone (the fourth tone in Chinese). For both tasks, participants made the yes/no decision by pressing the right or left button on a response box. A perceptual task served as a control. In this task, participants determined whether a caret-like character (/\) was present on the left of a line drawing (/\\) or on the right (//\). They made the left/right decision by pressing the left or right button. There were a total of one hundred and sixty line drawings, eighty for the Lv task and eighty for the Gv task.

#### *Stimuli in auditory-based tasks*

The Chinese language consists of a very large number of homophones, sounds represented by several different (visual) word forms. A key consideration in selecting stimuli for tasks of auditory-based word-form judgment was to make sure that only one specific visual correspondence (character) could be retrieved for each auditory word. To this end, a group of characters that have no homophones or few (low frequent) homophones were chosen. Then, another 30 subjects from the same sample group, who did not participate the functional magnetic resonance imaging (fMRI) experiment, were asked to write down the character(s) that first came to their mind when they listened to a speech sound. Only those speech sounds that showed high consistence and accuracy (recognizability) were chosen as **Table 1 | Brain regions showing significant activation for visual and auditory tasks compared to control tasks.**


*L, left hemisphere; R, right hemisphere; BA, Brodmann's areas. The first area for multiple regions indicate peaks of activation in the clusters. Talairach coordinates; Significance at p < 0.05, FDR corrected (cluster size>10).*

final stimuli. Due to these limitations, a total of 80 Chinese single-character words were selected for both tasks, i.e., the La and Ga task used the same set of stimuli (see Supplementary Material). According to the dictionary, the majority of final stimuli had no homophones, while some characters (24 out of 80) had a few homophones with extremely low frequency. All stimuli were presented in an auditory format. All auditory words were recorded in a soundproof booth using a digital recorder and a high-quality stereo microphone. A native Chinese woman read aloud each pronunciation in isolation. Sound duration was normalized to 800 ms and presented at the same sound intensity (loudness).

In the La task, participants determined whether the written form of an auditory word contains a specific " " (dot) stroke. In the Ga task, participants determined whether the written structure of an auditory word has a left-right structure, i.e., whether two major visual components of a character are horizontally configured. Again, they made the yes/no decision by pressing the right or left button on the response box. A perceptual task served as a control. In this task, participants were asked to judge whether the volume of the tone was low, and made the yes/no decision by pressing the right or left button.

#### *Validation of experimental tasks*

In order to test the validity of these tasks (i.e., different requirements for sub-lexical processing induce different psycholinguistic processing levels), a pilot behavioral study was conducted. Ten subjects from the same sample group, who did not participate in the fMRI experiment, were asked to complete all four tasks. Result showed that subjects performed significantly faster in the global condition (mean RT: 1477.9 ms) than in the local condition (mean RT: 1887.2 ms) in auditory-based tasks [*t(*9*)* = −10*.*938, *p <* 0*.*001]. For visual-based tasks, subjects also demonstrated a consistent trend for better performance in the global condition (mean RT: 1240.6 ms) compared to the local condition (mean RT: 1363.9 ms). Faster performance in the global condition is in accordance with the classic finding of "global precedence" in the domains of visual perception (Navon, 1977), attention (Miller, 1981), and mental imagery (Qiu et al., 2009; Niu and Qiu, 2013), indicating that the tasks employed do indeed require different levels of cognitive processing. In addition, although the global-local difference in visual tasks was not as large as in the auditory tasks, evidence from brain imaging studies have consistently found that phoneme/syllabic-unit processing activated a different neural network compared to

#### **Table 2 | Brain regions showing significant differences in activation between visual and auditory tasks.**


*L, left hemisphere; R, right hemisphere; BA, Brodmann's areas. The first area for multiple regions indicate peaks of activation in the clusters. Talairach coordinates; Significance at p < 0.001 corrected at cluster level.*

supersegmental processing (e.g., tones) in Chinese (Gandour et al., 2003; Tong et al., 2008; Li et al., 2010). Evidence from a brain connectivity study also found that distinct brain networks were engaged by global and local information processing for mental imagery (a paradigm similar as our auditory-based task) (Li et al., 2008).

#### *VWFA localizer*

The stimuli and procedures were adapted from a previous study (Ma et al., 2011). Three categories of stimuli, including Chinese characters, faces, and line-drawings, were used. The stimuli were chosen randomly from a pool of 80 during the experiment. Within each trial, the center of each stimulus was slightly shifted from the center of the fixation point and participants were asked to make a judgment about whether the center of the picture was to the left or the right compared to the fixation point by pressing the left/right button.

#### **fMRI PROCEDURES AND TIMING**

All participants practiced a short version of each experimental task before the fMRI scanning session. Different stimuli were used in the practice and the fMRI sessions. There were a total of six functional scanning runs for each subject, including four runs for experimental tasks (Lv, Gv, La, Ga) and two runs for localizer tasks.

For all four experimental runs, a block design was used for stimulus presentation. There was one run for each task. The task order was counterbalanced across subjects. Each run consisted of four experimental task blocks and 4 control task blocks. Each trial lasted 2 s. There were 20 trials per block, and a 2 s instruction trial before each block, so each experimental run lasted 336 s.

After four experimental task runs, there were two identical localizer runs. Each localizer run consisted of 3 blocks repeated three times, one block for each of the three stimulus categories (characters/faces/line-drawings). The block order for the three categories was pseudo-randomized, with a 20 s fixation interval between successive blocks. Each block involved the presentation

**FIGURE 2 | Brain activation maps for both visual and auditory tasks.** Blue indicates activations for visual tasks; Red indicates activations for auditory tasks; Purple indicates overlapping activations for both tasks. *p <* 0*.*05, FDR corrected, greater than 10 voxels.

of 20 images (each for 250 ms), interleaved with a central fixation cross shown for 750 ms. Therefore, each localizer run lasted 380 s.

#### **IMAGE ACQUISITION**

Brain images were obtained on a 3T Siemens Trio scanner at the Beijing MRI Center for Brain Research. Participants lay in the scanner with their head position secured with a specially designed vacuum pillow. Participants were asked to hold an optical response box. The head coil was positioned over the participants' head. Participants viewed visual stimuli that were projected onto a screen via a mirror attached to the inside of the head coil and listened to auditory stimuli via earphones.

For the functional imaging runs, a susceptibility weighted single-shot echo planar imaging (EPI) method with blood oxygenation level-dependent (BOLD) was used. The following scan parameters were used: *TE* = 35 ms, flip angle = 90◦, matrix size = 64 × 64, field of view = 24 cm, slice thickness = 4 mm, number of slices = 32, *TR* = 2000 ms. In addition, a high resolution, T1-weighted 3D image was acquired (3D MPRAGE; 1*.*33 × <sup>1</sup> <sup>×</sup> 1 mm<sup>3</sup> resolution, 144 slices and 1.33 mm slice thickness with no gap).

**FIGURE 3 | Significant activation of the left occipitotemopal cortex for each subject during localizer runs for Chinese characters compared to faces.** The two activation maps on the left (upper: coronal view; lower: horizontal view) are from subject 5. Maps on

the right are each subject's activation map superimposed on that individual's anatomical map. Blue circles indicate activation of the left occipitotemporal sulcus (LOTS) and green circles indicate activation of the left inferior temporal gyrus (LITG).

#### **DATA ANALYSIS**

Data analysis was performed using BrainVoyager QX 2.0 software (Brain Innovation; Goebel et al., 2006). Due to technical problems, data from four subjects were excluded from the final analysis. The functional images were preprocessed; preprocessing steps included slice scan timing correction, motion correction with respect to the first volume in the run, and high-pass filtering (2 cycles per series cutoff). Functional data were not smoothed. Preprocessed functional data were then coregistered to high-resolution anatomical images, which in turn were normalized to Talairach space (Talairach, 1988). Normalizations were performed by using a piecewise affine transformation based on manual identification of the anterior and posterior commissures and the edges of cortex along each axis on anatomical data.

Data from all four experimental runs for each participant were entered into a general linear model using a block analysis procedure. Parameter estimates from BOLD contrasts in single participant model were entered into a random-effects model for all participants to determine whether activation was significant for a contrast at the group level. To reveal overall activation patterns for visual and for auditory stimuli, two tasks of the same modality were combined (Lv and Gv for visual, La and Ga for auditory). The threshold was set at *p <* 0*.*05 FDR-corrected with a cluster size of 10 voxels or greater. Differences between each condition were also examined by paired *t*-test. Statistical threshold was set at *p <* 0*.*001 and cluster-size threshold estimation was performed for correction of multiple comparisons.

Based on two localizer runs, regions-of-interest (ROIs) in the ventral visual pathway for visual word-form processing were selected. According to a previous study (Ma et al., 2011), the contrast between Chinese characters and faces was used to localize the region showing higher activation for words (FDR-corrected, *p <* 0*.*05). At the single subject level, two regions in the left ventral temporal region showed significantly greater activation for Chinese characters, and this activation pattern was consistent across subjects. Based on the anatomical location of these activated regions, the following two ROIs were recognized: the left occipitotemporal sulcus (LOTS) and the left inferior temporal gyrus (LITG). Accordingly, each participant's individual ROIs were identified with the exception of one participant who showed a similar cortical activation pattern in response to Chinese characters and faces. The mean estimates of ROI activation (Beta value) for each subject and for each experimental task (Lv, Gv, La, Ga) relative to control tasks were then obtained using the ROI GLM tool in the BVQX package. Finally, these data were entered into a 2 region (LOTS and LITG) × 2 input-modality (visual and auditory) × 2 processing-level (local and global) ANOVA analysis.


**Table 3 | Two regions of interest (ROIs) showing significant activation for Chinese characters compared to faces.**

*LOTS, left occipitotemporal sulcus; LITG, left inferior temporal gyrus; Talairach coordinates; T-test, all p < 0.01.*

# **RESULTS**

# **BEHAVIORAL RESULTS**

Due to technical problems, the data from four of the 15 participants were not included in the final analysis. The average accuracies were 95.6% for Lv, 91.5% for Gv, 79.1% for La, and 87.5% for Ga. The reaction times (RTs) were 658 ms for Lv, 696 ms for Gv, 885 ms for La, and 902 ms for Ga. Significant main effects of input modality were found for both accuracy [*F(*1*,* <sup>10</sup>*)* = 41*.*827, *p <* 0*.*001] and RT [*F(*1*,* <sup>10</sup>*)* = 35*.*305, *p <* 0*.*001], suggesting that participants performed better and responded faster in visual tasks than auditory tasks. A two-way interaction was found for accuracy [*F(*1*,* <sup>10</sup>*)* = 74*.*391, *p <* 0*.*001]. *Post-hoc* analysis showed that participants responded more accurately on global judgments than local judgments for auditory tasks [*t(*10*)* = 7*.*149, *p <* 0*.*001], while there was no significant performance difference between global and local processing for visual tasks. These findings are consistent with results from the pilot behavioral study.

#### **IMAGING RESULTS**

**Table 1** shows those areas significantly activated by each modality-specific task (visual and auditory) relative to the corresponding control task. **Table 2** shows direct comparisons of the cortical activation patterns between visual and auditory tasks. **Figure 2** presents areas of overlapping activation for both task modalities. As seen in **Figure 2**, both tasks evoked similar activation patterns in the bilateral superior frontal gyrus, bilateral angular gyrus, and posterior cingulate gyrus. However, phonological judgment of visual inputs (Lv and Gv tasks) significantly activated the bilateral superior parietal region, while orthographic judgment of auditory inputs (La and Ga tasks) significantly activated the bilateral temporoparietal junction, including the right superior temporal gyrus and left supermaginal gyrus.

**Figure 3** presents brain maps showing significant activations for Chinese characters compared to faces in localizer runs for each subject, and the selection of each individual's ROIs (also see **Table 3** for their peak coordinates). These two regions were adjacent, with the loci for LOTS activation more mesial and

inferior to those for LITG activation. This pattern was highly consistent across subjects.

**Figure 4** shows the average beta values of both word-specific regions (ROIs) for each experimental task (Lv, Gv, La, Ga). A significant main effect of region [*F(*1*,* <sup>8</sup>*)* = 13*.*77, *p <* 0*.*01] and a region by modality interaction [*F(*1*,* <sup>8</sup>*)* = 64*.*23, *p <* 0*.*001] were found. *Post-hoc* analysis revealed that the LITG was significantly activated for both visual-based and auditory-based tasks, but that LOTS was significantly activated only by visual-based tasks [*F(*1*,* <sup>8</sup>*)* = 10*.*64, *p <* 0*.*05], indicating that LOTS may be a modality-specific region, while LITG may be a multimodal region. Surprisingly, there were no significant main effects or two/three way interactions for processing-level factors, suggesting that different levels of linguistic processing, either phonological or orthographic, did not significantly modulate activation level within either ROI.

# **DISCUSSION**

The current study took advantage of the unique characteristics of the Chinese writing system to examine the functional properties of the VWFA. Current findings showed that there were two regions in the left ventral occipitotemporal cortex showing selective activation for Chinese characters. One region was the LOTS and the other was the LITG. However, they responded differently to inputs depending on modality. The LOTS responded exclusively to visual inputs, while the LITG showed comparable responses to both visual and auditory inputs. Accordingly, the LOTS may serve as a modality-specific region and can be regarded as the VWFA for Chinese reading, while the LITG may serve as a multimodal region analogous to the LIMA/BTLA.

This activation pattern for Chinese processing coincides with findings from previous studies on alphabetic languages (Jobard et al., 2003, 2007, Cohen et al., 2004). However, the loci of the modality-specific and multimodal regions were anatomically distinct from the locations reported in previous studies on alphabetic languages. In the current study, LOTS activation (Talairach Coordinate, TC -32, -47, -9) was slightly mesial and superior to the VWFA identified in previous studies on alphabetic languages (Cohen et al., 2000, 2002, TC -42, -57, -15; Jobard et al., 2007, TC -48, -56, -12). The coordinates of the multimodal region (LITG, TC -45, -54, -1) was slightly mesial and superior to the corresponding region found in previous studies on alphabetic languages (Cohen et al., 2004, TC -58, -56, -8; Jobard et al., 2007, TC -50, -44, -10).

Even among studies on Chinese language processing, the location of the VWFA has been inconsistent. Three recent studies employing the same localizer technique as used in the current study reported VWFA locations relatively close to that reported here (TC -38, -49, -12 in Ma et al., 2011; TC -45.4, -51.5, -9.1 in Bai et al., 2011; TC -43.8, -55.6, -8.8 in Xu et al., 2012). In contrast, a meta-analysis study concluded that the coordinates of the VWFA for Chinese characters deviated less than 5 mm in each dimension compared to that for English words, suggesting a consistent localization of VWFA across writing systems (Bolger et al., 2005). Moreover, other studies have also localized the VWFA for Chinese characters closer to that for alphabetic languages (Xue et al., 2006; Liu et al., 2008; Mei et al., 2010; Song et al., 2012). To our knowledge, there were no similar findings regarding to LITG multimodal region for processing of Chinese character have been reported in the literatures.

In summary, although current findings indicated that there is a functional VWFA and a lateral inferior temporal multimodal region for both alphabetic and logographic writing systems (functional reproducibility), these regions may occupy slight different regions of the cortex (i.e., no anatomical reproducibility). Deviation in VWFA locations could reflect differences in the visual features of different writing systems, the principles of form-to-sound mappings, top-down modulation, and (or) task requirements at the linguistic level. Therefore, future studies should directly compare processing of alphabetic and logographic characters in bilingual subjects to explore different organizational patterns in the left ventral temporal cortex and the underlying mechanisms (e.g., whether the VWFA loci differ due to different processing requirements).

In addition, Cohen et al. (2004) has proposed the anterior part of the superior temporal sulcus (STS) as a possible unimodal auditory area. In the current study, auditory-based tasks exclusively activated the left supermaginal gyrus and the right posterior STS, but there was no anterior STS activation in either hemisphere. However, visual and auditory tasks showed overlapping activation in the left angular gyrus, which includes the posterior part of the STS. This finding is in accord with that of Price et al. (2003). This area is generally considered as a multimodal region, responsible for integrating visual and auditory inputs (Price, 2000; Booth et al., 2002). Therefore, the proposal of "an auditory equivalent of the VWFA" by Cohen et al. (2004) requires further investigation.

Unexpectedly, the current study showed that different levels of phonological or orthographic retrieval did not influence the activation of word-specific regions (LOTS or LITG), suggesting that the VWFA may be involved in form-to-sound mapping at the syllable-level for Chinese reading. However, there are other possibilities. First, processing level may influence the inter-regional connections at the network level rather than at the individual regional level (Bitan et al., 2005; Deng et al., 2012). However, how task requirements modulate intra-regional activation is still unclear. A recent study demonstrated that task requirements modulated the activation intensity of the VWFA (Wang et al., 2011). In contrast, it has also been reported that the spatial profile of response selectivity in the left inferior temporal cortex is not modulated by attentional levels (Xu et al., 2012) or task requirements (Ma et al., 2011). Second, the difficulty of the current tasks may have influenced the result. On one hand, task difficulty varied across the four tasks as evidenced by differences in accuracy and RT (with the La task being the most difficult). However, it is uncertain if task difficulty affects local activation of language-related regions. In several studies, increased difficulty of a reading task did not increase activation of language-related areas but rather activated additional regions associated with attention, memory, and executive function (Gur et al., 1988; Paus et al., 1998; Drager et al., 2004). On the other hand, a difficult task *per se* may change the subjects' strategies for performing the task (Huber, 1985). Therefore, it is difficult to exclude the possibility that the current tasks, especially the auditory ones, may be completed without substantial orthographic processing. As a result, modulation of task requirements may have a major influence on additional aspects of cognitive processing (e.g., discrimination, working memory) dependent on other brain regions, rather than on orthographic analysis. Additional experiments are needed to explore how psycholinguistic variations, especially within the same domain (e.g., phonological retrieval), influence spatial representation and response specialization of the left occipitotemporal cortex for language processing.

# **ACKNOWLEDGMENTS**

This research was supported by a grant from Chinese Natural Science Foundation (31271193) to Yuan Deng.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Human\_Neuroscience/ 10.3389/fnhum.2013.00619/abstract

# **REFERENCES**


difficulty on regional cerebral blood flow: relationships with anxiety and performance. *Psychophysiology* 25, 392–399. doi: 10.1111/j.1469- 8986.1988.tb01874.x


divided attention paradigm," in *Advances in Brain Inspired Cognitive Systems.* Lecture Notes in Computer Science, Vol. 7888, (Heidelberg; Dordrecht; London; New York, NY: Springer), 30–37. doi: 10.1007/978- 3-642-38786-9\_4


*Brain 3-Dimensional Proportional System: an Approach to Cerebral Imaging*. New York, NY: Stuttgart.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 08 September 2013; published online: 27 September 2013.*

*Citation: Deng Y, Wu Q and Weng X (2013) Unimodal and multimodal regions for logographic language processing in left ventral occipitotemporal cortex. Front. Hum. Neurosci. 7:619. doi: 10.3389/fnhum.2013.00619*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Deng, Wu and Weng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Electrophysiological correlates of morphological processing in Chinese compound word recognition

#### *Yingchun Du1, Weiping Hu2, Zhuo Fang3 and John X. Zhang4 \**

*<sup>1</sup> Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China*

*<sup>2</sup> MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi'an, China*

*<sup>3</sup> Department of Psychology, Sun Yat-Sen University, Guangzhou, China*

*<sup>4</sup> Department of Psychology, Fudan University, Shanghai, China*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Anthony T. Herdman, University of British Columbia, Canada Kana Okano, Massachusetts Institute of Technology, USA*

#### *\*Correspondence:*

*John X. Zhang, Department of Psychology, Fudan University, Shanghai 200433, China e-mail: zhangxuexinjohn@gmail.com* The present study investigated the electrophysiological correlates of morphological processing in Chinese compound word reading using a delayed repetition priming paradigm. Participants were asked to passively view lists of two-character compound words containing prime-target pairs separated by a few items. In a *Whole Word* repetition condition, the prime and target were the same real words (e.g., *, manager-manager*). In a *Constituent* repetition condition, the prime and target were swapped in terms of their constituent position (e.g., , the former is a pseudo-word and the later means *nurse*). Two ERP components including N200 and N400 showed repetition effects. The N200 showed a negative shift upon repetition in the *Whole Word* condition but this effect was delayed for the *Constituent* condition. The N400 showed comparable amplitude reduction across the two priming conditions. The results reveal different aspects of morphological processing with an early stage associated with N200 and a late stage with N400. There was also a possibility that the N200 effect reflect general cognitive processing, i.e., the detection of low-probability stimuli.

**Keywords: morphological processing, compound word, delayed repetition, morpheme, Chinese**

# **INTRODUCTION**

Understanding to the nature of morphological processing is critical to the investigation of how written words are represented in the brain. Over the past three decades, morphological processing has been central to the study of mental lexicon (Libben and Jarema, 2004). One key issue about morphological processing is whether morphemically complex words are represented in mental lexicon as a whole, or by their constituent morphemes (Chen and Chen, 2006). In the early literature, a strong view referred to as the full-listing models proposed that complex words including compound words are processed as a single unit with no reference to its constituents (e.g., Butterworth, 1983; Bybee, 1995). However, much evidence has accumulated supporting a decompositional view where complex words are processed by access to and combination of their constituent morpheme representations (e.g., Taft and Forster, 1975; Libben et al., 1999; McKinnon et al., 2003). As a compromise, many recent models on morphological processing combine features from both models (Caramazza et al., 1988; Schreuder and Baayen, 1995; Baayen et al., 1997; Isel et al., 2003). There is also evidence that interplays between factors such as frequency (Alegre and Gordon, 1999), morphological type (Miceli and Caramazza, 1988), and language background of a speaker (Portin et al., 2008) determine whether a multimorphemic word is stored and recognized as a full form or via decomposition during lexical-semantic processing.

As one major way to produce morphologically complex words, compounding has been widely attested in many languages. It provides a unique opportunity for assessing the combinatorial mechanisms inherent to language production and comprehension (Badecker, 2007). It is particularly important for some languages such as Chinese where more than 70% of the vocabulary is made up of multiple-character compound words. Compared with the other two types of morphological complex words, i.e., inflectional words (e.g., *depart—departing*) and derivational words (e.g., *agree-agreement*), which are prevalent in most alphabetic languages, there has been relatively less research in compound word recognition.

Morphological processing is clearly important for both inflectional and derivational words, especially irregular words. However, whether this is also the case for compound words is less clear, as morphemes in compound words are relatively separable and easy to identify (Chung et al., 2010). On the one hand, morphological decomposition seems to offer a very effective route for compound word comprehension. On the other hand, compounds are sensitive to semantic drift and thus frequently show high degrees of semantic opacity that would thwart a routine morphological decomposition (Libben, 1998).

Early evidence for morphological decomposition of compound words mainly came from behavioral studies showing that constituent frequencies affected lexical decision times of compound words (e.g., Zhang and Peng, 1992; Zhou and Marslen-Wilson, 1994), or from the examination of morphological priming (Zhang and Peng, 1992; Liu and Peng, 1997). Nowadays, electrophysiological techniques with high temporal resolution have been used to investigate the neural dynamics of morphological processing. Due to differences in word types and tasks in individual studies, the evidence about the neural correlates of morphological processing is far from consistent. For example, with auditory stimuli, Koester et al. (2004)found gender violations of initial constituents resulted in a left-anterior negativity (LAN) for both opaque and transparent compounds, which was interpreted as an index of morphosyntactic decomposition. Vergara-Martínez et al. (2009) revealed that high-frequency first constituents elicited larger negativities starting in the 100–300 ms time window, while low-frequency second constituents elicited larger N400 amplitudes than high-frequency second constituents. Chiarelli et al. (2007) found larger LAN and N400 for compound than non-compound words.

In a series of experiments using an immediate repetition paradigm, we found that both full ( *, plan—plan*) and partial morphological ( *, honor—nabobism*) overlap between two-constituent prime-target pairs (both being real words) elicited an enhanced N200 response and a reduced N400 response (Zhang et al., 2012; Jia et al., 2013). Further, the N200 enhancement effect was larger when the prime and target words were fully overlapped compared with partially overlapped.

Although these results indicate that both N200 and N400 are modulated by morphological similarity, it is unclear whether they are truly the electrophysiological correlates of morphological processing. This is because words sharing morphological elements are usually also related in form, phonology, and meaning resulting in priming effect in N200 and N400. In addition, participants may adopt specific response strategies in the immediate repetition paradigm producing priming effects that would otherwise be absent in normal reading (Bozic et al., 2007).

To avoid these problems, the present study turned to a delayed repetition priming paradigm, which has been well-established to be a powerful tool to investigate the nature of internal stimulus representation (see Henson and Rugg, 2003; Schacter et al., 2007 for reviews). In a delayed repetition task, members of a word pair are separated by varying numbers of intervening words, which reduces effects of task strategies. Previous studies using this task have shown that morphological effects are preserved over long lags, whereas semantic and form effects drop away sharply as the number of intervening items increases (Napps, 1989; Bentin and Feldman, 1990; Zwitserlood et al., 2002). More importantly, an implicit task was used to minimize strategic processes due to task demands and to probe reading processes in a more natural way (Vartiainen et al., 2009).

Based on our previous studies (Zhang et al., 2012; Jia et al., 2013), priming effects in the N200 and N400 components would be expected. While N400 has been extensively studied and used as an effective dependent variable for examining many aspect of language processing (for review, see Kutas and Federmeier, 2011). In comparison, the functional significance of N200 is far from clear. The N200, also called N2 in some literature, refers to the second negative wave peaking between 200 and 350 ms after stimulus onset. It is suggested that the N200 elicited by visual stimuli should be divided into at least three subcomponents: a frontocentral (anterior) component related to the detection of novelty or mismatch from a perceptual template when the eliciting stimuli are attended, a second fronto-central component related to cognitive control (encompassing response inhibition, response conflict, and error monitoring), and one or two posterior N2s related to some aspects of visual attention (for review, see Folstein and Van Petten, 2008). The specific pattern of priming effect may help to reveal whether and how these N2 components are related to the N200 identified in our previous word recognition studies (Zhang et al., 2012; Jia et al., 2013).

#### **METHODS**

#### **PARTICIPANTS**

Twenty college students (10 males, age range from 19 to 30 years, mean ±*SD* = 23*.*3 ± 3*.*0 years) participated in this experiment with monetary compensation. All were right-handed native speakers of Mandarin Chinese with normal or corrected-tonormal vision. Their handedness scores were between 54 and 100, which were assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). None of them reported any neurological or psychiatric diseases. Informed consent was obtained in accordance with guidelines from the Institute of Psychology, Chinese Academy of Sciences, Beijing, China.

#### **STIMULI**

There were two experimental conditions both involving presenting a prime-target word pair separated by several intervening items (**Figure 1**). In the *Whole Word* repetition condition, the

prime and target were the same real word. In the *Constituent* repetition condition, the prime and target were swapped in terms of their constituent position. In modern Chinese, a compound word is always read from left to right and swapping the characters in position for a two-character word would usually produce a meaningless pseudo-word without pre-existing linguistic representations as a whole [except for a small set of reversible words as studied in Zhang et al. (2004)]. For example, after the swap, the word meaning *nurse* became which was not a real word anymore. Also, the pseudo-words were not homophonic to any real word. So, for the *Constituent* repetition condition, the prime was a pseudo-word while the target was a real word but they shared the same two constituents. It should be noticed that the pseudo-words are unfamiliar novel items and novelty may confound with the repetition effect in the *Constituent* condition.

Character combinability was controlled across the two priming conditions as each critical word was assigned to the *Whole Word* condition for half of the participants but to the *Constitute* condition for the other half. The chance of each character appearing in the first or second position of a compound word varies for different characters. However, for high frequency characters as used in the present study, it is mostly the case that they could appear in both the first and the second positions. Therefore, it is very hard or almost impossible for participants to make use of the frequency information to infer whether the word was a real word or not. A pseudo-word was very much like a real word except its specific character combination as a whole unit was not associated with any pre-existing linguistic representations.

The critical stimuli contained a list of 144 two-character words selected from an online corpus based on a research project of Middle Tennessee State University (http://lingua*.*mtsu*.*edu/ chinese-computing/introduction*.*html). The mean stroke number of the words (sum of the two characters) was 15.3 (*SD* = 3*.*9), and the mean word frequency was 815 occurrence per million (*SD* = 783, frequency range = 55–3476, median = 598), respectively. To reduce explicit attention to stimulus repetitions of the critical words, there were 144 two-character filler items including 60 real words and 84 pseudo-words. The pseudo-words consisted of two Chinese characters matching the critical words in visual complexity (i.e., stroke number). Each pseudo-word as a whole unit was neither a real word nor homophonic to any real word. There were also 60 two-character real words serving as the target items. All the real filler words and the target words were selected from the same online corpus matching the critical words in stimulus characteristics wherever applicable.

#### **PROCEDURE**

Participants were seated in a dimly lit and sound-attenuated room. All visual stimuli were presented on a computer monitor that was about 1 m away from participants' eyes. All word stimuli were displayed at high contrast as black words on a white background, subtending visual angles of 4*.*3 × 2*.*3◦. Participants were instructed to remain relaxed and to refrain from moving throughout the experiment.

Each participant completed 6 blocks, each with 82 trials. The first block served as practice and was not analyzed. As shown in **Figure 1**, each trial started with a fixation at the screen center for a period jittered between 1700 and 1900 ms. The fixation was then replaced by the central presentation of a visual item that was turned off 800 ms later, followed immediately by the fixation of the next trial. The item could be a critical word, a filler or a target item.

In each block, there were 12 critical words used for *Whole Word* condition, 12 for *Constituent* condition, 10 real filler words, 14 filler pseudo-words, and 10 target words. The different types of items were pseudo-randomly intermixed with the following constraints. Each critical item was presented in two trials separated on average by 4 intervening trials (ranging from 1 to 7 trials or 5.1 to 20.4 s).

Participants responded to the target items by pressing a button as quickly as possible with their right index finger. For non-target items, they should refrain from making any response. The detection task was only to ensure attention to the critical non-target items for which no response was required to minimize motorrelated contaminations on the ERPs. Such tasks have been widely used in alphabetic reading studies. For example, in Price et al. (1996) and Jamal et al. (2011) study, participants were instructed to perform a non-linguistic visual feature detection task, i.e., to detect the presence or absence of ascenders within a stimulus.

#### **ELECTROENCEPHALOGRAM RECORDING AND DATA ANALYSIS**

The electroencephalogram (EEG) was recorded from the scalp through 64 non-polarizable Ag/AgCl sintered electrodes in a pre-configured cap. The position of electrodes followed the extended 10–20 convention. The EEG was continuously sampled at a rate of 1000 Hz and bandpass filtered (0.05–100 Hz) using the Neuroscan EEG system (NeuroScan Inc., Compumedics, Australia). Electrode impedance was maintained below 5 k*-*. In addition to the scalp sites, the horizontal EOG was recorded at the outer canthi of both eyes and the vertical EOG was recorded between supraorbit and suborbit of the left eye. The left mastoid was used as the recording reference. Reference was changed offline to the average of the two mastoids.

Eye movement artifacts were removed using regression-based weighting coefficients (Semlitsch et al., 1986). This method subtracted a fraction of an EOG from the EEG channels on a sweepby-sweep, point-by-point basis. EEG segments were abstracted from 150 ms before stimuli onset to 850 ms post stimuli onset. The 150 ms pre-stimuli period was used as the baseline. The segments were baseline corrected and bandpass filtered (0.5–30 Hz). Segments with amplitude exceeding ±50µv in any scalp channel were excluded from analysis (less than 2% of trials were rejected). Averaged ERPs were computed separately for the first and second time of presentation of the critical items in the two repetition conditions. The filler items and the targets were not analyzed.

#### **RESULTS**

Data from two participants were excluded with exceedingly low target detection accuracy (50 and 62%). For the remaining 18 participants, target detection was both fast and highly accurate with a mean response time of 561 ms (*SD* = 42) and a mean accuracy of 92.8% (*SD* = 7*.*0). Mean false alarm rate was below 1.5% for all types of non-target items with a mean of 0.5% (*SD* = 1*.*2). These behavioral results indicate that the included 18 participants followed the instructions and were attentive to the critical items in the detection task. The grand-average waveforms (based on 18 participants) for all experimental conditions are plotted in **Figure 2** for 15 representative electrodes. Waveforms at two electrodes Cz and CPz are highlighted in **Figure 3** for clarity.

As shown in the figures, both repetition conditions elicited similar N1-P2 complexes regardless of whether the stimuli were in their first presentation or second presentation. A negative deflection was elicited between 200 and 230 ms, with similar latency and distribution as the N200 component reported in our previous studies (Zhang et al., 2012; Jia et al., 2013). Following the N200, there was a positive shift peaking around 260 ms and a broad negativity peaking around 400 ms (N400).

To further characterize the 260 ms positive shift, difference waves for the two experimental conditions were computed by subtracting the mean voltage for the first presentation trials from those for the second presentation trials (see **Figure 4**). The difference waves highlight the components of interest more clearly by removing variations in voltage that were common to all conditions. By visual inspection, the difference wave around the 260 ms period under the *Constituent* condition was comparable to that in the N200 interval under the *Whole Word* condition. This indicates that the 260 ms positive shift in the *Constituent* condition was likely a delayed N200 effect.

By visual inspection, the N200 effect and the delayed N200 effect were mainly distributed in the fronto-central region, and the N400 was mainly distributed in the centro-parietal region. Preliminary analysis did not reveal any significant effect of repetition or condition (*Whole Word* vs. *Constituent*) in the N1 and P2 time windows. Four-Way repeated-measures ANOVAs with Geisser-Greenhouse correction were conducted on the averaged amplitudes of the N200 and the delayed N200 effects, with experimental condition, time of presentation (first vs. second), laterality (midline, left hemisphere, and right hemisphere), and electrode position (frontal: FZ, F1/2; fronto-central: FCZ, FC1/2; central: CZ, C1/2) as factors. The same ANOVA was performed on the N400 component except that the electrode position was different (fronto-central: FCZ, FC1/2; central: CZ, C1/2; centro-parietal: CPZ, CP1/2).

The mean amplitude of N200 was computed for the 185–235 ms time window. The main effects were significant for time of presentation [*F(*1*,* <sup>17</sup>*)* = 11*.*1, *p <* 0*.*01], and for electrode position [*F(*2*,* <sup>34</sup>*)* = 5*.*9, *p <* 0*.*01]. The mean amplitude of N200 was significantly increased (more negative) from the first to the second presentation (3.2 vs. 2.7µv). The interaction between time of presentation and condition was also significant [*F(*1*,* <sup>17</sup>*)* = 7*.*7, *p <* 0*.*05]. For the *Whole Word*

Legends are the same as in **Figure 2**.

condition, compared with the first presentation, the words' second presentation elicited a more negative going N200 [3.4 vs. 2.6µv, *t(*17*)* = 5*.*1, *p <* 0*.*001]. For the *Constituent* condition, the first and second presentations elicited comparable N200 effects (3.0 vs. 2.9, *p >* 0*.*5). No other interaction was significant.

waves were computed by subtracting the mean voltage for first presentation from those for second presentation trials.

The mean amplitude of the delayed N200 effect was computed from the 235 to 285 ms time window. The main effects were significant for time of presentation [*F(*1*,* <sup>17</sup>*)* = 5*.*4, *p <* 0*.*05], and for laterality [*F(*2*,* <sup>34</sup>*)* = 3*.*3, *p <* 0*.*05]. The waveform was more positive going for the first presentation than for the second presentations (3.3 vs. 2.8µv), and was more positive going on the midline and right hemisphere than on the left hemisphere (3.1 vs. 2.9µv, 3.2 vs. 2.9µv). The interaction between time of presentation and electrode position was significant [*F(*2*,* <sup>34</sup>*)* = 7*.*6, *p <* 0*.*01].

In addition, to help visualize the distribution of the repetition effect, topographical voltage maps were constructed based on the mean amplitudes of difference waves measured within the N200 and the delayed N200 effects time windows (see **Figure 5**). A within-subject ANOVA was conducted on the mean amplitude of the difference waves with effect type (the N200 effect vs. the delayed N200 effect) and brain area (anterior, central, and centroparietal) as factors. The results showed a significant interaction effect [*F(*2*,* <sup>34</sup>*)* = 5*.*4, *p <* 0*.*01]. The delayed N200 effect showed a more anterior distribution [*F(*2*,* <sup>34</sup>*)* = 8*.*0, *p <* 0*.*001], while the N200 effect showed no significant differences across brain regions (*p >* 0*.*1).

The mean amplitude of N400 was computed from the 330 to 450 ms time window. There were significant main effects for condition [*F(*1*,* <sup>17</sup>*)* = 7*.*8, *p <* 0*.*05], time of presentation [*F(*1*,* <sup>17</sup>*)* = 18*.*1, *p <* 0*.*001], and electrode position [*F(*2*,* <sup>34</sup>*)* = 4*.*0, *p <* 0*.*05]. The N400 was less negative going in the second presentation than in the first presentation (0.8 vs. 0.1 µv), and in the *Whole Word* condition than in the *Constituent* condition

(0.7 vs. 0.3µv). The amplitude reduction across the two presentations was larger for the *Constituent* condition (from 0.7 to −0.2µv) than for the *Whole Word* condition (from 0.9 to 0.4µv), although the difference was not significant as interaction between time of presentation and condition was not significant [*F(*1*,* <sup>17</sup>*)* = 2*.*2, *p >* 0*.*1]. The interaction between time of presentation and laterality was significant [*F(*2*,* <sup>34</sup>*)* = 6*.*2, *p <* 0*.*01], with the difference between first and second presentation at the right hemisphere being smaller than that at the midline and left hemispheres (*p*s *<* 0.05). The interaction between condition and electrode position was also significant [*F(*2*,* <sup>34</sup>*)* = 3*.*7, *p <* 0*.*05], with the difference between conditions at the fronto-central area being smaller than that at the central (*p <* 0*.*05) and centro-parietal area (although not significant, *p* = 0*.*06).

#### **DISCUSSION**

In the present study, we combined the recording of ERPs with a delayed repetition priming paradigm to investigate the neural dynamics of morphological processing in Chinese compound word reading. According to previous studies, only morphological priming can be preserved over long lags, whereas semantic, phonological, and form priming drop away sharply as the number of intervening items increases (Napps, 1989; Bentin and Feldman, 1990; Zwitserlood et al., 2002). In the current study, priming effects over long lags were observed in two components including N200 and N400, suggesting their association with morphological processing.

The earliest ERP response sensitive to the manipulation of stimulus repetition was the N200 component. In this time window, the *Whole Word* condition showed an amplitude enhancement (or a negative shift) upon repetition, this effect was delayed for the *Constituent* repetition condition. In the *Whole Word* repetition condition, all word information such as morphological, orthographic, and semantic information were repeated in the second presentation. In comparison, in the *Constituent* repetition condition, only the orthographic and semantic information of the constituent characters were repeated since the primes were not real words. The time difference between the two repetition conditions may indicate facilitated lexical processing at the whole word level as opposed to the constituent level.

One major difference between the N200 effect in the present study and those electrophysiological correlates of morphological processing in earlier literature is the relative early latency of N200. In previous EEG and MEG data on Finnish, English, German, Spanish, and Catalan languages (Penke et al., 1997; Rodriguez-Fornells et al., 2001; Domínguez et al., 2004; Fiorentino and Poeppel, 2007; Vartiainen et al., 2009), no effect of morphology within the first 200 ms following word onset was found. The reason for lacking an early morphology effect in alphabetic language studies may be because their stimuli were usually inflectional or derivational words rather than compound words. There has been evidence that different types of morphologically complex words involve different processing mechanisms. For example, compound words were processed more quickly than matched monomorphemic words (Ji et al., 2011), whereas inflected words were harder to recognize than matched morphologically simple words (Vartiainen et al., 2009).

Another major difference between the present N200 effect and the usually reported repetition effect is that the N200 here showed repetition enhancement instead of repetition suppression. Most studies adopting the neural priming paradigms showed reduced neural responses upon repetition, referred to as neural suppression (e.g., Gauthier, 2000). Furthermore, repetition suppression is generally considered as an effect of stimulus repetition *per se*, occurring independent of other psychological or neurophysiological variables. In contrast, cognitive variables including stimulus recognition, learning and explicit memory can bias repetition effects in BOLD response toward enhancement instead of suppression (for review, see Segaert et al., 2013).

In the electrophysiological literature, the N2 component has been tightly related with the brain function of cognitive control. Studies using the oddball paradigm demonstrated that when targets (go) and non-targets (no-go) were presented with equal probabilities, no-go trials would elicit a larger N2 at frontal scalp site (Ford et al., 1979; Kok, 1986). Czigler et al. (1996) extended this result by showing that low-probability stimuli elicited a larger N2 and the probability effect on the N2 was much larger in the no-go trial than the go trial at the fronto-central scalp site. Bruin and Wijers (2002) further showed that as the probability of no-go stimuli decreased, the N2 elicited by these events increased in amplitude. In the present study, repeated word stimuli (the second presentation) were low-probability non-targets. The enhanced N200 effect elicited by these repeated words may possibly be identified with the larger no-go N2 effect in the above-mentioned oddball paradigm. If this is the case, the N200 may reflect a more general cognitive mechanism as opposed to specific morphological processing. According to the topographical voltage maps, the delayed N200 effect showed a more anterior distribution compared with the N200 effect, possibly reflecting some processes related to novelty detection as the pseudo-words in the *Constituent* condition were unfamiliar items or novel to some degree.

Although pseudo-words are not associated with pre-existing semantic representations, word-like pseudo-words may partially activate the semantic representations of their real-word lexical neighbors (Holcomb et al., 2002). Therefore, the pseudo-word primes in the *Constituent* condition may activate phonologically/orthographically related words and consequently the semantic network, producing the repetition priming effect in N400. Statistical analysis of N400 showed that the priming effect size was comparable across the *Constituent* repetition condition and the *Whole Word* repetition condition, suggesting that the N400 effects reflected primarily morpheme-related processing. Note that the *Constituent* condition seems to involve more extensive semantic processing as shown by the prolonged N400 in this condition, compared with the *Whole Word* condition. In the literature, N400 has been shown to be modulated by morphologically related pairs and the effect size (amount of amplitude reduction) is dependent on the strength of the relation (Domínguez et al., 2004; Lavric et al., 2007). The present finding that N400 is associated with morphological processing is consistent with such previous results.

One more general question about morphology is whether it is a discrete and independent element of lexical structure or it simply emerges from the convergence of form and meaning. In an fMRI study of morphological processing in English, brain regions sensitive to morphological structure overlapped almost entirely with regions sensitive to orthographic (left occipito-temporal cortex) and semantic relatedness (left middle temporal gyrus), suggesting that morphology emerges from the convergence of form and meaning (Devlin et al., 2004). Therefore, morphology may not be limited to morphemes as "minimal meaning bearing units" and morphological structure may also exist within the orthographic level of representation (Longtin et al., 2003; Rastle et al., 2004). In light of this view, the N200 effect likely reflects morphological processing at the orthographic level, given that it occurs early and it can be modulated by orthographic overlap (Zhang et al., 2012).

# **CONCLUSION**

In this study, the electrophysiological correlates of morphological processing in Chinese compound reading were investigated with a delayed repetition priming paradigm. Two ERP components including N200 and N400 showed repetition effects that survived across long lags and were interpreted to index morphological processing. There is a possibility that the N200 and its delayed effects

# **REFERENCES**


and Semenza, C. (2007). The electrophysiological correlates of Noun-Noun compounds. *Brain Lang.* 103, 248–249. doi: 10.1016/j.bandl.2007.07.010


reflect general cognitive processing (i.e., low-probability stimulus detection), which needs further testing by manipulating stimulus probability.

#### **ACKNOWLEDGMENTS**

This research was supported by a grant from National Nature Science Foundation of China (NSFC #31100815).

*Neuropsychologia* 41, 263–270. doi: 10.1016/S0028-3932(02)00159-8


in cortical activity during priming. *Curr. Opin. Neurobiol.* 17, 171–176. doi: 10.1016/j.conb.2007.02.001


interplay with naming pictures? *Brain Lang.* 81, 358–367. doi: 10.1006/brln.2001.2530

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2013; accepted: 04 September 2013; published online: 24 September 2013.*

*Citation: Du Y, Hu W, Fang Z and Zhang JX (2013) Electrophysiological correlates of morphological processing in Chinese compound word recognition. Front. Hum. Neurosci. 7:601. doi: 10.3389/fnhum.2013.00601*

*This article was submitted to the journal Frontiers in Human Neuroscience. Copyright © 2013 Du, Hu, Fang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these*

*terms.*

# The time course of reading processes in children with and without dyslexia: an ERP study

# *Sandra Hasko\*, Katarina Groth , Jennifer Bruder , Jürgen Bartling and Gerd Schulte-Körne*

*Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital Munich, Munich, Germany*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Su Li, Chinese Academy of Sciences, China Susana M. Araujo, University of Algarve, Portugal*

#### *\*Correspondence:*

*Sandra Hasko, Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital Munich, Pettenkoferstr. 8a, 80336 Munich, Germany e-mail: sandra.hasko@ med.uni-muenchen.de*

The main diagnostic criterion for developmental dyslexia (DD) in transparent orthographies is a remarkable reading speed deficit, which is often accompanied by spelling difficulties. These deficits have been traced back to both deficits in orthographic and phonological processing. For a better understanding of the reading speed deficit in DD it is necessary to clarify which processing steps are degraded in children with DD during reading. In order to address this question the present study used EEG to investigate three reading related ERPs: the N170, N400 and LPC. Twenty-nine children without DD and 52 children with DD performed a phonological lexical decision (PLD)—task, which tapped both orthographic and phonological processing. Children were presented with words, pseudohomophones, pseudowords and false fonts and had to decide whether the presented stimulus sounded like an existing German word or not. Compared to control children, children with DD showed deficits in all the investigated ERPs. Firstly, a diminished mean area under the curve for the word material-false font contrasts in the time window of the N170 was observed, indicating a reduced degree of print sensitivity; secondly, N400 amplitudes, as suggested to reflect the access to the orthographic lexicon and grapheme-phoneme conversion, were attenuated; and lastly, phonological access as indexed by the LPC was degraded in children with DD. Processing differences dependent on the linguistic material in children without DD were observed only in the LPC, suggesting that similar reading processes were adopted independent of orthographic familiarity. The results of this study suggest that effective treatment should include both orthographic and phonological training. Furthermore, more longitudinal studies utilizing the same task and stimuli are needed to clarify how these processing steps and their time course change during reading development.

**Keywords: developmental dyslexia, phonological lexical decisions, orthography, phonology, dual route model of reading, N170, N400, LPC**

#### **INTRODUCTION**

Reading and writing are fundamental skills for daily life that allow us to integrate properly into a community and they are crucial for acquiring knowledge and transmitting information. Although reading and spelling require highly complex processes (Massaro and Cohen, 1994), most children acquire these skills without any serious problems. However, despite adequate teaching some children fail to develop age appropriate reading and spelling skills. These children suffer from developmental dyslexia (DD), which is one of the most common specific developmental disorders affecting around 4–9% of school-aged children (Shaywitz et al., 1990; Katusic et al., 2001; Esser et al., 2002). DD is characterized by severe problems in learning to read properly and is often accompanied by a comorbid spelling disorder. These difficulties are not the direct result of below-average general intelligence, inadequate schooling and neurological or sensory deficits (Dilling, 2006). DD accompanies the individuals throughout their lifespan and interferes with academic achievement, professional success and mental health (Esser et al., 2002).

Efforts to pinpoint the underlying mechanisms of DD have resulted in a substantial body of evidence that points toward a phonological core deficit (Snowling, 2001; Ramus et al., 2003; Vellutino et al., 2004). According to the phonological deficit hypothesis it is assumed that subjects with DD have difficulties in applying grapheme-phoneme correspondence rules due to an underspecification of phonological representations, an impaired access to these phonological representations (Ramus and Szenkovits, 2008) or a deficient association of letters and speech sounds (Blau et al., 2010; Froyen et al., 2011; for review see Blomert, 2011).

Orthographic consistency of a language influences the nature of reading difficulties. DD in regular orthographies, such as German, is mainly characterized by a remarkable reading speed deficit or rather an impaired acquisition of automatic reading (Wimmer, 1993, 1996; Landerl et al., 1997; Landerl, 2001; Bergmann and Wimmer, 2008; for review see Wimmer and Schurz, 2010) as well as faulty spelling (Klicpera and Gasteiger-Klicpera, 1998; Schulte-Körne, 2002). Spelling difficulties in transparent orthographies point to an orthographic core deficit (e.g., Bergmann and Wimmer, 2008; Bekebrede et al., 2009; van der Mark et al., 2009). A growing body of evidence suggests that subjects with DD are marked by poorer and less specified orthographic representations and delayed or impaired access to available orthographic representations (Bergmann and Wimmer, 2008; Bekebrede et al., 2009; van der Mark et al., 2009; Marinus and de Jong, 2010).

The phonological lexical decision (PLD)—task seems especially appropriate to investigate orthographic and phonological processing during reading. In the PLD—task, used in the present study, subjects are presented with real words (W), pseudohomophones (PH), pseudowords (PW) and false fonts (FF) and indicate whether the visually presented stimulus sounds like a real word or not (Kronbichler et al., 2007; van der Mark et al., 2009, 2011; Schurz et al., 2010; Wimmer et al., 2010). The PLD—task taps orthographic processing (i.e., the processing of orthographic material) on two levels. Firstly, by comparing the letter string material (W; PH; PW) to the visual control stimuli (FF) print sensitivity will be examined. Secondly, the contrast between orthographic familiar (W) and unfamiliar (PH; PW) word material provides information about the subjects' familiarity with orthographic representations. Furthermore, according to dual route models of reading (e.g., Coltheart et al., 1993, 2001) contrasting of unfamiliar (PH; PW) with familiar (W) word material also taps phonological processing because grapheme-phoneme correspondence rules need to be applied in order to sound out the orthographic unfamiliar word material. Because PH and PW were derived from real W it is possible that they were read by mapping larger units, such as bigrams and trigrams to phonology. However, the reading process remains sublexical.

The PLD—task has been employed in a number of fMRI studies (Kronbichler et al., 2007; Bruno et al., 2008; van der Mark et al., 2009, 2011; Wimmer et al., 2010). In subjects with DD, results point to a reduced print sensitivity as indicated by a lack of higher activity for linguistic material (W; PH; PW) in contrast to FF. And results also indicate an absence of orthographic familiarity as indexed by a lack of decreased activation for orthographic familiar (W) in contrast to orthographic unfamiliar (PH; PW) letter strings in the visual word form area (VWFA; van der Mark et al., 2009; Wimmer et al., 2010). Furthermore, results indicate deficits in phonological processing as suggested by a hemodynamic hypoactivation in response to PH and PW compared to subjects without DD in the left inferior frontal gyrus (Wimmer et al., 2010). On the behavioral level prolonged reaction times for W, PH, and PW were found in subjects with DD (Bergmann and Wimmer, 2008; van der Mark et al., 2009, 2011; Wimmer et al., 2010). Although reaction times were prolonged, the response pattern (W *<* PH *<* PW) was similar to control subjects suggesting that subjects with DD relied on comparable reading processes. Thus, these findings seem to highlight an impairment in the speed of access to orthographic and phonological representations in DD (Bergmann and Wimmer, 2008; van der Mark et al., 2009, 2011; Wimmer et al., 2010).

Keeping the reading speed deficit as the main diagnostic criterion for DD in transparent orthographies in mind, it is necessary to understand how the temporal course during reading might differ in DD, thus clarifying whether any steps in the reading process are degraded in children with DD. Identifying impaired processing steps as well as their dependencies during the time course of reading processes is essential for effective intervention as this knowledge might help to derive implications for choosing appropriate treatment methods. Due to the high temporal resolution providing a real-time measure of neural processes event-relatedpotentials (ERPs) are adapted to disentangle single processing steps. The aim of the present study was to investigate the time course of orthographic and phonological processing in order to provide a temporal model of reading processes in normal developing children and to further identify whether any steps in the reading process are degraded in children with DD. In order to cover different processes which are associated with reading we decided to investigate three reading related ERPs using the PLD—task: the N170, N400, and LPC.

The N170 is the first ERP component thought to reflect orthographic processes (e.g., Bentin et al., 1999; Maurer et al., 2005a,b). It is recorded over left occipito-temporal brain regions and peaks around 170 ms after stimulus onset in skilled adult readers (Bentin et al., 1999; Maurer et al., 2005a,b). The N170 distinguishes letter strings from low-level visual control stimuli (e.g., symbol strings: Tarkiainen et al., 1999; Maurer et al., 2005a,b; forms: Bentin et al., 1999; alphanumeric symbols: Bentin et al., 1999; shapes: Eulitz et al., 2000 and dots: Eulitz et al., 2000). Amplitudes were higher for letter strings, thus implicating that the left lateralized N170 is sensitive to print. Whether the N170 is sensitive to familiar orthographic material is not clear. Some studies described larger amplitudes in response to consonant strings (McCandliss et al., 1997) and pseudowords (Compton et al., 1991) compared to familiar words, as well as larger amplitudes for low frequency words compared to high frequency words (Sereno et al., 1998; Hauk and Pulvermüller, 2004). However, some research did not report amplitude differences between words, pseudowords and consonant strings (Nobre et al., 1994; Salmelin et al., 1996; Bentin et al., 1999; Cornelissen et al., 2003; Maurer et al., 2005b). Varying task requirements might lead to the contrasting results (Maurer and McCandliss, 2008).

The print sensitivity of the N170 develops together with reading acquisition, as children learn to integrate orthographic and phonological information of words. In preschool children N170 amplitudes do not differ between words and symbol strings (Maurer et al., 2005b, 2006). At the end of second grade, however, peak amplitudes are higher for words compared to symbol strings. Furthermore, in contrast to adults where a left lateralization is observed, in children the N170 is symmetrically distributed over occipito-temporal regions (Maurer et al., 2006) with a delay of 50 ms (Maurer et al., 2005b, 2006, 2007; Brem et al., 2009).

N170 amplitudes were found to be reduced in 8-year-old second graders with DD (Maurer et al., 2007), but not in fifth grade children with DD (Maurer et al., 2011; Hasko et al., 2012), suggesting that reduced print sensitivity plays a role especially in the early stage of reading acquisition and neurophysiological deficits related to DD change during development (Maurer et al., 2011). These results point to a delayed specialization for processing letter strings in DD. However, there is also evidence that print sensitivity is still reduced in pre-adolescents (age 9– 13, mean age 10.7; Araújo et al., 2012) and adults (Helenius et al., 1999; Mahé et al., 2012) with DD, thus contradicting the hypothesis of a delayed specialization for processing letter strings in DD. Interestingly studies reporting on N170 impairments in adults with DD (Helenius et al., 1999; Mahé et al., 2012) included subjects with more severe reading deficits (at least two standard deviations below the mean) compared to studies which did not report on N170 impairments (Maurer et al., 2011; Hasko et al., 2012). This suggests that the N170 impairment might be also influenced by the degree of reading and spelling impairments (Mahé et al., 2012).

The N400 is recorded over centro-parietal electrodes during written and spoken language processing (Deacon et al., 2004; for review see Lau et al., 2008; Kutas and Federmeier, 2011). This component was investigated in a large number of studies employing different tasks. It was found to be elicited by semantic incongruity (e.g., Kutas and Hillyard, 1980; Brandeis et al., 1994; Schulz et al., 2008), orthographic and phonological manipulations (e.g., Rugg and Barrett, 1987; Praamstra and Stegeman, 1993; Dumay et al., 2001; Bonte and Blomert, 2004; Rüsseler et al., 2007) as well as by orthographically and phonologically legal pseudowords, which do not possess an entry in the mental lexicon (Holcomb and Neville, 1990; Doyle et al., 1996; Deacon et al., 2004; for review see Kutas and Federmeier, 2011). As being sensitive to all of these properties it is still unclear whether the N400 might reflect lexical or post-lexical processing or even both. N400 effects have been reported in children as young as 12 months (Friedrich and Friederici, 2010) and N400 amplitudes and latencies decrease across development (Holcomb et al., 1992; Juottonen et al., 1996; Hahne et al., 2004; Atchley et al., 2006). In visual lexical decision tasks N400 amplitudes were found to be smaller to orthographic familiar compared to orthographic unfamiliar word forms in adults (e.g., Braun et al., 2006; Briesemeister et al., 2009). Therefore, it could be interpreted that the N400 amplitudes elicited in visual lexical decision tasks reflect lexical processing, rather than post-lexical processing, because in the latter case one would have expected comparable N400 amplitudes for W and PH, which share phonology and meaning. In 7-year-old children the N400 amplitude was not modulated by orthographic familiarity (Coch and Holcomb, 2003).

With respect to DD results regarding N400 effects are rather inconsistent. In a variety of studies reduced N400 amplitudes are reported across different language tasks in children (visual rhyme matching task: Ackerman et al., 1994; reading of correct and incorrect sentence endings: Brandeis et al., 1994; Schulz et al., 2008) and adults with DD (visual semantic, rhyme, and definite article judgment task: Rüsseler et al., 2007; visual word recognition task: Johannes et al., 1995) in contrast to control subjects. Other authors, however, did not confirm the abnormal N400 activation in children (listening to sentences with semantic violation: Sabisch et al., 2006; word categorization task: Silva-Pereyra et al., 2003; auditory lexical decision task: Bonte and Blomert, 2004) and adults with DD (word recognition task: Rüsseler et al., 2003). Neville et al. (1993) found even higher N400 amplitudes during reading incongruent sentence endings in 8- to 10-year-old children with DD and language impairments, suggesting maturational changes during development influencing the N400. Study inconsistencies in N400 response could be contributed to a number of factors including task and stimulus type, presentation modality, severity of reading impairment and age.

The N400 is followed by a late positive complex (LPC), which occurs in a time window between 500 and 800 ms and is distributed over the left centro-parietal scalp in adults (Friedman and Johnson, 2000; Finnigan et al., 2002; Rüsseler et al., 2003; Yonelinas et al., 2005; Balass et al., 2010; for review see: Rugg and Curran, 2007), adolescents (Schulte-Körne et al., 2004) and children (Hepworth et al., 2001; van Strien et al., 2009). The LPC might be involved in word recognition memory as LPC amplitudes are higher to correctly recognized old words compared to new words (for review see Rugg and Curran, 2007). This effect is not dependent on intentional retrieval (Curran, 1999).

LPC amplitudes were reduced in adolescents (Schulte-Körne et al., 2004) and adults with DD (Rüsseler et al., 2003) or low reading skills (Perfetti et al., 2005; Balass et al., 2010). For example, Schulte-Körne et al. (2004) investigated tenth graders with and without a history of DD. In a learning phase participants had to study a list of simple pseudowords and graphic symbols. In the recognition phase the learned items were presented together with new items and participants decided whether the presented item was new or learned. Interestingly, all subjects performed the task equally well, however, the LPC was attenuated in response to learned pseudowords in adolescents with DD compared to adolescents without DD. No group differences were found for graphic symbols. These results were interpreted as reflecting a specific word recognition memory deficit (Schulte-Körne et al., 2004). In the present study we did not investigate a word recognition task but a PLD—task, thus the LPC elicited in the present study might reflect the access to the phonological lexicon and the recognition of a phonological entry of an existing German word.

Taken together a large body of evidence points to deficits in different processing steps during reading in subjects with DD. As reviewed these studies often focused their investigation on one single process or used different tasks in order to explore different processing steps, thus also leading to inconsistent results. To the best of our knowledge this is the first study investigating the PLD—task in children without and with DD using ERPs. One major advantage of the PLD—task, is the fact, that it is a continuous reading task, which allows to study both orthographic and phonological processing in one experiment, thus avoiding confounding effects due to varying attention, motivation or arousal levels or due to different task demands and stimuli properties.

We predicted to find processing differences between the stimuli and groups on both the neurophysiological and the behavioral level. On the neurophysiological level we expected to find higher amplitudes for letter strings (W; PH; PW) compared to FF in the time window of the N170 over occipito-temporal electrodes in children without DD, as an index of print sensitivity. If the N170 is also sensitive to orthographic familiarity in children we hypothesized to find decreased amplitudes for orthographic familiar (W) in contrast to orthographic unfamiliar (PH; PW) word material. For children with DD we expect to find no print sensitivity and orthographic familiarity effect on the N170 component. Furthermore, we expected to find an N400 over centro-parietal electrodes in normal developing children reflecting lexical or post-lexical processing. If the N400 indexes lexical processing stages, we expected to find lower amplitudes for orthographic familiar (W) compared to orthographic unfamiliar (PH; PW) word material. If the N400 indicates post-lexical processing we hypothesized to find amplitude differences between phonological familiar (W; PH) and phonological unfamiliar word material (PW). Findings whether the processing steps related to the N400 are deficient in children with DD are inconsistent. If processing steps related to N400 are degraded in children with DD, we would expect them to show attenuated N400 amplitudes compared to normal developing children. Finally, we hypothesized to find higher LPC amplitudes over left centro-parietal electrodes for W and PH in control children, indicating successful access to the phonological lexicon. However, this pattern of activation is not expected for the children with DD. Against this background we anticipated delayed reaction times and reduced accuracy rates for W, PH, and PW in children with DD in contrast to control subjects. Further, we expected to replicate the reaction time pattern observed in former studies (W *<* PH *<* PW).

# **MATERIALS AND METHODS PARTICIPANTS**

As part of a longitudinal study of our research group (see Groth et al., 2013) contact details of all children born in Munich between January 2000 and December 2003 were requested from the Department of Public Order of Munich. Approximately 10,000 randomly selected families were contacted via letter and asked for participation in the present study. Additionally, study information was sent to schools, pediatrics, child psychiatrists and psychologists and socio-pediatric facilities.

Recruitment procedure had two stages. In a first step, families who expressed their interest in the present study underwent a telephone interview. Potential participants were excluded from the next stage of recruitment if one of the parents indicated that his or her child had a history of specific language disorder, had been treated for any neurological or psychiatric disorder or was currently under medication. To ensure that the children did not suffer from symptoms of Attention-Deficit Hyperactivity Disorder (ADHD) the parents were asked to estimate their children on the subscale "Attention Problems" of the Child-Behavior-Checklist (CBCL/1–4; Achenbach, 1991). Children were excluded if they scored above average in the parent questionnaire (CBCL-score *>*7 for girls and CBCL-score *>*8 for boys) indicating a risk of ADHD. Furthermore, participants had to be German native speakers, had to attend the second grade, their hearing had to be normal and their vision had to be normal or corrected-to-normal. We decided to recruit children at the end of second grade, because at this point in time there is a high level of certainty regarding the stability of the DD diagnosis.

In the second recruitment step 250 second graders were invited and screened regarding their reading and spelling performance as well as their non-verbal intelligence. Inclusion criteria for all children were an IQ score within the normal range (≥85 IQ points) as measured with the Culture Fair Intelligence Test (CFT 1; Cattell et al., 1997). Furthermore, common word reading fluency and spelling were used as inclusion criteria. Common word and pseudoword reading fluency was assessed by using a German standardized one-minute-fluent reading-test (German: Ein-Minuten-Leseflüssigkeitstest [SLRT-II]; Moll and Landerl, 2010). In this measure, children are presented with a list of common words and pseudowords and are given one minute to read as many items as possible. Spelling was assessed with a German standardized basic vocabulary spelling test for grades 2–3 (German: Weingartener Grundwortschatz Rechtschreib-Test für zweite und dritte Klassen [WRT2+]; Birkel, 1994). In addition, reading comprehension measured with a German standardized reading comprehension test for grades 1–6 (German: Ein Leseverständnistest für Erst- bis Sechstklässler [ELFE 1–6]; Lenhard and Schneider, 2006) was assessed.

In order to ensure inclusion of only truly average (or above average) readers and spellers in our control sample, children belonging to the control group were required to be within 0.70 standard deviations of the lower end of the norm scale calculated in *T*-values (mean = 50; *SD* = 10; cutoff criteria was therefore set to a *T*-value of 43). In order to be included in the group of children with DD, participants had to fulfill the diagnosis of DD according to the International Classification of Diseases (ICD-10: F 81.0; Dilling, 2006). Their reading and spelling score had to diverge from the mean *T*-value for at least one standard deviation (1 SD; cutoff criteria was therefore set to a *T*-value of 40) and 1 SD from the IQ according to the regression criterion (Schulte-Körne et al., 2001). Thus, both a discrepancy of reading and spelling abilities from the class or age level, but also from the level expected on the basis of the child's intelligence is required for diagnosing DD. As the correlation of reading and spelling performance with IQ is not 1, but medium-high the use of a simple discrepancy criterion distorts the diagnostic results for children with low or high intelligence (Schulte-Körne et al., 2001). The application of the regression criterion avoids distortions in extreme ranges by considering the correlation between IQ and reading and spelling abilities. Thus, a higher discrepancy is necessary for children with high intelligence and a lower discrepancy is necessary for children with low intelligence in order to meet the diagnostic criterion of DD (Schulte-Körne et al., 2001). Overall 29 children were included in the control group and 58 children were included in the group of DD. The sample of children with DD was larger compared to the sample of control children because as mentioned above children were recruited as part of a longitudinal study. For the purpose of this longitudinal study children with DD were assigned to three groups. One group received an intensive reading training, a second group performed an intensive spelling training and the third group acted as a control wait-group and received training only after a six month wait period (see Groth et al., 2013 for more information). Here the results of the first point in time, prior to the intervention, will be reported. We therefore decided to compare the control children to the whole group of children with DD. A total of six children from the DD sample were excluded from further analyses due to excessive EEG artifacts, resulting in a sample size of 52 children with DD. All data reported exclude these participants.

Both groups had an average age of about eight years (control group: *M* = 8*.*15, *SD* = 0*.*27; group with DD: *M* = 8*.*30, *SD* = 0*.*37) and an IQ-score within the normal range. The IQ of control children was significantly higher compared to the IQ of children with DD (see **Table 1**). In order to control for a confounding influence of the IQ on the ERP results the groups were matched according to their IQ. The Analyses of Variance (ANOVAs) presented below were also run with IQ matched groups and did reveal the same pattern of results. Gender was distributed similarly in both groups (control group: 13 females; group with DD: 21 females). In all reading and spelling tests children with DD performed significantly worse than control children (see **Table 1**). Apart from one control child and one child with DD all subjects were right-handed.

Parents and children were informed about the aim, purpose and procedure of the study and gave their written consent prior to inclusion in the study. Children received a present as acknowledgement for their participation. Experimental procedures were approved by the Ethical Committee of the Faculty of Medicine at the University of Munich, Germany.

#### **ERP PARADIGM AND PROCEDURE**

During ERP acquisition children performed a PLD—task (Kronbichler et al., 2007; Bergmann and Wimmer, 2008; van der Mark et al., 2009, 2011). In this task participants had to decide whether a visually presented stimulus sounded like a real word or not ("Does *...* sound like a real word?", see **Figure 1**).



*CON, control group; DD, group with DD; n, sample size; M, mean; SD, standard deviation.*

*bSLRT-II.*

*dWRT 2*+*.*

*Average reading and spelling scores are delineated by T-values; T-values have a mean of 50 (SD* <sup>±</sup> *10). \*t-test for independent samples.*

**FIGURE 1 | Phonological lexical decision task.** Words (W; e.g., Mund /mnt/, engl.: mouth), pseudohomophones (PH; e.g., Munt /mnt/), pseudowords (PW; e.g., Munk /mηk/) and false fonts (FF; e.g., ) were presented individually in white on black background in the center of a 17 screen. Participants were instructed to decide via button press whether a presented stimulus sounded like a real word or not.

Children were presented either with words (W; orthographically and phonologically familiar forms of German nouns), pseudohomophones (PH; phonologically correct but orthographically unfamiliar forms of the same words) or pseudowords (PW; phonologically and orthographically unfamiliar forms). W and PH required a "yes" response and PW should be responded with "no." For each item type (W; PH; PW) 60 stimuli were taken with minor adaptions from the letter strings used in the study of Bergmann and Wimmer (2008) and van der Mark et al. (2009, 2011). Every item was presented once only. In order to avoid a response bias toward "yes" responses we included a fourth condition, consisting of 60 false fonts (FF; van der Mark et al., 2009, 2011) and requiring a "no" response. FF were created by assigning a FF to each upper and lower case letter (van der Mark et al., 2009, 2011, see Appendix for a complete list of all stimuli used in the PLD—task). Furthermore, FF also served as non-lexical control stimuli in order to examine the print sensitivity of the N170 (see Introduction).

According to the "corpus-based word basic form list" (Korpusbasierte Wortgrundformenliste; DeReWo, 2013) compiled on the base of the Mannheim German Reference Corpus (Das Deutsche Referenzkorpus; Kupietz and Keibel, 2009; DeReKo, 2012) nouns used in the present study had a high frequency range, i.e., frequency classes 8–16 (Keibel, 2008). Item length and bigram frequency have a confounding effect on the ERPs of cognitive processes (Johannes et al., 1995; Assadollahi and Pulvermüller, 2003; Hauk and Pulvermüller, 2004; Penolazzi et al., 2007; Proverbio et al., 2008). To avoid effects due to item length and complexity all stimuli were matched for number of characters (3–7 characters). In addition W, PH, and PW were controlled for bigram frequency. Bigram frequencies were also determined based on the Mannheim German Reference Corpus. As can be seen in **Table 2** number of characters for all conditions and bigram frequencies for the letter string conditions were equally distributed.

All stimuli were presented in white font on black background in the center of a 17 screen using E-Prime® 2.0 software (Psychology Software Tools, Inc.). The computer screen was placed 70 cm in front of the children resulting in a vertical visual angle of 1.23◦ and in an average horizontal angle of 3.44◦.

The 240 stimuli were presented in two pseudorandomized lists. The order of W and corresponding PH was counterbalanced. In List 1 the W was presented before the corresponding PH in half of the cases and the opposite for the other half. In List 2 the order was reversed (Bergmann and Wimmer, 2008). In addition, a W and its corresponding PH did not appear in close proximity (Bergmann and Wimmer, 2008) and no more than three trials requiring the same response were presented in succession. Half of the children performed List 1, whereas the other half was presented with List 2. Both lists were divided into four blocks, each with 60 stimuli. After each block there was a short break. To ensure that the subjects fully understood the task, the experiment was preceded by a short practice-block (24 trials). Trials utilized in the practice-block did not occur in the experiment.

To make sure that even the poorest reader had enough time to read the letter string stimuli the task was self-paced. However,

*aCFT 1.*

*cELFE 1–6.*


#### **Table 2 | Item characteristics for each condition.**

*W, words; PH, pseudohomophones; PW, pseudowords; FF, false fonts.*

all children were presented with the stimuli for a minimum of 700 ms in order to guarantee that all participants saw the same in the first milliseconds, which is important for ERP analysis. Participants had to decide by button press whether the presented stimulus sounded like a real word or not. Half of the children used their right hand for giving a "yes" response and the left hand for giving a "no" response, the other half used the left hand for "yes" and the right hand for "no" responses. Depending on correct or incorrect response children were provided with a feedback in form of a happy or sad face (1500 ms). The next trial appeared automatically after a blank screen of 500 ms (see **Figure 1**).

#### **ERP RECORDING AND ANALYSIS**

EEG was recorded during the stimulus presentation with an Electrical Geodesic Inc. 128-channel-system (see **Figure 2** for a schematic illustration of the electrode net). The impedance was kept below 50 k*-*. EEG-data was recorded continuously with Cz as the reference electrode and sampled at 500 Hz. Further analysis steps were performed with Brainvision Analyzer (Brain Products GmbH).

After filtering (low cutoff: 0.5 Hz, time constant 0.3, 12dB/ octave; high cutoff: 40 Hz, 24 dB/ octave; notch filter: 50 Hz; filtered continuous on raw data to avoid discontinuities and transient phenomena), removing EOG-artifacts with Independent Component Analysis (Zhou et al., 2005; Hoffmann and Falkenstein, 2008) and exclusion of other artifacts (gradient criteria: more than 50μV difference between two successive data points or more than 150μV in a 200 ms window; absolute amplitude criterion: more than ±150μV; low activity: less than 0.5μV in a 100 ms window), the EEG was re-referenced to the average reference.

The data was then segmented into 1100 ms epochs including 100 ms pre-stimulus baseline and the ERP data was baseline corrected. For inclusion in the statistical analysis a minimum of 20 artifact free trials was necessary. Only correct trials were analyzed. The averages (*M* [*SD*]) for the accepted trials for control children were: W 53.79 [3.10], PH 50.45 [3.74], PW 51.86 [5.03] and FF 56.45 [2.34]. For children with DD an average of 47.23 [4.61], 43.56 [6.64], 40.50 [9.20] and 56.00 [2.31] trials were obtained for the W, PH, PW and FF, respectively. Individual ERPs were averaged per condition (W; PH; PW; FF). Grand averages of all four conditions were computed by averaging separately for each subject group (control group; group with DD).

Based on the electrophysiological activity to W for control children time windows and regions of interest (ROIs) for the N170, N400 and the LPC were determined using running *t*-tests against

zero (*p <* 0*.*05) at each electrode. According to this analysis the time window was set 170–290 ms for the N170, 330–460 ms for the N400 and 600–900 for the LPC. These time windows were applied to all conditions and both groups.

In line with previous studies (e.g., Maurer et al., 2006; Kast et al., 2010; Hasko et al., 2012) the most significant activation of the N170 in the present study was also found bilaterally over occipito-temporal electrodes using the running *t*-tests against zero (*p <* 0*.*05) for W in control children. According to this activation we defined left and right hemispheric ROIs (LH and RH ROIs). The LH ROI included electrodes 58, 59, 64, 65, 66, 69, 70 and the RH ROI included electrodes 83, 84, 89, 90, 91, 95, 96 (see **Figure 2** for exact electrode positions over occipito-temporal sites).

In order to examine the degree of N170 print sensitivity additionally difference waves were calculated between the linguistic material and the non-lexical control stimuli FF for the time window of the N170. ERP difference waves were calculated by subtracting FF from the linguistic material (i.e., W minus FF, PH

LPC.

minus FF, PW minus FF). Furthermore, in order to examine the degree of orthographic familiarity difference waves were calculated between the orthographic familiar (W) and orthographic unfamiliar material (PH; PW). ERP difference waves were calculated by subtracting orthographic unfamiliar material from orthographic familiar material (i.e., W minus PH, W minus PW). Even though both PH and PW are orthographically unfamiliar, difference waves contrasting W and PW might be confounded with phonological and semantic processes, because W and PW do not differ only with respect to orthographic familiarity but also with respect to phonology and semantic. Difference waves were calculated for each child separately and grand averages of all five difference waves (W minus FF, PH minus FF, PW minus FF, W minus PH, W minus PW) were computed by averaging separately for each group (control group; group with DD). Based on the electrophysiological activity to the W minus FF contrast for control children ROIs were determined using running *t*-tests against zero (*p <* 0*.*05) at each electrode. According to this activation electrodes 51, 52, 58, 59, 60, 64, 65, 66, 67, 69, 70, 71 were comprised in the LH ROI and electrodes 76, 77, 83, 84, 85, 89, 90, 91, 92, 95, 96, 97 were included in the RH ROI (see **Figure 2** for exact electrode positions over occipito-temporal sites).

For the N400, according to the running *t*-tests against zero (*p <* 0*.*05) for W in control children we determined a centroparietal distribution (see **Figure 2**; electrodes included in the ROI: 31, 37, 42, 53, 54, 55, 61, 62, 78, 79, 80, 86, 87, 93, 129 (VREF); e.g., Deacon et al., 2004; for review see Lau et al., 2008; Kutas and Federmeier, 2011).

According to the running *t*-tests against zero (*p <* 0*.*05) for W in control children a left centro-parietal ROI was defined for the LPC (see **Figure 2**; electrodes included in the ROI: 31, 36, 37, 41, 42, 47, 52, 53, 54, 55, 60, 61, 62, 67, 72, 77, 78, 79; Friedman and Johnson, 2000; Hepworth et al., 2001; Finnigan et al., 2002; Rüsseler et al., 2003; Schulte-Körne et al., 2004; Yonelinas et al., 2005; van Strien et al., 2009; Balass et al., 2010; for review see: Rugg and Curran, 2007).

Mean peak amplitude measures capturing data 20 ms before and 20 ms after the individual peak and latencies were exported for each electrode of the N170 and N400 ROI using the defined time windows. As no clear peak could be observed on the N170 difference waves and on the LPC, we decided to export the area under the curve for each electrode included in the ROI of the N170 difference waves and of the LPC using the defined time windows. The values of individual mean peak amplitudes, latencies, and areas under the curve were averaged after peak export for every ROI.

#### **STATISTICAL ANALYSIS**

To test for group differences regarding the N170 mean peak amplitudes and latencies we computed ANOVAs for repeated measures. The ANOVAs included the within-subject factor *condition* (W; PH; PW; FF) and *hemisphere* (LH; RH) and the between-subject factor *group* (control group; group with DD). Similar ANOVAs for repeated measures were run for the mean area under the curve for the N170 difference waves in order to examine the degree of print sensitivity and the degree of orthographic familiarity. For examining the degree of print sensitivity the ANOVA included the within-subject factor *condition* (W minus FF; PH minus FF; PW minus FF) and *hemisphere* (LH; RH) and the between-subject factor *group* (control group; group with DD). For examining the degree of orthographic familiarity the ANOVA included the within-subject factor *condition* (W minus PH; W minus PW) and *hemisphere* (LH; RH) and the betweensubject factor *group* (control group; group with DD). N400 mean peak amplitudes and latencies and LPC mean area under the curve were investigated for group differences using ANOVAs for repeated measures. These ANOVAs included the within-subject factor *condition* (W; PH; PW) and the between-subject factor *group* (control group; group with DD). *Post-hoc* analyses were performed with *t*-tests for independent and dependent samples.

The behavioral data (reaction times and accuracy on the PLD—task) was analyzed using ANOVAs for repeated measures including the within-subject factor *condition* (W; PH; PW; FF) and the between-subject factor *group* (control group; group with DD). Trials were excluded from analysis if the response times were lower than 200 ms and deviating more than 2.5 SD from the individual group mean within a condition type. This procedure resulted in a loss of 2.76% of the trials. Furthermore for the reaction time analysis only correct trials were included.

If sample sizes are equal, ANOVAs are unsusceptible against violations of homogeneity of variances. Given that the sample of children with DD was almost twice as big as the control sample the *F*max-test was applied in case of violations of the homogeneity of variances (Bühner and Ziegler, 2009). According to the *F*maxtest an adjustment of the alpha-level is necessary if the critical value of *F*max *>* 10 is exceeded (Bühner and Ziegler, 2009). In none of the variables the critical value was exceeded. If necessary the Greenhouse-Geisser correction was applied to correct for violations of the sphericity assumption. The alpha level for all analyses was 0.05. In order to avoid alpha-error-inflation due to multiple comparisons the alpha level was corrected using the Bonferroni-Holm correction (Bühner and Ziegler, 2009). In addition to the *p*-values, effect sizes η<sup>2</sup> *<sup>p</sup>* for ANOVAs with repeated measures and Cohen's d for *post-hoc t*-tests are reported for significant results (Cohen, 1988; Bühner and Ziegler, 2009).

Furthermore, partial correlations were computed controlling for the factor *group* between the ERP data (N170 mean area under the curve for difference waves; N400 mean peak amplitudes; LPC mean area under the curve) and the behavioral data (common word and pseudoword reading fluency; reading comprehension; spelling). As we did not observe differences between W, PH and PW in the N170 difference waves and in the N400 we decided to use mean values calculated across the three letter string types for the partial correlation analysis. The correlational analysis was exploratory, therefore Bonferroni-Holm correction was not applied. Only significant results (*p <* 0*.*05) will be reported.

#### **RESULTS**

#### **ERP DATA OF THE PLD—TASK** *N170*

*Mean peak amplitudes.* In both groups N170 mean peak amplitudes were enhanced for the linguistic material compared to FF (main effect *condition, F(*3*,* <sup>237</sup>*)* = 15*.*27, *p <* 0*.*001, η2 *<sup>p</sup>* = 0*.*16; dependent *post-hoc t*-tests across both groups: FF vs. PW, *t(*80*)* = 4*.*14, *p <* 0*.*001, *d* = 0*.*46; FF vs. PH, *t(*80*)* = 5*.*21, *p <* 0*.*001, *d* = 0*.*58; FF vs. W, *t(*80*)* = 5*.*59, *p <* 0*.*001, *d* = 0*.*63; PW vs. PH, *t(*80*)* = 0*.*71, *p* = 0*.*48; PW vs. W, *t(*80*)* = 1*.*48, *p* = 0*.*14; PH vs. W, *t(*80*)* = 0*.*72, *p* = 0*.*47; see **Table 3** and **Figure 3**). N170 mean peak amplitudes were comparable between *groups*, *F(*1*,* <sup>79</sup>*)* = 0*.*08, *p* = 0*.*78, and distributed symmetrically across both *hemispheres*, *F(*1*,* <sup>79</sup>*)* = 0*.*94, *p* = 0*.*34. No significant interaction between *group* and *condition, F(*3*,* <sup>237</sup>*)* = 1*.*50, *p* = 0*.*22, or *group* and *hemisphere* could be observed, *F(*1*,* <sup>79</sup>*)* = 0*.*12, *p* = 0*.*74.

*Peak latencies.* A significant main effect *condition* occurred, *<sup>F</sup>(*2*.*59*,* <sup>204</sup>*.*84*)* <sup>=</sup> <sup>3</sup>*.*65, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*018, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*04. Dependent *post-hoc t*-tests revealed signficantly shorter peak latencies only for PH compared to W (FF vs. PW, *t(*80*)* = −0*.*64, *p* = 0*.*53; FF vs. PH, *t(*80*)* = −0*.*18, *p* = 0*.*86; FF vs. W, *t(*80*)* = −2*.*62, *p* = 0*.*01; PW vs. PH, *t(*80*)* = 0*.*64, *p* = 0*.*53; PW vs. W, *t(*80*)* = −1*.*98, *p* = 0*.*05; PH vs. W, *t(*80*)* = −2*.*89, *p* = 0*.*005, *d* = 0*.*32; see **Table 3** and **Figure 3**). N170 peak latencies were comparable between *groups*, *F(*1*,* <sup>79</sup>*)* = 0*.*03, *p* = 0*.*87, and equal across both *hemispheres*, *F(*1*,* <sup>79</sup>*)* = 0*.*32, *p* = 0*.*57. No significant interaction between *group* and *condition*, *F(*2*.*59*,* <sup>204</sup>*.*84*)* = 2*.*19, *p* = 0*.*10, or *group* and *hemisphere*, *F(*1*,* <sup>79</sup>*)* = 1*.*43, *p* = 0*.*24, could be observed.

*Print sensitivity; area under the curve.* Mean area under the curve was greater for the control group compared to the group with DD for all difference waves contrasting the linguistic material with FF (W minus FF; PH minus FF; PW minus FF; main effect *group, F(*1*,* <sup>79</sup>*)* <sup>=</sup> <sup>9</sup>*.*36, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*003, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*11; see **Figure 4A**). Furthermore, the activation was greater over the left hemisphere compared to the right hemisphere (main effect *hemisphere, F(*1*,* <sup>79</sup>*)* <sup>=</sup> <sup>5</sup>*.*08, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*027, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*06; see **Figure 4A**). Mean area under the curve was similar high for all three difference waves, *F(*2*,* <sup>158</sup>*)* = 0*.*77, *p* = 0*.*46. No significant interaction between *group* and *condition*, *F(*2*,* <sup>158</sup>*)* = 1*.*27, *p* = 0*.*28, or *group* and *hemisphere*, *F(*1*,* <sup>79</sup>*)* = 0*.*04, *p* = 0*.*84, could be observed.

*Orthographic familiarity; area under the curve.* Mean area under the curve was comparable high for both *groups*, *F(*1*,* <sup>79</sup>*)* = 0*.*29, *p* = 0*.*59, and *hemispheres*, *F(*1*,* <sup>79</sup>*)* = 0*.*03, *p* = 0*.*85. Furthermore, mean area under the curve was similar for W minus PH and W minus PW, *F(*1*,* <sup>79</sup>*)* = 0*.*56, *p* = 0*.*46 (see **Figure 4B**). No significant interaction between *group* and *condition*, *F(*1*,* <sup>79</sup>*)* = 2*.*05, *p* = 0*.*16, or *group* and *hemisphere*, *F(*1*,* <sup>79</sup>*)* = 0*.*66, *p* = 0*.*42, could be observed.

# *N400*

*Mean peak amplitudes.* N400 mean peak amplitudes were more negative in the control group compared to the group with DD (main effect *group, F(*1*,* <sup>79</sup>*)* = 5*.*34, *p* = 0*.*023, η2 *<sup>p</sup>* = 0*.*06; see **Table 4** and **Figure 5**). N400 mean peak amplitudes were comparable high for all *conditions*, *F(*2*,* <sup>158</sup>*)* = 0*.*28, *p* = 0*.*75, and no significant interaction between *group* and *condition*, *F(*2*,* <sup>158</sup>*)* = 0*.*68, *p* = 0*.*51, could be observed.

**FIGURE 3 | Illustration of the averages across occipito-temporal (OT) electrodes included in the left hemispheric (LH) and right hemispheric (RH) ROIs of the N170 for control children (CON) and children with DD (DD).** W, words; PH, pseudohomophones; PW, pseudowords; FF, false fonts. Negativity is depicted upwards.


*W, words, PH, pseudohomophones, PW, pseudowords, FF, false fonts; CON, control children, DD, children with DD.*

*Peak latencies.* N400 peak latencies did not differ between *groups*, *F(*1*,* <sup>79</sup>*)* = 1*.*49, *p* = 0*.*23, and *conditions*, *F(*2*,* <sup>158</sup>*)* = 1*.*53, *p* = 0*.*22, and no significant interaction between *group* and *condition*, *F(*2*,* <sup>158</sup>*)* = 2*.*76, *p* = 0*.*07, could be observed (see **Table 4** and **Figure 5**).

#### *LPC*

*Area under the curve.* A main effect *condition*, *F(*2*,* <sup>158</sup>*)* = 4*.*41, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*014, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*05, occurred. Furthermore, a significant twofold interaction between the factors *condition* and *group* was observed, *<sup>F</sup>(*2*,* <sup>158</sup>*)* <sup>=</sup> <sup>4</sup>*.*05, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*019, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*05. Independent *post-hoc t*-tests revealed no significant differences between the groups (W, *t(*79*)* = 1*.*32, *p* = 0*.*19; PH, *t(*79*)* = 1*.*69, *p* = 0*.*09; PW, *t(*79*)* = −1*.*14, *p* = 0*.*26).

As can be seen in **Figure 6** only in the control group more activation for both W and PH compared to PW was found (dependent *post-hoc t*-tests: W vs. PW, *t(*28*)* = 3*.*57, *p* = 0*.*001,



*W, words; PH, pseudohomophones; PW, pseudowords; CON, control children; DD, children with DD.*

*d* = 0*.*66; PH vs. PW, *t(*28*)* = 2*.*63, *p* = 0*.*014, *d* = 0*.*49). The activation for W and PH was comparable high in control children (dependent *post-hoc t*-test: W vs. PH, *t(*28*)* = 0*.*91, *p* = 0*.*37). Conditions did not differ in the group with DD (dependent

*post-hoc t*-tests: W vs. PH, *t(*51*)* = 1*.*25, *p* = 0*.*22; W vs. PW, *t(*51*)* = 0*.*37, *p* = 0*.*71; PH vs. PW, *t(*51*)* = −0*.*78, *p* = 0*.*44; see **Figure 6**).

# **BEHAVIORAL DATA OF THE PLD—TASK** *Reaction times*

Performance on the PLD—task revealed a reaction time difference between *conditions*, *F(*1*.*77*,* <sup>139</sup>*.*63*)* = 323*.*85, *p <* 0*.*001, η2 *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*80, and *groups*, *<sup>F</sup>(*1*,* <sup>79</sup>*)* <sup>=</sup> <sup>80</sup>*.*84, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*51. Furthermore, a significant twofold interaction between the factors *condition* and *group* occurred, *F(*1*.*77*,* <sup>139</sup>*.*63*)* = 68*.*38, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0.46. Control children had smaller reaction times to W, *t(*71*.*81*)* = −10*.*90, *p <* 0*.*001, *d* = 2*.*68, PH, *t(*75*.*70*)* = −9*.*99, *p <* 0*.*001, *d* = 2*.*40 and PW, *t(*72*.*86*)* = −11*.*46, *p <* 0*.*001, *d* = 2*.*80, compared to children with DD. There was no difference between groups regarding reaction times to FF, *t(*79*)* = −0*.*49, *p* = 0*.*63 (see **Figure 7**).

*Post-hoc t*-tests within each group revealed the same pattern of reaction times for both groups. Both control children and children with DD had longer reaction times for all linguistic stimuli compared to FF (CON: W vs. FF, *t(*28*)* = 9*.*45, *p <* 0*.*001, *d* = 1*.*75; PH vs. FF, *t(*28*)* = 16*.*31, *d* = 3*.*03, *p <* 0*.*001; PW vs. FF, *t(*28*)* = 15*.*83, *p <* 0*.*001, *d* = 2*.*94; DD: W vs. FF, *t(*51*)* = 16*.*78, *p <* 0*.*001, *d* = 2*.*33; PH vs. FF, *t(*51*)* = 20*.*05, *p <* 0*.*001, *d* = 2*.*78; PW vs. FF, *t(*51*)* = 21*.*24, *p <* 0*.*001, *d* = 2*.*95). In both groups reaction times were shorter for W compared to PH (CON, *t(*28*)* = −12*.*70, *p <* 0*.*001, *d* = 2*.*36; DD, *t(*51*)* = −7*.*81, *p <* 0*.*001, *d* = 1*.*08) and PW (CON, *t(*28*)* = −15*.*12, *p <* 0*.*001, *d* = 2*.*81; DD, *t(*51*)* = −14*.*24, *p <* 0*.*001, *d* = 1*.*97). And both groups responded slower to PW compared to PH (CON, *t(*28*)* = 7*.*60, *p <* 0*.*001, *d* = 1*.*41; DD, *t(*51*)* = 12*.*54, *p <* 0*.*001, *d* = 1*.*74).

**FIGURE 6 | Illustration of the averages across centro-parietal (CP) electrodes included in the ROI of the LPC for control children (CON) and children with DD (DD) and illustration of the mean area under the curve.** W, words; PH, pseudohomophones; PW, pseudowords. The time window selected for the LPC is highlighted in gray (600–900 ms). Negativity is depicted upwards. Error bars illustrate standard deviations.

**FIGURE 7 | Behavioral results for the PLD—task for control children (CON) and children with DD (DD).** ACC, accuracy; RT, reaction time; FF, false fonts; W, words; PH, pseudohomophones; PW, pseudowords. Error bars illustrate standard deviations.

#### *Accuracy*

Performance on the PLD—task revealed an accuracy difference between *conditions*, *<sup>F</sup>(*2*.*23*,* <sup>175</sup>*.*83*)* <sup>=</sup> <sup>96</sup>*.*92, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*55, and *groups*, *<sup>F</sup>(*1*,* <sup>79</sup>*)* <sup>=</sup> <sup>50</sup>*.*83, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*39. Furthermore, a significant twofold interaction between the factors *condition* and *group* occurred, *F(*2*.*23*,* <sup>175</sup>*.*83*)* = 21*.*67, *p <* 0*.*001, η2 *<sup>p</sup>* = 0*.*22. Independent *post-hoc t*-tests revealed that control children's performance was significantly better in all linguistic conditions compared to the performance of the group with DD (W, *t(*78*.*42*)* = 8*.*32, *p <* 0*.*001, *d* = 1*.*96; PH, *t(*78*.*27*)* = 5*.*80, *p <* 0*.*001, *d* = 1*.*37; PW, *t(*77*.*39*)* = 7*.*65, *p <* 0*.*001, *d* = 1*.*81; see **Figure 7**). No group differences were found for the FF-condition, *t(*79*)* = 0*.*36, *p* = 0*.*72.

Dependent *post-hoc t*-tests within each group revealed that control children gave more correct answers to FF compared to all linguistic stimuli (W vs. FF, *t(*28*)* = −3*.*92, *p* = 0*.*001, *d* = 0*.*73; PH vs. FF, *t(*28*)* = −8*.*55, *p <* 0*.*001, *d* = 1*.*59; PW vs. FF, *t(*28*)* = −7*.*71, *p <* 0*.*001, *d* = 1*.*43). Furthermore, control children's accuracy was higher to W compared to PH and PW (W vs. PH, *t(*28*)* = 9*.*09, *p <* 0*.*001, *d* = 1*.*69; W vs. PW, *t(*28*)* = 4*.*54, *p <* 0*.*001, *d* = 0*.*84) and accuracy rates did not differ between PH and PW for control children, *t(*28*)* = −0*.*41, *p* = 0*.*69. Similarly to the control children, dependent *post-hoc t*-tests revealed that children with DD gave more correct answers to FF compared to all linguistic stimuli (W vs. FF, *t(*51*)* = −13*.*08, *p <* 0*.*001, *d* = 2*.*43; PH vs. FF, *t(*51*)* = −13*.*16, *p <* 0*.*001, *d* = 2*.*44; PW vs. FF, *t(*51*)* = −13*.*87, *p <* 0*.*001, *d* = 2*.*58). Furthermore, accuracy rates were higher to W compared to PH and PW (W vs. PH, *t(*51*)* = 5*.*59, *p <* 0*.*001, *d* = 0*.*78; W vs. PW, *t(*51*)* = 7*.*95, *p <* 0*.*001, *d* = 1*.*10) and higher to PH compared to PW, *t(*51*)* = 3*.*99, *p <* 0*.*001, *d* = 0*.*55, in the group with DD.

#### **CORRELATIONAL RESULTS**

When interpreting the correlation results, please note that the mean area under the curve for the N170 difference waves and the N400 mean peak amplitudes have negative values. No correlation was found between the mean area under the curve for the N170 difference waves and the performance in reading and spelling. N400 mean peak amplitudes were correlated with spelling (*r* = −0*.*25, *p* = 0*.*025), indicating that better spelling was related to enhanced N400 mean peak amplitudes. Furthermore, a smaller LPC mean area under the curve for PH was correlated with better spelling (*r* = −0*.*22, *p* = 0*.*048).

#### **DISCUSSION**

The present study was designed to investigate the single processing steps underlying PLDs in order to provide a temporal model of reading processes in normal developing children and to further clarify which processing steps are degraded in children with DD during reading. Therefore, we decided to employ a PLD—task in children with and without DD while recording their neurophysiological activity via EEG. Children were presented with W, PH, PW and FF and had to decide via button press whether the presented stimuli sounded like a real German word or not. In the following sections we will relate our ERP findings to single processing steps suggested by dual route models of reading, thus providing a temporal model of reading processes for children. Furthermore, deficits related to single processing steps in DD will be discussed and clinical implication for intervention derived from our findings will be offered.

#### **TEMPORAL MODEL OF READING PROCESSES IN NORMAL DEVELOPING CHILDREN AND DEFICITS IN DD**

Dual route models of reading (Coltheart et al., 1993, 2001) suggest that reading proceeds in a hierarchical manner. After the completion of visual and orthographic processing steps phonology of a letter string can be accessed in different ways depending on the orthographic familiarity of the letter string. Familiar known words are read first by accessing the orthographic representations in the orthographic lexicon and then by retrieving the corresponding phonological representations from the phonological lexicon. Unfamiliar word forms, such as pseudohomophones and pseudowords or familiar words for which the reader does not possess an entry in the orthographic lexicon are read by applying grapheme-phoneme correspondence rules in order to access the phonological representation.

Although reading models assume different processing steps they do not provide information about the time course of single processing steps. However, knowledge about when single processing steps occur is important, especially in order to achieve a better understanding which processing steps during reading are degraded in subjects with DD and how these deficits might lead to the reading speed deficit, which is suggested to be the main criterion to diagnose DD in transparent orthographies.

#### *N170 indexes orthographic processing and is deficient in DD*

At about 220 ms the child's brain differentiates between orthographic (W, PH, PW) and non-orthographic control stimuli (FF) as indicated by higher mean peak amplitudes for orthographic stimuli compared to FF. This effect of print sensitivity can be allocated to the first processing step of reading models, namely the visual-orthographic processing step. In accordance with previous studies the N170 was distributed equally over the left and the right occipito-temporal scalp (see **Figure 3**) and delayed for about 50 ms (Maurer et al., 2006; Spironelli and Angrilli, 2009; Kast et al., 2010; Hasko et al., 2012) compared to the adults left lateralized N170 (Bentin et al., 1999; Maurer et al., 2005a,b), indicating that this first processing step is not yet fully automated in children. According to the phonological mapping hypothesis (McCandliss and Noble, 2003; Maurer et al., 2010) processing of written language becomes left lateralized with increasing reading experience during development because phonological processes, which mediate grapheme-phoneme conversion, are typically left lateralized. This hypothesis is supported by a longitudinal study from Maurer and colleagues (2006), who were able to show that print sensitivity develops with reading instruction. While the N170 amplitudes were comparably high for words and symbol strings in preschool children, children at the end of second grade showed an effect of print sensitivity and N170 was distributed equally across hemispheres for words (Maurer et al., 2005b, 2006). Furthermore, Maurer and colleagues observed a clear shift of N170 amplitudes to the left hemisphere in adults (Maurer et al., 2005b, 2006). Thus, our results further indicate that the development of print sensitivity is not completed in third graders and suggest that the underlying system for fast visual word recognition is not yet entirely automated.

In contrast to a previous study by Maurer et al. (2007) children with DD in the present study, similar to control children, had higher mean peak amplitudes to orthographic compared to non-orthographic control stimuli. Children had a mean age of eight years in both studies. However, in the present study children were at the beginning of grade three, whereas children in the Maurer et al. (2007) study attended grade two, thus emphasizing the influence of increasing reading experience on the N170 as well as the plasticity of the N170 in DD. Although children with DD in the present study showed an effect of print sensitivity in the mean peak amplitudes the degree of print sensitivity was reduced as indicated by a significantly diminished mean area under the curve for the difference waves compared to children without DD (see **Figure 4A**). This finding also corresponds to fMRI studies investigating the PLD—task in subjects with DD and showing a lack of print sensitivity in the VWFA (Wimmer et al., 2010), next to a general hemodynamic hypoactivation of the VWFA (van der Mark et al., 2009; Wimmer et al., 2010), which is thought to generate the N170 (Allison et al., 1994; Tarkiainen et al., 1999; Salmelin et al., 2000). Overall, reduced VWFA activity was repeatedly reported for tasks requiring visual processing of words in subjects with DD (e.g., Démonet et al., 2004; Shaywitz and Shaywitz, 2008; Richlan et al., 2009).

Whereas the N170 was distributed equally across hemispheres, the degree of print sensitivity was more pronounced over the left hemisphere in both groups as indicated by greater mean areas under the curve over the left hemisphere compared to the right hemisphere (see **Figure 4A**). The greater left hemispheric activation of the difference waves is probably due to slightly higher activations for FF in the right hemisphere (see **Table 3**; amplitude means are at about 1μV higher in the right hemisphere), which is in line with previous studies showing a tendency toward right hemispheric processing for non-orthographic material (e.g., Bentin et al., 1999; Maurer et al., 2008). In order to compute the difference waves the activation to FF was subtracted from the orthographic material, thus resulting in a greater difference between orthographic material (W; PH; PW) and FF in the left hemisphere compared to the right hemisphere.

Thus far, fMRI studies examining the PLD—task reported an orthographic familiarity effect. Orthographic familiarity refers to a higher activation for unfamiliar (PH and PW) letter strings, compared to familiar letter strings (W) in the VWFA in normal developing subjects. This effect was absent in subjects with DD (van der Mark et al., 2009; Wimmer et al., 2010). Furthermore, some electrophysiological studies also reported an orthographic familiarity effect for the N170, i.e., lower N170 amplitudes for words with higher orthographic familiarity (Compton et al., 1991; McCandliss et al., 1997; Sereno et al., 1998; Hauk and Pulvermüller, 2004). These findings suggest that in this point in time the orthographic lexicon is accessed at least in adult readers. However, other studies did not replicate amplitude differences between words, pseudowords or consonant strings in children (Maurer et al., 2005b; Kast et al., 2010) and adults (Nobre et al., 1994; Salmelin et al., 1996; Bentin et al., 1999; Cornelissen et al., 2003). The orthographic familiarity effect seems to also depend on task demands as indexed by a study of Bentin et al. (1999), who found differences between consonant strings and words during explicit lexical and semantic tasks but not during implicit reading. Although the children in the present study had to explicitly read the word in order to resolve the task they did not show an orthographic familiarity effect. N170 mean peak amplitudes were comparable high for W, PH and PW (see **Table 3** and **Figure 3**). Furthermore, the mean area under the curve for the difference waves, measuring the degree of orthographic familiarity were negligible not only for children with DD but also for control children (see **Figure 4B**).

There are two possible explanations for the lack of orthographic familiarity in control children in the present study. Firstly, Barber and Kutas (2007) proposed that the sensitivity to orthographic familiarity might be dependent on the stimulus material included in the experiment. Studies including only orthographic material, which varied in orthographic familiarity, found an orthographic familiarity effect in the N170 time window, whereas studies additionally investigating non-orthographic material reported an influence of orthographic familiarity in a later time window, namely the N400. Barber and Kutas (2007) suggested that by presenting only orthographic material the human brain might prepare to process the presented stimuli as orthographic, thus accelerating reading processes. Accordingly, the lack of orthographic familiarity in the N170 in the present study might be explained by the investigation of both orthographic material (W, PH, PW) and FF. However, as we investigated 8-year-old children, it might be more likely that the lack of orthographic familiarity could be ascribed to the lower level of reading experience. This assumption is supported by a study of Kast and colleagues (2010), who explored a visual lexical decision task in 10-year-old children. Children were presented with words and pseudowords and had to decide whether the presented stimulus was a word or not. Although Kast et al. (2010) only investigated orthographic material they did not find an orthographic familiarity effect in the N170 concluding that this might be the result of lower reading experience in children and a less established reading system (Kast et al., 2010).

To summarize, the control children's brain differentiates orthographic familiar (W, PH, PW) from non-orthographic control stimuli (FF) at about 220 ms. However, there was no effect of orthographic familiarity in this early time window suggesting that reading processes at this point in time might be comparable for orthographic familiar and unfamiliar word forms in young children and further proposing that the orthographic lexicon has not yet been accessed. With respect to children with DD the degree of print sensitivity was reduced and points to deficits in this early stage of reading processes and at this age.

#### *N400 indexes comparable reading processes for W, PH and PW and points to deficits in DD*

According to hierarchical reading models the next processing step comprises the access to the orthographic lexicon in case of familiar word forms (W) and the applying of graphemephoneme correspondence rules in case of unfamiliar word forms (PH; PW) respectively in order to access phonology in a last step of reading process. Dual route models of reading (Coltheart et al., 1993, 2001) suggest that the search for an orthographic representation in the orthographic lexicon and the appliance of grapheme-phoneme correspondence rules occur in a parallel manner. In adults it has been found that N400 amplitudes were smaller to orthographic familiar word forms compared to unfamiliar word forms (e.g., Braun et al., 2006; Briesemeister et al., 2009). These results suggest that less effort was needed in order to find a fitting orthographic representation for familiar words in the orthographic lexicon, whereas the search was prolonged and grapheme-phoneme correspondence rules had to be applied in case of unfamiliar word forms resulting in enhanced N400 amplitudes.

In line with previous studies the N400 was distributed over centro-parietal electrodes in the present study (Deacon et al., 2004; for review see Lau et al., 2008; Kutas and Federmeier, 2011). In contrast to the N400 orthographic familiarity effect reported in adults N400 mean peak amplitudes were comparable high for W, PH and PW in the present study. Our findings are in accordance with results of previous studies investigating children (e.g., Coch et al., 2002; Coch and Holcomb, 2003). For example, in the study of Coch and Holcomb (2003) 7-year-old children were required to read word lists consisting of stimuli which varied with respect to orthographic familiarity (i.e., real words with differing degree of difficulty for 7-year-old children and nonpronounceable letter strings) and had to respond via button press whenever an animal name was presented. The authors did not report a modulation of the N400 by orthographic familiarity (Coch and Holcomb, 2003). These results together with the findings of the present study suggest that children rely on comparable reading processes for all letter strings independent of orthographic familiarity. Furthermore, as we did not find an effect of phonological familiarity in the time window of the N400, i.e., amplitude differences between phonological familiar (W; PH) and unfamiliar word material (PW), the present study contradicts the assumption that the N400 might reflect post-lexical processing at least in young children (Kutas and Hillyard, 1980; Brandeis et al., 1994; Schulz et al., 2008; for review see Lau et al., 2008; Kutas and Federmeier, 2011).

In children with DD the N400 was nearly absent in the present study (see **Table 4** and **Figure 5**). Reduced N400 activation in subjects with DD has been reported previously (Ackerman et al., 1994; Brandeis et al., 1994; Johannes et al., 1995; Rüsseler et al., 2007; Schulz et al., 2008). The assumption that the N400 might index the searching process for an orthographic representation in the orthographic lexicon and the appliance of graphemephoneme correspondence rules is further strengthened by the partial correlation results. Better spelling performance was correlated to higher N400 mean peak amplitudes irrespective of being diagnosed with DD or not. A prerequisite for correct spelling is both knowledge of grapheme-phoneme correspondence rules and knowledge of orthographic rules (Klicpera et al., 2007). Thus, the correlation between correct spelling and N400 mean peak amplitudes suggests that children at this point in time might be engaged with applying grapheme-phoneme correspondence rules or the searching process for an orthographic representation in the orthographic lexicon. Diminished N400 amplitudes in children with DD point to deficits in these processes. This conclusion is in line with both the phonological (Snowling, 2001; Ramus et al., 2003; Vellutino et al., 2004) and the orthographic core deficit (e.g., Bergmann and Wimmer, 2008; Bekebrede et al., 2009; van der Mark et al., 2009) reported for children with DD in transparent orthographies. According to the phonological deficit hypothesis it is assumed that subjects with DD have difficulties in manipulating and applying graphemephoneme correspondence rules. According to the orthographic core deficit an impaired or delayed access to available orthographic representations or poorer and less specified orthographic representations (Bergmann and Wimmer, 2008; Bekebrede et al., 2009; van der Mark et al., 2009; Marinus and de Jong, 2010) are suggested.

To summarize, control children's N400 mean peak amplitudes suggest that children at the age of eight years rely on comparable reading processes for W, PH and PW, as there was no effect of orthographic familiarity in the N400 time window. With respect to children with DD, N400 amplitudes were significantly reduced indicating less specified orthographic representations or impairments in accessing the orthographic lexicon and in applying grapheme-phoneme correspondence rules.

#### *LPC indexes phonological lexical access in control children and is degraded in DD*

According to hierarchical reading models the last processing step includes the access to the phonological lexicon. In the present study the phonological lexicon was accessed between 600 and 900 ms after stimulus onset in control children as indicated by a phonological familiarity effect for the LPC. That is, the mean area under the curve of the LPC did not differ between W and PH, which share the same phonological representation, but was significantly reduced for PW, which do not have an entry in the phonological lexicon. Interestingly, a small correlation between the LPC mean area under the curve for PH and spelling was found, indicating that independent of group better spelling is correlated to smaller activation for PH. The correlation suggests that orthography has an influence even in this late time window. When inspecting the grand average of the LPC (see **Figure 6**) the activation for PH sharing the same phonological representations with W, but violating the orthographic representation seems to lie between the activation for W and PW, although this does not reach significance. It is possible that children at this stage of reading acquisition are not aware of all orthographic violations posed by PH and might accept PH to be orthographically correct. Thus, it might be speculated that children with more reading and spelling experience might show a decreasing activation pattern from W over PH to PW. In line with previous studies the LPC was distributed over left centro-parietal electrodes (Friedman and Johnson, 2000; Hepworth et al., 2001; Finnigan et al., 2002; Rüsseler et al., 2003; Schulte-Körne et al., 2004; Yonelinas et al., 2005; van Strien et al., 2009; Balass et al., 2010; for review see: Rugg and Curran, 2007). The allocation of the LPC to the left hemisphere is not surprising, as left hemispheric activation has been repeatedly reported for tasks requiring phonological processing (e.g., Price et al., 1997; Rumsey et al., 1997; Shaywitz et al., 2002; Shaywitz and Shaywitz, 2008).

In children with DD the LPC did not differentiate between phonological familiar and phonological unfamiliar word forms (see **Figure 6**). Because previous studies investigated word recognition tasks a direct comparison with our results is not possible. Nevertheless, deficient activation of LPC has also been reported in adolescents (Schulte-Körne et al., 2004) and adults (Rüsseler et al., 2003) with DD in word recognition tasks. For example, in the experiment by Schulte-Körne et al. (2004) participants with and without DD were required to learn pseudoword lists in a first phase and had to indicate in a second phase whether the presented pseudoword was a learned pseudoword or not. The LPC was found to be higher to learned compared to new pseudowords in control children only. This was interpreted as reflecting a specific word recognition memory deficit in DD. In the present study, however we did not examine word recognition and the phonological familiarity on the LPC in control children was interpreted as indicating access to the phonological lexicon. Therefore, the absence of a modulation of the LPC by phonological familiarity might indicate an impaired access to phonological representations or an underspecification of phonological representations (Ramus and Szenkovits, 2008).

To summarize, control children's LPC suggests that at this point in time the phonological lexicon is accessed. With respect to children with DD the lack of phonological familiarity on the LPC indicates an impaired access to phonological representations or an underspecification of phonological representations.

#### **BEHAVIORAL DATA MIRRORS THE CORE DEFICITS OF YOUNG CHILDREN WITH DD**

Overall our results on the behavioral level mirror the main characteristics of DD in transparent orthographies, namely a rather high reading accuracy, which is accompanied by severe deficits in reading speed. Children with DD in the present study displayed rather high accuracy rates (between 70 and 85%), however they were substantially delayed in their reaction times for all lexical conditions compared to control children. There is evidence that the reading speed deficit observed in subjects with DD in transparent orthographies can be traced back to a persistent reliance on the non-lexical route (e.g. De Luca et al., 2002; Zoccolotti et al., 1999, 2005). However, it has been shown that the reading speed deficit in DD can be ascribed to both non-lexical and lexical route reading (Bergmann and Wimmer, 2008). Children (Moll and Landerl, 2009) and adolescents (Bergmann and Wimmer, 2008) with DD do indeed engage in visual whole word processing and read via the lexical route for orthographically known words, but their reading speed is impaired. Thus, the prolonged reaction times in the present study and the response pattern, which was similar to control subjects (FF *<* W *<* PH *<* PW) suggest that subjects with DD might rely on comparable reading processes as control children at least for some items. Overall the behavioral results in the present study replicate findings of former studies (Bergmann and Wimmer, 2008; van der Mark et al., 2009, 2011; Wimmer et al., 2010). Compared to the children examined in the study of van der Mark et al. (2009, 2011) reaction times were longer in both control children and children with DD in the present study. This is probably due to age differences. Children in the study of van der Mark et al. (2009, 2011) were three years older and had more reading experience. Proportionally, however, the speed impairment of subjects with DD compared to control subjects remained stable across both studies. This is in line with longitudinal studies, showing that the gap between skilled and less skilled readers in reading performance still remains over time although both high and poor performers develop in word reading (e.g., Klicpera et al., 1993; Shaywitz et al., 1999).

# **LIMITATIONS OF THE STUDY**

One limitation of the present study is that the behavioral data does not match to the ERP data. Whereas the reaction time results suggest that children might use orthographic representations for reading orthographic familiar word material (W) and might rely on grapheme-phoneme correspondence rules for orthographic unfamiliar word material (PH; PW) we were not able to detect different reading processes depending on orthographic familiarity in the ERP data. Neither the N170, nor the N400 showed an orthographic familiarity effect in form of lower amplitudes for orthographic familiar compared to orthographic unfamiliar word forms. In contrast, fMRI studies did find support for engaging both routes in children, adolescents and adults (Kronbichler et al., 2007; Bruno et al., 2008; van der Mark et al., 2009; Wimmer et al., 2010).

We would like to offer three explanations for the discrepancy observed between our behavioral and ERP data. Although van der Mark et al. (2009) did report an orthographic familiarity effect in children, the children in the present study were three years younger and less experienced readers, suggesting that one possible explanation for the lack of orthographic familiarity in the ERP might be the younger age of the children investigated in the present study. It has been proposed that the orthographic familiarity effect is the result of reading experience (Reicher, 1969). It might be that the effect of orthographic familiarity is only partly developed in 8-year-old children as it has been observed on the behavioral level but not in the ERP data. Another possible explanation might be that children differ with respect to their reading development, as indicated by great variance of the ERP measures, even though they were very similar with respect to reading performance, IQ and age on the behavioral level, thus masking an effect of orthographic familiarity. Enlarging the sample size might have reduced the variance observed in the ERP data and may have led to an effect of orthographic familiarity. A third explanation for the absence of an orthographic familiarity effect in the ERP data might be that children rely on comparable reading processes for W, PH, and PW, however after having accessed the phonological representation they need more time to decide whether the presented word exists or not. This is supported by the long reaction times, especially for PW. Whereas the LPC indicates that the phonological lexicon has been accessed between 600–900 ms after stimulus onset, most children responded to PW more than one second later, suggesting that they might have been insecure whether the presented stimuli was a real word or not. Longitudinal studies are necessary in order to better understand the discrepancies between the behavioral and ERP findings and in order to clarify at which age and reading level an orthographic familiarity effect can be also observed in the ERP data.

# **CONCLUSION**

In the present study we attempt to provide a temporal model of reading processes in normal developing children by relating our ERP findings to single processing steps suggested by dual route models of reading in order to clarify which processing steps are degraded in children with DD during reading. ERPs provide evidence for deficient processes from the very first processing stage until the last processing stage. To summarize, a reduced mean area under the curve for the word material-false font contrasts in the time window of the N170 suggested a reduced degree of print sensitivity. Furthermore, diminished N400 amplitudes pointed to less specified orthographic representations or to deficits in accessing the orthographic lexicon and in applying grapheme-phoneme correspondence rules. And lastly, the lack of phonological familiarity on the LPC indicated an impaired access to phonological representations or an underspecification of phonological representations. These deficits are in line with the orthographic and phonological core deficit reported for subjects with DD in transparent orthographies. The results of our study suggest that effective treatment should include both orthographic and phonological training. In general more longitudinal studies and studies investigating adults utilizing the same task and stimuli are needed to clarify how the observed processing steps and their time course change during reading development and how they differ from mature reading processes, which in turn has major implications on reading instructions in school and in therapeutic settings for children with DD.

#### **ACKNOWLEDGMENTS**

This research was supported by grant of the Bundesministerium für Bildung und Forschung (Grant Number 01GJ1001). Special thanks to all of the children and their parents, who were so kind and willing to participate in this study and who continue to take part in many important studies.

# **REFERENCES**


correlates of anomalous phonological processing during spoken word recognition. *Cogn. Brain Res*. 21, 360–376. doi: 10.1016/j. cogbrainres.2004.06.010


*1; 5. revidierte Auflage)*. Göttingen: Hogrefe.


771–785. doi: 10.1016/ S0028-3932(98)00133-X


*Psych*. 31, 235–242. doi: 10.1026// 1616-3443.31.4.235


Available online at: http://www. ids– mannheim.de/kl/dokumente/freqM easures.html


for semantics: (De)constructing the N400. *Nat. Rev. Neurosci*. 9, 920–933. doi: 10.1038/nrn2532


*Front. Hum. Neurosci*. 2:18. doi: 10.3389/neuro.09.018.2008


1, 73–86. doi: 10.1016/0926-6410 (93)90013-U


memory for high- and low- frequency words in adult normal and dyslexic readers: An eventrelated brain potential study, *J. Clin. Exp. Neuropsychol*. 25, 815–829. doi: 10.1076/jcen.25.6.815.16469


*Neuroimage* 41, 153–168. doi: 10.1016/j.neuroimage.2008.02.012


of automatic word processing: Language lateralization of early ERP components in children, young adults and middle-aged subjects. *Biol. Psychol*. 80, 35–45. doi: 10.1016/j.biopsycho.2008.01.012


doi: 10.1046/j.0021-9630.2003. 00305.x


doi: 10.1016/j.bandl.2004. 10.010

Zoccolotti, P., De Luca, M., Di Pace, E., Judica, A., Orlandi, M., and Spinelli, D. (1999). Markers of developmental surface dyslexia in a language (Italian) with high grapheme-phoneme correspondence. *Appl. Psycholinguist*. 20, 191–216. doi: 10.1017/S0142716499 002027

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 May 2013; accepted: 26 August 2013; published online: 07 October 2013.*

*Citation: Hasko S, Groth K, Bruder J, Bartling J and Schulte-Körne G (2013) The time course of reading processes in children with and without dyslexia: an ERP study. Front. Hum. Neurosci. 7:570. doi: 10.3389/fnhum.2013.00570*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Hasko, Groth, Bruder, Bartling and Schulte-Körne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **APPENDIX**

Stimuli of the phonological lexical decision task; W, words; PH, pseudohomophones; PW, pseudowords; FF, false fonts and English translation.



*(Continued)*

# Paying attention to orthography: a visual evoked potential study

# *Anthony T. Herdman\* and Osamu Takai*

*BRANE Lab, School of Audiology and Speech Sciences, University of British Columbia, Vancouver, BC, Canada*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Alan C.-N. Wong, The Chinese University of Hong Kong, Hong Kong Roberta Adorni, University of Milano-Bicocca, Italy*

#### *\*Correspondence:*

*Anthony T. Herdman, Faculty of Medicine, School of Audiology and Speech Sciences, University of British Columbia, 2177 Wesbrook Mall, Vancouver, BC V6T 1Z3, Canada. e-mail: aherdman@ audiospeech.ubc.ca*

In adult readers, letters, and words are rapidly identified within visual networks to allow for efficient reading abilities. Neuroimaging studies of orthography have mostly used words and letter strings that recruit many hierarchical levels in reading. Understanding how single letters are processed could provide further insight into orthographic processing. The present study investigated orthographic processing using single letters and pseudoletters when adults were encouraged to pay attention to or away from orthographic features. We measured evoked potentials (EPs) to single letters and pseudoletters from adults while they performed an orthographic-discrimination task (letters vs. pseudoletters), a color-discrimination task (red vs. blue), and a target-detection task (respond to #1 and #2). Larger and later peaking N1 responses (∼170 ms) and larger P2 responses (∼250 ms) occurred to pseudoletters as compared to letters. This reflected greater visual processing for pseudoletters. Dipole analyses localized this effect to bilateral fusiform and inferior temporal cortices. Moreover, this letter-pseudoletter difference was not modulated by task and thus indicates that directing attention to or away from orthographic features did not affect early visual processing of single letters or pseudoletters within extrastriate regions. Paying attention to orthography or color as compared to disregarding the stimuli (target-detection task) elicited selection negativities at about 175 ms, which were followed by a classical N2-P3 complex. This indicated that the tasks sufficiently drew participant's attention to and away from the stimuli. Together these findings revealed that visual processing of single letters and pseudoletters, in adults, appeared to be sensory-contingent and independent of paying attention to stimulus features (e.g., orthography or color).

#### **Keywords: orthography, visual evoked potential (VEP), attention, reading, dipole modeling**

# **INTRODUCTION**

Single-letter perception is a prerequisite to word perception and research is starting to unravel the mystery of how the brain processes such basic building blocks of literacy. Reaction times to letters are faster than to symbols or pseudoletters indicating that somewhere along the visual processing stream familiar letters are processed faster (LaBerge, 1973; Herdman, 2011). This might be caused by increased neural activity to letters or faster responding neural ensembles. Evidence for increased neural activity comes from previous neuroimaging research that showed visual evoked responses between 140–190 ms were larger to letters as compare to symbols or pseudoletters (Miller and Wood, 1995; Eulitz et al., 1996; Tarkiainen et al., 1999; Pernet et al., 2003, 2005; Maurer et al., 2005, 2008; Wong et al., 2005; Appelbaum et al., 2009). A negative response recorded from left inferior temporal cortices, termed the N200, has also been shown to be larger for words than for faces or objects (Nobre et al., 1994). However, later responses between 200 and 400 ms were shown to be greater for pseudoletters than letters (Miller and Wood, 1995; Wong et al., 2005; Herdman, 2011). Such processing advantages for letters have been suggested to be a result of language-dominant networks within the left inferior temporal cortices used for word

reading (Miller and Wood, 1995; Eulitz et al., 1996; Tarkiainen et al., 1999; McCandliss et al., 2003; Pernet et al., 2003, 2005; Cohen and Dehaene, 2004; Flowers et al., 2004; James et al., 2005; Maurer et al., 2005, 2008; Wong et al., 2005; Joseph et al., 2006). Conversely, a few other studies showed consistently early visual processing differences between letters and pseudoletters across bilateral visual cortices with a possible right-hemispheric dominance (Appelbaum et al., 2009; Herdman, 2011). This provides evidence that orthographic processing is recruiting more bilateral networks, as has been previously proposed (Tagamets et al., 2000). Correspondingly, an fMRI study contrasting falsefont strings with words or word-like characters showed a greater signal change in the left inferior temporal regions to words than false-font strings but conversely greater signal change in the right hemisphere to false-font strings than words (Vinckier et al., 2007). The authors suggested that false-font strings might capture greater attention because they are unfamiliar objects and thus recruit more resources within extrastriate regions. This is in line with our previous proposal that pseudoletters elicit prolonged processing within the right extrastriate regions (Herdman, 2011). Furthermore, modulation of neural activity associated with orthographic processing is consistent with findings from Ruz and Nobre (2008) showing that attention to orthography modulated early N200 to words more so than attention to phonology or semantics. However, the attention-related modulation of ERP differences between words and false-font strings were not reported in that study and thus it is difficult to interpret how attention might modulate processing differences between letters and pseudoletters. The current study addressed this issue by manipulating attention toward or away from orthographic features of single letters and pseudoletters.

As compared to the neuroimaging literature on word processing (for reviews see Price, 2000; McCandliss et al., 2003; Price and Delvin, 2003; Cohen and Dehaene, 2004; Dehaene et al., 2005; Maurer et al., 2005, 2008; Grainger et al., 2008), the literature on single-letter processing is less well-developed (e.g., Miller and Wood, 1995; Tarkiainen et al., 1999; James et al., 2005; Wong et al., 2005; Grainger et al., 2008; Appelbaum et al., 2009; Herdman, 2011). Initial stages of reading acquisition are dependent on single-letter recognition (e.g., grapheme-to-phoneme encoding) and thus it is important to understand how the human brain processes individual letters. Interpretations of low-level orthographic processing have mainly been inferred from studies investigating orthography in tasks involving word and letter-string recognition (Grainger et al., 2008). These tasks likely prime neural networks associated with word recognition, such as the visual word form system that could potentially recruit additional processes beyond low-level orthographic processes. For instance, participants are faster at identifying letters in words than when presented alone, commonly known as the word superiority effect (Reicher, 1969; McClelland and Rabinovitch, 1981). Thus, tasks that compare words to letter strings might be recruiting hierarchical processes beyond that of single-letter processing. Evidence for extra processing can be seen in ERP recordings to words or letter strings as compared to single letters in that character strings elicited broader N1 responses as compared to single characters (Wong et al., 2005). Measuring neural responses to single-letters would provide further information about the underpinnings of low-level orthographic processing.

The inconsistent findings for orthographic-related processing within the literature might be due to differences in attention demands on stimulus features as driven by task set or stimulus familiarity (letters vs. pseudoletters). For instance, targetdetection tasks that asked participants only to respond after a target (e.g., Appelbaum et al., 2009) might have minimally activated the networks responsible for orthographic processing as compared to tasks that asked participants to discriminate between letters and pseudoletters on a trial-by-trial basis (e.g., Herdman, 2011). Attention is likely less focused on the orthographic stimuli during target-detection tasks than orthographic-discrimination tasks. Reduced attention to a stimulus feature, such as color, is known to modulate early visual processing as evidenced by an early selection negativity (SN) between 140 to 180 ms when attending to stimulus color (Hillyard and Anllo-Vento, 1998). Whether such attention to stimulus feature modulates early orthographic processing differences needs further research. Thus, we investigated the hypothesis that tasks encouraging participants to directly pay attention to orthographic features would enhance early orthographic processing differences between letters and pseudoletters (Herdman, 2011), as compared to tasks that did not encourage recruitment of orthographic networks, such as a color discrimination task or a non-orthographic target-detection task. Contrarily, letters become highly consolidated and relevant for adults who have gained a large amount of experience with these familiar visual objects. Thus, early orthographic processing within the lower-visual centers might be automatic and not task dependent. If this alternative hypothesis is correct then there will be little, if any, change in the early orthographic processing differences between letters and pseudoletters due to directing attention to or away from orthographic features. We used evidence from visual evoked potentials among three tasks (orthography discrimination, color discrimination, and target detection) to determine whether early visual processing of letters and pseudoletters are modulated by paying attention to orthographic features.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Fifteen right-handed participants (age 18–28 years; 8 female) volunteered for this study. Participant's handedness was determined by Edinburgh Handedness Inventory (Oldfield, 1971). Due to insufficient ERP trials (*<*40) after artifact rejection of EEG artifacts, datasets from four participants were excluded from this study. All participants disclosed that they had no known sensory or cognitive impairments. Participants were screened for normal 20–20 visual acuity (with corrected lenses) and for color blindness. Informed consent was signed by all participants. This study was approved by the Research Ethics Board at Simon Fraser University, Canada. The experiment lasted for approximately 50 min, consisting of 15–20 min for electrode set-up and 30 min for ERP recording. Participants received a \$10 honorarium.

#### **STIMULI AND TASK**

Visual stimuli were upper-case, roman-alphabetic letters (A, B, D, E, G, H, J, N, P, R, T, U, and Y), pseudoletters (mixed line forms of the letters: A, B, D, E, G, H, J, N, P, R, T, U, and Y), and numbers (1 and 2) presented as red or blue characters on a gray background (**Figure 1**). Stimuli covered 60 × 60 pixels at the centre of a 19 VGA monitor with a resolution of 600 × 800 pixels situated approximately 70 cm in front of the participant's eyes. Stimuli were randomly presented for a duration of 500 ms in the central visual field. Stimuli were followed by a black fixation dot on the gray background shown for a random duration between 1500 and 2000 ms. Presentation software (NeuroBehavioral Systems Inc., Albany, CA) was synchronized to the VGA monitor's refresh rate in order to accurately synchronize the stimulus onset with the trigger pulse that was sent to the EEG recording computer.

Participants performed three tasks in separate randomlyassigned blocks. A participant was asked to press one of two buttons with his/her right hand to discriminate between letters and pseudoletters (Orthography Task), to discriminate between red and blue stimuli (Color Task), and to detect target numbers 1 and 2 (Target Task). For the Orthography and Color tasks, 200 letters and 200 pseudoletters were randomly presented across three blocks of 133, 133, and 134 trials with each block lasting about 5 min. Participants were given approximately 30 s of rest between blocks. For the Target task, 200 letters, 200 pseudoletters, and 50

targets (25 number "1" and 25 number "2") were randomly presented across three blocks of 150 trials with each block lasting about 5 min. Participants were given approximately 30 s of rest between blocks. For the Target task, participants were asked to detect when a number 1 or 2 appeared on the screen by pressing only one button and to ignore the other stimuli (i.e., letters and pseudoletters). Participants were asked to press buttons as accurately and as fast as possible. This allowed us to collect behavioral response accuracy and reaction times to stimuli when button presses were required.

# **DATA ACQUISITION**

EEG was collected using a 136-channel BIOSEMI system (BIOSEMI, www*.*biosemi*.*com). Scalp electrodes (128 channels) were situated within a cap in a modified 10–5 configuration with two additional mastoid electrodes (M1 and M2), two inferior occipital electrodes (SI3 and SI4), and four electrooculogram electrodes (SO1, IO1, LO1, and LO2). EEG was amplified and sampled at a rate of 1024 Hz with a band-pass filter of 0.16– 256 Hz. For online collection, the 136-electrodes were referenced to a common electrode placed between CPz and CP2. For offline analyses, the 132 scalp-electrodes (excluding electrooculogram channels) were re-referenced to their average reference.

# **DATA ANALYSES**

#### *Behavioral*

Behavioral accuracy and reaction times were determined from the participants' button presses for each task. Trials with correct button presses within the post-stimulus interval of 100–1500 ms were used to calculate accuracy and reaction times. Correct responses (hits) were correct button presses to corresponding stimulus type (letters and pseudoletters) for the Orthography task, correct button presses to stimulus color (red and blue) for the Color task, and correct button presses to numbers (1 or 2) for the target task. False alarms were considered as incorrect button responses and misses were considered as no button responses when participants should have pressed a button. We performed One-Way analysis of variances (ANOVAs) on accuracy (hits, false alarms, and misses) and reaction times among stimulus types (letter, pseudoletter, red, blue, target). Tukey-Kramer *post-hoc* tests were performed on significant ANOVA effects. Statistical results were considered significant at *p <* 0*.*05.

# *Event-related potentials (ERPs)*

ERPs were time locked to the each stimulus onset and epoched to yield trials of −500 to 1500 ms. Trials with ERPs exceeding ±100 microV between −350 and 850 ms were rejected from further analyses. We subsequently performed a principle component artifact reduction procedure with a principle component threshold of ±100 microV between −500 to 1500 ms in order to reduce the rising and falling edges of artifacts that might remain within the interval of −350 to 850 ms window (Picton et al., 2000). This ensured that the artifacts did not contaminate the prestimulus interval during baseline correction between −200 to 0 ms. The mean, standard deviation, and range (in parentheses) for artefact-free trials for each Task-Stimulus type are as follow: Orthography-Letters = 125 ± 36 (42–172); Orthography-Pseudoletters = 125 ± 35 (44– 159); Color-Letters = 117 ± 41 (45–158); Color-Pseudoletters = 130 ± 49 (43–182); Target-Letters = 122 ± 26 (42–153); Target-Pseudoletters = 125 ± 17 (87–145); and Target-Targets = 47 ± 13 (20–69). Artifact-free trials were averaged across trials and filtered using a 30-Hz low-pass filter to obtain evoked potentials (EPs) for each stimulus type (letters and pseudoletters) within each task condition (Orthography, Color, and Target). For the purpose of this study, we only investigated the EPs to letters and pseudoletters among tasks. Target stimuli (numbers 1 and 2) were excluded from our analyses and results. We also calculated the global field power (GFP) as the root-mean-squared values of the EPs averaged across the scalp electrodes (excluding the electrooculogram electrodes) for each sample.

We performed Two-Way ANOVAs on the EP and GFP waveforms averaged over 25 ms intervals spanning from −100 to 600 ms across Tasks (Orthography, Color, and Target) and Stimulus type (letter and pseudoletter). Main effects and interactions were considered significant at *p <* 0*.*05. Tukey-Kramer *post-hoc* tests were performed on significant ANOVA main effects of Task. *Post-hoc* results were considered significant at *p <* 0*.*05. We also evaluated ANOVA and *post-hoc* results at significance levels of *p <* 0*.*01 and *p <* 0*.*001.

In addition to statistical testing across samples, we performed statistical analyses on the P1, N1, and P2 peak amplitudes and latencies at electrodes PO9h, PO10h, P7, and P8. These electrode sites were chosen because they had significant Stimulus effects from the Two-Way ANOVAs described above. An experienced rater manually identified peak responses with a maximum between 50–100 ms as P1, a first minimum between 50–250 ms as N1, and a maximum between 150–300 ms as P2 for electrodes PO9h, PO10h, P7, and P8. In addition, P3 peaks were identified in electrode Pz as a maximum between 200 and 600 ms. Three-Way ANOVAs were performed for peak amplitudes and latencies for the P1, N1, P2, and P3 peaks across stimulus type (letter and pseudoletter), tasks (Orthography, Color, and Target) and hemisphere (left hemisphere = averaged PO9h and P7; right hemisphere = averaged PO10h and P8).

#### *Dipole modeling*

Dipole modeling using BESA software (BESA GmbH; www*.*besa*.* de) was performed *post-hoc* on EP difference waveforms for significant main effects of Task (Orthography, Color, Target) and Stimulus (letter vs. pseudoletter). This was done to determine the possible source locations of processing differences between Tasks and Stimulus types. For the Task-effects model, a pair of symmetrically-constrained dipoles was fitted to significant differences that occurred between 175 and 200 ms for the Orthography vs. Target and Color vs. Target contrasts (i.e., a selection negativity component). A third dipole was fitted to the significant differences between 225 and 250 ms for the Color vs. Target contrast (i.e., an N2 component). A fourth dipole was fitted to the significant differences between 300 and 500 ms for the Orthography vs. Target and Color vs. Target contrasts (i.e., a P3 component). Residual variances for the source modeling of the difference waves were less than 10% for all intervals. Talairach locations for these dipoles were *x* = ±45*.*5, *y* = −56*.*0, *z* = −17*.*2 mm (left/right fusiform gyri); *x* = 4*.*1, *y* = 2*.*9, *z* = 49*.*9 mm (medial frontal gyrus); and *x* = −3*.*6, *y* = −61*.*0, *z* = 5*.*3 mm (lyngual gyrus). For the Stimulus-effects model, two pairs of symmetrically constrained dipoles were used to model the significant differences occurring between 150–200 ms (around the N1 peak) and between 225–300 ms (around the P2 peak). Residual variances for the source modeling of the difference waves (letter minus pseudoletter) were less than 10% for both intervals. Talairach locations for these dipoles were *x* = ±42*.*6, *y* = −72*.*4, and *z* = −14*.*4 mm (left/right fusiform gyri); and *x* = ±41*.*4, *y* = −62*.*1, and *z* = −0*.*6 mm (left/right inferior temporal gyri).

Similar to the statistical analyses used for the EP waveforms, we performed Two-Way ANOVAs on the dipole waveforms averaged over 25 ms intervals spanning from −100 to 600 ms across Tasks (Orthography, Color, and target) and Stimulus type (letter and pseudoletter). This was done for both the dipole models of EP difference waveforms for the Task and Stimulus effects. Tukey-Kramer *post-hoc* tests were performed on the significant ANOVA main effects of Task. ANOVA and *post-hoc t*-test results were considered significant at *p <* 0*.*05. We also evaluated ANOVA and *post-hoc* results at significance levels of *p <* 0*.*01 and *p <* 0*.*001.

# **RESULTS**

#### **BEHAVIORAL RESPONSES**

Behavioral responses showed participants were highly accurate at discriminating among stimuli and detecting targets (see **Table 1**). However, ANOVA and Tukey-Kramer *post-hoc* testing revealed that participants were less accurate at pressing the correct button to red stimuli in the Color task than to any other stimuli across tasks (see **Table 1** for means; *F* = 7*.*2; *df* = 4*,* 50; *p* = 0*.*0001). This was a result of making more false alarms to red stimuli as compared to other stimuli (see **Table 1** for means; *F* = 14*.*1; *df* = 4*,* 50; *p <* 0*.*0001) and not misses (*F* = 0*.*56; *df* = 4*,* 50; *p* = 0*.*6897). ANOVA results for RTs did not support significant differences in RTs among stimulus type (letter, pseudoletter, red, blue, target) (see **Table 1** for means; *F* = 2*.*52; *df* = 4*,* 50; *p* = 0*.*0526). Although the ANOVA results for RTs were close to significance, this was driven by reaction times to targets being most delayed as compared to the other stimulus types (see **Table 1**).

#### **GFP AND EP WAVEFORMS**

GFP waveforms showed typical responses patterns of P1, N1, P2, and P3 peaks to visual stimuli (**Figure 2**, top graph). Comparisons across Task (Orthography, Color, and Target) revealed that GFPs between 175 and 200 ms were significantly (*p <* 0*.*05) greater for Color vs. Target task and close to being significantly greater (*p* = 0*.*089) for the Orthography vs. Target task. GFPs between 375 and 600 ms were significantly greater for the Orthography task as compared to the Color and Target tasks. GFPs between 450 and 525 ms were significantly greater for the Color task as compared to the Target task. For the Stimulus effects, GFPs between 150– 200 ms, 225–275 ms, and 450–500 ms were significantly greater for pseudoletter than letter stimuli (**Figure 2**, middle graph). There were no significant interactions of Task by Stimulus on GFP (**Figure 2**, bottom graph).

EP waveforms showed typical P1-N1-P2 responses to the letter and pseudoletter stimuli (**Figures 3** and **4**). Because participants were asked to attend to and press buttons to letter and pseudoletter stimuli in Orthography and Color tasks, additional attentionrelated EP responses (N2 and P3) occurred as compared to the Target task in which participants disregarded the letter and pseudoletter stimuli. In addition, Orthography and Color tasks evoked a significantly greater negative response between 175 and 200 ms (around the N1) as compared to the Target task at POz (**Figure 3**, top graph). Topographies of the differences among Tasks revealed that the greater negativity has a posterior scalp distribution for the Orthography vs. Target and Color vs. Target contrasts. This has a similar posterior scalp distribution and timing as an SN response that has been previously reported (Hillyard and Anllo-Vento, 1998). At central electrode sites (e.g., FCCh1), EPs were significantly greater between 225 and 250 ms for Color vs. Target task (**Figure 3**, middle graph). Scalp topography for this contrast revealed a central distribution of this negativity, stereotypical of an N2b component. Although the Orthography vs. Target contrast did not reach statistical significance at *p <* 0*.*05, *p*-value for this contrast between 225 and 250 ms was 0.09 and its topography was strikingly similar to the Color vs. Target topography. Significant EP differences among Tasks were evident at Pz spanning 300 and 550 ms (**Figure 3**, bottom graph). Similar to the GFP results, EPs at Pz in this interval were greatest for the Orthography

#### **Table 1 | Behavioral results.**


stimulus type (letters and pseudoletters) and waveforms for the Stimulus effect are averaged across tasks (Orthography, Color, and Target). Waveforms

revealed significant interactions (data not shown). To further support these findings we calculated peak amplitudes and latencies for the P1, N1, P2, and P3 responses. These are shown in **Tables 2**

and **3** and presented below with ANOVA results.

# *P1 peak responses*

*p <* 0*.*05, *p <* 0*.*01, and *p <* 0*.*001.

Peak P1 amplitudes averaged across tasks and stimulus types were significantly larger in the right hemisphere (averaged across P8 and PO10h electrodes; 3.61 ± 2.11 µV) than the left hemisphere (averaged across P7 and PO9h electrodes; 2.11 ± 1.83µV) (*F* = 16*.*96; *df* = 1*,* 112; *p <* 0*.*0001). No other ANOVA effects or interactions for P1 amplitudes were found to be significant (*p >* 0*.*20). A significant ANOVA hemispheric effect for P1 latencies revealed P1 peaked earlier in the right (96 ± 10 ms) than left hemisphere (100 ± 9 ms) (*F* = 4*.*59; *df* = 1*,* 112; *p* = 0*.*0343). No other ANOVA effects or interactions for P1 latencies were found to be significant (*p >* 0*.*17).

#### *N1 peak responses*

Peak N1 responses were significantly larger to pseudoletters (−6*.*66 ± 2*.*92µV) than to letters (−5*.*47 ± 3*.*07µV) (*F* = 5*.*213; *df* = 1*,* 112; *p* = 0*.*0243). ANOVA results also revealed

task, next for the Color task, and then for the Target task. The topographies between 425 and 450 ms among the Task contrasts showed typical P3 scalp distributions with peak responses occurring over parietal regions (**Figure 3**, bottom topographies).

Stimulus comparison results showed that pseudoletters evoked greater and later peaking N1 waves between 100 and 200 ms than did letters (**Figure 4**). The significant difference in the 100–125 ms interval appeared to result from a delayed N1 onset to pseudoletters than to letters. In addition to these differences in the N1 interval, P2 responses peaking around 250 ms were greater to pseudoletters than to letters over parietal sites (e.g., P6), with a right hemispheric dominance. Topographies revealed that the significant N1 and P2 differences were mainly recorded over the parieto-occipital scalp.

Contrary to our hypothesis that the N1and P2 responses differences between letters and pseudoletters would be reduced when attention was drawn away from categorizing stimuli, we found no statistical support for interactions of Task by Stimulus at electrode sites (PO10h, PO9h, and POz), which clearly showed significant main effects of Task or Stimulus (**Figure 5**). All tasks showed the same difference waves between letters and pseudoletters. Additionally, none of the other scalp recordings

a significant hemispheric effect whereby N1 amplitudes were larger in the left (−7*.*06 ± 2*.*83µV) than right hemisphere (−5*.*08 ± 2*.*95µV) (*F* = 14*.*475; *df* = 1*,* 112; *p* = 0*.*00023). No other ANOVA effects or interactions for N1 amplitudes were found to be significant (*p >* 0*.*15). N1 responses peaked significantly earlier to letters (150 ± 17 ms) than pseudoletters (165 ± 13 ms) (*F* = 29*.*419; *df* = 1*,* 112; *p <* 0*.*00001).

#### *P2 peak responses*

Peak P2 responses were significantly larger to pseudoletters (5.75 ± 3.64µV) than to letters (4.21 ± 3.38µV) (*F* = 5*.*801; *df* =

1*,* 112; *p* = 0*.*0177). No other ANOVA effects or interactions for P2 amplitudes were found to be significant (*p >* 0*.*2). Peak P2 latencies were not found to show any significant effects or interactions among task, stimulus type, and hemisphere (*p >* 0*.*06).

#### *P3 peak responses*

Peak P3 responses were significantly larger for the Orthography (6.41 ± 3.19µV) and Color (6.06 ± 3.54µV) tasks as separately compared to Target task (3.1 ± 2.66µV) (*F* = 5*.*801; *df* = 1*,* 112; *p* = 0*.*0177). No other ANOVA effects or interactions for P3 amplitudes were found to be significant (*p >* 0*.*60). ANOVA and *post-hoc* testing revealed that P3 responses peaked significantly later for the Orthography task (394 ± 50 ms) as separately compared to the Color (333 ± 38 ms) and Target (344 ± 34 ms)

tasks (*F* = 12*.*447; *df* = 2*,* 56; *p <* 0*.*0001). No other ANOVA effects or interactions for P3 latencies were found to be significant (*p >* 0*.*87).

# **DIPOLE WAVEFORMS**

Dipole-source waveforms showed significant effects in the Taskeffects and Stimulus-effects models (**Figures 6**–**9**) similar to those seen in the EP waveforms (**Figures 3**–**5**). The Task-effects source model (**Figure 6**) had significantly larger N1 responses in the right fusiform gyrus (dipole 1L) for the Orthography and Color tasks as compared to the Target task. Although this effect was not significant (*p >* 0*.*15) in the right fusiform gyrus (dipole 1R) the waveforms showed the same larger N1 responses, as seen in the left fusiform gyrus, for the Orthography and Color tasks as compared to the Target task. The N2 effect was localized to the medial frontal gyrus (dipole 2), which showed significant N2 differences among all task contrasts. This source had a large and prolonged N2 response for the Orthography task, a smaller and narrower N2 for the Color task, and a minimally evident N2 for the Target task. Because of the prolonged nature of the N2 for the Orthography task, it was significantly larger than the N2 for the Color task. The P3 effect was localized to the midline of the lingual gyrus (dipole 4). This dipole had large responses for the Orthography and Color tasks and minimal responses for the Target task. Task contrasts revealed that the P3 response was significantly prolonged, extending out to about 500 ms, for the Orthography task as compared to the P3 response for the Color Task that peaked around 330 ms. Source waveforms for the differences between Letters and Pseudoletters for the Task-effects dipole model showed little, if any, disparity among tasks (**Figure 7**). Moreover, the statistical interaction of Task by Stimulus revealed no evidence that tasks modulated the responses differences between letters and pseudoletters (**Figure 7**).

The Stimulus-effects dipole model localized the EP differences between letters and pseudoletter to bilateral fusiform gyri (**Figure 8**). Source waveforms showed that bilateral fusiform gyri generated significantly larger N1 responses (between 150 and 200 ms) to pseudoletters than to letters. This is consistent with the EP results shown in **Figure 4**. This model further revealed that the right inferior temporal region (dipole 2R) had significantly larger P2 responses (225–325 ms) to pseudoletters than to letters. This result is consistent with the Stimulus effect shown at the P6 electrode (see **Figure 4**). We found no statistical evidence to support significant stimulus type differences in P2 responses in the left hemispheric source (dipole 2L). In addition, dipole 2R had significantly larger responses to pseudoletters than to letters between 350 and 475 ms. Interactions of Task by Stimulus, yet again, showed that difference waveforms (letters minus pseudoletters) showed little, if any, differences among tasks. We found no statistical evidence (i.e., no interaction of Task by Stimulus) to support the hypothesis that task modulated the differences between letters and pseudoletters (**Figure 9**).

# **DISCUSSION**

A main finding from this study was that the early response differences between letters and pseudoletters occurring around 170 ms were not affected by task demands that encouraged attention to be directed toward (Orthography task) or away from (Color and Target tasks) orthographic stimulus features. This provides evidence that early orthographic processing of single letters is not largely influenced by selective attention to stimulus features, at least with respect to the task demands used within this study. In addition, attention did not affect the P2 differences seen in the right hemisphere. Thus, our results are in opposition to previous findings that showed attention to orthography of word stimuli enhanced early (N200) responses as compared to attention to phonology and semantics of words, which modulated later EP components (Ruz and Nobre, 2008). One explanation for our discrepant findings is that we used single character stimuli; whereas Ruz and Nobre (2008) used words and character strings. Thus, stimulus complexity and lexical retrieval might recruit higher levels of visual processes that might be influenced by top-down attention. Another difference between studies is that we used a block design for task manipulation that could have resulted in participants paying attention to letters and pseudoletters to the same degree for all tasks. However, we attempted to control for

#### **Table 2 | Peak EP amplitudes.**


#### **Table 3 | Peak EP latencies.**


**gyri (dipoles 1L and 1R), medial frontal gyrus (dipole 2), and** such order effects by randomly assigning task-block order across participants. Moreover, participants' attention appeared to be

successfully manipulated across tasks as expected because selection negativities (SN) and N2 responses were apparent for the Orthography and Color tasks but not for the Target task (see **Figures 3** and **6**). The selection negativities associated with paying attention to a stimulus feature (Orthography or Color) that occurred between 175 and 200 ms had a similar scalp topography and source locations as to those shown previously (Hillyard and Anllo-Vento, 1998). In addition, the N2 following the SN had a typical topography of an attention-related N2b response, also referred to as the anterior N2 (Folstein and Van Petten, 2008). Further indication that this study's tasks modulated participants' attention was that P3 responses increased in amplitude with increasing task demands on directing attention to orthography and color (Orthography-task P3 *>* Color-task P3 *>* Target-task P3). In contrast to our study, Ruz and Nobre (2008) used a trialto-trial cueing paradigm for drawing participants' attention to

for waveform plots is in nAmp.

**FIGURE 7 | Grand-mean source waveforms for the Interaction of Task by Stimulus for the Task-effects model (inset) with bilateral dipoles in the fusiform gyri (dipoles 1L and 1R), medial frontal gyrus (dipole 2), and medial lingual gyrus (dipole 3).** Waveforms are plotted as the differences

between letter and pseudoletters for each task (Orthography, Color, and Target). No statistical evidence of significant interactions were found in these source waveforms at *p <* 0*.*05, *p <* 0*.*01, and *p <* 0*.*001. Vertical axis scale for waveform plots is in nAmp.

orthographic, phonologic, or semantic stimulus features. Thus, task procedures and sensory-to-motor mapping were required to be maintained throughout the block and could have recruited networks associated with perceptual and motor processes in which attention could modulate activity. Furthermore, attention effects in their study were only provided for the word stimuli and thus differences in orthographic processing between words and false-font strings are not available for comparison to the present study's results.

Another main result from this study is that we further replicated the findings that the N1 peaked earlier to letters than pseudoletters and that P2 responses are greater to pseudoletters than letters (Appelbaum et al., 2009; Herdman, 2011). These findings add support to the notion that letters are processed faster and to a lesser degree than pseudoletters. This makes sense because adult participants had many years of consolidating visual templates for familiar letters as compared to unfamiliar pseudoletters; thus template matching for letter recognition should be

fairly automatic and require minimal processing. This is in line with many models of reading (e.g., McClelland and Rabinovitch, 1981; Price, 2000; Grainger et al., 2008). Contrary to our original hypothesis, task demands appeared not to affect either the early or later stages of letter and pseudoletter processing. Thus, these processes appear to be resistant to the attention demands we placed on the participants in this study and signify that letterpseudoletter effects are most likely sensory-contingent processes, at least in adults.

Interestingly, the N1 responses and difference waveforms between letters and pseudoletters were largest in the left as compared to the right visual cortices. This is consistent with a leftlateralized language model for reading (Price et al., 2003; Cohen and Dehaene, 2004; Dehaene et al., 2005) and could be akin to the N200 effects (Nobre et al., 1994; Ruz and Nobre, 2008). However, this laterality is in opposition to a right-dominant effect showing greater processing for pseudoletters that we and others previously reported (Appelbaum et al., 2009; Herdman, 2011). Given similarities in timing, topography, and source locations across studies for the N1 letter-pseudoletter effect indicates that these are likely analogous processing effects. However, at this point we cannot explain the discrepant findings among these studies. Task differences among studies are unlikely because the current experiment found no evidence for task effects for similar tasks and stimuli to those previously used in the literature. More research is thus warranted to determine laterality of these early visual processing differences between letters and pseudoletters.

Possible explanations for the larger and later peaking N1 and the larger P2 to pseudoletters than letters is that extra processing of unfamiliar objects occurs in order to identify and categorize the unfamiliar pseudoletters (Appelbaum et al., 2009; Herdman, 2011) or that pseudoletters capture attention to a greater extent and thus modulate early visual processing (Vinckier et al., 2007; Ruz and Nobre, 2008). However, this later possibility is less likely because we found no change in letter-pseudoletter processing differences among the tasks that manipulated attention to or away from orthographic stimulus features. It appears that the different levels of attention paid to stimulus features did not alter the broader N1 and larger P2 responses to pseudoletters. Thus, the results indicate that the greater responses to pseudoletters appear to be sensory-contingent and are not under the control of attentional focus. This further leads us to believe that the N1 and P2 enhancements are likely related to the initial processing stages that are molded by experience to become more rapid and efficient at identifying letters than pseudoletters. In this case, bigger or broader is not better. Bigger responses here reveal more processing of the stimulus attributes, which requires more energy and poorer efficiency. The EPs to letters peaking earlier and with reduced neural responses, points toward consolidation of letter templates within neural ensembles to allow for rapid and accurate identification of these highly familiar letters. The finding that the behavioral reaction times are faster to letters than pseudoletters (LaBerge, 1973; Herdman, 2011; also in present study but not significant) further supports a more efficient system for processing familiar letters than unfamiliar pseudoletters.

EPs can peak later because of deconstructive addition upon averaging. Two reasons for this deconstructive addition is that there is greater variability in the timing by which neural populations are synchronously evoked by stimuli (i.e., less overlapping components of the N1) or there is greater trial-to-trial latency jitter of the EP. These would also reduce the EP amplitudes. We found that the N1 was larger and peaked later to pseudoletters than letters. Thus, a more likely alternate explanation for this later and larger N1 is a greater recruitment of neural ensembles. Because pseudoletters are less familiar and had very limited time to create well-formed templates within the visual networks, the brain likely attempts to first match the pseudoletters to letter templates. This could take a few template-matching iterations within the network and thus cause greater neural discharges over time as compared to more automatic template matching that would occur for letters. Such a notion fits with many reading models describing the early stages of orthographic processing (e.g., Dehaene et al., 2005; Grainger et al., 2008).

Our behavioral results were largely unremarkable. They showed that participants were fairly engaged in performing all tasks (*>*90% accuracy). Interestingly though we did not find statistical evidence for faster reaction times to letters than pseudoletters as previously reported; however the difference was in the right direction, about 8 ms faster to letters than pseudoletters (LaBerge, 1973; Herdman, 2011). This might have been due to statistical power issues of having a limited number of participants. We did; however, find an unexpected result in that participants made more false alarms to red than blue stimuli. This could be a result of an ecological effect in that red stimuli are commonly associated with the concept of "stop" and possibly this

#### **REFERENCES**


association is interacting with participants ability to discriminate and press the buttons (Elliot et al., 2007). Reaction times were similar between red and blue stimuli thus motor-response inhibition is unlikely. In hindsight, we should have used color stimuli that are not commonly associated with motor commands. We did not include false-alarm trials within the EP analyses so this unexpected result likely had little or no effect on our EP differences between letters and pseudoletters.

In conclusion, the present study's results provided further evidence that single letters are processed faster and with less neural activity than pseudoletters. Tasks encouraging participants to direct attention toward and away from orthographic stimulus features did not change the early (N1 at ∼170 ms) and late (P2 at ∼250 ms) processing differences between letters and pseudoletters. Thus, visual processing of single orthographic or nonorthographic characters appeared to be sensory-contingent and independent of top-down control of directing attention toward or away from orthographic stimulus features.

#### **ACKNOWLEDGMENTS**

Support for this research was provided through a Michael Smith Foundation for Health Research Scholar award and a Natural Sciences and Engineering Research Council of Canada Discovery grant awarded to Anthony T. Herdman. Doctoral Scholarships from the Natural Sciences and Engineering Research Council of Canada, the Izaak Walton Killam Memorial Fund for Advanced Studies, and the UBC 4-Year Fellowship supported Osamu Takai during his contributions to this research. We thank Dr. John MacDonald for use of his lab and equipment and Mr. John Gaspar for his assistance in collecting the data.


of basic findings. *Psychol. Rev.* 88, 375–407.


Cohen, L. (2007). Hierarchical coding of letter strings in the ventral stream: dissecting the inner organization of the visual word-form system. *Neuron* 55, 143–156.

Wong, A. C., Gauthier, I., Woroch, B., DeBuse, C., and Curran, T. (2005). An early electrophysiological response associated with expertise in letter perception. *Cogn. Affect. Behav. Neurosci.* 5, 306–318.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 December 2012; accepted: 29 April 2013; published online: 21 May 2013.*

*Citation: Herdman AT and Takai O (2013) Paying attention to orthography: a visual evoked potential study. Front. Hum. Neurosci. 7:199. doi: 10.3389/ fnhum.2013.00199*

*Copyright © 2013 Herdman and Takai. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Identifying brain systems for gaze orienting during reading: fMRI investigation of the Landolt paradigm

### *Rebekka Hillen1,2, Thomas Günther 3, Claudia Kohlen2,3, Cornelia Eckers 4, Muna van Ermingen-Marbach1,5, Katharina Sass 1,6, Wolfgang Scharke3, Josefine Vollmar 3, Ralph Radach7,8 and Stefan Heim1,2,5,9\**

*<sup>2</sup> Section Neurological Cognition Research, Department of Neurology, Medical School, RWTH Aachen University, Aachen, Germany*


#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Fabio Richlan, University of Salzburg, Austria Zoe Woodhead, University College London, UK*

#### *\*Correspondence:*

*Stefan Heim, Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Pauwelsstrasse 30, 52074 Aachen, Germany e-mail: sheim@ukaachen.de*

The Landolt reading paradigm was created in order to dissociate effects of eye movements and attention from lexical, syntactic, and sub-lexical processing. While previous eyetracking and behavioral findings support the usefulness of the paradigm, it remains to be shown that the paradigm actually relies on the brain networks for occulomotor control and attention, but not on systems for lexical/syntactic/orthographic processing. Here, 20 healthy volunteers underwent fMRI scanning while reading sentences (with syntax) or unconnected lists of written stimuli (no syntax) consisting of words (with semantics) or pseudowords (no semantics). In an additional "Landolt reading" condition, all letters were replaced by closed circles, which should be scanned for targets (Landolt's rings) in a reading-like fashion from left to right. A conjunction analysis of all five conditions revealed the visual scanning network which involved bilateral visual cortex, premotor cortex, and superior parietal cortex, but which did not include regions for semantics, syntax, or orthography. Contrasting the Landolt reading condition with all other regions revealed additional involvement of the right superior parietal cortex (areas 7A/7P/7PC) and postcentral gyrus (area 2) involved in deliberate gaze shifting. These neuroimaging findings demonstrate for the first time that the linguistic and orthographic brain network can be dissociated from a pure gaze-orienting network with the Landolt paradigm. Consequently, the Landolt paradigm may provide novel insights into the contributions of linguistic and non-linguistic factors on reading failure e.g., in developmental dyslexia.

**Keywords: reading, dyslexia, semantics, syntax, phonology, orthography, gaze, attention**

#### **INTRODUCTION**

In reading, eye movements of children with developmental dyslexia differ from those of normal reading children (De Luca et al., 1999), e.g., more and longer fixations (Rayner, 1998; Hutzler and Wimmer, 2003). Although this fact has been known for more than 20 years, little is known about the causalities here: do these abnormal gaze patterns lead to dyslexic reading, or are they a consequence of reading difficulties potentially reflecting compensatory mechanisms? In order to address this question, a non-lexical and non-orthographical reading paradigm was developed (Corbic et al., 2007; Günther et al., 2012b; for the earlier work using the "Z-reading paradigm" see Ferretti et al., 2008). This "Landolt" paradigm allows investigating eye movements during reading without any influence of lexical information such as lexical frequency, phonotactics, or lexical status (Zschornak and Zeschmann, 2008; available at http://www*.*tguenthert*.*de/ thesis/files/archive-2008*.*html; Zschornak et al., 2012; available at https://www*.*thieme-connect*.*de/ejournals/html/10*.*1055/ s-0032-1304900). This non-lexical reading task only maintains the visual structure of written language, i.e., number of "letters" and "words." This is achieved by replacing letters by nonorthorgraphic circle-like symbols, the so-called Landolt rings, thus removing all lexical, syntactic, or orthographic-phonological information. The visual structure is maintained by adopting all space characters. The size of the typefaces is matched exactly so that the length of a Landolt sentence is virtually identical to that of the matched (basic) sentence (see **Figure 1**). Hence, reading without words is stimulated, allowing to test whether the gaze patterns of the reader move over the Landolt sentences in a reading-like fashion.

Preliminary behavioral and eye-tracking data (e.g., Günther et al., 2012a,b; Radach et al., 2012 available at http://www.triplesr.

*<sup>1</sup> Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany*

*<sup>3</sup> Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany*

org/conference/archive/2012/12conf-Abstracts.php)<sup>1</sup> suggest that the Landolt paradigm mimics the spatial characteristics of eye movements during reading while without additional influence of lexical, syntactic, or orthographic-phonological sources. However, there is yet no neurophysiological evidence as to whether reading in the Landolt paradigm relies on the same brain regions relevant for orienting gaze during real reading or whether the seemingly comparable behavioral patterns of real reading and Landolt reading emerge from distinct neural mechanisms. Consequently, the aim of the present study was to characterize the neurofunctional network recruited for Landolt reading. We investigated whether or not the Landolt paradigm activates particular brain areas in addition to those supporting orthographic reading. Furthermore, in order to be able to dissociate these findings from brain areas involved in lexical, syntactic, and orthographic-phonological processing during reading (e.g., Vigneau et al., 2006, 2011; Price, 2010), we included additional conditions that would serves as functional localizers for each of these dimensions (Friederici et al., 2000a). Hence, we might be able to compare directly brain regions related to these dimensions to those involved in the Landolt paradigm, which supposedly relate to gaze orienting.

#### **METHODS**

#### **PARTICIPANTS**

Twenty individuals (age range 20–30 years; mean 25;8 years; 10 women) participated in the study. All were right-handed according to the Edinburgh Inventory (Oldfield, 1971). They had normal or corrected-to-normal vision, normal reading skills (average percentile = 68) according to the Salzburger Lese- und Rechtschreibtest (SLRT-II; Moll and Landerl, 2010), a standard German reading test, and normal non-verbal intelligence (average IQ = 112; range = 90–130) in the revised version of the Cattell Culture Fair Text (CFT 20 R; Weiss, 2006).

#### **MATERIALS AND TASK**

The study comprised five conditions (cf. **Figure 1**). Four of them contained orthographic stimuli with or without meaning (SEM+ vs. SEM−, i.e., words or pseudowords), which were arranged either as syntactically correct sentences (SYN+) or non-syntactic rows presented in a sentence-like fashion (SYN−). The first condition (sentences, S) was composed of 41 syntactic and semantic complete sentences. The material was taken from a study by Huestegge (2005). The next three conditions (pseudoword sentences, PWS; nouns, N; pseudowords, PW) were created according to the logic of the study by Friederici et al. (2000a). Pseudowords were created such that they were in accordance with the phonotactic and graphotactic rules of German. All stimuli were matched to the S condition with respect to the number of words, word length, number of syllables, syllable frequency, and German orthography. In order to ensure that all presented sentences began with an item from the same word class so that the initial landing position for each condition was not influenced by experimental manipulations, all sentences started with a definite determiner (der/die/das— "the"). Finally, for the fifth condition, all letters were replaced by Landolt rings not representing any orthographic, lexical, or syntactic information. These stimuli were constructed such that the "words" in the Landolt sentence (LS) were matched to all other conditions with respect to the number of "letters," i.e., rings. Preliminary work showed that the eye movements for "reading" LS in search for targets were comparable to all other conditions (Kohlen, 2012; see also Hillen et al., 2012, available at https://www*.*thieme-connect*.*de/ ejournals/html/10*.*1055/s-0032-1304899?update=true).

All stimuli were created such that they contained a varying number of open Landolt rings as targets. The number of targets was zero, one, or two. The positions of words or Landolt "words" containing targets were randomly distributed over the left, centre, and right part of the entire stimulus in order to prevent the subjects from engaging in processing strategies. The participants were asked to scan each stimulus in a reading-like fashion from left to right and to press the response button each time they detected a target. In analogy to the previous eye tracking studies (e.g., Zschornak and Zeschmann, 2008; Günther et al., 2013), each condition consisted of 18 sentences without targets, 14 sentences with one, and nine sentences with two targets. The total length of each stimulus varied from 55 to 68 characters, including all spaces, corresponding to a length of nine to twelve words per sentence. Landolt sentences and orthographic conditions were matched for visual angle covered and for number of characters.

In earlier studies using the Landolt paradigm (e.g.,Zschornak and Zeschmann, 2008; Günther et al., 2013), each letter was the size of 12 × 12 pixels, corresponding to the size of one Landolt ring. For the present study, the presentation of each stimulus had

<sup>1</sup>Note that this preliminary work at present consists to a large extent of published abstracts and academic theses.

to be scaled down about two percent in order to make them fit the size of the screen used in the MRI scanner.

#### **PROCEDURE**

Before scanning, participants were screened for MRI suitability and were tested for reading ability, non-verbal intelligence, and handedness as outlined above. Right-handers with average or better reading and non-verbal IQ were included in the study. In preparation of scanning, the participants were informed about the procedures and the task. Next, they were familiarized with the upcoming stimuli. Finally, informed consent was obtained from all participants.

A very important aspect is the instruction researcher gives to participants. Using similar instructions for all conditions prevents subjects from engaging in different cognitive (and neurofunctional) processing modes. Kaakinen and Hyönä (2010) researched how gaze movement patterns of adult participants change according to the task instructions. They found that the gaze patterns of the participants behave very differently during proof reading than during reading comprehension tasks. Kaakinen and Hyönä (2010) suspected that the differences in gaze movement pattern are caused by different cognitive strategies. Therefore, in order to ensure comparable reading related eye movements in all conditions, the material of the present study contained targets in all reading conditions. As in the previous studies that used the Landolt paradigm, the participants were asked "to read the materials" and react whenever detecting a target. These targets look like left opened "c"s, i.e., they were no real orthographic signs but nonetheless highly comparable to those with respect to their visual features (see **Figure 1**).

In the scanner, stimuli were presented using the software Presentation 0.70 (Neurobehavioral Systems, San Francisco, CA, 2003) and an MR compatible goggle system (Resonance-Technologies) with a resolution of 800 × 600 pixels. Stimuli subtended a visual angle of 30 degrees. For each condition, 41 stimuli and seven null events were presented to the participants. Null events were included to improve modeling of the hemodynamic response function and to provide periods for a resting baseline. All conditions were divided into eight sub-blocks, each consisting of six trials (null events or stimulus trials). Generally the presentation sequence of the single stimuli and of the sub blocks was randomized. The distribution of the null events was pseudorandomized for each block to ensure that no sub-block within this condition included more than two null events and that these events were separated by at least one real trial.

In accordance with data from a behavioral pilot study with 10 participants (5 males, aged 20–30 years), each experimental trial lasted for 4.5 s, including a 1-s blank screen between two stimuli. Sub-blocks were separated by periods of 4.5 s showing a blank screen. A total of 205 stimuli were presented with duration of 27 min for the fMRI session.

#### **FUNCTIONAL MAGNETIC RESONANCE IMAGING (fMRI)**

Participants lay in a 3 Tesla magnetic resonance tomograph (Siemens TRIO) with standard head fixation using cushions. A total of 655 echo planar imaging (EPI) measurements were recorded from each subject. Whole-brain coverage was achieved by recording from 40 axial slices (interleaved acquisition) of 3 mm thickness (each with a distance of 1 mm), with a time of echo (TE) of 30 ms and with a time to repeat (TR) of 2500 ms. Thus, each voxel had a size of 4*.*<sup>0</sup> <sup>×</sup> 4.0 <sup>×</sup> <sup>3</sup>*.*0 mm3. The flip angle was 90◦, the matrix 64 <sup>×</sup> 64 mm2, and the field of view (FOV) 200 mm. No subject showed head movements exceeding the size of one voxel so all participants remained in the analysis.

After the fMRI scans, an anatomical T1-weighted MP-RAGE sequence was run with a duration of 9 min (*TR* = 2500 ms; *TE* = 2*.*98 ms; 176 axial slices; FOV = 256 mm; flip angle = 9◦).

# **DATA ANALYSIS**

#### *Behavioral data*

Behavioral data were obtained from the individual Presentation log-files and analysed with SPSS 19.0 (SPSS Inc., 2011). By means of ANOVAs and subsequent pair-wise Bonferroni tests we analysed if the participants reacted equally correct to stimuli containing targets.

#### *fMRI data*

*Preprocessing.* fMRI data were analysed with the computer software MATLAB 7.10.0 (The MathWorks, 2010) and SPM8 (The Welcome Trust Centre of Neuroimaging, 2009; http://www*.*fil*.* ion*.*ucl*.*ac*.*uk/spm/). The pre-processing took place in six steps. First, data were corrected for spatial movements during fMRI measurement (*realign*), followed by a temporal correction for acquisitions times (*slice time correction*). The normalization in the MNI reference space was accomplished using a *unified segmentation procedure* based on the individual anatomical scans. Prior to normalization, fMRI data were therefore realigned to this anatomical image (*coregister*). Finally, the data were smoothed with a Gaussian kernel of FWHM = 8 mm.

*Statistical analysis.* At the first level data were analysed in a block design. The hemodynamic response function was modeled separately for the five conditions using the canonical HRF with the first time derivate. For each subject, five contrast images were generated by contrasting each condition against the implicit baseline condition.

During the second level analysis, contrast images of all 20 participants were entered into a flexible factorial repeatedmeasures design with the factors SUBJECT (as repetition factor) and CONDITION. The following contrasts were computed. First, activation for each condition was assessed relative to the implicit resting baseline. Next, a conjunction of these five contrasts was used to identify the network for gaze orientation involved in each condition. In order to determine brain regions specifically involved in the Landolt paradigm, the conjunction of the difference images of the Landolt condition minus each other condition [(LS − S*)* ∩ *(*LS − N*)* ∩ *(*LS − PWS*)* ∩ *(*LS − PW*)*] was computed.

In order to test whether processing in the Landolt paradigm involved lexical, syntactic, or orthographic processing, the following contrasts were calculated based on the study design. Regions involved in orthographic processing during reading were identified by the conjunction of the contrasts of all orthographic conditions against the Landolt condition [(S − LS*)* ∩ *(*N − LS*)* ∩ *(*PWS − LS*)* ∩ *(*PW − LS*)*]. Lexical processing was operationalized by contrasting conditions including words against those involving pseudowords (S + N − PWS − PW). Similarly, regions for syntactic processing were identified by contrasting sentences and pseudoword sentences against lists of words or pseudowords (S − N + PWS − PW).

All contrasts were assessed at a significance level of *p <* 0*.*05 FWE corrected at cluster level, obtained by using an uncorrected threshold of *p* ≤ 0*.*001 with a minimal cluster size (*k*) of 200 voxels.

*Neuroanatomical localization.* For the precise neuroanatomical localization of the effects, we used the Jülich-Düsseldorf probabilistic atlas, which is based on an observer-independent analysis of cytoarchitectonic borders in a sample of ten postmortem brains (Zilles et al., 2002; Schleicher et al., 2005). The atlas provides information about the position and variability of cortical regions in standard MNI reference space. For the assignment to cytoarchitectonically defined regions we used the SPM Anatomy Toolbox (Eickhoff et al., 2005) available at http://www*.*fz-juelich*.*de/inm/inm-1/DE/Forschung/ docs/SPMAnatomyToolbox/SPMAnatomyToolboxnode*.*html*.*

# **RESULTS**

#### **BEHAVIORAL DATA**

Participants pressed a response button whenever they thought there was a target within the stimulus. The data were obtained from all 205 items, divided into five conditions (S, N, PWS, PW, LS). Each condition block comprised 41 stimuli with either no, one, or two targets. Because of technical difficulties one data set was incomplete, lacking responses for 12 items (six items in conditions S and N, respectively). The descriptive data are presented in **Table 1**.

#### *Accuracy per condition*

The influence of experimental condition on the number of hits was tested in a one-way ANOVA with CONDITION (S, PWS, N, PW, LS) as factor. There was a main effect of CONDITION [*F(*4*,* <sup>19</sup>*)* = 7*.*664; *p <* 0*.*001]. *Post-hoc* Bonferroni tests revealed that condition S showed significantly more correct responses in comparison to condition PWS (*p* = 0*.*004), condition PW (*p <* 0*.*001), and LS (*p <* 0*.*001). Moreover, there was a trend toward more correct responses compared to condition N (*p* = 0*.*054). The conditions N, PWS, and PW did not differ with respect to correct responses (each *p >* 0*.*999). Likewise, during



*The maximum of hits was 41 per condition. For each condition, the number of hits, misses, and false alarms is indicated together with the standard deviation. S, sentences; N, nouns; PWS, pseudoword sentences; PWR, pseudoword rows; LS, Landolt sentences.*

Landolt reading, performance was comparable in comparison with these other conditions (N: *p* = 0*.*362; PWS: *p >* 0*.*999; PW: *p >* 0*.*999).

#### *Detailed error analysis: misses vs. false alarms*

Overall, participants performed correctly on 78% of all trials. In the remaining 22% incorrect responses participants missed a target in 15% of the cases or produced false alarms in 7%.

Analysing these response patterns separately for each condition in a series of one-way repeated-measures ANOVAs revealed a main effect of RESPONSE TYPE (hit/miss/false alarm) in all conditions [S: *F(*2*,* <sup>19</sup>*)* = 1861*.*957; *p <* 0*.*001; N: *F(*2*,* <sup>19</sup>*)* = 372*.*177; *p <* 0*.*001; PWS: *F(*2*,* <sup>19</sup>*)* = 396*.*794; *p <* 0*.*001, PW: *F(*2*,* <sup>19</sup>*)* = 213*.*197; *p <* 0*.*001; LS: *F(*2*,* <sup>19</sup>*)* = 186*.*079; *p <* 0*.*001]. The *posthoc* Bonferroni tests revealed no differences between misses and false alarms for S blocks (*p* = 0*.*453), W blocks (*p* = 0*.*128), and PWS blocks (*p* = 0*.*274). In contrast, for PW and LS blocks, there were differences between the types of incorrect responses (PW: *p* = 0*.*001; LS: *p <* 0*.*001). For both conditions, participants made more misses than false alarms.

#### **fMRI DATA**

In this section, the fMRI data are reported with respect to the macroanatomical structures, which were activated in each condition; for detailed information about the cytoarchitectonic localizations please refer to the figures and tables. Results are reported separately for each condition contrasted against the resting baseline and for contrasts representing the gaze orientation network, orthographic processing, processing in the Landolt paradigm, semantic processing, and syntactic processing.

**Figure 2** gives an overview of the brain activation for each condition compared to the resting baseline. In all conditions, there was comparable activation in the visual cortex, precentral gyrus, and right and left parietal lobe. Moreover, all conditions involving real letters (S, PWS, N, PW) showed activation in the left fusiform gyrus. In contrast, this region was not involved when reading the Landolt sentences.

#### *Common gazing network*

The conjunction analysis to identify a common gaze network of all reading conditions shows a brain activation pattern which is responsible for gaze patterns independent of activation related to semantic, syntactic, or orthographic processing (see **Figure 3** and **Table 2**). Both hemispheres showed comparable patterns of activation. The largest cluster was located in the occipital lobe, with

#### **Table 2 | Common gazing network.**

#### **Conjunction (S** *>* **0) ∩(N** *>* **0) ∩ (PWS** *>* **0) ∩ (PW** *>* **0) ∩ (LS** *>* **0)**


*t-test, p < 0.001, uncorrected, k* = *200.*

*R, right; L, left; LG, lingual gyrus; IOG, inferior occipital gyrus; pFG, posterior fusiform gyrus; CS, calcarin sulcus; SFG, superior frontal gyrus; PrCG, precentral gyrus; SPL, superior parietal lobe; IPS, intraparietal sulcus; MFG, middle frontal gyrus; SPL, superior parietal lobe Lobulus; SMA, supplementary motor area; PrCG, precentral gyrus.*

*1Amunts et al. (2000) 2Geyer (2003) 3Scheperjans et al. (2008).*

its local maximum in the cerebellum. The most anterior border of that cluster in the left ventral occipito-temporal cortex was in the fusiform and inferior temporal gyri (*y* = −56); the cluster extended posteriorly into the visual cortex (*y* = −105). Further clusters were found in the right and left middle frontal gyrus (MFG) extending into the superior frontal (SFG) and precentral gyri (PrCG), right inferior parietal lobule (IPL) extending into the right intraparietal sulcus (IPS) and right superior parietal lobule (SPL), and in the left caudate nucleus and putamen.

# *Orthographic processing*

**Figure 4** and **Table 3** show the brain activation patterns for orthographic processing. Except for the Landolt paradigm, all other conditions required orthographic processing. Consequently, the conjunction analysis of the differences between each single orthographic condition (S, N, PWS, PW) and the Landolt condition reveals brain areas relevant for the processing of orthographic information. This conjunction analysis of the processing of orthographic information shows a clear difference in hemispheres, with

#### **Table 3 | Orthographic processing.**


*t-test, p < 0.001, uncorrected, k* = *200.*

*R, right; L, left; MOG, middle occipital gyrus; IOG, inferior occipital gyrus; IFG, inferior frontal gyrus.*

*4Rottschy et al. (2007) 5Amunts et al. (1999) 6Diedrichsen et al. (2009).*

activation predominantly in the left hemisphere. The biggest cluster was in the left fusiform gyrus, starting anteriorly at *y* = −40 and extending posteriorly into the ventral visual cortex (*y* = −98). Another cluster was located in the left PrCG, reaching further anterior into the left inferior frontal gyrus (IFG). In the right hemisphere, activation was found in the cerebellum, PrCG, and middle temporal gyrus (MTG).

#### *Processing in the Landolt paradigm*

The reverse analysis shows brain areas with greater activation in the Landolt paradigm than in all other conditions, which contained orthographic stimuli (**Figure 5** and **Table 4**). Two clusters were observed. The maximum of the first clusters was localized in the right IPL, extending into the precuneus and the SPL. The local maximum of the second cluster was localized

**Table 4 | Processing in the Landolt paradigm.**


*t-test, p < 0.001, uncorrected, k* = *200.*

*R, right; L, left; SPL, superior parietal lobule; IPL, inferior parietal lobule.*

*3Scheperjans et al. (2008) 7Grefkes et al. (2001) 8Caspers et al. (2006) 9Geyer et al. (1999, 2000).*


#### **Table 5 | Semantic processing contrast.**


*t-test, p < 0.001, uncorrected, k* = *200.*

*R, right; L, left; MCC, middle cingulate cortex; PrCG, precentral gyrus; GPoC, postcentral gyrus; SMA, supplementair motor area; SPL, superior parietal lobe; IPL, inferior parietal lobe; GA, angular gyrus; GTS, superior temporal gyrus; STS, superior temporal sulcus; GOS, superior occipital gyrus.*

*1Amunts et al. (2000), 3Scheperjans et al. (2008), 8Caspers et al. (2008), 11Amunts et al. (2005).*

in the right postcentral gyrus (PoCG). Plotting the beta estimates of the activation strength revealed uniformly strong effects for LS in comparison to all other conditions in comparison to lower positive signal in the IPL vs. negative signal in the PoCG (**Figure 8**).

# *Semantic processing*

Subtracting activation for pseudoword lists and sentences from activation for reading real-word sentences and nouns identified regions involved in semantic processing during reading. This semantic contrast showed activation predominantly in temporal and parietal areas of both hemispheres. Bilaterally, the angular gyri of the IPL, PrCG, MTG, and precuneus were activated. Moreover, activation extended into the right cuneus (**Figure 6** and **Table 5**).

# *Syntactic processing*

Syntactic processing was reflected in the contrasts of sentences (containing real words, S, or pseudowords, PWS) minus lists of nouns or pseudowords (N, PW). These findings are reported in **Figure 7** and **Table 6**. The results show an explicit hemispheric distinction: effects in the left hemisphere and in particular in the left frontal lobe were stronger than in the right hemisphere. The biggest cluster was localized in the left fronto-temporal region, covering the superior temporal sulcus (STS), superior temporal gyrus (STG) and MTG as well as left IFG and insula. The second cluster was found in the left SMA, extending into left SFG. In the right hemisphere, a smaller fronto-temporal cluster covered the temporal pole, IFG, and insula. Further effects were found in the right PrCG and MFG, SPL, and cerebellum. In order to compare these data to earlier findings from the auditory modality by Friederici et al. (2000a), we checked whether the effect in Broca's region was due to higher activation for both real sentences and pseudoword sentences. This was indeed the case (cf. **Figure 8**).

# **DISCUSSION**

The present study aimed to identify the neural systems that support reading-like behavior in the novel Landolt reading task, a paradigm developed to study eye movements during reading without influences of lexical, syntactic, or phonologicorthographic processing. The results of the present study allow a description of the neuronal processing during reading in the Landolt paradigm, dissociating them from those regions relevant for the linguistic dimensions. The main finding was that the Landolt paradigm and the other reading conditions activated a common network relevant for gaze orienting.

# **COMMON GAZING NETWORK**

The conjunction analysis of all five conditions showed brain activation of a common gazing network, which is necessary for reading independent of language processing. Besides prominent activity in the left and right visual cortex (including cytoarchitectonic areas 17 and 18; Amunts et al., 2000), there were significant activations around the middle frontal gyrus in both hemispheres. These clusters extended posteriorly into area 6 in the SMA and PrCG (Geyer, 2003), a region regulating gaze orienting during saccadic movements (Grosbras et al., 1999; Haller et al., 2008).

In parietal lobe, the gazing network included an area around the right inferior parietal lobule extending into cytoarchitectonic

#### **Table 6 | Syntactic processing.**


*t-test, p < 0.001, uncorrected, k* = *200.*

*R, right; L, left; MTG, middle temporal gyrus; PT, temporal pole; IFG, inferior frontal gyrus; SPL, superior parietal lobe; PrCG, precentral gyrus; MCC, middle cingulate cortex; SFG, superior frontal gyrus; PrCG, precentral gyrus.*

*5Amunts et al. (1999) 6Diedrichsen et al. (2009).*

area hIP3 in the intraparietal sulcus (Scheperjans et al., 2008). Fan et al. (2005) discuss both IPL and IPS in relation to attention regulation networks, especially in the alerting and the orienting network. Barthélémy and Boulinguez (2002) and Culham et al. (2006) reported that the right inferior parietal lobule is involved in attention processes and in the neuronal planning of movements and the visuo-motor conversion. Related to this, Thiel et al. (2003) described that this area was involved in visual selection tasks. Taken together, these findings suggest that gaze re-orienting has an attentional and in particular a visuo-spatial component. Thiel et al. (2003) and Fan et al. (2005) define the alerting network as a network that is activated while waiting and expecting a following target. It is possible that this expecting attitude is common with the position of the participants in the actual study. Both aspects, i.e., attention (Thiel et al., 2003) and the planning and conversion of gaze movements (Barthélémy and Boulinguez, 2002) are likely to contribute to the activation in this area in the present study. This point will become relevant later when discussing the activation patterns specific for the Landolt paradigm.

#### **ORTHOGRAPHIC PROCESSING**

The presence or absence of orthographic/phonologic information is the most important difference between the orthographic (S, N, PWS and PW) and the Landolt (LS) reading conditions. Consequently, contrasting orthographic and non-orthographic conditions yields a pinpointed neurophysiological description of the Landolt paradigm. The conjunction analysis for orthographic processing shows a clear difference in the hemispheres. Righthemispheric activations were localized in the cerebellum, while left-hemispheric clusters were found in the fusiform, precentral, and middle temporal gyrus. The fusiform gyrus as part of the ventral stream has repeatedly been shown to support the processing of written stimuli. This observation has led to the idea of the visual word form area (e.g., Dehaene et al., 2002, 2010) which is, however, not undisputed (see also Price and Devlin, 2003, 2011; Price, 2012; for comprehensive accounts). The fact that reading orthographic stimuli involved the fusiform gyrus to a significantly higher degree than the Landolt sentences might thus be taken to suggest that the Landolt stimuli were indeed not processed in an orthographic way (e.g., modulated by top-down expectancy).

Further left temporal activations for orthographic processing were found in the left middle temporal gyrus in the vicinity of Wernicke's area. Jobard et al. (2003) reported that this area is involved while reading words and pseudowords. They proposed that grapheme-to-phoneme conversion is represented here and in the superiorly adjacent superior temporal gyrus and sulcus.

Finally, there was also cerebellar activation for orthographic processing. Originally, activations in the cerebellum had been expected for the control of gaze movements (Kheradmand and Zee, 2011) rather than orthographic processing. Interestingly, however, there were only few activations at all in the common gazing network, but rather more prominent effects for orthographic processing. These data are interesting in the context of the cerebellar hypothesis of dyslexia (e.g., Nicolson et al., 1999) that assumes that general problems of automaticity affect successful reading in developmental reading disorders. These, in turn, relate to reduced cerebellar activation and likewise to reduced cerebellar volumes (Pernet et al., 2009). The present data suggest that this failure might not be localized (exclusively) at a general procedural level but might pertain in particular to fine-grained visual information such as letters.

#### **SPECIFICS OF THE LANDOLT PARADIGM**

The conjunction analysis to detect the neuronal activation for reading Landolt sentences in comparison to orthographic material revealed two activation clusters in the right hemisphere in the inferior parietal lobule and precentral gyrus. This finding stands in contradiction to the initial hypothesis that Landolt reading does not activate any regions to a higher degree than "real" orthographic reading.

This pattern of activations resembles at least partly the pattern found for gaze orienting discussed above. Like for gaze orienting, the right inferior parietal lobule was involved in Landolt reading. This finding suggests that Landolt reading in fact requires visual-spatial attention even to a higher degree than normal reading, probably because parafoveal vision does not help identifying appropriate landing positions for subsequent saccades (but note that the landing positions *are* indeed appropriately identified, but at the expense of longer re-fixation times; cf. Günther et al., 2012a,b). The present neuroimaging results indicate that this performance was achieved by stronger involvement of the right parietal attention and gaze orienting network. The mechanism behind this recruitment might be found in the automaticity of the normal reading process in adult readers, which is not exactly given in the new context of "reading" lines of circles and deliberately looking for targets—perhaps even more so in the Landolt condition where all stimuli almost look alike.

The second cluster was located in the somatosensory cortex in the postcentral gyrus (Geyer et al., 1999; Grefkes et al., 2001). Earlier studies on visual processing (e.g., Donner et al., 2000) have reported networks in healthy controls that contain exactly these two regions, showing that this network is indeed relevant for covert visual selection—a process necessary in the Landolt task to plan the next saccade (please also see the "Limitations" section below). More recent work by Balslev et al. (2012) refined these data, showing that under specifically demanding conditions eye proprioception may be used to guide gaze behavior, whereas normally no such feedback information is necessary. In the Landolt paradigm, the fact that the "letters," i.e., the rings, all look alike may have led the brain to rely additionally on proprioceptive information to perform a visual gaze pattern that is comparable to that in normal reading. We say "additionally," because it is likely that the task to detect a target in *all* conditions would involve a certain degree of visual search also in the orthographic conditions (even thought the beta estimates of the activation were negative for the other four conditions: note that in fMRI, unlike in PET, the zero line does not distinguish absolutely positive from negative effects). In fact, the right IPL had also been found in the conjunction analysis of all five conditions, which can be taken as indication that that was indeed the case. Yet, this effect was most pronounced in the Landolt paradigm. It might be that the nature of the stimuli had an influence here: in the Landolt condition, subjects made more misses than false alarms. A similar pattern was observed for the PW condition, i.e., that of the four orthographic conditions in which neither semantic nor syntactic information would influence or guide reading behavior. For the Landolt condition, the similarity between targets and standard rings might have additionally affected the brain systems involved in the task.

#### **SEMANTIC AND SYNTACTIC PROCESSING**

The lexical-semantic and syntactic contrasts were computed as functional localizers in order to see whether Landolt reading involved any of these processes. Left-hemispheric brain activation was expected for the semantic and syntactic contrasts, especially in the frontal and temporal lobes. Vigneau et al. (2006) ascribe the analysis of both semantic and syntactic information to these anatomical structures. Moreover, Whitney et al. (2011) talk of a semantic network, which is anchored, in the left inferior frontal gyrus and in the posterior part of the middle temporal gyrus, an area that seems to be very important in semantic processing. For syntactic processing, many studies have identified the left inferior frontal cortex as a pivotal region (e.g., Friederici et al., 2000a,b; Santi and Grodzinsky, 2010; Makuuchi et al., 2013) that may interact closely with the left anterior temporal lobe.

The present results are very much in line with this pattern of data. There was a clear effect for the syntax contrast in the left inferior frontal and anterior temporal cortex, which was distinct from the semantic effect in the pMTG. For semantic processing, no left inferior component was observed. This was probably the case because in the present paradigm no selection was required (cf. Thompson-Schill et al., 1997; Heim et al., 2009; for the discussion about the role of pMTG in semantic representation vs. selection cf. Whitney et al., 2011, 2012). Most importantly, as predicted, none of these regions were implicated in Landolt reading,

#### **REFERENCES**


Balslev, D., Himmelbach, M., Karnath, H. O., Borchers, S., and Odoj, B. (2012). Eye proprioception used for visual localization only if in conflict with the oculomotor plan. *J. Neurosci.* 20, 8569–8573. doi: 10.1523/JNEUROSCI.1488-12.2012


neither in the baseline contrast nor in the conjunction identifying the unique activation for Landolt reading over all other conditions.

#### **CONCLUSIONS**

The present study showed that "reading without words" does not lead to significant activation in language processing areas but rather recruits right hemispheric areas in the parietal lobe to a higher degree than the four orthographic conditions. These right-hemispheric areas are involved in both attentional processes and gaze orienting. In combination with earlier behavioral and eye tracking studies, the present data indicate that the Landolt paradigm might be used in future studies to investigate reading in the absence of lexical, syntactic, or phonological-orthographic influences. One application could be in dyslexic reading, which can result from different kinds of underlying deficits. In addition to (or instead of) frequently encountered phonological difficulties (e.g., Ramus et al., 2003), some dyslexic readers show profound difficulties in visual attention (Valdois et al., 2003; Bosse et al., 2007; Heim et al., 2008). First investigations whether, and if so in how far, dyslexic children show uncommon gaze patterns while reading non-lexical material, suggest that a subgroup of children with dyslexia has difficulties in gaze patterns in the Landolt paradigm (Günther et al., 2013; for earlier results on a discrepancy between eye movements during visual search vs. reading in dyslexia cf. Prado et al., 2007). The Landolt paradigm might prove to be a novel tool to tap into these processes relevant for successful reading without interference from pre-lexical information in order to isolate such visuo-attentional processes and the underlying neurofunctional components.

#### **ACKNOWLEDGMENTS**

This research was funded by the National German Research Agency (Deutsche Forschungsgemeinschaft) grant GU-1177/1-1.

481–495. doi: 10.1007/s00429-008- 0195-z


(1999). Eye movement patterns in linguistic and non-linguistic tasks in developmental surface dyslexia. *Neuropsychologia* 37, 1407–1420. doi: 10.1016/S0028- 3932(99)00038-X


Grosbras, M.-H., Lobel, E., Van de Mortele, P.-F., LeBihan, D., and Berthoz, A. (1999). An anatomical landmark for the supplementary eye field in human revealed with functional magnetic resonance imaging. *Cereb. Cortex* 9, 705–711.


Rheinisch Westfälische Technische Hochschule.


studies published in 2009. *Ann. N.Y. Acad. Sci.* 1191, 62–88. doi: 10.1111/j.1749-6632.2010.05444.x


alerting, orienting and reorienting of visuospatial attention: an eventrelated fMRI study. *Neuroimage* 21, 318–328. doi: 10.1016/j. neuroimage.2003.08.044


temporal gyrus. *Cereb. Cortex* 21, 1066–1075.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 April 2013; accepted: 03 July 2013; published online: 29 July 2013. Citation: Hillen R, Günther T, Kohlen C, Eckers C, van Ermingen-Marbach M, Sass K, Scharke W, Vollmar J, Radach R and Heim S. (2013) Identifying brain systems for gaze orienting during reading: fMRI investigation of the Landolt paradigm. Front. Hum. Neurosci. 7:384. doi: 10.3389/fnhum. 2013.00384*

*Copyright © 2013 Hillen, Günther, Kohlen, Eckers, van Ermingen-Marbach, Sass, Scharke, Vollmar, Radach and Heim. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Attention shifts the language network reflecting paradigm presentation

#### *Kathrin Kollndorfer 1, Julia Furtner 1, Jacqueline Krajnik1,2, Daniela Prayer <sup>1</sup> and Veronika Schöpf <sup>1</sup> \**

*<sup>1</sup> Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria <sup>2</sup> Department of Neurosurgery, Medical University of Vienna, Vienna, Austria*

*Edited by:*

*Mohamed L. Seghier, UCL, UK*

#### *Reviewed by:*

*Prasanna Karunanayaka, Penn State University, USA Ferath Kherif, CHUV, Switzerland*

#### *\*Correspondence:*

*Veronika Schöpf, Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Waehringer Guertel 18-20, 1090 Vienna, Austria e-mail: veronika.schoepf@ meduniwien.ac.at*

**Objectives:** Functional magnetic resonance imaging (fMRI) is a reliable and non-invasive method with which to localize language function in pre-surgical planning. In clinical practice, visual stimulus presentation is often difficult or impossible, due to the patient's restricted language or attention abilities. Therefore, our aim was to investigate modality-specific differences in visual and auditory stimulus presentation.

**Methods:** Ten healthy subjects participated in an fMRI study comprising two experiments with visual and auditory stimulus presentation. In both experiments, two language paradigms (one for language comprehension and one for language production) used in clinical practice were investigated. In addition to standard data analysis by the means of the general linear model (GLM), independent component analysis (ICA) was performed to achieve more detailed information on language processing networks.

**Results:** GLM analysis revealed modality-specific brain activation for both language paradigms for the contrast visual *>* auditory in the area of the intraparietal sulcus and the hippocampus, two areas related to attention and working memory. Using group ICA, a language network was detected for both paradigms independent of stimulus presentation modality. The investigation of language lateralization revealed no significant variations. Visually presented stimuli further activated an attention-shift network, which could not be identified for the auditory presented language.

**Conclusion:** The results of this study indicate that the visually presented language stimuli additionally activate an attention-shift network. These findings will provide important information for pre-surgical planning in order to preserve reading abilities after brain surgery, significantly improving surgical outcomes. Our findings suggest that the presentation modality for language paradigms should be adapted on behalf of individual indication.

**Keywords: fMRI, language, attention-shift network, functional mapping, visual, auditory**

#### **INTRODUCTION**

Brain surgery that involves eloquent cortical areas, particularly in brain tumor or epilepsy patients, has remained a challenging task (Spena et al., 2010). Preservation of neuronal functions after surgery is one of the most important goals for neurosurgeons. An accurate mapping of eloquent cortical areas ensures a sufficiently extensive and safe resection of brain parenchyma. Functional magnetic resonance imaging (fMRI) has been established as a reliable and noninvasive tool in mapping of cognitive and executive functions prior to brain surgery [for review see Dimou et al. (2013)]. Reliable localization of language abilities is of huge importance in pre-surgical planning, as language is an essential quality of life factor. The gold standard for intraoperative language localization and neuronavigation is direct electrocortical stimulation (ECS; Sunaert, 2006). However, this method is time-consuming during surgery and is not applicable in all cases, as compliance of the awake patient during surgery is mandatory, and not all patients are capable of this.

Patients who undergo fMRI examination prior to neurosurgery often suffer from disease-driven restricted language abilities or have difficulties in focusing their attention on the task for the entire measurement period. Reading is especially challenging for patients undergoing pre-surgical planning, and therefore, stimuli are often presented auditorily to map language abilities (Dimou et al., 2013). However, the manner in which stimuli are presented might influence the spatial representation of processing networks, as already hypothesized by Carpentier et al. (2001), who investigated differences between auditory and visual stimulus presentation in language-related areas.

In clinical practice, two different language paradigms, one for language perception and one for language production are usually presented visually to map language-related areas. The present study aimed to investigate the different processing networks related to presentation modalities of these exact paradigms by testing auditory and visual stimulus conditions. Based on previous findings and clinical observations that visually presented fMRI stimuli are particularly challenging for patients, we were interested in the specific characteristics of networks that process written language. Therefore, we hypothesized that visually presented language stimuli would require an attention-shift network (Corbetta et al., 1998; Corbetta and Shulman, 2002) in the brain.

To achieve a conclusive comparison of both techniques, two different analysis approaches were used to account for temporal and spatial network patterns: data driven analysis was performed, using independent component analysis (ICA), to test for functionally connected processing networks; and a hypothesis-driven method, using a general linear model (GLM), was used to account for purely stimulus-driven activity. Combining these two analysis methods offers complementary information about the precise processing and representation of language-related areas. As the shift of attention induced by different stimulus modalities is not clear yet, ICA is an appropriate method to investigate data without assuming an *a priori* model, as this method discriminates activation based on spatial independence rather than temporal correlation to a predefined stimulus.

# **MATERIALS AND METHODS**

#### **SUBJECTS**

Ten healthy right-handed subjects (four male, six female; mean age 22 years) participated in this study. All participants completed two fMRI experiments, comprising two scanning sessions each: Experiments 1 and 2. All subjects had normal or correctedto-normal vision and no history of psychiatric or neurologic diseases. All participants were native speakers of the German language and had a comparable educational background. Prior to inclusion, all participants were informed about the aim of the study and gave their written, informed consent. The study was approved by the Ethics Committee of the Medical University of Vienna.

#### **BEHAVIORAL DATA**

To avoid influence of language abilities on neural activation within the language network, two language tasks were performed prior to fMRI measurements. The first task was a sentence completion task, a subtest of the Intelligence Structure Test (IST-2000- R; Liepmann et al., 2007), which tests for semantic decisionmaking. This subtest consists of 20 sentences that are missing the last word of the sentence. The participant is instructed to choose one of five given words to complete the sentence correctly. Furthermore, all participants completed the Regensburg Word Fluency Test (RWT; Aschenbrenner et al., 2000), which tests for verbal fluency, and reflects semantic memory. Subjects had to pronounce as many words as possible referring to a given category. This category can be semantic, such as fruits or animals, or phonemic, such as words beginning with the letter *M (e.g., mother, man, mouse)*.

#### **EXPERIMENT 1**

In Experiment 1, subjects were visually presented with two different language paradigms using an MR-compatible visual stimulation system (NordicNeuroLab, Bergen, NO).

1. Verb generation task: The first language paradigm was a covert verb generation task. Frequent German nouns were visually presented in white letters on a black screen. In this task, a 30s block-design was used. During active blocks, 15 nouns are presented for 1s each (e.g., door, book, ball). The subjects were instructed to think of all verbs he/she associated with the presented noun until the next word appeared (Petersen et al., 1988; Holland et al., 2001). During baseline blocks, the participants were asked to fixate on harsh signs presented on the screen.

2. Phrases task: In the second language paradigm, syntactically simple and correct sentences in canonical German word order (subject–verb–object) were presented in white letters on a black screen. During active blocks, sentences were presented every 2 s, half of the sentences containing a semantically inappropriate object *(e.g., semantically appropriate: Das Mädchen spielt Klavier. Engl.: 'The girl plays the piano.'; semantically inappropriate: Der Dichter dichtet ein Auto. Engl:. 'The poet composes a car.')*. During baseline, subjects were instructed to look at white harsh signs presented on the black screen [stimuli modified from Foki et al. (2008)].

#### **EXPERIMENT 2**

Experiment 2 consisted of the same two language paradigms. Rather than visual presentation, words and sentences were presented auditorily using MR-compatible head phones. Blockdesign presentation times equaled those of Experiment 1. During active blocks, 15 nouns or sentences were presented. During baseline, participants were presented with a tone every 2 s.

#### **IMAGING METHODS**

Measurements were performed on a 3T TIM Trio System (Siemens Medical Solution, Erlangen, Germany) using a 12 channel head coil. FMRI data were acquired using singleshot, gradient-recalled, echo-planar imaging (EPI). Twenty slices (1 mm gap, 4 mm thickness) with an FOV of 210 × 210 mm and a TE/TR of 42/2000 ms were acquired. Slices were aligned parallel to the connection between the anterior and posterior commissure. All subjects participating in this study underwent four scanning sessions, two with visually presented language paradigms (Experiment 1) and two with auditory language presentation (Experiment 2), lasting 5 min each.

Stimulus fixation and eye movements were recorded using an MR-compatible eye-tracker (ViewPoint EyeTracker, Arrington Research, Scottsdale, AZ) throughout all the measurements of Experiment 1.

#### **DATA ANALYSIS**

Preprocessing of fMRI data was performed using SPM8 (http:// www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/) implemented in MATLAB (Matlab 7.14.0, Release 2012a, Mathworks Inc., Sherborn, MA, USA) including motion correction, spatial normalization to an MNI template, and spatial smoothing. First-level analysis was performed for each paradigm separately, by constructing a GLM using block onsets as regressors. Head movement effects were modeled by including six motion parameters as additional regressors. The contrast active *>* baseline was generated for both paradigms for Experiments 1 and 2. For comparison of visually vs. auditorily presented language effects, the two contrasts visual *>* auditory and auditory *>* visual were calculated at the group level.

Additional second-level group analysis was carried out for both paradigms (phrases and verb generation) and experiments (auditory and visual presentation) using probabilistic ICA, as implemented in MELODIC (Multivariate Exploratory Linear Decomposition into Independent Components) version 3.10, a part of FSL (FMRIB's Software Library, www*.*fmrib*.*ox*.*ac*.*uk/fsl), using FastICA (Beckmann and Smith, 2004). Non-brain voxels were masked and voxel-wise de-meaning of the data and normalization of the voxel-wise variance was carried out. Pre-processed data sets were whitened and projected into an n-dimensional subspace using probabilistic Principal Component analysis in which the number of dimensions was estimated using the Laplace approximation to the Bayesian evidence of the model order (Minka, 2000; Beckmann and Smith, 2004). Dimensions for n were 18 for both visually presented paradigms, 24 for the auditory phrases task, and 25 for the auditory verb generation task. For the optimization of the non-Gaussian sources, contrast function and convergence thresholds, as suggested by Hyvärinen et al. (2001), were used. Estimated component maps were divided by the standard deviation of the residual noise and thresholded by fitting a mixture model to the intensity values histogram (Beckmann and Smith, 2004). All group ICA network components were assessed by visual inspection, based on the spatial distribution patterns.

Additional group ICAs were carried out by submitting the visual and auditory data sets of both conditions to be evaluated as a group. After group ICA, as described above, the set of spatial maps from the group-average analysis was used to generate subject-specific versions of the spatial maps, and associated time series, using the dual regression approach version v0.5, a part of FSL (Beckmann et al., 2009; Filippini et al., 2009). First, for each subject, the group-average set of spatial maps is regressed (as spatial regressors in a multiple regression) into the subject's 4D space-time dataset. This results in a set of subject-specific time series, one per group-level spatial map. Next, those time series were regressed (as temporal regressors, again in a multiple regression) into the same 4D dataset, resulting in a set of subject-specific spatial maps, one per group-level spatial map. Corresponding spatial IC maps for every subject and both conditions were then exported to SPM8 for statistical testing. For second-level analysis, two separate *t*-tests were performed for both conditions (*p <* 0*.*05, FWE corrected). Common language related areas, independent of presentation modality, were investigated performing two conjunction analyses (Friston et al., 1999), one for visual and auditory presentation of the phrases task and a second one for the two modalities of the verb generation task (*p <* 0*.*05, FWE corrected).

To investigate language lateralization, voxel-wise laterality maps were created for the subject-specific spatial IC maps resulting from the dual regression step. The lateral maps were computed using the LUI toolbox (http://mialab*.*mrn*.*org/software/; Swanson et al., 2011) by subtracting every image from itself after flipping in the left/right direction (Stevens et al., 2005). A voxelwise laterality map overcomes the problem of a laterality index, which is based on voxel counting and is therefore sensitive to the definition of the threshold. Two-sample *t*-tests were calculated across the two presentation modalities separately for both tasks (*p <* 0*.*05, FWE corrected).

# **RESULTS**

# **BEHAVIORAL DATA**

The results of the language tasks performed prior to fMRI measurements revealed average language performance for all investigated subjects. For the sentence completion subtest, the participants' number of correct items ranged from 10 to 18 (mean 14), corresponding to an average performance compared to normative data for this age group. Results of the RWT revealed a mean number of 18 words beginning with the letter *M* and a mean number of listed words referring to the category of animals of 35, both reflecting average verbal fluency performance.

#### **HYPOTHESIS-DRIVEN ANALYSIS (GLM)**

To map the modality-specific effects of language processing on brain activity, a two-sample *t*-test was performed at the group level. These analyses comprise the t-contrasts visual *>* auditory (see **Figure 1**) and auditory *>* visual (see **Figure 2**) computed

**FIGURE 1 | Axial mean anatomical images overlaid with brain activation resulting from second-level GLM analysis, revealing higher brain activity for visual presentation compared to auditory presentation (***p <* **0***.***001, uncorrected) induced by (A) the phrases task and (B) the verb generation task.**

at the group level for both investigated language paradigms. All resulting statistical parametric maps were thresholded at *p <* 0*.*001 (uncorrected), using a cluster extent threshold of 10 contiguous voxels.

Results for the contrast visual *>* auditory revealed significantly higher brain activation in the superior and inferior parietal lobule, the middle occipital gyrus, the postcentral gyrus, and in the hippocampus. For the phrases task, additional increased brain activity was obtained in the middle frontal gyrus, the precuneus, the cuneus, the precentral gyrus, and in the pallidum (see **Table 1** and **Figure 1B**). For the verb generation task, the contrast visual *>* auditory evoked additional increased brain activity in the inferior temporal gyrus (see **Table 1** and **Figure 1A**).

Auditory presentation (contrast auditory *>* visual) induced significantly increased brain activation in the superior temporal gyrus bilaterally (see **Table 2** and **Figure 2**) for both tasks. The auditory presentation of the verb generation task revealed an additional cluster in the middle frontal gyrus (see **Table 2** and **Figure 2A**).

Auditory presentation of the phrases task evoked brain activation bilaterally in the superior temporal gyrus, the insula, the medial frontal gyrus, the inferior frontal gyrus, and the left precentral gyrus (see **Figure 3A**). In contrast, visual presentation of the same paradigm induced clusters of increased neuronal activation bilaterally in the superior and inferior parietal gyrus, the precentral gyrus, the lingual gyrus, the cuneus, the middle occipital gyrus, the inferior frontal gyrus, the superior temporal gyrus, the left medial frontal gyrus as well as the right middle frontal gyrus (see **Figure 3B**).

The auditorily presented verb generation task induced brain activation bilaterally in the medial frontal gyrus, the cingulate gyrus, the insula, the superior temporal gyrus, the left inferior frontal gyrus, the left inferior parietal lobule, and the left **Table 1 | Significantly higher activated brain areas by visual compared to auditory presentation of the two language paradigms.**


*aSignificantly activated clusters with 10 or more voxels.*

*bclusters were automatically labeled using AAL toolbox (Tzourio-Mazoyer et al., 2002).*

*cp < 0.001 uncorrected.*

precentral gyrus (see **Figure 3C**). Visually presented stimuli also evoked neuronal activation bilaterally in the medial frontal gyrus, the cingulate gyrus and the insula. Activation of the superior temporal gyrus was obtained lateralized in the left hemisphere. Furthermore, left-sided clusters in the inferior frontal gyrus, the

#### **Table 2 | Significantly higher activated brain areas by auditory compared to visual presentation of the two language paradigms.**


*aSignificantly activated clusters with 10 or more voxels.*

*bclusters were automatically labeled using AAL toolbox (Tzourio-Mazoyer et al., 2002).*

*cp < 0.001 uncorrected.*

left inferior parietal lobule, and the precentral gyrus were larger for visual compared to auditory condition (see **Figure 3D**).

#### **DATA-DRIVEN ANALYSIS (ICA)**

Separate group ICA for both paradigms (phrases and verb generation) and both experiments (visual and auditory) obtained 18 components for both visually presented paradigms. For auditory presentation, group ICA revealed 24 components for the phrases task and 25 components for the auditory task. Reported activated network components only include within-brain activations.

A group language network was determined for both paradigms, independent of the presentation modality, and involved brain areas such as the inferior frontal gyrus (Broca's area), the superior temporal gyrus (Wernicke's area), the insula, the middle occipital gyrus, the precentral gyrus, and the superior frontal gyrus (see **Figures 3E–H**).

The combined group ICA of both modalities for the phrases task and the verb generation task revealed a language and an attention network respectively. The phrases task evoked a modality independent language network, detected by performing a conjunction analysis (*p <* 0*.*05, FWE corrected), involving clusters in the left and right superior frontal gyrus, the left inferior frontal gyrus, the left and right angular gyrus, the left posterior

**FIGURE 3 | Mean anatomical images overlaid with brain activation resulting from second-level GLM analysis and group ICA.** Results of second-level GLM analysis **(A–D)** were reported for the contrast active *>* baseline condition (*p <* 0*.*001, uncorrected) for **(A)** auditory presentation of the phrases task, **(B)** visual presentation of the phrases task, **(C)** auditory presentation of the verb generation task and **(D)** visual presentation of the verb generation task. Group ICA revealed a left lateralized language network independent from presentation modality and language paradigm. Determined networks were reported for **(E)** auditory presentation of the phrases task, **(F)** visual presentation of the phrases task, **(G)** auditory presentation of the verb generation task and **(H)** visual presentation of the verb generation task.

cingulate cortex, the left middle temporal gyrus, and the left supplementary motor area (see **Figures 4A–D**). No significant differences between visual and auditory presentation were found. The verb generation task revealed a language related network including significant clusters of neuronal activation in the left and right inferior parietal lobule, the left and right inferior frontal gyrus, the left supplementary motor area, the left inferior and middle temporal gyrus, the right cerebellum, and the left and right precentral gyrus (see **Figures 4E–H**). Similar to the phrases task, no significant differences were detected between visual and auditory language presentation.

The combined group ICA of visual and auditory presentation of the phrases task obtained an attention network involving the left and right inferior, middle and superior occipital lobule as well as the left putamen, detected by the conjunction analysis (*p <* 0*.*05, FWE corrected). In addition, visual stimulus presentation revealed significant brain activation in attention related areas, involving the left superior and medial frontal gyrus, the left and right precentral gyrus, the left and right middle frontal gyrus as well as the left and right superior parietal lobule (see **Figures 5A–D**). No additional brain activation was obtained for auditory stimulus presentation. Based on the conjunction analysis of the two modalities for the verb generation task, an attention network involving neuronal activation bilaterally in the lingual gyrus, the calcarine gyrus, and the fusiform gyrus was detected. Visual presentation evoked additional activation in the left posterior cingulate gyrus, the left and right superior parietal lobule, and the left precentral gyrus (see **Figures 5E–H**). Similar to the phrases task, no additional brain activation was found for auditory presentation.

#### **LATERALIZATION**

Thresholded laterality maps (*p <* 0*.*05, FWE corrected) resulting from group ICA of the phrases task revealed significant left-sided lateralization in the inferior, middle and superior temporal gyrus, the middle and inferior frontal gyrus, the angular gyrus, and the inferior parietal lobule (see **Figures 6A,B**). Computation of the laterality maps resulting from the verb generation task obtained significant left lateralized brain activation in the inferior, middle and superior frontal gyrus, the inferior and superior parietal lobule, the supramarginal gyrus, the angular gyrus, the middle temporal gyrus, and the fusiform gyrus (see **Figures 6C,D**). For both paradigms, no significant differences were determined between visual and auditory language presentation.

#### **DISCUSSION**

The aim of this study was to examine modality-specific differences of language processing by comparing visually and auditorily presented language paradigms (one for language production and one for language comprehension) used in clinical practice. A combined group ICA for visual and auditory presentation revealed modality-dependent differences in identified networks. For visually presented language, an attention-shift network (Corbetta et al., 1998) was found for both paradigms. In contrast, this network was not detected for auditory presentation. These results are largely consistent with our hypothesis that visual stimulus presentation of language paradigms requires an additional attention network.

Investigating modality-dependent differences in language localization is of huge importance with respect to pre-surgical planning for which fMRI has become part of the routine procedure (Genetti et al., 2013). FMRI has been proven to be a reliable tool to determine language lateralization (Arora et al., 2009; Jones et al., 2011), and has been increasingly validated for the precise localization of language cortices (Genetti et al., 2013). The reliability of language lateralization is of particular interest in patients with left hemisphere temporal lobe epilepsy, as they have an increased likelihood of atypical right hemisphere lateralization of language processing areas (Hamberger and Cole, 2011). It is assumed that chronic epileptic activity induces a shift of language processing areas from the left to the right hemisphere (Liégeois et al., 2004; Janszky et al., 2006). Since patients prior to neurosurgery often suffer from restricted language and attention abilities, the required compliance of the patient is often lacking, which inhibits the determination of brain areas involved in language processing. Reading, in particular, may present an insurmountable challenge to patients, and therefore, paradigms for detecting language abilities are often presented auditorily for review see Dimou et al. (2013). Neuronal patterns resulting from fMRI experiments provide essential information for neuronavigation during brain surgery. Differences between auditory and visual language presentation need to be investigated in detail, as functional imaging data provide essential information for neurosurgery.

Independent of presentation modality, a language component was identified for the verb generation and for the phrases task. In clinical practice, usually both paradigms are used, as they cover different aspects of language processing. This assumption has been supported by the results of this study, showing differences in the language network between the two tasks. Investigating presentation modalities, no significant differences between auditory and visual stimulation were obtained. The involved areas of the modality-independent language network are in line with previous functional imaging results of language processing [for review see Price (2010, 2012)]. In contrast to Carpentier et al. (2001), who found higher lateralization scores for visual stimuli, the results of our study revealed no significant difference between visual and auditory stimulus presentation. Thus our findings suggest that auditory language presentation in functional imaging is an appropriate tool for lateralization, providing essential information for pre-surgical planning. However, visually presented language additionally activated an attention-shift network (Corbetta et al., 1998), which appears to be a necessary prerequisite for written language processing. A comparison of the detected attention network has shown that visual stimulus presentation evoked increased brain activation in the left superior and medial frontal gyrus, the left and right precentral gyrus, the left and right middle frontal gyrus as well as the left and right superior parietal lobule, areas related to attention (Corbetta and Shulman, 2002; Daselaar et al., 2013) and short term memory (Makuuchi and Friederici, 2013). For auditory stimulus presentation no comparable network was found. Our finding indicates that the investigator has to be aware of the individual clinical indication of functional

**stimulus presentation (***p <* **0***.***05, FWE-corrected).** A network involving language related areas was detected for **(A)** the phrases task and **(E)** the verb

For both paradigms the conjunction analysis of both modalities **(D,H)** revealed similar activation patterns compared to modality-specific networks.

**FIGURE 5 | Axial mean anatomical images overlaid with the attention-shift network, resulting from combined group ICA including auditory and visual stimulus presentation (***p <* **0***.***05, FWE-corrected).** This network was determined for **(A)** the phrases task and **(E)** the verb generation task. The comparison of modality specific differences shows

language paradigms. Whereas visually presented stimuli caused evoked activity in the attention-shift network, no comparable activation pattern was detected for auditory stimuli. In the conjunction analysis **(D,H)** only activation in occipital parts was found.

language mapping and to select the appropriate stimulus presentation method with respect to tumor location or reorganization of networks.

Beyond modality-dependent differences, the change of spatial processing patterns induced by language and attention shifts were investigated in this study using group ICA, an already proven analysis tool for language network detection (Kim et al., 2011). The evaluation of the network components that resulted from ICA in this investigation for visual stimulus presentation revealed a network similar to the network of eye movement and attentionshift, described in Corbetta et al. (1998). For auditory stimulus presentation, this network was not detected. It is assumed that this network is responsible for covert shifts of attention, reflected by overt rapid eye movements (saccades). Moreover, these two processes appear to be not only functionally related but also share the same pathways in the brain. Although it has been shown that saccadic eye movements combined with short fixations are necessary for reading words (Reichle et al., 2003; Rayner and Reichle, 2010), the impact of saccades on word processing is still unknown (Temereanca et al., 2012).

The results of this study suggest that the performed language task as well as the presentation modality influence the detected networks. In addition, our findings indicate that not only the task itself and the way of stimulus presentation may affect the detected language processing areas. A comparison of hypothesis-driven GLM analysis and data-driven ICA showed substantial differences in resulting network patterns. Standard GLM analysis is based on the canonical hemodynamic response function (HRF) also relying on restrictive time-modeling of the stimuli. In contrast, ICA revealed highly consistent language networks independent of the language task and the modality of stimulus presentation. It is assumed that ICA is qualified to detect separate time course related networks such as attention or motor patterns (Robinson et al., 2013) and has already been shown to add additional information on processing networks (Tie et al., 2008; Schöpf et al., 2011; Frasnelli et al., 2012; Xu et al., 2013). The results of previous studies revealed that language processing areas show interindividual variability in network patterns (Amunts et al., 2000; Rademacher et al., 2001). These individual variations in conjunction with additional stimulus-related functions such as attention or eye movements may produce imprecise language localization based on GLM analysis especially in group studies. Furthermore, a recently published study (Stoppelman et al., 2013) found significant influence of different baseline conditions on resulting language related areas using GLM analysis. Especially for the analysis of language paradigms a purely data-driven method as ICA may not only serve as an additional technique, but furthermore might be the analyzing method of choice as we were able to show that a time-locked analyzing tool, such as the GLM, was not able to reflect the spatial patterns involved in the processing of visually generated language paradigms.

Although the mapping of language processing areas using fMRI has been investigated in various studies (Carpentier et al., 2001; Arora et al., 2009; Jones et al., 2011; Genetti et al., 2013), the conductance of fMRI is sometimes problematic in clinical practice. Usually, two different language paradigms, one for language perception and another for language production, have to be performed for covering a wide range of language processing. These tasks require focused attention on the stimuli throughout the whole experiment, which is often challenging and hard to accomplish for the patient. Recently, an fMRI paradigm was presented, claiming to localize functional activation in areas for language perception and production in a single paradigm (Fedorenko et al., 2010, 2012). The validation of this paradigm in clinical practice and its effect on patient compliance should be part of further investigations.

Even though new language paradigms are developed to facilitate tasks during fMRI measurements, performing the task is still challenging the patient, due to the disabilities already mentioned previously. A promising method to overcome the substantial challenge of the patient's active participation is resting-state fMRI, a method without active task performance. Previous studies have successfully determined language networks using resting-state fMRI (Tomasi and Volkow, 2012; Kollndorfer et al., 2013; Tie et al., 2013). Although the application to pre-surgical planning has already achieved promising initial results in epilepsy surgery (Negishi et al., 2011; Morgan et al., 2012), it is still a long way from becoming part of the clinical routine (Böttger et al., 2011). In clinical practice, the development of a standardized imaging protocol for mapping language abilities, as demanded by Sunaert (2006), will be an inevitable step, as it has been shown that different resting-state conditions may influence the detected networks (Kollndorfer et al., 2013).

A potential limitation of this study is the small sample size. The influence of sample size in fMRI studies has recently been discussed controversially. Friston (2012) pointed out that statistically significant results from studies with small sample sizes are statistically valid, indicating a stronger effect than the equivalent result in a larger sample size. In contrast, some other authors (Ingre, 2013; Lindquist et al., 2013) highlight the potential pitfalls of statistical testing using small sample sizes, such as less accurate parameter estimation or less possibilities to control for confounding variables. To avoid an exceeding influence of confounding factors, we investigated a very homogeneous sample: young, healthy, right-handed subjects with comparable educational background. In addition, behavioral language data were collected to control for language ability parameters.

#### **CONCLUSION**

We were able to show that the neural processing of visually presented paradigms (language perception and language production) requires an attention-shift network in addition to the commonly known language processing areas in the brain. These activation patterns were not detected for auditory stimulus presentation of the same tasks. Therefore, the way of stimulus presentation should be adjusted with respect to individual indication of functional language mapping. As the attention-shift network was restricted to visual stimuli, it is assumed that it is a basic prerequisite for reading abilities. This additional attention mechanism accompanying visually language testing may provide important information for neurosurgeons, so as to preserve language function and writing abilities to improve quality of life after surgery.

# **ACKNOWLEDGMENTS**

The authors thank the subjects for their participation and Thomas Schlegl for his support in analyzing eye-tracking data. Special thanks are dedicated to Daniel Grailach (Tonkombüse, Vienna) for recording the voice samples. Veronika Schöpf, Kathrin Kollndorfer, and Jacqueline Krajnik are supported by the FWF (Veronika Schöpf and Kathrin Kollndorfer: P23205-B09; Jacqueline Krajnik: KLI252).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 27 June 2013; accepted: 07 November 2013; published online: 25 November 2013.*

*Citation: Kollndorfer K, Furtner J, Krajnik J, Prayer D and Schöpf V (2013) Attention shifts the language network reflecting paradigm presentation. Front. Hum. Neurosci. 7:809. doi: 10.3389/fnhum.2013.00809*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Kollndorfer, Furtner, Krajnik, Prayer and Schöpf. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A cross-linguistic evaluation of script-specific effects on fMRI lateralization in late second language readers

# *Maki S. Koyama1,2\*, John F. Stein1, Catherine J. Stoodley3 and Peter C. Hansen4*

*<sup>1</sup> Department of physiology, anatomy, and genetics, University of Oxford, Oxford, UK*

*<sup>2</sup> Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA*

*<sup>3</sup> Department of Psychology, American University, Washington, DC, USA*

*<sup>4</sup> School of Psychology, University of Birmingham, Birmingham, UK*

#### *Edited by:*

*Mohamed L. Seghier, UCL, UK*

*Reviewed by: Ana Sanjuan, Wellcome Trust Centre for neuroimging, Spain Qing Cai, East China Normal University, China*

#### *\*Correspondence:*

*Maki S. Koyama, Sherrington Building, Sherrington Rd., Oxford, OX1 3PT, UK e-mail: makisophiakoyama@ gmail.com*

Behavioral and neuroimaging studies have provided evidence that reading is strongly left lateralized, and the degree of this pattern of functional lateralization can be indicative of reading competence. However, it remains unclear whether functional lateralization differs between the first (L1) and second (L2) languages in bilingual L2 readers. This question is particularly important when the particular script, or orthography, learned by the L2 readers is markedly different from their L1 script. In this study, we quantified functional lateralization in brain regions involved in visual word recognition for participants' L1 and L2 scripts, with a particular focus on the effects of L1–L2 script differences in the visual complexity and orthographic depth of the script. Two different groups of late L2 learners participated in an fMRI experiment using a visual one-back matching task: L1 readers of Japanese who learnt to read alphabetic English and L1 readers of English who learnt to read both Japanese syllabic Kana and logographic Kanji. The results showed weaker leftward lateralization in the posterior lateral occipital complex (pLOC) for logographic Kanji compared with syllabic and alphabetic scripts in both L1 and L2 readers of Kanji. When both L1 and L2 scripts were non-logographic, where symbols are mapped onto sounds, functional lateralization did not significantly differ between L1 and L2 scripts in any region, in any group. Our findings indicate that weaker leftward lateralization for logographic reading reflects greater requirement of the right hemisphere for processing visually complex logographic Kanji symbols, irrespective of whether Kanji is the readers' L1 or L2, rather than characterizing additional cognitive efforts of L2 readers. Finally, brain-behavior analysis revealed that functional lateralization for L2 visual word processing predicted L2 reading competency.

**Keywords: visual complexity, orthographic depth, second langauge reading, logogrpahic, functional lateralization**

# **INTRODUCTION**

The left cerebral hemisphere plays the dominant role in language functioning for most right handed individuals. Differences in functional lateralization can be clinically significant, because weaker leftward lateralization may be an indicator of inefficient or impaired language ability (Chiarello et al., 2012; Bishop, 2013; Gotts et al., 2013). The neuroimaging literature has provided strong evidence for leftward lateralization for language processing (e.g., Xiong et al., 1998; Frost et al., 1999; Tzourio-Mazoyer et al., 2004) including written language processing (e.g., Xue et al., 2004; Seghier and Price, 2011). In particular, the occipitotemporal cortex has been characterized as strongly left lateralized for written words across alphabetic, syllabic, and logographic scripts (Nakamura et al., 2005b; Xue et al., 2006; Seghier and Price, 2011). This contrasts with the longstanding view that the right hemisphere is more specialized for visually complex logographic scripts (e.g., Chinese, Japanese Kanji) (Hasuike et al., 1986 for a review).

Given this strong leftward cortical lateralization as a signature of language processing, an intriguing question arises as to whether the degree of functional lateralization differs between the first language (L1) and second language (L2) in bilinguals, particularly late bilinguals (i.e., late L2 learners) whose L2s are typically less proficient than their L1s. Nelson et al. (2009) addressed this question by examining two types of late L2 reader groups—L2 logographic Chinese readers with alphabetic English as their L1 and L2 English readers with Chinese as their L1. Notably, these two scripts are markedly different in visual complexity, with logographic symbols being visually more complex than alphabetic symbols. Additional activation in the right occipito-temporal cortex was observed only during L2 logographic Chinese reading, but not during L2 alphabetic English reading. This result, indicating weaker leftward functional lateralization for L2 logographic reading in the ventral visual pathway, likely reflects the late L2 readers' extra efforts to cope with the greater visual demands (and thus greater right-hemispheric demand) of processing the L2 logographic symbols. However, considering that logographic reading elicits additional right occipito-temporal activation relative to syllabic reading even in L1 readers (Nakamura et al., 2005a; Koyama et al., 2011), it may be that weaker leftward lateralization for logographic reading in the occipito-temporal cortex reflects the generally increased visual demands when reading logographic symbols, rather than L2 readers' extra efforts to learn the new scripts.

Written languages can also differ in their level of orthographic depth. For example, Japanese Kana has an extremely regular orthography, whereas English has irregular orthography where letters can represent very different sounds in different words. It appears that orthographic depth has little effect on functional lateralization for L2 reading, at least in bilinguals of L1 Spanish (regular orthography) and L2 English (irregular orthography) (Jamal et al., 2012). However, our previous fMRI study indicated that a difference in orthographic depth has a significant impact on L2 functional lateralization in late L2 readers (Koyama et al., 2013). More specifically, stronger leftward lateralization for L2 reading was observed in a phonological region (i.e., the left supramarginal gyrus) only when readers' L2 had a more irregular orthography (i.e., English) than their L1 (i.e., Japanese Kana). This stronger leftward lateralization can be interpreted as L2 readers' extra efforts to cope with the greater phonological demands (and thus greater left-hemispheric demands) during L2 reading (Koyama et al., 2013).

To address questions regarding the effects of visual complexity and orthographic depth on L2 reading cross-linguistically, the current study utilized L1 readers of Japanese, where both syllabic Kana and logographic Kanji are equally used in the writing system, who were also late L2 readers of alphabetic English. We also examined another late L2 group that consisted of L1 readers of alphabetic English who learned to read both Japanese scripts. Visually, Kanji symbols are more complex than Kana and English symbols (**Figure 1**). Orthographically, English is characterized as having an irregular orthography, whereas Japanese Kana has an extremely regular orthography (Logographic Kanji symbols are primarily mapped onto meanings). Importantly, even with the

**FIGURE 1 | Schematics of visual one-back matching task.** The paradigm used was a block design with alternating 24-s task blocks and 15-s rest blocks. In each task block, a fixation cross appeared at the center of the visual display, and then 24 words were presented at a rate of 1 per second. The stimulus duration was 250 ms followed by 750 ms blank period, during which participants were asked to press a button when stimuli presented in succession were identical visually. The Kana and Kanji words from the top mean "friend," "socks," "fruits," and "fruits."

notable visual differences between Kana and Kanji, a word written in these two Japanese scripts can represent the same sound and meaning, minimizing the influence of confounding factors in language-related experiments.

Such a cross-linguistic evaluation of functional lateralization in two types of late L2 groups can allow us to make the following two predictions: (1) Irrespective of an individual's L1 or L2, logographic reading, relative to non-logographic reading, will be associated with weaker leftward lateralization due to the increased visual demands of processing logographic symbols; and (2) Only in L2 readers of English (whose L1 has a more regular orthography), English reading will be associated with stronger leftward lateralization due to the increased phonological demands. Additionally, we assessed the quantitative relationships between the degree of lateralization and the level of reading ability. Xue et al.'s work (2006) demonstrated that the laterality index of the occipito-temporal cortex while viewing novel symbols predicted subsequent visual word recognition performance for the same symbols. Hence, we hypothesized that functional lateralization in the occipito-temporal cortex would be associated with L2 reading performance, particularly for logographic L2 reading. Although the occipito-temporal cortex is a core region of interest in the current study, we extended our analysis to other brain regions based on the functional maps resulting from our analyses. This approach provides a more comprehensive understanding of the functional lateralization for reading.

# **METHODS**

# **PARTICIPANTS**

Three groups of skilled adult readers (13 males and 32 females) participated in this study: 1) 15 native Japanese readers who had learnt English as L2 (Japanese-L1/English-L2, the "J1/E2" group; mean age ± *SD* = 29*.*3 ± 6*.*4 years), 2) 14 native English readers who had learnt Japanese Kana as L2 (the "E1/J2" group; mean age ± *SD* = 26*.*2 ± 5*.*7 years), and 3) 16 native English readers who had no experience of learning either Japanese or logographic Chinese as L2 ("Control" monolingual E1 group; mean age ± *SD* = 25*.*7 ± 5*.*3 years). Individuals in the J1/E2 and E1/J2 groups had participated in a previous study (Koyama et al., 2013), and were considered late L2 readers because none of them had early experience (before the age of 12 years) of learning English or Japanese, respectively. In the J1/E2 group, all participants started learning English as L2 at the age of 12 years old, reflecting the official system of English language education in Japan, whereas the mean age of L2-Japanese acquisition in the E1/J2 group was 15.21 (*SD* ± 6*.*4) years. To estimate which language was the dominant one for the late L2 readers, we asked them to count aloud the number of beans presented to them, in the language with which they felt most comfortable. Counting is usually carried out in the dominant language, in which bilinguals first learned to count (typically L1 in late L2 learners) (Grosjean, 1996). All participants in both the J1/E2 and E1/J2 groups counted in their respective L1s, confirming that their dominant language was indeed the L1. Controls were not fluent in any other language than their L1 (English), confirmed by a questionnaire and a brief interview. All participants were strongly right-handed, as measured by the Annett Handedness Questionnaire (Annett, 1970). They reported no history of psychiatric disorders or learning disabilities (including dyslexia). Participants in the J1/E2 group were either full-time students or exchange students at Universities in the UK, whereas those in the E1/J2 group were full-time students who were studying Japanese at Universities in Oxford. Participants in the Control group were recruited from the University of Oxford and were of similar age and educational level as the L2 groups. The study was approved by the Oxfordshire Research Ethics Committee, and all participants provided written, informed consent to take part in the study.

#### **COGNITIVE MEASURES (OUTSIDE THE SCANNER)**

Single word reading competency in English was assessed by the Wide Range Achievement Test III (WRAT; Wilkinson, 1993). For Japanese Kana and Kanji, we used the Kana Word Reading Test (Koyama et al., 2008) and the Kanji Word Reading Test, respectively. For the Kanji Word Reading Test, we selected 60 Kanji words that children are expected to master by the age of 15 in Japan. Unlike the WRAT-III, neither the Kana Word Reading nor Kanji Word Reading tests were standardized, because there was no standardized word reading test in Japanese appropriate for the age range of our participants. Hence, we used the raw scores or percent accuracy for each cognitive test, to perform further statistical analysis. Non-verbal IQ was measured using the Raven's Advanced Progressive Matrices (Raven et al., 1998).

#### **TASKS PERFORMED IN THE SCANNER**

All participants performed a visual one-back matching task for four types of script: (1) syllabic Kana (Japanese), (2) logographic Kanji (Japanese), (3) alphabetic English, and (4) Tibetan letterstrings, which were visually unfamiliar and unpronounceable to all the participants. Tibetan was chosen as an ecologically valid orthography but with characters equally unfamiliar to all participant groups. Individual Tibetan characters that resembled English or Kana characters were excluded. **Figure 1** illustrates the task paradigm and the script conditions. Linguistically, syllabic Kana and alphabetic English are categorized as phonographic scripts where symbols are mapped onto sounds, whereas logographic Kanji symbols can be directly mapped onto meanings. Participants were instructed to press a button with their right index finger as quickly and accurately as possible if successively presented words were visually identical ("Press the button whenever you see two words in succession that are visually identical").

All the words in the Kana, English, or Tibetan conditions were four characters long, whereas Kanji words were two characters long. As the visual word length of the same word is typically longer when printed in Kana than in Kanji, an asterisk ∗ was placed at the beginning and the end of each Kanji word. This allowed us to equate the retinal image size of the stimuli between word categories. Regarding word frequency, words in all script conditions (with the exception of Tibetan) represented high frequency nouns. Japanese and English words were chosen based on the frequency norms by Amano and Kondo (1999) and those by Kucera and Francis (1967), respectively. Importantly, the Kana and Kanji words were matched in terms of their phonological and semantic features, so that only the visual representations differed from each other. For more details, see Koyama et al. (2011, 2013). The Tibetan letter-strings, which were unfamiliar and unpronounceable stimuli to our participants, were included as a control condition in order to verify that there were no systemic differences in basic visual processing abilities (e.g., recognition, working memory) between the three groups.

The paradigm was a block design with alternating 24 s task blocks and 15 s rest blocks. In the rest block, a small red fixation point was visible at the center of the visual display. In the task block, 24 words were presented at a rate of 1 per second, with an onscreen duration of 250 ms and a blank period of 750 ms between words. Within each task block, 3–5 of the 24 words were visually identical and required a button response. Participants underwent a total of 16 task blocks (4 blocks for each script condition) with 12 rest blocks. Prior to the scan session, participants performed a computerized practice run outside the scanner to ensure task familiarity. In order to prevent word-specific practice effects, the word stimuli used in the practice run were different from the words used in the in-scanner task.

The choice of the visual one-back matching of letter-strings brings both advantages and disadvantages. On the one hand, it allows side-by-side comparison of the different types of orthographic script in all three participant groups studied. On the other hand, the nature of the task is such that it can be performed using purely visual matching without knowledge of the underlying orthography. However, where the orthography has been learned to the high degree needed for efficient reading, we would expect the automaticity of the reading process to be invoked and to reveal language-specific effects. For all the script conditions except for the Tibetan script, each L2 group was likely to yield implicit fMRI activation associated with phonological and/or semantic processes, even in the absence of overt word reading (for alphabetic English, Turkeltaub et al., 2003; for logographic Chinese, Kuo et al., 2004). In the current study, such implicit activation should reflect patterns of functional lateralization for word reading.

#### **MRI DATA ACQUISITION**

Functional and structural images were acquired with a Varian Siemens 3T scanner at the Center for the Functional Magnetic Resonance Imaging of the Brain in Oxford (FMRIB). Prior to data acquisition, an automated shimming algorithm was applied to reduce magnetic field inhomogeneities (Wilson et al., 2002). For whole brain functional imaging, a T2∗-weighted gradient-echo EPI sequence was employed with parameters: *TR* = 3000 ms, *TE* <sup>=</sup> 30 ms, flip angle <sup>=</sup> <sup>90</sup>◦, *FOV* <sup>=</sup> 192 mm2, voxel size <sup>=</sup> 3 × 3 × 3 mm, with 43 slices acquired in axial orientation. The visual one-back matching fMRI protocol consisted of 368 volumes. For structural images, a high-resolution T1-weighted scan was acquired (3D TurboFLASH sequence, *TR* = 13 ms, *TE* = 5 ms, *TI* <sup>=</sup> 200 ms, flip angle <sup>=</sup> <sup>8</sup>◦, *FOV* <sup>=</sup> 265 mm2, voxel size <sup>=</sup> 1 × 1 × 1 mm).

#### **MRI DATA ANALYSIS**

Data were analyzed using the FMRIB Software Library (FSL, www*.*fmrib*.*ox*.*ac*.*uk/fsl). The initial four dummy volumes were discarded from the functional MRI data to eliminate non-equilibrium effects of magnetization. The following pre-processing procedures were applied: a high-pass filter cut-off of 40 s, motion correction using MCFLIRT, regular-up slicetiming correction, and spatial smoothing using a Gaussian spatial filter with kernel size 5 mm full width half maximum. The registration of functional images for each participant into standard space was carried out using the FMRIB Non-Linear Image Registration Tool (FNIRT).

After the pre-processing, statistical analysis at the individual level was performed for all the conditions using a general linear model (GLM) with local autocorrelation correction (FILM prewhitening; Woolrich et al., 2001). At the single subject level, contrast images were generated for all participants for each word condition vs. rest (baseline). Rather than using the unpronounceable Tibetan condition as the baseline condition, the rest period was used instead, for the following reasons. Firstly, showing the degree (or lack thereof) of functional lateralization for the Tibetan condition, which can be processed only visually, allows us to highlight the pattern of functional lateralization associated with language processing involved in pronounceable script conditions (see Seghier and Price, 2012). Secondly, the selection of the baseline condition differentially affects observed brain activation (Newman et al., 2001), and thus the use of the Tibetan condition, which dominantly involves visual processing, as a baseline (i.e., subtracting low-level visual processing) can be problematic in delineating brain activation for our other script conditions, among which the level of visual processing demand is likely to differ.

To help correct for motion-related artifacts, the six motion correction parameters estimated with MCFLIRT were included in the model as regressors of no interest. Script conditions were modeled using a Gaussian hemodynamic response function. In addition, in order for the model to best fit the time course of the actual data acquisition, temporal derivatives of the main conditions were added as separate regressors and temporal filtering was applied. Group analysis was performed with random effects analysis using FLAME. Gaussian Random Field theory was used for thresholding (voxel-level *Z >* 2*.*3, cluster-level *p <* 0*.*05, corrected for multiple comparisons). This group analysis produced resultant *Z*-value activation maps for each script condition for each group.

#### *Regions Of Interest (ROIs)*

To investigate functional lateralization in regions involved in word reading, we created regions of interest (ROIs) based on the functional activation patterns during the visual one-back matching task. To create the ROIs, first we combined the three group-level activation maps thresholded at *p <* 0*.*05 (corrected) for three pronounceable script conditions—Kana, Kanji, and English those obtained from the respective L1 groups (i.e., the Kana and Kanji conditions from the J1/E2 group; the English condition from the E1/J2 group). The resultant combination map for group-level activation was then binarized to obtain clusters of overlap that contained voxels activated by the three script conditions (**Figure 2**). Consequently, we identified 11 clusters (6 in the left hemisphere and 5 in the right hemisphere) as being activated by all 3 conditions: the bilateral occipital pole (OP), bilateral posterior lateral occipital complex (pLOC), bilateral intraparietal sulcus (IPS), bilateral precentral gyrus/inferior frontal gyrus (PCG/IFG), bilateral insula, and left temporo-parietal junction (TPJ).

Based on the well-demonstrated word-specific activation in the left fusiform gyrus (FFG), which is known as the Visual Word Form Area (Cohen et al., 2002; Dehaene et al., 2002), we also examined two FFG sub-clusters which were located within the pLOC clusters. For each cluster (and sub-cluster), we created a 6 mm-radius spherical seed centered on the peak MNI coordinates (note: in order to detect the peak MNI coordinates, the combination map for the group-level activation was masked by the binary map and then averaged across the time-series). In addition, we created a sphere centered on the right hemisphere homolog of the left tempo-parietal junction (note; the right temporo-parielta junction was not commonly activated by the three script conditions). These procedures resulted in the creation of 14 seeds for 7 regions of interest (**Table 1** and **Figure 2**).

To confirm that two seeds, which appeared to be homologs in the left and right hemispheres, were actually located in homologous regions, we first visually inspected the clusters superimposed on the Harvard-Oxford Cortical Structural Atlas. In addition, we then calculated the Euclidean distance between pairs of homologous seeds from the peak MNI coordinates, having mirrored the left hemisphere coordinates into the right hemisphere. For example, with the pLOC seeds, we computed the separation between MNI coordinates (Left pLOC: −46, −70, −8) and (Right pLOC: 40, −78, −10), where the x-coordinates were mirror flipped about the x-axis (to yield 46, −70, −8 for the left pLOC). No pair of homologous seeds had a peak separation greater than 13 mm (2 mm for OP, 10.2 for pLOC, 7.2 for FFG, 0 for TPJ, 12.9 for IPS, 4.5 for PCG/IFG, and 4.7 for the insula). These results verify that all seed ROIs, except for the temporo-parietal junction, exhibited significant bilateral activation for all the word conditions.

To obtain the BOLD signal changes in each seed, we extracted the contrast of parameter estimates (COPE) values using FSL Featquery for each script condition for each individual participant. The maximum value within the respective seed was taken, rather than the mean value, to avoid the misinterpretation of resultant laterality index values that can be caused by negative BOLD signal changes (Jansen et al., 2006; Seghier, 2008 for

**Table 1 | MNI coordinates of seeds.**


*ROIs, Regions of Interest; OP, Occipital Pole; pLOC, posterior Lateral Occipital Complex; FFG, Fusiform Gyrus; TPJ, Temporo-Parietal Junction; IPS, IntraParietal Sulcus; PCG/IFG, PreCentral Gyrus/Inferior Frontal Gyrus. \*Derived from a subcluster within the pLOC cluster.*

review). Although it is relatively common to use mean values for ROI analyses, recent studies have indicated that the maximum and 90th percentile measures of percent BOLD signal change can be considered to be more representative measures of a typically active voxel within the ROI (Arthurs and Boniface, 2003; Buck et al., 2008).

#### **LATERALITY INDEX (LI)**

For each ROI, a laterality index (LI) was calculated based on the magnitude of BOLD signal changes for each script condition for each individual. In general, the degree of activation is considered to be a more robust LI measure than the number of activated voxels (Jansen et al., 2006). We used the formula: LI = (RH − LH)/(RH + LH), where LH and RH represent the percent signal change for the left and right hemispheres, respectively. A negative LI value indicates a tendency to leftward lateralization, whereas a positive LI value indicates rightward lateralization. LI values range from −1 (only active in the LH) to +1 (only active in the RH), and participants with LI *<* −0.2 were considered as having left hemisphere dominance, and those with LI *>* 0.2 were categorized as having right hemisphere dominance (see Seghier, 2008 for review). Differences between the L1 and L2 script conditions within each group, as well as those between groups for each condition, were tested using pair-wise and independent t statistics, respectively.

#### **RESULTS**

#### **COGNITIVE PERFORMANCE OUTSIDE THE SCANNER**

**Table 2** gives a summary of the demographic and cognitive measures (performed outside the scanner) for each group. Although all participants achieved high performance on English word reading (WRAT), the three groups differed in their WRAT accuracy scores (*F* = 8*.*0, *p <* 0*.*01); the mean accuracy was significantly lower in the English L2 group (J1/E2) than in the two native English groups (for the E1/J2 group *t* = 3*.*2, *p <* 0*.*01; for the control group *t* = 3*.*2, *p <* 0*.*01). For the Kana Word Reading Test, all participants in the Japanese L1 group (J1/E2) achieved 100% accuracy, and there was a ceiling effect (91% accuracy) in the L2 group of Japanese (E1/J2). The ceiling effect observed in the E1/J2 group is in line with a previous finding that even pre-school Japanese children are able to achieve extremely high accuracy in regular Kana word reading (Shimamura and Mikami, 1994). However, the mean reaction time for Kana word reading differed significantly between the two groups: the J1/E2 group read more quickly than the E1/J2 group (*t* = 5*.*3, *p <* 0*.*001). Hence, for further analyses, we used only the reaction time for Kana reading. For the Kanji Word Reading Test, mean accuracy differed significantly between the groups, with the J1/E2 group scoring significantly higher than the E1/J2 group (*t* = 18*.*0, *p <* 0*.*001). These results demonstrate that L2 reading competency had not reached native levels in either L2 group.

#### **IN-SCANNER TASK PERFORMANCE ACCURACY**

**Table 3** summarizes the mean accuracy scores for each script condition in each group. Among the scripts, accuracy was higher for the L1 condition(s) than the L2 condition(s) in each L2 group: the J1/E2 group exhibited significantly higher performance for the two L1 conditions relative to the L2 English condition (relative to Kana, *t* = 5*.*6, *p <* 0*.*01; relative to Kanji, *t* = 6*.*0, *p <* 0*.*01). Similarly, the E1/J2 group exhibited significantly higher



*J1/E2, Japanese L1/English L2 group; E1/J2, English L1/Japanese L2 group; Diff., Difference; SD, Standard Deviation; M, Male; F, Female; Raven, the Raven's Advanced Progressive Matrices (max.* = *36); WRAT, the Wide Range Achievement Test (max.* = *44); AC, Accuracy (number correct out of the respective maximum scores); RT, Response Time; Kana, Kana Word Reading Test (max.* = *20); Kanji, Kanji Word Reading Test (max.* = *60).*



*J1/E2, Japanese L1/English L2 group; E1/J2, English L1/Japanese L2 group; Diff., Difference; SD, Standard Deviation; NS, not significant.*

performance for the L1 English condition than the two L2 conditions (relative to Kana, *t* = 4*.*1, *p <* 0*.*01; relative to Kanji, *t* = 5*.*4, *p <* 0*.*01). As expected, the control group exhibited significantly higher accuracy for the L1 English condition relative to the other conditions (Kana, Kanji, Tibetan) that were unfamiliar and unpronounceable to this group.

Among the groups, the mean accuracy differed in the Kana, Kanji and English conditions. For the Kana condition, the J1/E2 group exhibited significantly higher accuracy than the control group (*t* = 3*.*0, *p <* 0*.*01), but not the E1/J2 group. For the Kanji condition, the J1/E2 group exhibited significantly higher accuracy than both the E1/J2 (*t* = 3*.*3, *p <* 0*.*01) and control groups (*t* = 3*.*3, *p <* 0*.*01). For the English condition, the mean accuracy was significantly lower in the J1/E2 group than in the two groups of L1 English readers (the E1/J2, *t* = 4*.*6, *p <* 0*.*01; the control group, *t* = 10*.*1, *p <* 0*.*001). Notably, for the Tibetan condition, the three groups did not differ in mean accuracy scores (*F* = 0.18, *p* = 0*.*83). This indicates that the three groups were well-matched for basic visual processing abilities.

#### **LATERALITY INDEX FOR EACH ROI**

Before addressing our two primary questions, we report the patterns of functional lateralization for each of the L1 and L2 conditions in each group. The mean laterality index values are illustrated in **Figure 3** for the primary 6 ROIs and in **Supplementary Figure 1A**. for the FFG. Laterality index values smaller than −0.2 and values larger than 0.2 are considered to be significantly left-lateralized and right-lateralized, respectively (Seghier, 2008). In all the groups, significant leftward lateralization was observed for their respective L1s in pLOC, TPJ, IPS, PCG/IFG, and FFG (but not in either OP or insula), replicating previous findings showing strong leftward lateralization for L1 word reading within the known reading network (e.g., Xue et al., 2004; Seghier and Price, 2011).

For the respective L2s, in the J1/E2 group, significant leftward lateralization for L2 English was observed in the majority of the ROIs, except for OP and insula. Similarly, in the E1/J2 group, significant leftward lateralization for L2 Kana was observed in the majority of the ROIs, except for the OP and insula, whereas L2 Kanji exhibited significant leftward lateralization only in the TPJ. Neither L2 group showed leftward lateralization in the FFG (the sub-ROI within the pLOC) for the L2 conditions.

For the Tibetan condition, during which activation was expected to be associated only with visual processing (e.g., recognition, working memory), no group exhibited significant leftward lateralization in any ROI. Of note, this unpronounceable letterstring condition exhibited weak (though not statistically significant) leftward lateralization in the IPS and pLOC, the latter of which is consistent with a previous study (Seghier and Price, 2011). In the control group, as expected, significant leftward lateralization was observed only for the English condition. These results verify that activation during this fMRI task can serve to assess the functional lateralization of word reading.

#### *Within-group comparisons*

Below, we describe the patterns of functional lateralization for each script condition in each group. We performed pair-wise *t*-tests to compare the L2 script conditions to the L1 script conditions for each ROI in each L2 group (J1/E2 and E1/J2).

*Occipital pole (OP).* This region exhibited no significant lateralization in any script condition in any group. No significant differences were observed between L1 and L2 conditions in either the J1/E2 group or the E1/J2 group.

*Posterior lateral occipital complex (pLOC).* In the J1/E2 group, no difference in functional lateralization was observed between the L1 Kana and L2 English conditions (*t* = 0*.*96, *p* = 0*.*35), but the pLOC activation was significantly more left lateralized for the L2 English condition relative to the L1 Kanji condition (*t* = 3*.*07, *p <* 0*.*01). In the E1/J2 group, pLOC was significantly more left lateralized for the L1 English condition relative to the L2 Kanji condition (*t* = 3*.*07, *p <* 0*.*01), but not for L1 English relative to the L2 Kana condition (*t* = 1*.*25, *p* = 0*.*23). When looking at differences between the two Japanese scripts (Kana vs. Kanji), in the J1/E2 group the leftward lateralization of pLOC was significantly stronger for the L1 Kana condition than the L1 Kanji condition (*t* = 2*.*81, *p <* 0*.*05). Though not statistically significant (*t* = 1*.*23, *p* = 0*.*24), the E1/J2 group also exhibited a tendency toward stronger leftward lateralization for the L2 Kana condition than the L2 Kanji condition in this region. That is, irrespective of whether it was participants' L1 or L2, the logographic

Kanji condition tended to show weaker leftward lateralization relative to the syllabic Kana and alphabetic English conditions.

*Fusiforum gyrus (FFG).* In the J1/E2 group, no difference in functional lateralization was observed either between the L1 Kana and L2 English conditions or between the L1 Kanji and L2 English conditions. In the E1/J2 group, the FFG activation was significantly more left lateralized for the L1 English condition relative to the L2 Kanji condition (*t* = 2*.*3, *p <* 0*.*05).

*Temporo-parietal junction (TPJ).* This region's lateralization patterns were similar to those observed in pLOC. In the J1/E2 group, the TPJ was significantly more left lateralized for the L2 English condition relative to the L1 Kanji condition (*t* = 3*.*05, *p <* 0*.*01), but no difference was observed between the L1 Kana and L2 English conditions (*t* = 0*.*19, *p* = 0*.*85). In the E1/J2 group, there was no significant difference between the L1 English and L2 Kanji conditions (*t* = 0*.*83, *p* = 0*.*41). No significant difference was observed between the L1 English and L2 Kana conditions (*t* = 0*.*10, *p* = 0*.*92). When comparing the two Japanese scripts, in the J1/E2 group leftward lateralization of the TPJ was significantly stronger for the L1 Kana condition than the L1 Kanji condition (*t* = 3*.*39, *p <* 0*.*01). Though not statistically significant (*t* = 0*.*51, *p* = 0*.*62), the E1/J2 group exhibited a tendency toward stronger leftward lateralization for the L2 Kana condition than the L2 Kanji condition. Similar to the pLOC, irrespective of whether it was L1 or L2, the logographic Kanji condition tended to show weaker leftward lateralization in the TPJ relative to the syllabic Kana and alphabetic English conditions.

*Intraparietal sulcus (IPS).* In the J1/E2 group, no significant group difference in laterality index was observed between L1 Kana or Kanji and the L2 English script. However, in the E1/J2 group, the IPS was significantly more left lateralized for the L1 English condition compared to both the L2 Kanji script (*t* = 3*.*08, *p <* 0*.*01) and the L2 Kana script (*t* = 2*.*29, *p <* 0*.*05). No group showed significant differences between the Kana and Kanji conditions in the IPS.

*Precentral/Inferior frontal gyrus (PCG/IFG).* In the J1/E2 group, there were no significant differences between either of the L1 conditions and the L2 English condition (Kana vs. English: *t* = 0*.*41, *p* = 0*.*69; Kanji vs. English: *t* = 0*.*19, *p* = 0*.*85). Similarly, no L1 vs. L2 differences were observed in the E1/J2 group (English vs. Kana: *t* = 0*.*04, *p* = 0*.*97; English vs. Kanji: *t* = 0*.*97, *p* = 0*.*35). No group showed a significant difference between the Kana and Kanji conditions.

*Insula.* Similar to the OP, this region exhibited no significant lateralization in any script condition in any group. When L1 and L2 conditions were compared, in the J1/E2 group the L2 English condition was significantly more left lateralized relative to the L1 Kanji condition (*t* = 2*.*28, *p <* 0*.*05), but not relative to the L1 Kana condition (*t* = 0*.*99, *p* = 0*.*34). The E1/J2 group showed no significant differences between the L1 English condition and the L2 Japanese conditions in the insula.

#### *Confirmatory analysis between groups*

First, there was no group difference for the Tibetan condition in any ROI (**Figure 4** and **Supplementary Figure 1B**), confirming that all the groups were well-matched on basic visual processing abilities. This result excludes the possibility that group differences in basic visual processing skills might have confounded patterns of functional lateralization for the pronounceable word conditions during the visual one-back matching task. Second, for the English condition—which all groups were able to read—there were no group differences in the degree of leftward lateralization in any ROI. For each of the Japanese script conditions, no significant difference between L1 and L2 groups was observed in any ROI except for the insula. These results from the group comparisons suggest that the degree of leftward lateralization for word reading was not always stronger for L1 than L2 readers, even when reading proficiency was higher in L1 readers than in L2 readers.

#### **CONFIRMATORY ANALYSIS IN pLOC**

Seghier and Price (2011) emphasize that a pattern of functional lateralization in the ventral part of the occipito-temporal cortex needs to be considered in lights of the relative contribution of right hemisphere activation. Hence, we plotted BOLD signal changes in the left and right hemisphere of pLOC for each script condition relative to the resting in the J1/E2 group (**Supplementary Figure 2**). For all the conditions except for the Tibetan condition, the magnitude of BOLD signal changes was significantly higher in the left hemisphere than the right

Occipital Pole; pLOC, posterior Lateral Occipital Complex; TPJ,

English L1 control group.

hemisphere, leading to significant leftward lateralization for the pronounceable script conditions. However, the Kanji condition yielded greater activation in the right pLOC relative to the Kana and English conditions (*F* = 8.8, *p <* 0*.*01), but this pattern was absent in the left pLOC. This relatively stronger right pLOC activation contributed to the observed weaker leftward lateralization in pLOC for the logographic Kanji condition. Our result is consistent with a previous observation that right hemisphere activation is disengaged from the left hemisphere activation in this region specifically for processing words (but not for unfamiliar letter-strings) (Seghier and Price, 2011).

#### *Behavioral relevance*

We examined the extent to which L2 word reading competency correlates with the degree of functional lateralization for visual word processing. For L2 English reading performance in the J1/E2 group, the WRAT accuracy correlated inversely with the laterality index values in the TPJ during both L2 English (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*47, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01) and L1 Kana (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*37, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05) conditions (**Figure 5**). That is, L2 readers of English who had better reading performance tended to exhibit stronger leftward functional lateralization in the TPJ not only when processing English words but also when processing Kana words. This correlation was not seen in the L1 English group. No other ROIs other than the TPJ showed significant relationships with the WRAT accuracy scores. For L2 Kanji reading performance, accuracy on the Kanji Word Reading Test correlated positively with the laterality index values in pLOC during both L2 Kanji (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*40, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05) and Tibetan (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*45, *p <* 0*.*01) conditions (**Figure 6**). That is, L2 readers of Japanese who had better reading performance in L2 Kanji tended to exhibit stronger rightward functional lateralization in pLOC not only while processing Kanji but also while processing Tibetan. Like the results for the TPJ, the laterality index in the pLOC showed no significant relationship with Kanji reading scores in the Japanese L1 group. No ROIs other than the pLOC showed significant relationships with the Kanji reading accuracy scores. Neither accuracy nor reaction time for the Kana Word Reading Test was significantly associated with laterality index values in any ROI.

# **DISCUSSION**

In this study, we investigated the functional lateralization for visual word processing in two types of L2 groups, primarily focusing on the effects of visual complexity and orthographic depth on the laterality index for L2 reading. Consistent with our prediction, irrespective of whether it was participants' L1 or L2, visually complex logographic Kanji was associated with weaker leftward lateralization in the pLOC, relative to both alphabetic English and syllabic Kana conditions. This finding indicates that weaker leftward lateralization in pLOC results from the greater visuo-spatial processing demands of visually complex logographic symbols that require more right hemisphere processing (e.g., Lycke et al., 2008; Gotts et al., 2013), rather than specifically reflecting the L2 readers' extra efforts required to learn the new script. However, contrary to our prediction, there was no effect of orthographic depth on functional lateralization for L2 reading in late L2 readers. That is, no stronger leftward lateralization was observed for L2 English than for L1 Kana in the J1/E2 group. With regard to brain-behavior relationships, better L2 reading performance in English was associated with stronger leftward lateralization in the TPJ, a core region for phonological processing (e.g., Jobard et al., 2003), during both L2 English and L1 Kana conditions. For logographic L2 reading, better performance in Kanji was associated

with weaker leftward lateralization in the pLOC during both the L2 Kanji and the control Tibetan conditions.

#### **WEAKER LEFTWARD LATERALIZATION IN pLOC FOR LOGOGRAPHIC KANJI**

In the group for whom Japanese was the L2 (E1/J2), the degree of leftward lateralization in the pLOC was weaker for L2 Kanji relative to L1 English. However, this weaker leftward lateralization for Kanji in the pLOC does not specifically indicate L2 readers' extra efforts, because the native Japanese readers (J1/E2) also exhibited weaker leftward lateralization for L1 Kanji relative not only to L1 Kana but also to L2 English. This region is responsive specifically to higher-level letter shape information (Kourtzi and Kanwisher, 2001) and to visual complexity (Xu and Chun, 2006), rather than to lower-level features (e.g., edges) during visual object recognition. Hence, processing Kanji may require more global/holistic visuo-spatial processing due to its greater level of visual complexity, for which the right hemisphere is more specialized. Consequently, the right pLOC is additionally activated during Kanji reading, which contributes to the reduction of leftward lateralization in the pLOC for Kanji, irrespective of whether the script was the L1 or L2. Support for this suggestion comes from a recent fMRI study, which showed that logographic training in artificial symbols (mapping of logographic-like symbols to corresponding sounds) reduced the degree of leftward lateralization in the pLOC (posterior fusiform gyrus in their terminology) (Mei et al., 2013).

Although weaker leftward lateralization for logographic Kanji relative to other pronounceable phonographic scripts was observed in both L1 and L2 readers of Japanese, its relevance to reading competency differed between the two groups. L2 readers of Japanese (E1/J2) who exhibited weaker leftward lateralization in pLOC while processing Kanji or Tibetan tended to have better reading performance in Kanji (i.e., the E1/J2 readers with more rightward lateralization had the highest Kanji reading scores). However, this relationship was not seen in the L1 group of Japanese (J1/E2). Recently, Gotts et al. (2013) have demonstrated a dissociation in the specialized functions of the left and right hemispheres, highlighting that rightward lateralization predicts better visuo-spatial attentional performance. As the Kanji and Tibetan conditions were more visually demanding (the Kanji symbols are visually complex and the Tibetan letter strings could only be processed visually), the brain-behavior relationships in the E1/J2 group indicate that strong rightward lateralization, rather than strong leftward lateralization, may be beneficial for visual processing of logographic words in late L2 readers of Japanese. In other words, reliance on right hemisphere functions may allow for the successful recognition of the visually demanding logographic symbols in less proficient L2 readers of logographic scripts.

Consistent with previous studies (Nakamura et al., 2005b; Seghier and Price, 2011), leftward lateralization was found for both pronounceable and unpronounceable scripts. As suggested by Seghier and Price (2011), acquiring reading expertise can reduce the involvement of the right hemisphere, which can result in the increased degree of leftward lateralization in pLOC for learnt symbols/letters. This is supported by our finding showing stronger leftward lateralization for pronounceable conditions relative to unpronounceable conditions (e.g., the Tibetan for the J1/E2 and E1/J2 groups). That said, the effect of reading expertise on leftward lateralization in the pLOC may be limited to visually demanding logographic symbols, even in L1 logographic readers, as was clearly shown in the weaker leftward lateralization for Kanji than Kana in the Japanese L1 group.

#### **NO STRONGER LEFTWARD LATERALIZATION FOR L2 ALPHABETIC ENGLISH**

Contrary to our prediction, there was no stronger leftward lateralization observed for L2 English reading in the J1/E2 group, indicating a limited effect of orthographic depth on functional lateralization for L2 reading. However, considering our previous findings showing greater activation in a phonological region in the left hemisphere for L2 English reading relative to L1 Kana reading during a phonological task (Koyama et al., 2013), it remains possible that the visual task used may not have been sensitive enough to assess L2 readers' extra efforts to cope with the greater phonological demands of L2 English reading. Mei et al. (2013) demonstrated that phonological training (phonetic mapping) using artificial symbols increased leftward lateralization in the posterior fusiform gyrus (adjacent to pLOC in the current study). This effect of phonetic learning/experience was clearly reflected in the pattern of functional lateralization in the control group: there was stronger leftward lateralization in the majority of ROIs for the English condition relative to the other unpronounceable/unfamiliar conditions. Therefore, it is possible that increased leftward lateralization associated with increased phonological demands for L2 reading might be observed during a phonological task.

Although no difference was observed between the L1 and L2 phonographic scripts, functional lateralization in the TPJ was relevant to reading competency only in the L2 readers of English (J1/E2 group). More specifically, higher L2 reading scores in English were associated with stronger leftward lateralization in the TPJ, which has been shown to be crucial for phonological decoding (e.g., Welcome and Joanisse, 2012), during not only the L2 English condition but also the L1 Kana condition. This result indicates that functional lateralization for written L1 words can predict word reading competency in L2 when both L1 and L2 are phonographic scripts. This is consistent with previous behavioral findings that phonological and reading skills in an alphabetic L1 can predict later L2 reading in another alphabetic script (Sparks et al., 2006, 2012). Of note, this cross-script transfer of reading competency has been observed between L1 logographic and L2 alphabetic reading at the behavioral level (Chuang, 2011), but was absent in the current fMRI study.

In conclusion, the current study provides evidence that weaker leftward lateralization is associated with greater involvement of right hemisphere visuo-spatial processing, rather than specifically reflecting L2 readers' additional efforts. Visually complex logographic symbols rely more on the functions of the right hemisphere; particularly those of the right posterior lateral occipital complex (pLOC), relative to phonographic symbols (alphabetic and syllabic), even after extensive reading experience (evident in L1 readers). For late L2 readers of logographic scripts (Japanese Kanji and probably Chinese), strong rightward lateralization (rather than strong leftward lateralization) in pLOC may be beneficial for L2 word reading, at least for visual word recognition. In contrast, when L1 and L2 are both phonographic scripts where symbols are mapped onto sounds, stronger leftward lateralization in the temporo-parietal junction (a region crucial for phonological processing) during L1 word processing can predict better L2 reading competency. Further research is necessary to investigate functional lateralization in a larger sample during reading tasks that selectively tap either phonological or semantic components, as well as its relevance to L2 reading performance.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00249/abstract

**Supplementary Figure 1 | (A)** Functional lateralization in the fusiform gyrus (FFG) for each script condition, and **(B)** group comparisons of functional lateralization for the Tibetan condition in the FFG. Vertical error bars on data points represent the standard error of the mean. LH, Left Hemisphere; RH, Right Hemisphere; LI, Laterality Index; J1/E2, Japanese L1/English L2 group; E1/J2, English L1/Japanese L2 group; Control, Monolingual English L1 control group. ∗∗*p <* 0*.*01, ∗*p <* 0*.*05.

**Supplementary Figure 2 | The BOLD signal changes in the left and right pLOC for each script condition relative to the resting condition in the J1/E2 group.** LH, Left Hemisphere; RH, Right Hemisphere. ∗*p <* 0*.*01.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 November 2013; accepted: 04 April 2014; published online: 24 April 2014. Citation: Koyama MS, Stein JF, Stoodley CJ and Hansen PC (2014) A cross-linguistic evaluation of script-specific effects on fMRI lateralization in late second language readers. Front. Hum. Neurosci. 8:249. doi: 10.3389/fnhum.2014.00249*

*This article was submitted to the journal Frontiers in Human Neuroscience. Copyright © 2014 Koyama, Stein, Stoodley and Hansen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or*

*reproduction is permitted which does not comply with these terms.*

# Brain activation during phonological and semantic processing of Chinese characters in deaf signers

#### *Yanyan Li 1, Danling Peng1, Li Liu1, James R. Booth2 \* and Guosheng Ding1,3\**

*<sup>1</sup> State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China*

*<sup>2</sup> Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA*

*<sup>3</sup> Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing, China*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*David Corina, UC Davis, USA Jie Zhuang, Duke University, USA*

#### *\*Correspondence:*

*James R. Booth, Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Frances Searle Building, 2240 Campus Drive, Room 2-352, Evanston, IL 60208-2952, USA e-mail: j-booth@northwestern.edu; Guosheng Ding, State Key Laboratory of Cognitive Neuroscience and Learning, Brain Imaging Centre, Beijing Normal University, Rm. 206, 19 Xinjiekou Wai Street, Haidian, Beijing 100875, China e-mail: dinggsh@bnu.edu.cn*

Previous studies found altered brain function in deaf individuals reading alphabetic orthographies. However, it is not known whether similar alterations of brain function are characteristic of non-alphabetic writing systems and whether alterations are specific to certain kinds of lexical tasks. Here we examined differences in brain activation between Chinese congenitally deaf individuals (CD) and hearing controls (HC) during character reading tasks requiring phonological and semantic judgments. For both tasks, we found that CD showed less activation than HC in left inferior frontal gyrus, but greater activation in several right hemisphere regions including inferior frontal gyrus, angular gyrus, and inferior temporal gyrus. Although many group differences were similar across tasks, greater activation in right middle frontal gyrus was more pronounced for the rhyming compared to the meaning task. Finally, within the deaf individuals better performance on the rhyming task was associated with less activation in right inferior parietal lobule and angular gyrus. Our results in Chinese CD are broadly consistent with previous studies in alphabetic languages suggesting greater engagement of inferior frontal gyrus and inferior parietal cortex for reading that is largely independent of task, with the exception of right middle frontal gyrus for phonological processing. The brain behavior correlations potentially indicate that CD that more efficiently use the right hemisphere are better readers.

#### **Keywords: congenitally deaf, reading, rhyming, meaning, fMRI**

### **INTRODUCTION**

It is generally believed that there is an intimate connection between language acquisition and subsequent reading development (Perfetti, 1987). Increasing evidence indicates that spoken language skills, especially the child's sensitivity to phonological structures, are fundamental and essential for early and long-term reading acquisition (Dickinson et al., 2006). Correspondingly, one prominent theory argues that reading acquisition builds on the mapping from orthography to phonology, and that a word's meaning will become accessible via the existing phonology-tosemantics link in the speech system (Chall, 1967; Perfetti, 1987). Thus, the children's phonological sensitivities play a pivotal role in the reading development (Temple et al., 2003; Vellutino et al., 2004).

Congenitally and profoundly deaf children cannot access speech before learning to read. This makes the process of learning to read in deaf individuals distinct from the hearing individuals (Perfetti and Sandak, 2000; Geers, 2003). Hence, one may wonder if the deaf individuals can be aware of phonology and how their phonological representations affect their reading development. As to whether deaf people can be aware of phonology, there is inconsistency in previous studies. About half of the studies have found evidence for phonological coding and awareness in deaf individuals, whereas about half have not (Mayberry et al., 2011). Some deaf individuals are aware of phonology, suggesting that they can obtain phonological knowledge from visual and/or articulatory modalities (Dodd and Hermelin, 1977; Hanson and Fowler, 1987). However, they still perform more poorly than hearing individuals on a variety of phonological tasks (Hanson and Fowler, 1987; Campbell and Wright, 1988; Sterne and Goswami, 2000; Aparicio et al., 2007; MacSweeney et al., 2008).

Deaf individuals also seem to have difficulty in the semantic knowledge of written words (Ormel et al., 2010). Because deaf readers may not automatically access phonology, semantic knowledge may provide an important source of reading support (Kyle and Harris, 2006). However, most research has suggested that deaf individuals have semantic processing deficits in alphabetic writing systems (Green and Shepherd, 1975; Marschark et al., 2004; Ormel et al., 2010). For example, both hearing and deaf groups showed significant unmasked priming RT effects and N400 effects, whereas only hearing individuals showed a behavioral effect during masked priming (MacSweeney et al., 2004). Thus, when more automatic word processing is required, the impact of language experience or reading level becomes evident. On the other hand, a similar right visual field advantage was found during a semantic task in deaf and hearing individuals (D'Hondt and Leybaert, 2003), and therefore it is possible that semantics is less likely to be affected in deaf individuals.

Functional magnetic resonance imaging (fMRI) studies in alphabetic languages have also found that deaf individuals show an altered reading network (Neville et al., 1998; Aparicio et al., 2007). Specially, Neville et al. (1998) tested congenitally deaf individuals during silent reading of written sentences, and found that deaf individuals show robust activation in bilateral prefrontal areas and inferior parietal lobule. Aparicio et al. (2007) investigated pre-lingually deaf individuals with lexical and rhyming decision tasks to written words, and found greater activations in the opercular part of the left inferior frontal gyrus, left inferior parietal lobule and right inferior frontal gyrus. These authors suggested that deaf individuals might preferentially rely on the rulebased letter-sound mappings to overcome their poorly specified phonological representations of words.

To the best of our knowledge, there is no fMRI study exploring the neural mechanisms of Chinese character reading in deaf individuals, using either phonological or semantic tasks. Chinese characters represent the phonology and semantics of the spoken languages differently from that of the alphabetic words. For example, spoken Chinese is highly homophonic, with a single syllable having many distinct meanings, and the writing system encodes these homophonic syllables in its major graphic units, the characters. Thus, when learning to read, a Chinese child is confronted with the fact that a great number of written characters correspond to the same syllable, and phonological information is insufficient to access semantics of a printed character. In addition, many Chinese characters encode meaning by including a semantic radical. Furthermore, the relationship between writing skills and Chinese reading is stronger than that between phonological awareness and reading (Tan et al., 2005a).

Previous behavioral studies have found that Chinese deaf individuals have poorer reading ability than hearing individuals (Feng, 2000). As to whether Chinese deaf individuals can be aware of phonology, previous studies have found Chinese deaf individuals have reduced phonological ability. For example, the spelling errors of hearing individuals tend to be substitutions having a similar pronunciation but no visual similarity (homophone errors) to the target character. However, few homophone errors were observed in Chinese deaf individuals (Fok and Bellugi, 1986). In addition, deaf individuals failed to show articulatory suppression effects in a digit span task, suggesting that they were not using a speech-based phonological code (Chincotta and Chincotta, 1996).

Neuroimaging studies have revealed a set of cortical regions shared by alphabetic words and logographic Chinese. The common left hemisphere regions include fusiform gyrus, inferior parietal lobule, middle temporal gyrus, and inferior frontal gyrus (Turkeltaub et al., 2002; Jobard et al., 2003; Price et al., 2003; Tan et al., 2005b). Different nodes of this network are thought to be associated with specific cognitive processes in reading and in oral language more generally. The middle portion of fusiform gyrus (close to the inferior temporal gyrus), labeled by some as the visual word form area (VWFA), has been implicated in the computation of orthographic processes (Cohen et al., 2000; Tan et al., 2001; Vinckier et al., 2007). The left inferior parietal lobule seems to play an important role in mapping of written symbols to the phonological representations (Booth et al., 2002a, 2004, 2006; Eden et al., 2004). The left middle temporal gyrus is thought to be involved in representing semantic information (Booth et al., 2002b, 2006). Finally, the left inferior frontal gyrus is thought to be involved in controlled retrieval and selection, with the dorsal portion (opercular and triangular parts) being involved in phonology (Fiez et al., 1999; Poldrack et al., 1999; Cao et al., 2010) and the ventral portion (orbital parts) being involved in semantics (Poldrack et al., 1999; Friederici et al., 2000; Booth et al., 2006).

Logographic Chinese characters markedly differ from alphabetic words in the nature of their orthography and how they represent the phonology of spoken language. These differences seem to be associated with cross-linguistic differences in their neural basis. The specialized regions for Chinese reading appear to include the right ventral occipito-temporal cortex and left middle frontal gyrus (Bolger et al., 2005; Tan et al., 2005b). The greater involvement of right ventral occipito-temporal cortex is presumably reflecting the greater spatial analysis required of Chinese character recognition (Cao et al., 2009). The left middle frontal gyrus is assumed to serve as a long-term storage center for addressed phonology (Tan et al., 2005b). On the other hand, alphabetic writing systems seem to rely more on the posterior portion of left superior temporal gyrus, which appears to be responsible for assembled phonology (Tan et al., 2003; Eden et al., 2004).

Cross-linguistic differences and similarities in the neural bases of reading have also been investigated in developmental studies. Neuroimaging studies on reading alphabetic words have found that learning to read is associated with enhanced involvement of left fusiform gyrus involved in visual word form recognition (Booth et al., 2004; Brem et al., 2006), left inferior parietal lobule involved in orthography-phonology mapping (Bitan et al., 2007; Booth et al., 2007), left middle temporal gyrus involved in semantic processing (Turkeltaub et al., 2003; Chou et al., 2006) and in left inferior frontal gyrus in a variety of tasks (Booth et al., 2001, 2004; Gaillard et al., 2003; Turkeltaub et al., 2003; Szaflarski et al., 2006). In contrast, developmental differences during Chinese character reading is associated with increased activation in right middle occipital gyrus involved in visual-spatial analysis of characters, left inferior parietal lobule involved in phonological processing and left middle frontal gyrus involved in integrating of orthography and phonology (Cao et al., 2009).

In the present study, we explored the neural mechanisms of Chinese character reading in deaf individuals during both phonological and semantic processing tasks. The primary goal was to investigate the extent to which the brain mechanisms involved in reading Chinese characters are determined by early auditory speech experience, so we compared congenitally profoundly deaf to hearing individuals. We adopted a paradigm used in our previous study on children (Cao et al., 2009) and adults (Booth et al., 2006) to examine the neural mechanisms underlying phonological and semantic processing in both group of participants. Based on previous studies, we expected that deaf individuals may show altered recruitment of left-hemisphere language regions and/or increased recruitment of homologs in the right-hemisphere. We also wished to determine whether any of these effects were task specific by examining whether group differences were larger for phonological vs. semantic processing.

# **METHODS**

#### **PARTICIPANTS**

We recruited 27 profoundly congenitally deaf signers (CD, 10 males, mean age 21.3 ± 2.51 years, range 19–28) and 20 hearing controls (HC, 10 males, mean age 21.7 ± 2.20 years, range 19–28). All were right-handed undergraduate or graduate students with no history of neurological or psychiatric illness (except sensorineural hearing loss). All deaf individuals exhibited profound hearing loss (better ear: mean 100.7 ± 8.45 dB, range 91–120; left ear: mean 102.5 ± 8.65 dB, range 91–120; right ear: mean 102.8 ± 8.48 dB, range 91–120). The causes of deafness in all individuals were genetic, pregnancy-related cytomegalovirus or unknown. All deaf individuals had normal intelligence quotient (IQ) scores as determined by Raven's Standard Progressive Matrices (Raven, 1976), as indexed by a higher score than the 50th percentile on the appropriate norms. This test further ensures the deaf participants recruited in our study are not individuals with multiple disabilities, such as mental retardation. None of the deaf individuals wore hearing aids before 6 years old or in the past 3 years. Chinese Sign Language was the primary language of all deaf individuals. Only deaf individuals who got score of 5 on a 6-point scale from each of two experts on Chinese sign language were recruited in our study. Thus, all signers in our study were viewed as experienced signers. Institutional Review Board (IRB) approval was obtained from the State Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University and informed written consent was obtained from all participants.

We only included individuals who met all of the following criteria: (1) overall accuracy was more than 70% for either the rhyming or meaning task (2) the ratio of "yes" responses and "no" responses fell into the range of 35–65% for both the rhyming and meaning task to avoid any response bias, and (3) head motion during the fMRI was less than 3 mm in translation or 3◦ in rotation. Due to these criteria, seven CD were excluded for the rhyming task, and five CD were excluded for the meaning task. After the exclusion, for the rhyming task, there were no significant differences in age [*T*(38) = 0*.*947, *p* = 0*.*350] and gender (χ<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*921, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*337) between the CD and HC. For the meaning task, there were no significant differences in age [*T*(40) <sup>=</sup> <sup>0</sup>*.*495, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*623] and gender (χ<sup>2</sup> <sup>=</sup> <sup>1</sup>*.*437, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*231) between the CD and HC.

#### **EXPERIMENTAL PROCEDURE**

In both the rhyming and meaning tasks, two Chinese characters were sequentially presented in the visual modality and the participants were instructed to determine whether the two characters rhymed during the rhyming task or were semantically related during the meaning task. Each character was presented for 800 ms followed by a 200 ms blank interval. After the second stimulus was removed, a red fixation cross (+) appeared on the screen, indicating the need to make a response during the subsequent interval that jittered between 2200, 2600, and 3000 ms. The participants were asked to press the yes button with their right index finger for matching pairs (rhyming or semantically related) and the no button with their right middle finger for non-matching pairs. Half of the pairs rhymed (or were related in meaning) and half did not. Semantic association strength between the two characters was assessed by 30 Chinese adults using a 7-point scale. They were instructed to judge to what extent the character pairs were related. If the average score was over 4.5 (*M* = 5*.*95), we considered the pairs semantically related. If an average score was below 3 (*M* = 2.18), we considered the pairs semantically unrelated.

For the rhyming task, four lexical conditions (24 pairs in each) independently manipulated the orthographic and phonological relation between the words in the pair. In two non-conflicting conditions, two words in a pair shared an identical phonetic radical and rhymed (R+P+, for example, ခ/gu1/ and ᷟ/ku1/), or two words in a pair had different phonetic radicals and did not rhyme (R−P−, for example, ߹ /liang2/ and 䈻/mou2/). In two conflicting conditions, two words in a pair shared an identical phonetic radical but did not rhyme (R+P−, for example, ဃ/xing4/ and 㜌/sheng4/), or two words in a pair had different phonetic radicals but rhymed (R−P+, for example, ở /ti1/ and վ/di1/). This manipulation was included to that the rhyming judgment could not be based on orthography alone.

For the meaning task, four lexical conditions (24 pairs each) independently manipulated the orthographic and semantic relation between the words in the pair. In two non-conflicting conditions, two words in a pair shared an identical semantic radical and were related in meaning (R+S+, for example, ߧ/cold/ and ߹/cool/), or two words in a pair had different semantic radicals and were not related in meaning (R−S−, for example,Ἵ/plants/ and /canyon/). In two conflicting conditions, two words in a pair shared an identical semantic radical and were not related in meaning (R+S−, for example,ṳ/peach/ andᶯ/board/), or two words in a pair had different semantic radicals but were related in meaning (R−S+, for example, 㳷/snake/ and /bite/). As with the rhyming task, this manipulation was included so that the relatedness judgment could not be based on orthography alone.

Two types of control trials were used for each task. In the perceptual trials, two unfamiliar Tibetan symbols were presented sequentially in the visual modality. The participants were instructed to press the yes button to identical pairs with their right index finger (for example, and ) and the no button to different pairs with their right middle finger (for example, and ). Half of the symbol pairs were identical and half were not. For the fixation trials, participants were instructed to press a button when a black fixation-cross turned blue. The timing for the control trials was the same as for the lexical trials. 24 perceptual trials and 48 fixation trials were used in the each task. The order of lexical and control trials and was optimized for event-related design using OptSeq (http://www*.*surfer*.*nmr*.*mgh*.*harvard*.*edu/ optseq).

#### **STIMULUS CHARACTERISTICS**

Previous studies have found the mean reading age of deaf individuals to be lower than hearing controls (Conrad, 1979; Holt, 1993). To ensure all deaf individuals in our study were familiar with the characters, we selected stimuli from Chinese language textbooks from Grade 1 to Grade 6 in primary schools. The characters (both the first and second words in the pairs) were matched for frequency, age of acquisition (the term when a character is first shown in Chinese language textbooks) (Xing et al., 2004) and strokes across the rhyming and meaning task. In addition, the two characters in the pairs for the rhyming task shared an identical lexical tone, so that this information could not interfere with the rhyming judgment.

#### **DATA COLLECTION**

Functional and structural images were acquired on a Siemens 3T Tim Trio scanner. Gradient-echo localizer images were acquired to determine the placement of the functional slices. Imaging parameters of reading tasks were: 32 axial slices with an echoplanar imaging (EPI) pulse sequence, repetition time of 2000 ms, echo time of 20 ms, flip angle of 80◦, slice thickness of 3 mm, gap of 0.48 mm, FOV = 220 × 206.25 mm, matrix = 128 × 120 × 32; in plane pixel size = 1*.*71875 × 1*.*71875 mm. Imaging parameters of the T1-weighted anatomical image were: Sagittal acquisition with a 256 × 256 matrix, repetition time of 2530 ms, echo time of 3.45 ms, flip angle of 7◦, number of excitations = 1, 256 mm field of view, 1 mm slice thickness.

#### **DATA ANALYSIS**

We used SPM5 (http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm) for preprocessing. The functional images were corrected for differences in slice-acquisition time to the middle slice and were realigned to the first volume in the scanning session. Participants who showed more than 3.0 mm in translation or 3.0◦ in rotation within a run in any plane were not included. Participants' functional images were co-registered with their own structural MRI images. The coregistered high-resolution structural MRI images were segmented and normalized to the Montreal Neurologic Institute (MNI) template image and spatially re-sampled (2 × 2 × 2 mm). Finally, the images were smoothed with a Gaussian filter of 4 × 4 × 8 mm full width half max (FWHM).

The general linear model was used to estimate condition effects for each participant. There were significant differences in performance between CD and HC in both the rhyming and meaning tasks (see Behavioral Results). To increase the likelihood that the brain differences between the CD and HC were not caused by performance differences, we only included correct items in all analyses. Because this resulted in unequal numbers of items across groups, items were randomly eliminated from HC so that the number items in each condition were equated across groups. There was an average of 78 and 81 lexical pairs during the rhyming and meaning task, respectively. In addition, the CD took longer than the HC to make their decisions. To reduce the effect of reaction time differences on group differences, we covaried this variable when conducting factorial analyses. Two conditions "lexical" and "fixation" were modeled using a canonical hemodynamic response function (HRF) and the contrast "lexical-fixation" was computed. One-sample *t*-tests were applied to determine differences between the lexical and fixation condition, separately for each group and separately for each task. Two-sample *t*-tests were computed separately for the rhyming minus fixation, meaning minus fixation, rhyming minus meaning, and meaning minus rhyming contrasts between groups, with reaction time of lexical judgment as a covariate. We then created a mask of the regions showing group differences either in the rhyming or meaning tasks and the regions showing a group difference in the task effect. Using this mask, we examined positive and negative correlations of accuracy (rhyming/meaning) with signal intensity separately for each task in the CD. We did not examine correlations in the HC as their accuracy was near ceiling. All the reported regions of activation were at *p <* 0*.*05 AlphaSim corrected (*p <* 0*.*005 voxel-level cut-off).

# **RESULTS**

#### **BEHAVIORAL RESULTS**

In order to determine if there were group differences in the reading and fixation conditions, group (CD vs. HC) by task (rhyming vs. fixation, meaning vs. fixation, or rhyming vs. meaning) ANOVAs were calculated separately on RT and accuracy (**Table 1**). Significant differences between the HC and CD were found for RT and accuracy in the reading and fixation conditions [RT: rhyming vs. fixation, *F*(1*,* 38) = 11*.*310, *p <* 0*.*01; meaning vs. fixation, *F*(1*,* 40) = 10*.*862, *p <* 0*.*01; accuracy: rhyming vs. fixation, *F*(1*,* 38) = 50*.*887, *p <* 0*.*01; meaning vs. fixation, *F*(1*,* 40) = 11*.*660, *p <* 0*.*01]. Significant task effects were found for RT [rhyming vs. fixation, *F*(1*,* 38) = 139*.*641, *p <* 0*.*01; meaning vs. fixation, *F*(1*,* 40) = 134*.*749, *p <* 0*.*01] and accuracy [rhyming vs. fixation, *F*(1*,* 38) = 139*.*953, *p <* 0*.*01; meaning vs. fixation, *F*(1*,* 40) = 62*.*637, *p <* 0*.*01]. Interactions were only found for accuracy [rhyming vs. fixation, *F*(1*,* 38) = 47*.*623, *p <* 0*.*01; meaning vs. fixation, *F*(1*,* 40) = 21*.*008, *p <* 0*.*01], but not for RT [rhyming vs. fixation, *F*(1*,* 38) = 0*.*897, *p* = 0*.*350; meaning vs. fixation, *F*(1*,* 38) = 0*.*647, *p* = 0*.*426]. These results indicated that CD showed similarly poor performance in RT in the reading and fixation conditions. Thus, the difference in RT in the reading task between CD and HC are likely due to the different motor plans and/or decision criteria. For reading task, significant group effects were found on RT [rhyming vs. meaning, *F*(1*,* 34) = 5*.*950, *p <* 0*.*05] and accuracy [rhyming vs. meaning, *F*(1*,* 34) = 53*.*184, *p <* 0*.*01]. Significant task effects were found for RT [rhyming vs. meaning, *F*(1*,* 34) = 41*.*558, *p <* 0*.*01], but not accuracy [rhyming vs. meaning, *F*(1*,* 34) = 2*.*060, *p* = 0*.*160]. No interaction was found for either RT [rhyming vs. meaning, *F*(1*,* 34) = 0*.*098, *p* = 0*.*756] or accuracy [rhyming vs. meaning, *F*(1*,* 34) = 3*.*336, *p* = 0*.*077]. These results indicated that CD showed similarly poor performance in both the phonology and semantics. The group differences in the different

**Table 1 | Means (M) and standard deviations (SD) for reaction time (RT in ms) and accuracy (%) in the Rhyming and Meaning tasks for congenitally deaf individuals (CD) and hearing controls (HC).**


lexical conditions are shown in Supplementary Results and Supplementary Table 1.

#### **fMRI ACTIVATION RESULTS**

For both rhyming and meaning tasks, CD and HC showed activation in the reading network including bilateral ventral occipitotemporal cortex (e.g., inferior occipital gyrus and fusiform gyrus), left inferior parietal cortex, left inferior/middle frontal gyrus, and basal ganglia (e.g., putamen and caudate nucleus). These patterns were compatible with previous studies (Tan et al., 2005b; Booth et al., 2006; Chou et al., 2009; Cao et al., 2010). CD appeared to show greater activation in the right inferior parietal cortex and right inferior frontal gyrus. The results were shown in **Figure 1**.

Compared to HC, CD showed less activation in the left inferior frontal gyrus, but greater activation in right hemispheric regions for both the rhyming and meaning tasks, including the triangular part of inferior frontal gyrus, middle frontal gyrus, angular gyrus, inferior temporal gyrus, cingulate gyrus, thalamus, and superior frontal gyrus (**Figure 2**). For the rhyming task, CD also showed less activation in left middle temporal gyrus and right precuneus, but greater activation in right orbital part of inferior frontal gyrus, inferior parietal lobule, and supramarginal gyrus. For the meaning task, CD showed less activation in right inferior occipital gyrus and superior temporal gyrus, but greater activation in right insula (**Tables 2**, **3**, **Figure 2**). We also compared the group differences before partialing for RT. The result is shown in the Supplementary Figure 1 (please see the Supplementary Results) which is very similar to that shown in **Figure 2**.

We also found there was greater activation for CD than for HC on rhyming vs. meaning (**Figure 3**). CD showed greater activation than HC in right middle frontal gyrus (*x* = 38, *y* = 38, *z* = 20) on the rhyming minus meaning contrast. Specially, CD showed greater activation in the rhyming task compared to the meaning task in the right middle frontal gyrus, whereas there were comparable activations across tasks for HC.

When correlating task accuracy with signal intensity during the reading tasks, we found significantly negative correlations between the rhyming task and brain activations. CD who had higher accuracy showed less activation in right angular gyrus and inferior parietal lobule (**Table 4**).

# **DISCUSSION**

To investigate the extent to which the brain mechanisms involved in reading Chinese characters are determined by early auditory speech experience and whether alterations are specific to certain kinds of lexical tasks, we examined the neural mechanisms for the rhyming and meaning judgments of written language in congenitally deaf signers (CD) and hearing controls (HC). Both deaf individuals and hearing controls recruited a left lateralized reading network including ventral occipito-temporal cortex, inferior parietal cortex, and inferior/middle frontal gyrus. This pattern is similar to previous research on hearing Chinese participants (Chou et al., 2009; Cao et al., 2010). Our results are also consistent with previous behavioral studies in alphabetic writing systems by showing that the deaf individuals were less accurate than hearing controls during phonological (Hanson and Fowler, 1987; Campbell and Wright, 1988; Sterne and Goswami, 2000; Aparicio et al., 2007; MacSweeney et al., 2008) and semantic processing (Green and Shepherd, 1975; MacSweeney et al., 2004; Marschark et al., 2004; Ormel et al., 2010). For both tasks, we found that CD showed less activation than HC in left inferior frontal gyrus, but greater activation in several right hemisphere regions including inferior frontal gyrus, angular gyrus, and inferior temporal gyrus. Although many group differences were similar across tasks, greater activation in right middle frontal gyrus was more pronounced for the rhyming compared to the meaning task. Finally, within the deaf individuals better performance on the rhyming task was associated with less activation in right inferior parietal lobule and angular gyrus.

Previous studies have found that learning to read is associated with two patterns of change in brain activation: increased activation in classical left-hemisphere language regions and/or decreased activation in homologous areas in the right-hemisphere (Turkeltaub et al., 2003). Because spoken Chinese is highly homophonic, in learning to read, a Chinese child is confronted with the fact that a great number of written characters correspond to

**FIGURE 2 | Differential activation in congenitally deaf (CD) compared to hearing control individuals (HC) within the rhyming and meaning tasks. (A)** Reduced activation in CD compared to HC in the rhyming (red) and meaning (green) tasks. **(B)** Greater activation in CD compared to HC in the rhyming (red) and meaning (green) tasks. For both **(A)** and **(B)**, yellow indicates overlap. Compared to HC, CD showed reduced activation in left inferior frontal cortex, but greater activation in right inferior frontal, inferior parietal, and inferior temporal cortex, among other regions, for both tasks. The threshold for the whole brain comparisons was set at *p <* 0*.*05 AlphaSim corrected (*p <* 0*.*005 voxel-level cut-off). The number below each map (Z) represents axial coordinates in MNI space.

the same syllable. Thus, as children learn Chinese characters, they are required to spend a great deal of time repeatedly copying single characters (Tan et al., 2005a,b). By writing, children learn to decode Chinese characters into a unique pattern of strokes. This orthographic knowledge facilitates the formation of connections among orthographic, phonological, and semantic components of the written Chinese characters (Tan et al., 2005a). When entering elementary school, deaf signers also learn by repeatedly copying characters. Thus, the major difference between the deaf signers and the hearing controls is auditory speech input before learning to read. Due to the lack of early speech experience, CD showed less activation in left hemisphere language regions (i.e., inferior frontal gyrus), whereas they showed greater activation in right hemisphere regions including inferior frontal and inferior parietal cortex during both the rhyming and meaning tasks. CD's engagement of homologous regions of the right hemisphere may be a byproduct of their lack of early speech experience that plays a pivotal role for subsequent learning of written Chinese characters in hearing individuals.

The deaf individuals recruited in the current research primarily used a different language, i.e., Chinese Sign Language, compared to the hearing controls. Increasing evidence shows that reading in deaf people may rely on access to brain networks involved in sign language processing. Behavioral studies have shown that signs were active during written word processing for deaf individuals (Shand, 1982; Morford et al., 2011), and that the sign language translations of written words were activated even when a task did not explicitly require the use of sign language (Morford et al., 2011). Moreover, deaf readers are more likely to become successful readers when they bring a strong sign language foundation to the reading process (Mayberry et al., 2011). Evidences from functional imaging studies found deaf individuals exhibited strong activation not only in left classical language areas but also in right homologous regions including inferior frontal gyrus and/or inferior parietal lobule when processing sign language (Soderfeldt et al., 1997; Bavelier et al., 1998; Neville et al., 1998; Emmorey et al., 2002; Fang and He, 2003; Capek et al., 2008). Taken together, the deaf individuals may rely on sign language mechanisms for skilled reading.

Both CD and HC showed involvement in left triangular/orbital part of inferior frontal gyrus during the rhyming and meaning tasks. Previous studies that suggest the ventral portion of inferior frontal gyrus (orbital and triangular parts) is involved in semantic processing (Poldrack et al., 1999; Friederici et al., 2000; Booth et al., 2006). In addition, compared to hearing individuals, deaf individuals showed decreased activation in this region for both tasks. The reduced activation in ventral inferior frontal gyrus for CD compared to HC may indicate their ineffective retrieval and selection of semantic representations. It is possible that this reduced activation is due to deaf individual's poorer lexicalsemantic skills, as show in previous studies (Green and Shepherd, 1975; Marschark et al., 2004; Ormel et al., 2010). However, we did not find that the group difference in left ventral inferior frontal gyrus was larger for the meaning task compared to rhyming task, so future studies are needed to investigate the specific role of ventral inferior frontal gyrus in deaf reading.

We also showed that CD had greater activation than HC in the right triangular part of inferior frontal gyrus for both the rhyming and meaning task. The triangular part of inferior frontal gyrus is thought to be involved in controlled retrieval and selection of phonology (Fiez et al., 1999; Cao et al., 2010). Similar patterns were also shown in previous reading study of deaf individuals (Aparicio et al., 2007). Greater activation in right inferior frontal gyrus may reflect that deaf individuals resort to the right hemisphere for controlled retrieval and selection of phonology. There is another possible interpretation for this compensatory recruitment of right inferior frontal gyrus in deaf individuals. The left inferior frontal gyrus is activated during rhyming judgments, especially for difficult conditions, in hearing individuals (Bitan et al., 2007). Moreover, the activation of inferior frontal gyrus increases with age in hearing individuals, which may be associated with phonological segmentation and covert articulation (Bitan et al., 2007). Therefore, the greater activation in right inferior frontal gyrus during phonological processing in deaf individuals may be due to compensatory recruitment of articulation processes (Aparicio et al., 2007; MacSweeney et al., 2008, 2009). However, we did not show that group differences in the engagement of right inferior frontal gyrus were larger for the rhyming compared to the meaning task, so future studies should examine the specific role of this right inferior frontal cortex in deaf reading.

Additionally, CD showed greater activation than HC in right inferior temporal gyrus for the rhyming and meaning tasks. The left ventral occiptotemporal cortex is involved in the perception of written alphabetic (Cohen et al., 2000; Vinckier et al., 2007) and Chinese words (Bolger et al., 2005; Tan et al., 2005b), while Chinese reading also elicits activation of the right ventral occiptotemporal cortex (Bolger et al., 2005; Tan et al., 2005b). Chinese characters are comprised of strokes packed into square shape, and therefore the character's spatial arrangement requires holistic and visual-spatial processing (Xue et al., 2005), which requires the engagement of right visual cortex (Warschausky et al.,


*H, hemisphere; L, left; R, right; B, bilateral; BA, Brodmann's Area. All coordinates p < 0.05 AlphaSim corrected (p < 0.005 voxel-level cut-off).*

#### **Table 3 | Comparisons between congenitally deaf individuals (CD) and hearing controls (HC) for the meaning task.**


*H, hemisphere; L, left; R, right; B, bilateral; BA, Brodmann's Area. All coordinates p < 0.05 AlphaSim corrected (p < 0.005 voxel-level cut-off).*

1996). Previous studies have also revealed that deaf individuals show a right hemisphere advantage when judging whether a word corresponds to the sign, whereas hearing controls show a reverse advantage (Ross et al., 1979). Thus, the increased activation in right ventral occiptotemporal cortex may reflect that deaf individuals used more holistic information to accomplish the reading task.

CD also showed greater activation than HC in right angular gyrus and inferior parietal lobule for the rhyming and meaning tasks. Further analysis found that CD who had better performance during the rhyming task showed less activation in right inferior parietal lobule. This result is compatible with previous research in alphabetic word reading which found right inferior frontal gyrus was only activated in less-proficient deaf individuals but not in proficient ones (Corina et al., 2013). The left inferior parietal system is activated during phonological processing of Chinese characters (Tan et al., 2003). This inferior parietal system appears to be involved in temporarily storing phonological information in working memory (Ravizza et al., 2004). Thus, this inferior parietal system may maintain phonological information to accomplish the reading tasks (Tan et al., 2005b). The greater activation in right inferior parietal system may reflect that deaf individuals resort to right hemisphere to temporarily store phonological information to accomplish the reading task, and the brain behavior correlations potentially indicate that CD who more efficiently use the right hemisphere to store phonological information are better readers. In addition, CD also showed greater activation than HC in right inferior parietal lobule for the perceptual tasks (Supplementary Table 2). Thus, the CD might use the right inferior parietal system to temporarily store information to accomplish the corresponding task.

Finally, CD showed greater activation than HC in right middle frontal gyrus (*x* = 38, *y* = 38, *z* = 20, BA9) on the rhyming minus meaning contrast. Specifically, CD showed greater activation in the rhyming task compared to the meaning task in the right middle frontal gyrus, whereas there were comparable activations across tasks for HC. The left middle frontal gyrus is thought to be specialized for Chinese reading (Perfetti et al., 2005; Tan et al., 2005b). This region has been consistently activated during Chinese reading in hearing adults in various tasks (Tan et al., 2001; Chee et al., 2004; Kuo et al., 2004; Booth et al., 2006; Cao et al., 2009). It has been argued that this area is responsible for addressed phonology in Chinese reading (Tan et al., 2005b). It is interesting to note that Chinese dyslexics exhibited less activation in BA9 in the left hemisphere compared to controls (Siok et al., 2004). Consistent with the findings in dyslexia, Cao et al. (2009) found that lower accuracy children showed reduced activation in BA9 in the left hemisphere. However, in the current study, CD showed stronger activation than HC in right middle frontal gyrus for the rhyming as compared with the meaning task. This finding suggests deaf readers may resort to alternative cognitive mechanisms to overcome their deficits in phonological processing. Previous studies have found that the engagement of right hemisphere regions is homotopic to the left language network in left-hemispheric brain lesions and callosal agenesis (Staudt et al., 2002; Riecker et al., 2007). Consequently, deaf readers might recruit right BA9 to integrate visual orthographic information with addressed phonology. Moreover, our finding is compatible with a previous alphabetic study, which found that deaf individuals showed increased right inferior frontal gyrus activation, and interpreted this as reflecting greater demands on grapheme-to-phoneme conversion (Aparicio et al., 2007).

In conclusion, the use of both a rhyming and a meaning task in the current study allowed us to find: (1) CD showed less

**Table 4 | Negative correlations between better reading performance and lower signal intensity within the congenitally deaf individuals for the rhyming task.**


*H, hemisphere; R, right; BA, Brodmann's Area. All coordinates p < 0.05 AlphaSim corrected (p < 0.005 voxel-level cut-off).*

CD showed greater activation than HC in right middle frontal gyrus for the rhyming as compared with the meaning task, suggesting greater recruitment of right hemisphere for phonological processing in CD; and (3) CD who had better performance on the rhyming task showed less activation in right inferior parietal cortex, potentially indicating that CD that more efficiently use the right hemisphere for phonological storage are better readers.

# **ACKNOWLEDGMENTS**

This work was supported by National Key Basic Research Program of China (2014CB846102) and grants from National Nature Science Foundation of China (NSFC, 31170969, 81171016, 30870757). This work was also supported by grants from the National Institute of Child Health and Human Development (HD042049) to James R. Booth.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00211/abstract

# **REFERENCES**


Xue, G., Dong, Q., Chen, K., Jin, Z., Chen, C., Zeng, Y., et al. (2005). Cerebral asymmetry in children when reading Chinese characters. *Brain Res. Cogn. Brain Res.* 24, 206–214. doi: 10.1016/j.cogbrainres.2005.01.022

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 July 2013; accepted: 26 March 2014; published online: 16 April 2014. Citation: Li Y, Peng D, Liu L, Booth JR and Ding G (2014) Brain activation during phonological and semantic processing of Chinese characters in deaf signers. Front. Hum. Neurosci. 8:211. doi: 10.3389/fnhum.2014.00211*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Li, Peng, Liu, Booth and Ding. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Opposite effects of visual and auditory word-likeness on activity in the visual word form area

#### *Philipp Ludersdorfer <sup>1</sup> \*, Matthias Schurz 1, Fabio Richlan1, Martin Kronbichler 1,2 and Heinz Wimmer <sup>1</sup>*

*<sup>1</sup> Centre for Neurocognitive Research and Department of Psychology, University of Salzburg, Salzburg, Austria <sup>2</sup> Neuroscience Institute, Christian-Doppler-Clinic, Paracelsus Medical University Salzburg, Salzburg, Austria*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Cheryl Grady, University of Toronto, Canada Jianfeng Yang, Chinese Academy of Sciences, China*

#### *\*Correspondence:*

*Philipp Ludersdorfer, Centre for Neurocognitive Research and Department of Psychology, University of Salzburg, Hellbrunnerstrasse 34/II, 5020 Salzburg, Austria e-mail: philipp.ludersdorfer@ sbg.ac.at*

The present fMRI study investigated the effects of word-likeness of visual and auditory stimuli on activity along the ventral visual stream. In the context of a one-back task, we presented visual and auditory words, pseudowords, and artificial stimuli (i.e., false-fonts and reversed-speech, respectively). Main findings were regionally specific effects of word-likeness on activation in a left ventral occipitotemporal region corresponding to the classic localization of the Visual Word Form Area (VWFA). Specifically, we found an inverse word-likeness effect for the visual stimuli in the form of decreased activation for words compared to pseudowords which, in turn, elicited decreased activation compared to the artificial stimuli. For the auditory stimuli, we found positive word-likeness effects as both words and pseudowords elicited more activation than the artificial stimuli. This resulted from a marked deactivation in response to the artificial stimuli and no such deactivation for words and pseudowords. We suggest that the opposite effects of visual and auditory word-likeness on VWFA activation can be explained by assuming the involvement of visual orthographic memory representations. For the visual stimuli, these representations reduce the coding effort as a function of word-likeness. This results in highest activation to the artificial stimuli and least activation to words for which corresponding representations exist. The positive auditory word-likeness effects may result from activation of orthographic information associated with the auditory words and pseudowords. The view that the VWFA has a primarily visual function is supported by our findings of high activation to the visual artificial stimuli (which have no phonological or semantic associations) and deactivation to the auditory artificial stimuli. According to the phenomenon of cross-modal sensory suppression such deactivations during demanding auditory processing are expected in visual regions.

**Keywords: fMRI, neuroimaging, one-back task, word-likeness, word processing, VWFA, orthographic representations**

# **INTRODUCTION**

Since the advent of neuroimaging a multitude of studies has shown that the left ventral occipitotemporal cortex (vOT) plays a crucial role in reading (for reviews see Jobard et al., 2003; Vigneau et al., 2006; Price, 2012). Specific attention has been given to a region at around *y* = −58 (MNI space) in a series of studies by L. Cohen, S. Dehaene and collaborators who reasoned that it hosts abstract representations of letter strings and, hence, referred to it as Visual Word Form Area (VWFA; Cohen et al., 2000, 2002; Dehaene et al., 2001). However, different from its name, the VWFA was assumed to be only involved in sub-lexical processing of legal letter sequences and not in whole-word recognition (Dehaene et al., 2002). This VWFA hypothesis stimulated critical debate (Price and Devlin, 2003, 2011) and a large number of imaging studies investigated the functional properties of this region. Dehaene et al. (2005) extended the original VWFA hypothesis by proposing hierarchical stages of visual word recognition along the left ventral visual stream from detection of letter fragments in V2 to recognition of recurring letter strings and short words in anterior vOT (*y* = −48) with in-between stages of letter and letter sequence coding. Support for the hierarchical model was provided by Vinckier et al. (2007) who presented words and letter strings of decreasing similarity to words (frequent quadrigrams, frequent bigrams, frequent letters, infrequent letters and false-fonts) and found increasing response selectivity along the ventral stream with absent differentiation in posterior regions and a marked differentiation in anterior regions. Specifically, in anterior regions (including the VWFA) activation was highest for words and lowest for strings of infrequent letters and false-fonts.

Our research group contributed to the VWFA debate by proposing that vOT regions may not only be involved in sublexical coding as assumed by the VWFA hypothesis, but also provide visual-orthographic whole-word recognition, similar to other anterior vOT regions which are engaged by recognition of visual objects or faces (see Kronbichler et al., 2004). This proposal was based on our finding that a parametric increase in word frequency from pseudowords to high frequency words was accompanied by a systematic decrease of VWFA activation. Further studies tried to control for the confound between orthographic familiarity and phonological/semantic familiarity. Therefore, we compared familiar to unfamiliar spellings (i.e., pseudohomophones) of the same phonological words (e.g., TAXI vs. TAKSI) in a phonological lexical decision task ("Does XXX sound like a word?" with YES to both familiar and unfamiliar spellings) and found decreased VWFA activation to the familiar spellings (Kronbichler et al., 2007, 2009). Similar inverse effects of orthographic familiarity on VWFA activation (i.e., words *<* pseudohomophones) were found in several other studies (Bruno et al., 2008; van der Mark et al., 2009; Twomey et al., 2011).

An interesting extension of inverse orthographic familiarity effects on vOT activation was recently provided by Wang et al. (2011) using a one-back repetition recognition task. The authors investigated the effects of word-likeness with stimuli ranging from familiar Chinese characters to unfamiliar artificial characters. They found higher vOT activation to artificial Chinese characters compared to pseudo-characters (composed of two semantic and/or phonetic components) which, in turn, elicited higher activation compared to existing Chinese characters. Interestingly, this inverse relation between word-likeness and activation was present from posterior (*y* = −80) to anterior (*y* = −48) regions of the left ventral visual stream.

The inverse effect of word-likeness on brain activation found by Wang et al. stands in contrast to the findings of Vinckier et al. who found the exact opposite pattern. To further investigate this discrepancy, a first goal of the present study was to investigate effects of word-likeness of visual stimuli along the ventral visual stream, especially in left vOT. Therefore, similar to Wang et al., we presented familiar (German) words, pseudowords and unfamiliar artificial stimuli (false-font strings) in the context of a one-back task. In cognitive terms, words can be considered to be more familiar than pseudowords due to the presence of orthographic word representations (with associations to phonology and meaning) and pseudowords are more familiar than false-font strings due the presence of representations for letter identities and for recurrent letter sequences within words. The question was whether the large extension of the inverse word-likeness effect of Wang et al. would also be found with alphabetic stimuli. Specifically unexpected would be a posterior extension of the decreased activation for familiar words compared to pseudowords. This finding would imply that visual recognition for words is already present in posterior vOT regions. In our previous studies—mentioned above decreased activation to familiar words compared to unfamiliar letter strings (pseudohomophones and pseudowords) typically only emerged in anterior vOT regions and only these were assumed to be involved in orthographic whole-word recognition.

The main novel aspect of the present study was the extension of our stimuli set to auditory stimuli also varying in word-likeness (i.e., familiar words, pseudowords and unfamiliar reversed-speech stimuli). The inclusion of auditory stimuli is of interest in relation to the controversy around the question whether the VWFA—as implied by its name—is indeed a region primarily (or even exclusively) engaged by visual processes. In support of this assumption, Dehaene et al. (2002) showed that, compared to a rest baseline, the VWFA exhibited marked activation in response to visual but not to auditory words. However, a study by Price et al. (2003) found increased activation to auditory words compared to an auditory control condition. From this and similar findings (e.g., Booth et al., 2002). Price and Devlin (2003) inferred that the VWFA potentially is a "polymodal" region. More recently, a study by Reich et al. (2011)—similar to a previous by Buchel et al. (1998)—found that the VWFA was engaged by Braille reading in congenitally blind subjects. Based on these findings, Dehaene and Cohen (2011) recently referred to the VWFA as a "meta-modal" reading area. However, this conclusion may be premature. In congenitally blind people, regions dedicated to visual processing may be taken over by other sensory information such as touch. In sighted readers, the VWFA may still be a region primarily engaged by visual processing.

Specifically interesting for the VWFA's modality-specificity is the phenomenon of "cross-modal sensory suppression" (Laurienti et al., 2002; Mozolic et al., 2008). According to this, demanding auditory processing can be expected to result in negative activation (compared to rest baseline) in regions primarily engaged by visual processes. Indeed, a recent study by Yoncheva et al. (2010) provided support for negative activation of the VWFA to auditory stimulation. This study presented a sequence of two tone patterns which had to be matched. This tonematching task served as control condition for a rhyme matching task in which two auditory words were presented. In accordance with cross-modal sensory suppression, the tone-matching task resulted in negative activation (compared to rest baseline) in bilateral occipital and OT regions including the VWFA. The auditory words presented for rhyme matching also resulted in deactivation of bilateral OT regions with the exception of the VWFA where activation was increased (close to rest) compared to tonematching. A possible explanation is that, during the presentation of auditory words, the VWFA was exempt from cross-modal sensory suppression due to activation of visual word representations associated with the auditory words. This interpretation would be in line with behavioral studies which found rhyming judgments to be prone to misleading "orthographic intrusions" (Seidenberg and Tanenhaus, 1979).

The present auditory stimuli differed markedly from those of Yoncheva et al. as we presented only single auditory items (words, pseudowords) and the present one-back task may be less prone to activating associated orthographic word representations compared to the rhyme-matching task of Yoncheva et al. Therefore, it was of interest whether the result pattern of Yoncheva et al. will be replicated. Furthermore, a main advantage of the present study is that the activation of left vOT regions in response to the visual stimuli can be topographically related to those in response to the auditory stimuli. If the expected increased activation to the auditory words (compared to negative activation for the artificial stimuli) in vOT regions is indeed resulting from orthographic information associated with the auditory words, then this activation cluster should roughly coincide with the expected visual orthographic familiarity effect (i.e., visual words *<* pseudowords).

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Twenty-nine (11 female) participants aged 19–48 years (*M* = 24*.*3 years) took part in the experiment. All participants were German speaking university students, had normal or correctedto-normal vision, and had no history of neurological disease or learning disability. The study was approved by the ethical committee of the University of Salzburg. All participants gave written informed consent and were paid for participation.

#### **STIMULI AND PROCEDURE**

Stimuli consisted of visually and auditorily presented words, pseudowords and modality-specific artificial stimuli (false-fonts and reversed-speech, respectively). In total, each participant viewed 40 items (36 and 4 repetitions) of each of the categories. Initially, lists of 72 items were generated for the words and pseudowords. The words were all German nouns, one half were names for tools (e.g., "Hammer"), and the other half names for animals (e.g., "Zebra"). Word length varied from four to eight letters, and from one to three syllables. Mean word frequency was 4.59 per million (based on the CELEX database; Baayen et al., 1993). The pseudowords matched the words in number of letters, number of syllables, bigram frequency, and number of orthographic neighbors. Counterbalanced across participants, one half of the words and pseudowords were presented visually and the other auditorily. Artificial visual stimuli were generated by presenting the visual words in false-fonts. Artificial auditory stimuli were constructed by time-reversing the auditory word recordings.

In the scanner, subjects performed a one-back repetition task and were instructed to press a button with their right index finger whenever they viewed two identical stimuli in succession. Responses were considered as hits when they occurred before presentation of the next stimulus. An absence of response during this period was considered as a miss. All other responses were considered as false alarms.

Auditory stimuli were spoken by a male voice and presented via MR-compatible headphones. Visual stimuli were presented in yellow on a dark grey background. The visual display was projected by a video beamer (located outside the scanner room) on a semi-transparent screen, and viewed by the participants via a mirror mounted above their heads. An MR-compatible button box was used for the participants to respond. Projection and timing of the stimuli, as well as the recording of responses, was controlled by Presentation (Neurobehavioral Systems Inc., Albany, CA, USA).

Presentation of the items was divided into two functional imaging runs, with each run containing both visual and auditory stimuli as well as 20 null-events during which a fixation cross presented in the middle of the screen. A fast event-related design was used to investigate the hemodynamic response to the different types of stimuli. The order of items and null-events within each run was determined by a genetic algorithm (Wager and Nichols, 2003). Stimulus onset asynchrony (SOA) was 3800 ms. Visual stimuli were presented for 820 ms and the average duration of the auditory words was 808 ms, ranging from 600 to 1150 ms. During presentation of auditory stimuli, as well as during the interstimulus intervals, a fixation cross was present. The fact that the SOA is not a multiple of the used TR (2000 ms) enhances the efficiency of

sampling the hemodynamic response at different time points. The total duration of the functional session was approximately 20 min (10 min per run).

#### **IMAGE ACQUISITION AND ANALYSIS**

A 3-Tesla TRIO TIM Scanner (Siemens, Erlangen, Germany) was used for both functional and anatomical MR imaging. For the functional runs, images sensitive to blood oxygenation level dependent (BOLD) contrast were acquired with a T2\*-weighted echo-planar imaging (EPI) sequence using a 32 channel head coil. In each run consisted of 309 functional images (Flip angle = 70◦, *TR* = 2000 ms, *TE* = 30 ms, FOV = 210 mm, 64 × 64 matrix). Thirty-six descending axial slices (thickness = 3.0 mm; inter-slice gap = 0.3 mm) were acquired. In addition, for each subject a high-resolution structural scan was acquired with a T1 weighted MPRAGE sequence. The resolution of the structural image was 1 × 1 × 1.2 mm.

Preprocessing and statistical data analysis was performed using SPM8 software (http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm) running in a MATLAB 7.6 environment (Mathworks Inc., Sherbon, MA, USA). Functional images were realigned, unwarped, and then co-registered to the high-resolution structural image. The structural image was normalized to the MNI T1 template image (using SPM's segmentation procedure). The resulting parameters were used for normalization of the functional images, which were resampled to isotropic 3 × 3 × 3 mm voxels and smoothed with a 6 mm full width at half maximum (FWHM) Gaussian kernel.

The functional data were high-pass filtered with a cut-off of 128 s, as removing frequencies below 1/128 Hz reduces low frequency drifts. For correction of temporal autocorrelations an AR (1) model (Friston et al., 2002), as implemented in SPM8, was used. Statistical analysis was performed within a two stage mixed effects model. On the individual level, the parameter estimates reflecting signal change for each item type vs. rest (= null events and ISIs) were calculated. Item repetitions were separately modeled as regressors of no interest. Additionally, six covariates corresponding to the movement parameters (rotations and translations) were included. The subject-specific contrast images were used for the second level (random effects) analysis, which allows generalization to the population. For all statistical comparisons we used a voxel-wise threshold of *p <* 0*.*001, and a cluster extent threshold of *p <* 0*.*05, corrected for family-wise error (FWE).

# **RESULTS**

#### **BEHAVIORAL**

Response latencies for the detected repetitions in the one-back task (see **Table 1**) were entered in an ANOVA with the factors modality (visual and auditory) and stimulus type (words, pseudowords, and artificial). **Table 1** shows that there were longer response latencies for the auditory than the visual stimuli, *F(*1*,* <sup>28</sup>*)* = 42*.*73, *p <* 0*.*05. This is, however, not surprising since participants had to wait until the auditory stimulus presentation was completed in order to make a correct decision. Stimulus type had no reliable effect, *F(*2*,* <sup>56</sup>*)* = 1*.*48, *p* = 0*.*24, and did not interact with modality, *F(*2*,* <sup>56</sup>*) <* 1.

Task accuracy was generally high as shown by the mean hit and false-alarm rates in **Table 1**. Across stimulus types hit rates for the visual stimuli were on average lower than those for the auditory stimuli, *F(*1*,* <sup>28</sup>*)* = 4*.*30, *p <* 0*.*05. There was neither a reliable effect of stimulus type, *F(*2*,* <sup>56</sup>*)* = 1*.*07, *p* = 0*.*35, nor an interaction between the factors, *F(*2*,* <sup>56</sup>*) <* 1. However, as shown in **Table 1**, for the visual artificial stimuli the mean hit rate was lower than the rate for the other stimuli (but still at about 80%). In addition, false-alarm rates were generally very low, only the words (visual and auditory) had a slightly increased rate.

#### **fMRI**

Given our theoretical focus on ventral visual regions, especially left vOT, we limited our main fMRI analyses to anatomical regions involved in ventral visual stream processing: inferior occipital, inferior temporal, and fusiform gyrus of both

**Table 1 | Means (standard deviations) of response latencies and accuracy measures (hit- and false-alarm rates) in the one-back task.**


hemispheres as defined by automatic anatomical labeling (AAL; Tzourio-Mazoyer et al., 2002). Results of additional whole-brain analyses are provided in the Supplementary Materials.

#### *Visual stimuli*

Initial contrasts against rest baseline showed that all three types of visual stimuli elicited widespread positive activations throughout ventral occipital and temporal regions (see upper row of **Figure 1A**). Of main interest were the contrasts corresponding to the expected inverse effect of word-likeness. As mentioned in the Introduction, following Wang et al. (2011) we expected reduced activation for pseudowords compared to the artificial stimuli and, following our previous findings of orthographic familiarity effects (Kronbichler et al., 2004, 2007), we expected reduced activation for words compared to pseudowords.

The findings presented in **Table 2** and in the lower row of **Figure 1A** provide support for these expectations. The pseudowords *<* artificial contrast identified large bilateral clusters including occipital and OT regions with the right hemisphere cluster being more extended than the one in the left hemisphere. The maxima of the clusters were both in posterior inferior temporal regions (at around *y* = −60), but the decreased activation to pseudowords compared to artificial strings was already present in posterior occipital regions. Similar results were found for the words *<* artificial contrast. Furthermore, the contrast of words *<* pseudowords, different from the extended bilateral clusters identified by the previous contrasts, only identified a left vOT cluster. As shown in **Table 2**, the maximum of this cluster roughly corresponds to sites of previously found orthographic familiarity effects (Kronbichler et al., 2004, 2007). We did not identify any

**FIGURE 1 | Brain activations for visual (A) and auditory (B) contrasts of interest.** The upper row depicts contrasts against rest baseline. Positive and negative activations are depicted in red and blue, respectively. The lower row depicts results of contrasting the stimulus types with each

other. Activation maps are shown on the ventral surface of a rendered cortex. All contrasts are thresholded at *p <* 0*.*001 voxel-wise with a cluster extent threshold of *p <* 0*.*05 (FWE corrected). LH, left hemisphere; RH, right hemisphere.

**Table 2 | Visual stimuli: brain regions showing inverse word-likeness effects.**


*Peak Region is based on AAL (Tzourio-Mazoyer et al., 2002); k: cluster extent in voxel.*

**Table 3 | Auditory stimuli: Brain regions showing positive word-likeness effects.**


*Peak Region is based on AAL (Tzourio-Mazoyer et al., 2002); k: cluster extent in voxel.*

regions within our mask which exhibited positive word-likeness effects, that is, more activation for words or pseudowords compared to the artificial stimuli or more activation for words than pseudowords.

#### *Auditory stimuli*

Following the cross-modal sensory suppression phenomenon (Laurienti et al., 2002), we expected negative activation (compared to rest baseline) in response to auditory stimuli in ventral occipital and temporal regions assumed to be primarily engaged by visual processing. Additionally, following Yoncheva et al. (2010), there should be a release of deactivation for auditory words in a left vOT region corresponding to the VWFA.

Contrasts against rest baseline identified several clusters within ventral visual regions exhibiting negative activation in response to the auditory stimulus types (see upper row of **Figure 1B**). No cluster was identified with positive activation. Roughly similar clusters with deactivation to all auditory stimuli were found bilaterally in posterior occipital (at around *y* = −80) and medial anterior OT regions (at around *y* = −45) as well as in a right middle OT region (at around *y* = −64). In line with our expectations, in left middle OT, only the auditory artificial stimuli resulted in deactivation (at around y = −67) whereas no such deactivation was found for words and pseudowords. The contrasts between stimulus types (see **Table 3** and lower row of **Figure 1B**) further confirmed the positive word-likeness effects in left middle OT as words and pseudowords elicited significantly more activation than the artificial stimuli. No clusters were identified with higher activation for the artificial stimuli compared to pseudowords or words and no activation differences were found between words and pseudowords.

#### *Region of interest analysis*

For more information on changes of activation and deactivation patterns along the ventral visual stream, we relied on Region of Interest (ROI) Analysis. We selected five regions along the left ventral stream based on activation maxima of an "effects of interest" contrast (i.e. comparing all stimulus types against rest). The selected maxima approximately matched ROI locations used in Vinckier et al. (2007) and Wang et al. (2011) including one at *y* = −58 that matched classic VWFA coordinates (Cohen et al., 2000, 2002). Furthermore, to examine hemisphere differences, we selected five right hemispheric maxima, which corresponded to the left hemisphere ones on the *y*-axis. Spherical ROIs with a radius of 4 mm were built around the selected maxima (see **Figure 2** for approximate ROI locations). For all ROIs, mean brain activity estimates (given in arbitrary units) were extracted. For statistical analyses, activation levels of stimulus types were compared using paired *t*-tests (*p <* 0*.*01).

As evident from **Figure 2**, there were varying activation patterns for the visual stimulus types along the left ventral stream. In the left occipital ROI (at *y* = −94), there was no difference between activation levels for the three visual stimulus types. This changed more anteriorly as in the posterior OT ROI (*y* = −67) words and pseudowords led to decreased activation compared to the artificial stimuli without differing from each other. The middle OT ROI at *y* = −58 (roughly corresponding to the classic VWFA) was the only region exhibiting an inverse visual word-likeness effect on activation levels in the form of words *<* pseudowords *<* artificial stimuli. This inverse wordlikeness pattern was no longer observed in the most anterior left hemisphere ROI because activation for the artificial stimuli was much reduced (close to rest baseline). Here, activation for pseudowords was increased compared to both words and the artificial stimuli.

In the posterior ROIs of the right ventral stream, activation levels for the visual stimulus types were similar to those of the corresponding left ROIs, that is, no differences between all three stimulus types in the occipital ROI and reduced activation to words and pseudowords compared to the artificial stimuli in the posterior OT ROI. However, activation patterns differed in the two anterior right ROIs as, in contrast to the left anterior ROIs, there was no differentiation between words and pseudowords and no increased activation in response to pseudowords compared to the artificial stimuli.

**Figure 2** shows that activation levels of the auditory stimuli were less differentiated than those of the visual stimuli. In the posterior ROIs of the left hemisphere, all stimulus types led to similar levels of deactivation. In the ROI corresponding to the VWFA, however, there was a positive word-likeness effect since deactivation was only present for the artificial stimuli but not for words or pseudowords which elicited reliably higher activation. This

difference between words/pseudowords and the artificial stimuli was also present in the most anterior left ROI but here even the artificial stimuli did not elicit deactivation. The main finding with respect to hemisphere differences was that in the right hemispheric ROI, homologous to the VWFA, the reduced deactivation to words and pseudowords compared to the artificial stimuli was not found.

#### *Conjunction analysis*

Next, we investigated the hypothesis that the found positive word-likeness effects for the auditory stimuli (i.e., words and pseudowords *>* artificial) reflect activation of visual orthographic information associated with the auditory words

and pseudowords. Therefore, we tested whether the auditory effects coincide with the visual words *<* pseudowords effect. Since there was no activation difference between auditory words and pseudowords in the previous analyses, we collapsed the data and computed an auditory words/pseudowords *>* artificial contrast. The resulting activation cluster had a peak at MNI coordinates [−45 −58 −11] (*t* = 4*.*78; cluster extent = 105 voxels). A conjunction analysis (Friston et al., 2005; Nichols et al., 2005) showed a substantial overlap between the auditory words/pseudowords *>* artificial and the visual words *<* pseudowords cluster with a peak at [−45 −58 −11] (*t* = 4*.*53) and a cluster extent of 44 voxels (see **Figure 3**).

**words/pseudowords** *>* **artificial effect and the visual words** *<* **pseudowords (***p <* **0***.***001 voxel-wise threshold with a cluster extent threshold of** *p <* **0***.***05, FWE corrected).** The activation cluster is superimposed on the ventral surface of a standard brain template.

#### **DISCUSSION**

#### **INVERSE EFFECT OF WORD-LIKENESS OF VISUAL STIMULI ON LEFT vOT ACTIVATION**

We identified regionally specific differentiations between our three types of visual stimuli (i.e., words, pseudowords and artificial stimuli) along the ventral visual stream (most evident from the ROI analysis). In bilateral occipital regions (*y* = −94), all three stimulus types elicited similarly high activation. A first differentiation was found bilaterally in posterior OT ROIs (*y* = −67) where both words and pseudowords elicited decreased activation compared to the artificial stimuli and did not differ from each other. In the right hemisphere this general pattern (although with decreasing activation levels) ranged to the anterior OT ROI (*y* = −46). A further differentiation was limited to the left hemisphere. The ROI at *y* = −58 exhibited decreased activation to words compared to pseudowords in addition to reduced activation to pseudowords compared to the artificial stimuli. Reduced activation to words compared to pseudowords was also observed in the most anterior left ROI (*y* = −46), but here different from the posterior ROIs—activation to the artificial stimuli was close to baseline.

These findings differ from those of the Chinese-based study of Wang et al. (2011) who found an inverse word-likeness effect (i.e., real characters *<* pseudo-characters *<* artificial characters) in several left hemisphere ROIs from *y* = −80 to –48. Our corresponding finding (i.e., words *<* pseudowords *<* artificial) was regionally specific to middle OT at *y* = −58 corresponding to the VWFA. Possibly, the larger extension of the contrast between Chinese real characters and pseudo-characters compared to the present words and pseudowords is due to a difference in visual familiarity. For this, one may note that the number of alphabetic letters is quite limited compared to the number of phonetic and semantic components of the Chinese script. Consequently, alphabetic words and pseudowords are visually more similar (largely sharing the same letters) compared to the Chinese characters and pseudo-characters. Based on this, Liu et al. (2008) suggested that recognition of the fine-grained Chinese characters potentially not only relies on the VWFA but also recruits posterior occipital regions bilaterally.

Importantly, the finding of reduced activation for words compared to pseudowords in a region corresponding to the VWFA is in line with the position put forward by our group that the VWFA hosts orthographic whole-word recognition units which assimilate whole letter strings (e.g., Kronbichler et al., 2004, 2007) but is also able to code letter strings of pseudowords which require sequential coding. The most direct evidence for these differing modes of processing in the VWFA is the finding by Schurz et al. (2010) who found that word length (i.e., number of letters) had an effect on VWFA activation for pseudowords but not for words. Further strong evidence for whole-word coding in the VWFA is the finding of a priming study by Glezer et al. (2009) who showed that, for real words, the exchange of a single letter from prime to target (e.g., from COAT to BOAT) led to disappearance of the priming effect, whereas the corresponding manipulation for pseudowords (e.g., from SOAT to FOAT) had little effect. More direct support for the hypothesis that the VWFA hosts orthographic representations comes from studies examining brain activation when participants have to retrieve spellings in response to auditory words (Purcell et al., 2011).

# **ACCOUNTING FOR THE OPPOSING EFFECTS OF VISUAL WORD-LIKENESS ON VWFA ACTIVATION IN THE LITERATURE**

The present finding of an inverse effect of visual word-likeness on VWFA activation is just the opposite of the finding of Vinckier et al. (2007) who—as discussed in the Introduction—found a positive word-likeness effect. In the literature, there are several findings consistent with the present results. As already discussed, Wang et al. (2011) had found an inverse word-likeness effect in a more extended region along the left visual stream. Xue et al. (2006) reported higher VWFA activation to unfamiliar Korean characters compared to familiar Chinese characters and Reinke et al. (2008) found higher VWFA activation for unfamiliar Hebrew words compared to known English words. However, similar to Vinckier et al., a recent study by Szwed et al. (2011) also reported decreased VWFA activation for unfamiliar scrambled words compared to intact words, although this effect was already present in early visual areas. Importantly, these patterns were only observed in the left but not in the right visual stream.

This striking discrepancy in results between studies may find an explanation in theoretically interesting procedural differences. In contrast to the present study (and the studies reporting similar results), Vinckier et al. and Szwed et al. relied on short and rapid presentation. While the present study presented the visual stimuli for 802 ms with an SOA of 3800 ms, the presentation time of Vinckier et al. was only 100 ms with an SOA of 300 ms. One may reason that both the positive (e.g., Vinkier et al.) and the inverse word-likeness effects (present study) may reflect the same neurofunctional source, that is, the operation of wholeword recognition units (Kronbichler et al., 2004). Specifically, the fast presentation rates of Vinckier et al. and Szwed et al. may have resulted in the neuronal equivalent of the behavioral "word superiority" effect (e.g., Reicher, 1969), that is, for the letter strings of the words reduced bottom-up stimulation was still sufficient to activate whole-word codes in the VWFA which may have provided backward activation to low-level coding in posterior regions – in the case of Szwed et al. even to V1/V2. Obviously, no wholeword activation was possible for the false-font strings or the scrambled patterns resulting in diminished activation compared to words in anterior regions. The slow stimulus presentation rate of the present study together with the one-back task may have been responsible for the opposite activation pattern, that is, the inverse word-likeness effect. Specifically, the prolonged stimulus presentation allowed rather detailed coding of the artificial strings in order to set-up short-term memory representations for repetition recognition. Presumably, much less attention to the visual information was required for the letter strings of words which, in turn, provide access to the phonological words which can be easily retained for repetition recognition.

The discrepancy in the literature regarding VWFA activation for artificial visual strings also deserves a comment. To recapitulate, while Vinckier et al. and Szwed et al. found low activation levels for artificial or scrambled words, the present study—similar to other studies (e.g., Reinke et al., 2008)—found high activation. In general, these opposing results speak for the position that VWFA activation strongly depends on task setting and procedural characteristics (e.g., Starrfelt and Gerlach, 2007; Mano et al., 2013). Specifically, the high VWFA activation observed in the present study may reflect the high effort invested in encoding unfamiliar complex visual stimuli and setting-up memory representations for subsequent recognition as required by the one-back task.

One may also note that the high engagement of the VWFA for processing of the artificial strings is unexpected from the VWFA hypothesis which assumes that the region is functionally specialised for the coding of orthographic stimuli (Dehaene and Cohen, 2011). However, our findings are in line with the "neuronal recycling" hypothesis of Dehaene and Cohen (2007) which assumes that the functionally specialized VWFA emerges at a cortical location that is optimally suited for the demands of visual word processing as it is specifically tuned to dense line patterns (contours, junctions as in T L K). This expertise originally evolved for visual object recognition and is "recycled" in the course of learning to read for processing script type line patterns. In this line of reasoning, the high VWFA response to unfamiliar word-like visual patterns reflects neuronal assemblies which are critically engaged in the early phase of learning to read. We suggest that, if required, the VWFA in adult readers with a long reading history can still function as a coding system adequate for encoding and short-term representation of unfamiliar visual configurations similar to words, i.e. false-fonts.

#### **POSITIVE EFFECTS OF WORD-LIKENESS OF AUDITORY STIMULI ON LEFT vOT ACTIVATION**

Another main finding of the present study were the different activation patterns for the auditory stimulus types in ventral visual regions. Our expectation of general deactivations were based on the hypothesis of cross-modal sensory suppression (Laurienti et al., 2002) which states that regions primarily engaged by visual processes should exhibit negative activation (compared to rest baseline) in response to demanding auditory stimuli. In line with this expectation, deactivation to all auditory stimuli was found from posterior occipital (*y* = −94) to middle OT regions (*y* = −58) in the right hemisphere and from occipital to posterior OT regions (*y* = −67) in the left hemisphere. However, a regionally specific activation pattern emerged in a left middle OT region (*y* = −58) roughly corresponding to the VWFA. Here, marked deactivation was only present for the auditory artificial stimuli but not for words and pseudowords. This VWFA response pattern is in line with previous findings of Yoncheva et al. (2010) who had found deactivation only in response to pairs of tone-triplets but not for pairs of spoken words. The correspondence of the present finding with Yoncheva et al. is remarkable as the present auditory stimuli and our task differed markedly from Yoncheva et al. A main new finding of the present study is the substantial topographical overlap between the positive word-likeness effect for the auditory stimuli and the visual orthographic familiarity effect (i.e., words *<* pseudowords) in a cluster at *y* = −58. This overlap is expected when auditory words and pseudowords trigger orthographic information.

# **MODALITY-SPECIFICITY**

As mentioned in the Introduction, our findings may also contribute to the question whether the VWFA—as suggested by its name—is primarily engaged by visual processes. This assumption was questioned by Price and Devlin (2003) who characterized the region as "polymodal" and more recently by Reich et al. (2011) who characterized it as "metamodal". For this issue, the activation of the middle OT cluster (at *y* = −58 and corresponding to the VWFA) in response to the visual and auditory artificial stimuli is of specific interest. First, the high activation in response to the visual artificial stimuli cannot be related to ("polymodal") language processes since these stimuli cannot activate phonology and/or meaning. Second, the deactivation in response to the auditory artificial stimuli can be taken to stand for the region's primarily visual role since the phenomenon of cross-modal sensory suppression (Laurienti et al., 2002) predicts that regions primarily engaged by visual processes should exhibit negative activation (compared to rest baseline) in response to demanding auditory stimuli. Taken together, these findings suggest that the VWFA belongs to brain regions primarily engaged by visual processes.

For the modality issue it is of interest that both the visual words *<* pseudowords and the auditory words/pseudowords *>* artificial effect extended to a more anterior region (*y* = −46). However, in this region, the high activation for the visual artificial and the deactivation for the auditory artificial stimuli (both present at *y* = −58) were no longer present with activation levels close to baseline. Apparently, this region is not engaged by visual orthographic processing, but is responsive to phonological processing demands. This area potentially corresponds to the lateral inferior multimodal area (Cohen et al., 2004) or a basal temporal language area (Luders et al., 1991).

# **CONCLUSION**

In the present fMRI study we investigated effects of wordlikeness of visual and auditory stimuli on activation in ventral visual brain regions. In the context of a one-back task, we presented visual and auditory words, pseudowords, and artificial stimuli (false-fonts and reversed speech, respectively). The main findings were regionally specific effects of word-likeness in left vOT, in a region closely corresponding to the classic localization of the VWFA. More precisely, we observed an inverse wordlikeness effect on activation for the visual stimuli (i.e. words *<* pseudowords *<* artificial stimuli) and positive word-likeness effects for the auditory stimuli (i.e. words and pseudowords *>* artificial stimuli). The latter resulted from a deactivation in response to the auditory artificial stimuli which was absent for words and pseudowords. We reason that the opposite effects of visual and auditory word-likeness on VWFA activation can be explained by the theoretical position that the VWFA hosts visual orthographic memory representations (Kronbichler et al., 2004, 2007). For the visual stimuli, these representations reduce the coding effort as a function of word-likeness. This results in highest activation for unfamiliar visual artificial stimuli, less activation for pseudowords and lowest activation for familiar

# **REFERENCES**


words for which corresponding orthographic representations exist. For the auditory stimuli, higher activation for words and pseudowords compared to the artificial stimuli may result from activation of orthographic information associated with auditory words and pseudowords. The observed activation levels in response to the visual and auditory artificial stimuli also contribute to the dispute around the modality-specificity of the VWFA. First, we found high VWFA activation for the visual artificial stimuli. This cannot be explained by assuming that the VWFA is involved in general language processing since the artificial stimuli do not have phonological or semantic associations. Second, we found marked deactivation for the auditory artificial stimuli. According to the phenomenon of cross-modal sensory suppression (Laurienti et al., 2002) such deactivations during demanding auditory processing are found in visual regions. Taken together, these results speak for a primarily visual role of the VWFA.

#### **ACKNOWLEDGMENTS**

This work was supported by the Austrian Science Foundation Grant P18832-B02 to Heinz Wimmer; Philipp Ludersdorfer was supported by the Doctoral College "Imaging the Mind" of the Austrian Science Foundation (FWF-W1233).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Human\_ Neuroscience/10.3389/fnhum.2013.00491/abstract

gyrus. *Neuroreport* 13, 321–325. doi: 10.1097/00001756-200203040- 00015


T., et al. (2008). The visual word form area: Evidence from an fMRI study of implicit processing of chinese characters. *Neuroimage* 40, 1350–1361. doi: 10.1016/j.neuroimage.2007.10.014


*Brain Lang.* 86, 272–286. doi: 10.1016/S0093-934X(02)00544-8


*Neuroimage* 56, 330–344. doi: 10.1016/j.neuroimage.2011.01.073


likeness in a one-back task. *Neuroimage* 55, 1346–1356. doi: 10.1016/j.neuroimage.2010.12.062


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 02 August 2013; published online: 29 August 2013.*

*Citation: Ludersdorfer P, Schurz M, Richlan F, Kronbichler M and Wimmer H (2013) Opposite effects of visual and auditory word-likeness on activity in the visual word form area. Front. Hum. Neurosci. 7:491. doi: 10.3389/fnhum. 2013.00491*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Ludersdorfer, Schurz, Richlan, Kronbichler and Wimmer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# From regular text to artistic writing and artworks: Fourier statistics of images with low and high aesthetic appeal

#### *Tamara Melmer 1, Seyed A. Amirshahi 1,2, Michael Koch1,2, Joachim Denzler <sup>2</sup> and Christoph Redies <sup>1</sup> \**

*<sup>1</sup> Experimental Aesthetics Group, Institute of Anatomy I, University of Jena School of Medicine, Jena University Hospital, Jena, Germany <sup>2</sup> Computer Vision Group, Department of Computer Science, Friedrich Schiller University, Jena, Germany*

#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Shozo Tobimatsu, Kyushu University, Japan Arnold J. Wilkins, University of Essex, UK*

#### *\*Correspondence:*

*Christoph Redies, Experimental Aesthetics Group, Institute of Anatomy I, Jena University Hospital, Teichgraben 7, D-07743 Jena, Germany. e-mail: redies@mti.uni-jena.de*

The spatial characteristics of letters and their influence on readability and letter identification have been intensely studied during the last decades. There have been few studies, however, on statistical image properties that reflect more global aspects of text, for example properties that may relate to its aesthetic appeal. It has been shown that natural scenes and a large variety of visual artworks possess a scale-invariant Fourier power spectrum that falls off linearly with increasing frequency in log-log plots. We asked whether images of text share this property. As expected, the Fourier spectrum of images of regular typed or handwritten text is highly anisotropic, i.e., the spectral image properties in vertical, horizontal, and oblique orientations differ. Moreover, the spatial frequency spectra of text images are not scale-invariant in any direction. The decline is shallower in the low-frequency part of the spectrum for text than for aesthetic artworks, whereas, in the high-frequency part, it is steeper. These results indicate that, in general, images of regular text contain less global structure (low spatial frequencies) relative to fine detail (high spatial frequencies) than images of aesthetics artworks. Moreover, we studied images of text with artistic claim (ornate print and calligraphy) and ornamental art. For some measures, these images assume average values intermediate between regular text and aesthetic artworks. Finally, to answer the question of whether the statistical properties measured by us are universal amongst humans or are subject to intercultural differences, we compared images from three different cultural backgrounds (Western, East Asian, and Arabic). Results for different categories (regular text, aesthetic writing, ornamental art, and fine art) were similar across cultures.

#### **Keywords: writing systems, calligraphy, ornamental art, abstract artworks, spatial frequency, scale invariance, experimental aesthetics**

#### **INTRODUCTION**

During the last decades, the spatial characteristics of letters and their influence on readability and letter identification have been studied in considerable detail (Pelli et al., 2006; Chung and Tjan, 2007; Tyler and Likova, 2007; Chung and Tjan, 2009). A particular focus has been on the spatial frequency components and spacing requirements that facilitate letter identification and improve readability and reading comfort (Solomon and Pelli, 1994; Majaj et al., 2002; Wilkins et al., 2007; Nandy and Tjan, 2008; Oruc and Landy, 2009; Jainta et al., 2010). The statistical image properties that relate to more global aspects of text images, for example properties that may relate to aesthetic aspects of writing systems, have received less attention (for an example, see Wilkins, 1995). Readability and aesthetics are two independent aspects of writing. Artistic writing with intricate ornaments or deformed letters may be highly aesthetic but it can sometimes be difficult to read, for example, Chinese cursive script. In contrast, ordinary (non-artistic) typographic writing is easy to read but may not necessarily be aesthetic.

It has recently been shown that a large variety of aesthetic and other visually pleasing images, including art images of Western and Eastern provenance (Graham and Field, 2007; Redies et al., 2007b; Graham and Redies, 2010) and graphic novels (Koch et al.,

2010), exhibit specific statistical properties in their Fourier spectra: with increasing frequency, radially averaged (1d) power falls off according to a power law with a slope of around −2 in loglog plots, i.e., the power spectrum displays 1/*f* <sup>2</sup> characteristics. This property implies that subsets of visually pleasing images possess a scale-invariant structure in the Fourier domain. Aesthetic images share this property with complex natural scenes (Burton and Moorhead, 1987; Field, 1987; Tolhurst et al., 1992). Vice versa, images that deviate from natural scene statistics can induce visually discomfort (Fernandez and Wilkins, 2008; Juricevic et al., 2010; O'Hare and Hibbard, 2011). Because the mammalian visual system is adapted to process natural scenes with an efficient and sparse sensory code (Olshausen and Field, 1996; Parraga et al., 2000; Vinje and Gallant, 2000; Hoyer and Hyvärinen, 2002; Simoncelli, 2003), it has been proposed that artists create aesthetic images by adapting their artworks to this type of sensory coding in the human visual system (Redies, 2007; Redies et al., 2007a; Graham and Redies, 2010).

It remains unclear, however, whether other types of visual patterns that are produced by humans for viewing in everyday life, also possess 1/*f* <sup>2</sup> characteristics. Examples are visual patterns that are created without obvious aesthetic intent, such as regular text. In the present study, we therefore compared the statistical properties of regular text with a set of monochrome graphic art analyzed previously (Redies et al., 2007b). Although regular text is not necessarily aesthetic, it may also be adapted or optimized to particular aspects of visual perception, as suggested previously (Wilkins, 1995; Changizi and Shimojo, 2005; Changizi et al., 2006; Jainta et al., 2010). It is therefore of interest to study the statistical properties of text images, not only with respect to local properties, such as the readability of individual letters and words (see above), but also in terms of the global appearance of text images, as suggested previously in a preliminary study on the Fourier spectrum of two examples of Japanese calligraphy and regular print (Ozawa, 1994).

The action of reading is likely to differ from viewing artworks because reading typographic text encompasses the deciphering of a linear code with semantic content (for example, reading line by line from left to right, or top to bottom). In contrast, viewing artworks is much less constrained and the composition of artworks allows the free exploration of global image structure, as shown by eye tracking studies (Wooding et al., 2002; Quiroga and Pedreira, 2011). The differences in viewing strategies likely correspond to differences in statistical image properties. For example, subsets of aesthetic art images were shown to have Fourier spectral properties that are rather uniformly distributed across image orientations (i.e., low anisotropy; Koch et al., 2010). In contrast, images of text are highly anisotropic due to the oriented structure of the lines of text. Moreover, the spatial frequency amplitude that corresponds to the distance between text lines can be expected to be high compared to other frequencies. In the present study of the Fourier spectra of text images, we therefore studied cardinal (horizontal and vertical) and oblique orientations separately to obtain 1d spectra, rather than radially averaging across all orientations together, as usually done when analyzing natural scenes and aesthetic images (see above).

To more closely define the differences in statistical properties between text images and aesthetic art images, we included image categories at the transition between regular text and aesthetic art. Specifically, we studied (1) images of artistic or aesthetic writing (ornamental writing, calligraphy), and (2) aesthetic images that are similar to text images in that they are composed of multiple, largely independent pictorial elements placed side-by-side (ornamental art and abstract expressionist art).

Note that in current experimental research on aesthetics, there is no universally accepted or independent measure for the degree to which an image is aesthetic or artistic. By using these terms for different categories of images in the present study, we rely on views that are commonly held by the general public. Our classifications may well be in conflict with the opinions of individual persons. For example, on the one hand, typographers might claim that the creation of typographic letters is a highly artistic endeavor whereas most of the general public will not regard images of regular printed text as artworks. On the other hand, individual viewers may consider highly acclaimed artworks, such as the drip paintings by Jackson Pollock, unaesthetic. This terminological uncertainty is reflected in the many views on art and aesthetics that abound in philosophy and art history. Consequently, the usage of these terms in the present study should be treated with caution.

Moreover, in view of the great variety of writing styles in different cultures, it was unclear whether any of the statistical properties measured in the present study are universal amongst humans (Changizi and Shimojo, 2005) or are subject to intercultural differences. We therefore compared images from three different cultural backgrounds (Western, East Asian, and Arabic).

Results from the present study contribute to the knowledge on the relationship between the statistical properties of text images and their perceptual processing by the human visual system. By comparing text images to other types of images produced by humans, the present findings contribute also to our understanding of what makes text images special to the human brain.

# **MATERIALS AND METHODS**

#### **IMAGE DATA**

The data analyzed in this study include image databases of regular and ornate text of diverse cultures of writing, calligraphy, artistic ornaments of three cultural backgrounds (Western, Arabic, and East Asian), and East Asian and Western fine art (**Table 1**). A total of 1611 images were analyzed.

#### *Regular print, handwriting, ornate print, and calligraphy*

For the analysis of text images, we scanned the largest possible square section comprising eight lines of monochrome original text. Care was taken to select original print samples reproduced at a high quality and at a size that was sufficiently large. Scanning was performed at a high resolution (400 dpi) in 8-bit gray scale with a scanner (Perfection 3200 Photo, Epson, Nagano, Japan) that was calibrated as described previously (Redies et al., 2007a). Subsequently, resizing each image to 1024 × 1024 pixels reduced the resolution. For each category of text, different original documents were used. The number of sections taken from one original document ranged from about 1 to 4.

For Latin serif and sans serif fonts, a sample text was set in 77 serif fonts and in 60 sans serif fonts with the Photoshop program (Adobe, Mountain View, CA). For international serif fonts, 119 examples of the same text from different writing systems were generated with the Photoshop program, including samples from Europe (Latin, Georgian, Kyrillic, Greek), the Middle East (Hebrew, Arabic), North America (Cherokee), North India (Devanagari, Gujarati, Nagari, Oriya), South India (Sinhala, Tamil, Telugu), South East Asia (Laotian, Khmer, Thai), Africa (Ethiopian), and the Far East (Chinese, Japanese, Korean). By the same method, examples of ornate print (117 Latin samples, 13 Chinese samples, 80 Arabic samples) were generated.

To assess differences within one font, we generated 30 samples each of different text passages that were set in Times New Roman font (Latin serif), Arial font (Latin sans serif), and a Georgian font (serif font), respectively. Moreover, we analyzed digital images directly for analysis without printing and scanning. As a control, we compared the digital images with the same images printed on paper and scanned as described above. The differences between the different types of images were small (data not shown).

For Latin and Carolingian handwriting, examples were scanned from two books on these subjects (Menz, 1912; Klemm, 1998). We also gathered 39 different examples of Arabic calligraphy by scanning three different books on the subject. The

#### **Table 1 | Slope values for art images and text images.**


*(Continued)*

#### **Table 1 | Continued**


background of the images was rendered white by subtracting it in the Photoshop program. As a control, we also analyzed the original scans of the same images. Results reveal differences in the low-frequency range, most likely caused by the paper structure. However, these differences have only a minor influence on the slopes (data not shown).

From reproductions in various textbooks on East Asian calligraphy, 92 examples of Chinese calligraphy representing different styles and periods were scanned. Some of the images showing eight lines of text were not square due to a smaller size of the original artworks in the other direction (fewer columns of text). Before their reduction to 1024 × 1024 pixels, the images were padded according to square ones with the MatLab program by adding a uniform border with a gray level that was equal to the mean gray level of the scanned image, as described previously (Redies et al., 2007a).

# *Ornaments*

For Western grotesque ornaments, 69 different samples were scanned from a textbook (Warncke, 1979), as described above. With a digital camera (Canon, Ixus 400), 78 photographs of East Asian ornaments were taken from samples of 17th and 18th century Chinese and Japanese porcelain (vases, pots, and dishes) that were on display at the Dresden Porcelain Collection in Dresden, Germany. Square details from the objects were analyzed (for an example, see **Figure 13A**). Photographs of Arabic ornaments (253 images) were taken with a digital camera (Canon EOS 500D) from interior and exterior wall reliefs of the Alhambra Palace complex in Grenada, Spain. The palace represents an example of 14th century Moorish architecture (for an example, see **Figure 12A**).

#### *Western and East Asian fine art and abstract expressionism*

For Western art, a previously analyzed dataset of 200 examples of monochrome graphic art of Western provenance was used (Redies et al., 2007b). For East Asian (China, Japan, and Korea) fine art, 209 images of monochrome paintings were collected. Both datasets were scanned from diverse high-quality art books on the subject, as described above, and represented a large variety of graphic styles, subject matters, techniques, centuries, and artists. Largest possible square details from the artworks were analyzed. One-hundred nineteen examples of monochrome Abstract Expressionist art by four artists (32 paintings by Jackson Pollock, 18 paintings by Jean Dubuffet, 59 paintings by Cy Twombly, and 10 paintings by Christian Dotremont) were scanned from artbooks. Images were padded according to square ones, as described above. To render the images more similar to those of printed text, the background was subtracted from the images. The calculated slopes did not differ substantially between the original scanned images and the images after background subtraction (data not shown). As another control, we compared largest possible details of the East Asian art images to padded versions of the same images. Again, differences were small (data not shown).

#### **IMAGE ANALYSIS**

#### *Radial averaging of Fourier power*

Image analysis was performed using Matlab. If required, images were resized to a resolution of 1024 × 1024 pixels by bicubic interpolation using the *imresize* function in Matlab. For each image, the power spectrum (amplitude squared) was obtained by using an efficient algorithm for computing the discrete Fourier transform (2d Fast Fourier Transform). The 2d Fourier power spectrum of each image (for example, see **Figure 1B**) was divided into eight equal sectors (**Figure 1C**). For each sector, the 2d spectrum was transformed to a 1d power spectrum by rotational averaging for each frequency (Redies et al., 2007b). Values were combined for (near-)horizontal orientations (sector 1 [0–22.5◦] and sector 8 [157.5–180◦]; blue in **Figure 1C**), (near-)vertical orientations (sectors 4, 5 [67.5–112.5◦]; red in **Figure 1C**), and oblique orientations (sectors 2, 3 [22.5–67.5◦] and sectors 6, 7 [112.5–157.5◦]; green in **Figure 1C**). Power was then plotted for horizontal orientations (**Figure 1D**), oblique orientations (**Figure 1E**), and vertical orientations (**Figure 1F**) as a function of spatial frequency in the log-log plane.

For regular print, the resulting plots consisted of two roughly linear parts: a low-frequency part (range 5–40 cycles per image, cpi) with a shallow slope and a high-frequency part (range 40–256 cpi) with a steeper slope. This finding was similar for horizontal, oblique, and vertical orientations. As expected with images of eight horizontal lines of text, a prominent peak at 8 cpi was observed for vertical orientations (**Figure 1F**).

#### *Slope of 1d Fourier plots*

To measure the slopes of the curves in the two parts of the frequency spectrum, data points were binned at regular frequency intervals in the log-log plane and a least-squares fit of a line was performed separately for each of the two parts of the spectrum, as described previously (Redies et al., 2007b). Compared to the fitting to one continuous second-order polynomial function with three parameters, the fitting of two separate lines (with two intercepts and two slopes as parameters) allowed us to relate our present results more directly to previous slope measurements (Burton and Moorhead, 1987; Tolhurst et al., 1992; Graham and Field, 2007; Redies et al., 2007a,b; Koch et al., 2010).

For the high-frequency part, fitting was restricted to frequencies up to 256 cpi to minimize artifacts due to rectangular sampling and raster screen. For the low-frequency part, fitting was restricted to frequencies down to 5 cpi to avoid absent sample points for some orientations and to exclude information that is not of interest (artifacts due to uneven illumination and mean gray level). Moreover, for vertical orientations, values corresponding to the peak at around 8 cpi (7–9 cpi) were not included in the line fitting. As a measure of the goodness of the fit, we determined the mean deviation of the data points from the fitted lines (sigma in **Table 1**). **Table 1** lists the values as means for each image category [±1 standard deviation (SD)].

Because a characteristic difference between images of artworks and text seemed to be the change in the log-log plots of radially averaged Fourier power at around 40 cpi (see Results), we also calculated the difference between the slopes of the low-frequency and high-frequency parts.

# *Anisotropy*

As outlined in the Introduction section, the Fourier power spectrum of text images is likely to differ across orientations (anisotropy) because of the horizontal text lines (or vertical text lines in the case of Chinese writing). To analyze this anisotropy, we determined the average absolute difference between the power values for horizontal orientations (sectors 1 and 8 in **Figure 1C**) and vertical orientations (sectors 4 and 5 in **Figure 1C**) for each image. To calculate the difference, data were sampled at equal frequency intervals in the log-log plots of Fourier power for each image (see, e.g., **Figures 1D**–**F**). Differences were normalized to the mean power for vertical and horizontal orientations for each data point.

# **RESULTS**

**Figures 1**, **2** illustrate the type of Fourier analysis performed in the present study. As an example, the results for images of regular print (Times New Roman serif font, **Figure 1**; Arno Pro serif font, **Figure 2**) are shown. In the Fourier power spectrum (**Figure 1B**), low spatial frequencies are represented at the center and high frequencies at the periphery. Lighter shades represent more spectral power. For each frequency, power was radially averaged in sectors representing cardinal (horizontal and vertical) and oblique orientations (**Figure 1C**) and plotted as a function of spatial frequency in separate log-log plots (**Figures 1D**–**F**).

In contrast to similar plots for images of natural scenes or artworks (Burton and Moorhead, 1987; Field, 1987; Tolhurst et al., 1992; Graham and Field, 2007; Redies et al., 2007a,b), the plots for regular text can be roughly divided into two parts (**Figure 2A**). In the low-frequency part of the spectrum, the average curves for the cardinal and oblique orientations are more shallow; the plot for vertical orientations (red curve in **Figure 2A**) contains a major peak at about 8 cpi that corresponds to the periodicity of text lines in the images, as expected. To visualize this low-frequency part of the spectrum for the reader, a bandpass-filtered (5–40 cpi) representation is displayed in **Figure 2B**; it shows a blurred version of the text image. In the high-frequency part, the curves are steeper and fall-off linearly in the log-log plots. In the bandpassfiltered representation of the high-frequency part (40–256 cpi; **Figure 2C**), the outlines of all letters are sharply demarcated. The two parts of the curves form a transition at around 40 cpi for images with eight lines of text. In control images with four lines and 16 lines of Latin printed text, the transition is shifted to

**FIGURE 2 | Results for Latin printed text (Arno Pro font).** The plot in **(A)** shows an overlay of the plots for horizontal, oblique, and vertical orientations, as indicated (see **Figure 1C**). Band pass-filtered images are displayed for 5–40 cpi in **(B)**, and for 40–256 cpi in **(C)**.

about 20 and 80 cpi, respectively (data not shown), suggesting that the transition point is found at a spatial frequency about five times higher than the frequency peak that reflects the number of lines.

#### **ANISOTROPY**

**Figure 3** shows the mean difference between power values for vertical and horizontal orientations (anisotropy) for all image categories. Results are arranged with subjective artistic claim increasing from left to right, from regular print and handwriting to ornate print, ornamental art, calligraphy, and artworks. As a word of caution, however, we note that a concept like artistic claim is difficult to quantify and may be subject to various philosophical and art historical considerations (see Introduction). Results are presented separately for the low-frequency part (**Figure 3A**) and the high-frequency part of the spectrum (**Figure 3B**).

As expected, anisotropy values are high for regular print in both parts of the spectrum. Similarly high values are obtained for ornate print and for Carolingian handwriting that resembles

regular print in its uniform stroke width and regular letter alignment. Values are lower for fine art (*p <* 0*.*001), confirming previous results (Koch et al., 2010), and for East Asian ornaments (porcelain decorations) and Arabic ornaments (wall decorations) (*p <* 0*.*001). Significances of the differences between the different image categories were determined by the Tukey range test throughout this work. For the low-frequency part of the spectrum (**Figure 3A**), values for Latin handwriting, calligraphy, and grotesque ornaments are intermediate between artworks and regular print (*p <* 0*.*001). For the high-frequency part (**Figure 3B**), these image categories are about as lower or higher than those of fine art.

#### **SLOPE MEASUREMENTS**

To quantify the steepness of the curve in the low-frequency part (5–40 cpi) and in the high-frequency part (40–256 cpi), we calculated the slopes of straight lines that were fitted to the curves in the two ranges for each image (for examples, see **Figures 1D**–**F**). We also measured the differences between the slopes of the highfrequency and low-frequency parts (see Materials and Methods section). For a continuous straight line through both ranges, this difference assumes a value of 0. For each category of images, the two slopes and their difference are listed in **Table 1** for the two cardinal (vertical and horizontal) and the oblique orientations. Moreover, to assess how well these straight lines fitted the curves, the deviation (sigma) of the curves from the fitted lines was also determined. In the following sections, we will describe the results for each of the image categories that were analyzed in the present study.

# *General overview*

Like in **Figure 3**, results are arranged with artistic claim increasing from left to right in **Figure 4**. In this direction, the slope in the low-frequency part (5–40 cpi; **Figures 4A**–**C**) becomes more negative and approaches values between around −2 and −2.5. The slope in the high-frequency part assumes less negative values and approaches similar slope values (40–256 cpi; **Figures 4D**–**F**). As a result, the difference between the two slopes decreases with increasing artistic claim (**Figures 4G**–**I**) and approaches values of about 0. These general tendencies are similar for all three orientations ranges.

# *Regular print*

To study whether different categories of regular printed text result in similar graphs, we carried out the same type of analysis for multiple Latin fonts (serif and sans serif) as well as examples of regular print of other provenances (Arabic, Chinese, and other international fonts). An example of each type of font is displayed in **Figures 5A**,**D**,**G**. Averaged curves for all fonts within one category (one example for each font) are shown in **Figures 5B**,**E**,**H** and averaged curves for 30 examples of one font are displayed in **Figures 5C**,**F**,**I**. The within-font variance was similar or less than the between-font variance (data not shown).

For all plots in **Figure 5**, the steepness of the curve changes at around 40 cpi, similar to the example shown in **Figure 1**. The mean slopes for the low-frequency part (−0.78 to 0.19) indicate that power is constant or falls less strongly with increasing frequency than for the high-frequency part (mean slopes between −3.71 and −2.91). The slope difference assumes mean values between 2.23 and 3.81 (**Table 1**; **Figure 4**).

#### *Artworks*

To quantify the expected difference between regular print and artworks, we carried out the same type of analysis for two different datasets of artwork images, namely 200 examples of European graphic fine art (Redies et al., 2007b; Graham and Field, 2008), and 209 examples of East Asian monochrome paintings and prints. **Figures 6A**, **7A** show examples of the original images analyzed, together with their 2d Fourier power spectra (**Figures 6B**, **7B**). With increasing spatial frequency, power falls nearly linearly according to a power law (1/*f* <sup>2</sup> characteristics) for all orientations (**Figures 6C**, **7C**). The band-pass filtered images that correspond to the image in **Figure 6A** are shown in **Figures 6D**,**E**. Confirming previous studies (Redies et al., 2007b; Graham and Field, 2008), there is no abrupt transition at 40 cpi between the slopes of the low-frequency part and the high-frequency part.

The results listed in **Table 1** confirm that the slopes of art images is more negative (*p <* 0*.*001; Tukey range test) than images of regular print and assume average values around −2 for the low-frequency part of the spectrum (range −1.76 to −2.20). In the high-frequency part, the slope is less negative (*p <* 0*.*001) with average values between −1.86 and −2.67, depending on the orientations. The slope difference is around 0 (i.e., close to a straight line; range −0.34 to 0.86) and differs from printed text (*p <* 0*.*001; **Table 1**, **Figures 1**–**5**).

# *Handwriting*

Next, we asked whether samples of handwritten text share some of the features observed for regular print. We analyzed two types of historical Latin handwriting. **Figure 8** show results from a Carolingian manuscript (10th century; **Figures 8A**–**C**) and for handwritten Latin text from the Reformation period (dated 1528; **Figures 8D**–**F**), respectively. For Carolingian writing, overall results are similar to those of regular print. Similar to regular print, Carolingian handwriting is characterized by a uniform stroke width and a regular vertical alignment and spacing of the letters. For Latin handwriting, slopes are more negative in the low-frequency part (range −1.02 to −1.15; *p <* 0*.*001) and the slope difference is smaller for horizontal and oblique orientations than for regular print (2.45 and 2.30, respectively; *p <* 0*.*001; **Table 1**; **Figure 4**).

# *Ornate print and calligraphy*

We also analyzed writing systems with artistic claim. Firstly, we analyzed ornate fonts of three different cultural provenances (Latin, Arabic, and Chinese). Results for Latin and Arabic ornate print are shown in **Figure 9**. Secondly, we analyzed Arabic and Chinese cursive calligraphy. Typically, the cursive Arabic letters have long ascenders and descenders, while cursive Chinese pictograms are characterized by quick, fluent brushwork. **Figure 10** shows results for Chinese calligraphy.

Similar to the samples of Latin handwriting, slope values for the horizontal direction (range −0.67 to −0.99) and vertical direction (range −0.44 to −1.25) are smaller than for regular

print (*p <* 0*.*001) and higher than those for aesthetic artworks (*p <* 0*.*001) in the low-frequency part of the spectrum. An exception is Arabic ornate print with values that are similar to regular print for the slopes of the high-frequency part and the slope difference. Results for the other orientations and parts of the spectrum are similar to those of regular print.

spectrum. The average differences of the slope values are shown in **(G–I)**.

# *Ornamental art*

The arrangement of letters in regular text without artistic claim can be described as a sequence of largely independent pictorial elements. In contrast, artworks are characterized by a more global composition, in which individual pictorial elements relate to each other throughout the entire image (see Introduction). We next studied images of ornamental art (Western, Arabic, and East Asian), which consist also of repetitive pictorial elements arranged in a global structure (**Figures 11**–**13**).

The mean log-log plots for the three datasets are characterized by a more gradual rather than an abrupt transition of Fourier power at around 40 cpi (**Figures 11C**, **12C**, **13C**). Results for Western grotesque ornaments (**Figure 11**) and ornamental Arabic art (wall decorations; **Figure 12**) are similar to those of ornate print and calligraphy in general. For the low-frequency part of the spectrum, the slope values for ornamental paintings on East Asian porcelain (**Figure 13**) are lower than for Western and Arabic ornamentals (*p <* 0*.*001) and resemble those of aesthetic artworks (**Figures 4A**–**C**). The opposite tendency is observed for the high-frequency part of the spectrum (**Figures 4D**–**F**; *p <* 0*.*001). Here, slope values for East Asian decorative art are similar to those of regular print and lower than those of Western and Arabic decorative art (*p <* 0*.*001). For all orientation ranges, slope differences (**Figures 4G**–**I**) are equal or higher (*p <* 0*.*001) for ornamental art than for images of ornamental writing or calligraphy.

**(C,F,I)**. Values represent the mean ± 1 SD.

**FIGURE 5 | Results for images of regular print.** Results for Latin serif fonts **(A–C)**, Latin sans serif fonts **(D–F)**, and international fonts **(G–I)** are shown. Exemplary images are displayed in the left column **(A,D,G)**. Radially averaged Fourier power is plotted as a function of spatial frequency in the log-log plane

in **(B,C,E,F,H,I)**. The middle column **(B,E,H)** displays mean results for one sample of multiple fonts. The right column **(C,F,I)** displays mean results for 30 samples of a single font. The colors of the plots represent the different orientations (see **Figure 1C**).

#### *Abstract expressionism*

Finally, particular styles of abstract art can be described as an arrangement of similar pictorial elements embedded in a global image structure, similar to ornamental art. In the present work, we analyzed paintings by Abstract Expressionist artists (Jackson Pollock, Jean Dubuffet, Cy Twombly, and Christian Dotremont). Examples of the images cannot be shown for copyright reasons.

The mean log-log plot for the Abstract Expressionist dataset (not shown) is similar to the curves for fine art. The slope values in the low-frequency range are around −1.8 (**Table 1**; **Figures 4A**–**C**), which is only slightly lower that the value for European and Asian fine art and East Asian porcelain decorations (around −2.1, *p <* 0*.*001), but lower (*p <* 0*.*001) than the mean value for artistic Western and Arabic ornaments (around −1.0). In the high-frequency part (**Figures 4D**–**F**), however, the curve for abstract expressionism assumes slope values of around −2.6. This is lower (*p <* 0*.*001) than the value for fine art (−1.9). The slope differences (**Figures 4G**–**I**) are around 0.74–0.86, compared to 1.34–2.04 for artistic ornaments (*p <* 0*.*001) and −0.34 −0.19 for fine art (*p <* 0*.*001).

#### **DISCUSSION**

In the present work, we compared image statistics of ordinary text and different categories of images with artistic claim. Humans

create all these images for viewing by humans. As a consequence, the images may exhibit statistical properties that reflect sensory adaption to the human visual system (Changizi et al., 2006; Graham and Redies, 2010). However, the aesthetic appeal and artistic intent of the image categories differ (see Introduction). Our results reveal that, in general, specific statistical properties vary with the artistic claim of the images.

image in **(A)** are shown for 5–40 cpi in **(D)**, and for 40–526 cpi in **(E)**.

#### **IMAGES OF REGULAR PRINT ARE NOT SCALE-INVARIANT**

It can be expected that aesthetic artworks and regular text differ in their Fourier power spectra (see Introduction). In the present work, we provide a systematic study of this assumption and

quantify the differences by applying a computer-based algorithm for measuring statistical image properties. For aesthetic artworks, radially averaged spectral power falls off roughly linearly according to a power law (1/*f <sup>p</sup>* characteristics) with increasing spatial frequency in log-log plots; the mean slope value [*p*] is about −2 (Graham and Field, 2007; Redies et al., 2007a,b; Graham and Redies, 2010). This result implies that the Fourier spectrum is scale-invariant. Artworks share this property with other types of aesthetically pleasing images (for example, graphic novels and comics; Koch et al., 2010) and with images of complex natural scenes (see Introduction). Fractal-like structure was also found in particular types of music (Voss and Clarke, 1975; Beauvois, 2007), architecture (Joye, 2007), and American sign language (Bosworth et al., 2006). Unlike artworks, regular print has a steeper slope in the high-frequency part of the power spectrum (value of about −3.5) while the slope of the low-frequency part is shallower (value of about −1.2, **Table 1**, **Figure 4**). The difference in the slope values between the three orientation ranges (horizontal, oblique, and vertical) is small. Similar findings for regular handwritten text suggest that, in images of regular text, power in the low-frequency part of the spectrum is relatively low

spatial frequency in the log-log plane in **(C)**. The colors of the plots

represent the different orientations (see **Figure 1C**).

spectrum in **(B,E)**. Mean radially averaged Fourier power for the 132 samples of

The colors of the plots represent the different orientations (see **Figure 1C**).

when compared to artworks, with the exception of a prominent peak at 8 cpi (see above). Because low spatial frequencies represent coarse structure in an image and high spatial frequencies represent fine detail, this result implies that images of ordinary text tend to contain a lower amount of global image structure than the artworks analyzed. In the artworks, the higher amount of global structure may represent a physical correlate of artistic composition, which relates individual pictorial elements to each other across the image (see Introduction).

It has been argued that all images produced by the human hand, including artworks, generally possess scale-invariant properties for reasons related to the nature of hand movements (e.g., see Graham and Field, 2008). The present findings indicate that not all hand-made images are scale-invariant in the Fourier domain. The result that images of handwriting contain a lower amount of global image structure is not surprising because global image structure is not important for text, which is scanned word-by-word in a consecutive manner. In addition, even the high-frequency part of the Fourier power spectra of text is not scale-invariant. We conclude that humans can manufacture images that may or may not be scale-invariant, depending on their purpose.

Studies on artificial images revealed that manipulations of the Fourier spectrum can elicit visual discomfort if a significant deviation from scale invariance is induced (Fernandez and Wilkins, 2008; Juricevic et al., 2010; O'Hare and Hibbard, 2011). In particular, images with a curvilinear amplitude spectrum and an energy upshot at about 3–4 cycles per degree, i.e., close to the position where the visual system is most sensitive, can elicit visual discomfort (Fernandez and Wilkins, 2008). Visual discomfort is often (but not always) negatively associated with artistic merit (Fernandez and Wilkins, 2008). In the present study, curvilinear power spectra are also observed for several of the image categories (for example, images of text, calligraphy, and ornamental art), but the images used by us do not evoke obvious visual discomfort in general. Consequently, it remains unclear how the findings by Fernandez and Wilkins (2008) relate to our present results.

Together, these results suggest that several types of images, which lack scale invariance, are of relatively low aesthetic value. The opposite notion, however, does not hold because images, which possess scale-invariant Fourier spectra, are not necessarily aesthetic (for example, some computer-generated images; Lee and Mumford, 1999). It has therefore been suggested that scale invariance is a corollary of some other (unknown) feature of

aesthetic images that contributes to aesthetic perception (Redies et al., 2007a).

#### **SLOPE DIFFERENCE AND ANISOTROPY CORRELATE WITH ARTISTIC CLAIM**

Images of text are of particular interest for studying aesthetic perception because there is a continuous transition from regular print to aesthetic writing (ornamental writing and calligraphy) and to visual art (see Introduction). Our results (**Table 1**, **Figures 3**, **4**) illustrate that, with increasing artistic claim, images of text acquire specific statistical properties that are similar to those of visual art. As one such measure, we introduced the difference of the slopes between the low- and high-frequency parts of the radially averaged power spectrum. With increasing artistic claim, this difference decreases to values close to 0 (straight line) for all orientation ranges (**Table 1**, **Figure 4**), which indicates a nearly linear fall-off throughout the entire frequency spectrum, similar to what has been observed for natural scenes (1/*f* <sup>2</sup> characteristics). A similar transition between non-aesthetic images and images with artistic claim is observed for the horizontal/vertical power difference (anisotropy measure; **Figure 3**). For images of text, differences between vertical and horizontal

orientations probably reflect the periodicity of the text lines and/or regularities in the width or spacing of the lines that form the letters. Compared to regular text, the Fourier spectrum of fine art represents all orientations at similar strength, as shown previously by Koch et al. (2010) who compared artworks to other categories of images. This result is not trivial because artists could easily produce paintings, in which particular orientations predominate. In how far low anisotropy is necessary or sufficient to induce aesthetic perception—and in which types of fine art—remains to be studied.

In our analysis, we included a special style of art (Abstract Expressionism) that resembles text images in its repetitive arrangement of multiple and simple pictorial elements distributed over a large surface area. Nevertheless, the statistical properties of Abstract Expressionist paintings are more similar to those of other art images than to ordinary text, suggesting that they contain a high amount of global structure, similar to fine art. This result is also not trivial because, conceivably, paintings using the same pictorial elements could be produced with a lower amount of global structure. Our results are compatible with the suggestion by R. Taylor and other researchers that abstract expressionist paintings, like the drip paintings by Jackson Pollock, possess fractal-like structure (Taylor, 2002; Mureika, 2005; Alvarez-Ramirez et al., 2008).

Other examples of repetitive structures arranged over a large surface are ornamental decorations, which are also created to be enjoyed by human observers but may perhaps have lesser artistic claim in general. The slope differences for all three types of decorative art are intermediate between those of fine art and ordinary text. The anisotropy values of decorative art are intermediate or closer to fine art, when compared to regular print (**Figure 3**). Intermediate values are also obtained for calligraphy, a writing style with artistic claim. It remains unclear whether other types of images show a similar relation between statistical regularities and artistic claim.

#### **SIMILAR FINDINGS IN TEXT IMAGES OF DIFFERENT CULTURAL PROVENANCE**

To compare Western, Arabic, and East Asian examples of the different categories of text images (**Table 1**, **Figures 3**, **4**), we chose a horizontal approach and compared contemporary text images that include plain and ornate fonts as well as serif and sans serif fonts of different international alphabets. In addition, we chose a vertical approach and analyzed text of different ages (medieval manuscripts, calligraphy, and Reformation handwriting). Overall, we analyzed 11 datasets, each including between 13 and 253 samples (1598 images in total; **Table 1**).

Results from the Fourier analysis were generally similar for the three cultural backgrounds (**Table 1**, **Figures 3**, **4**). In particular, the slope differences were similar for Latin and international serif fonts of regular print, and also for ornate print, calligraphy, and ornamental art from all three cultures. Moreover,

#### **REFERENCES**


those found in objects in natural scenes. *Am. Nat.* 167, E117–E139.


our results confirm that artworks from both East Asian and Western provenance possess similar scale-invariant properties in the Fourier domain (Redies et al., 2007b; Graham and Field, 2008). Together, these results suggest that specific perceptual mechanisms for reading and aesthetic judgment, respectively, may be common amongst humans across different cultural backgrounds. It has been speculated that such common principles may have emerged due to selective pressures imposed by the adaptation of the human visual system to specific perceptual and motor tasks during the evolution of mankind (Changizi and Shimojo, 2005; Changizi et al., 2006; Redies, 2007; Graham and Redies, 2010).

Last but not least, physical features of the visual inputs have been shown to strongly modulate the functional responses in some core regions of the reading network, including for instance the influence of spatial frequency on the activation of the left ventral occipitotemporal cortex (Seghier and Price, 2011; Woodhead et al., 2011; Horie et al., 2012). The kind of image statistics studied here may thus shed some light on how the human brain processes written word stimuli in comparison to other types of stimuli. This topic warrants future studies.

# **ACKNOWLEDGMENTS**

The authors thank Mrs. Lisa Redies for producing and scanning of printed text, Mrs. Julia Braun for assistance with the statistical analysis, and members of the Denzler and Redies groups for constructive suggestions, discussion, and comments on the manuscript. They are grateful to Prof. Dr. Ulrich Pietsch for permission to take photographs at the Dresden Porcelain Collection, Dresden, Germany.


*Teil 4*. Wiesbaden: Dr. Ludwig Reichert Verlag.


frequencies for letter identification in foveal and peripheral vision. *J. Vis.* 8, 3.1–3.20.


natural images. *Ophthal. Physiol. Opt.* 12, 229–232.


performance of an autonomous public eye tracker. *Behav. Res. Methods Instrum. Comput.* 34, 509–517.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 February 2013; accepted: 13 March 2013; published online: 01 April 2013.*

*Citation: Melmer T, Amirshahi SA, Koch M, Denzler J and Redies C (2013) From regular text to artistic writing and artworks: Fourier statistics of images with low and high aesthetic appeal. Front. Hum. Neurosci. 7:106. doi: 10.3389/ fnhum.2013.00106*

*Copyright © 2013 Melmer, Amirshahi, Koch, Denzler and Redies. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Cross-modal integration in the brain is related to phonological awareness only in typical readers, not in those with reading difficulty

# *Chris McNorgan\*, Melissa Randazzo-Wagner and James R. Booth*

*Developmental Cognitive Neuroscience Laboratory, Department of Communication Studies and Disorders, Northwestern University, Evanston, IL, USA*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Nicolas Langer, University Zurich, Switzerland Fabio Richlan, University of Salzburg, Austria*

#### *\*Correspondence:*

*Chris McNorgan, Department of Communication Sciences and Disorders, Northwestern University, 2240 Campus Drive, Evanston, IL 60208, USA e-mail: chris.mcnorgan@ alumni.uwo.ca*

Fluent reading requires successfully mapping between visual orthographic and auditory phonological representations and is thus an intrinsically cross-modal process, though reading difficulty has often been characterized as a phonological deficit. However, recent evidence suggests that orthographic information influences phonological processing in typical developing (TD) readers, but that this effect may be blunted in those with reading difficulty (RD), suggesting that the core deficit underlying reading difficulties may be a failure to integrate orthographic and phonological information. Twenty-six (13 TD and 13 RD) children between 8 and 13 years of age participated in a functional magnetic resonance imaging (fMRI) experiment designed to assess the role of phonemic awareness in cross-modal processing. Participants completed a rhyme judgment task for word pairs presented unimodally (auditory only) and cross-modally (auditory followed by visual). For typically developing children, correlations between elision and neural activation were found for the cross-modal but not unimodal task, whereas in children with RD, no correlation was found. The results suggest that elision taps both phonemic awareness and cross-modal integration in typically developing readers, and that these processes are decoupled in children with reading difficulty.

**Keywords: dyslexia, functional MRI, audiovisual integration, reading development, developmental disorder, learning disability**

Multisensory, or audiovisual, integration of letters and speech sounds is considered a prerequisite to reading development (Share, 1995). Though processing orthographic or phonological linguistic representations clearly involves a wide cortical network (e.g., attention, semantic processing) a sub-network of cortical regions is strongly associated with processing and integrating orthographic and phonologic representations. This network includes the fusiform gyrus (FG), which is implicated in the processing of orthographic representations (Shaywitz et al., 2002; McCandliss et al., 2003; Dehaene and Cohen, 2011), posterior superior temporal gyrus (pSTG), which is implicated in processing phonologic representations, (Demonet et al., 1992; Booth et al., 2002a) and the posterior superior temporal sulcus (pSTS), which is implicated in audiovisual integration across a wide range of domains (Calvert, 2001; Van Atteveldt et al., 2009; Blau et al., 2010). Because reading entails integrating information from these two representational systems, understanding how cross-modal integration operates in normal and disordered reading may provide insight into the root causes underlying reading difficulty.

Converging evidence from event-related potential (ERP) and functional magnetic resonance imaging (fMRI) studies shows that children with dyslexia demonstrate weaker audiovisual integration of letters and speech sounds, suggesting that reading dysfluency may be partly attributable to difficulties in audiovisual integration. For example, ERP studies of letter-sound integration found that deviant letter-sound pairs produced a mismatch negativity effect in dyslexic readers only given a longer time window, similar to younger reading skill-matched but not age-matched control children, indicating a slow maturational component of audiovisual integration (Froyen et al., 2011). A series of pediatric fMRI studies by Blau et al. (2010) also demonstrated enhanced letter-sound integration in audiovisual conditions for fluent readers compared to dyslexic readers. These studies collectively identified an audiovisual integration network for reading including the planum temporale (PT) in the STG and pSTS in which crossmodal activation differentiated typically-developing and dyslexic children. Taken together, these neurophysiological and imaging studies suggest that children with dyslexia demonstrate reduced audiovisual integration of letters and speech sounds.

Previous studies have explored the mechanisms of audiovisual integration at the level of grapheme to phoneme correspondence, or at a small grain size. Given the inconsistency of the English orthography at the smaller grain sizes, large grain sizes play a greater role in early reading development because they provide greater consistency (Ziegler and Goswami, 2005). Fluent reading in English necessitates processing of larger grain sizes (e.g., words, syllables or rimes) because processing of smaller grain sizes utilizing a letter-by-letter decoding strategy will only be successful with words that have consistent grapheme to phoneme correspondences. Few studies, however, have explored audiovisual integration for whole word reading. Snowling (1980) compared dyslexic and reading-age matched readers' nonword recognition ability in unimodal (auditory only and visual only) and cross-modal (auditory-visual, visual-auditory) conditions, and noted the greatest difference in discrimination sensitivity between the two groups was in the visual-auditory condition. In the neuroimaging domain, Kast et al. (2011) compared fMRI activations of dyslexic and non-dyslexic adults during a lexical decision task presented in unimodal (auditory only and visual only), and cross-modal (auditory-visual) conditions. They found that the dyslexic group showed altered brain activation for the cross-modal condition in right STS and left supramarginal gyrus, both of which are implicated in cross-modal conversion (Calvert et al., 2000; Booth et al., 2002b). We stress, however, that Kast et al. assessed overall group differences in adult readers. Though potentially diagnostically useful for an older population, their study does not address whether these differences are present during early childhood and thus potentially impact the acquisition of reading skill, nor relate these differences to an independent cognitive measure. Thus, there is some evidence to suggest that impaired audiovisual integration at larger grain sizes underlies the reading difficulties experienced by dyslexic individuals, though the mechanism through which these impairments influence the acquisition of reading skill remains unclear.

As suggested earlier, however, audiovisual integration is a complex process, as it involves two very different representational systems. Thus, failure to properly integrate phonological and orthographic representations could be attributable to a failure of the phonological, orthographic or integration processes in isolation or in combination. A large body of behavioral and neuroimaging literature argues that reading fluency depends critically on phonological awareness skills. For example, dyslexic adults, relative to controls, display reduced phonological awareness, despite having intact phonological representations, and this reduced awareness is predictive of deficits in two measures of phonological decoding: nonword reading and nonsense passage reading (Mundy and Carroll, 2012). Phonological awareness skills have been shown to predict reading success in several alphabetic languages (Ziegler and Goswami, 2005). These skills appear to develop hierarchically from larger units at the word/syllable level to an intermediate rime/onset level, and ultimately to the smallest, phonemic level (Anthony and Francis, 2005).

Some have argued that phonemic-level awareness is a result of increased sensitivity to phonemes by exposure to orthography (Ziegler and Goswami, 2005), consistent with the argument that this skill is an experience-based developmental consequence of reading in typically developing readers—a position supported by numerous studies showing the influence of orthography on phonological processing (Stuart, 1990; Castles et al., 2003; Desroches et al., 2010). Several behavioral studies have shown that orthographic knowledge impinges on phoneme judgments. For example, given words like *pitch* and *rich* with the same number of sounds but not letters, typically developing readers perceived *pitch* as having a greater number of phonemes (Ehri and Wilce, 1980). This influence of orthography on phoneme judgment has been shown to emerge in preschool-aged children (Castles et al., 2011), suggesting that this cross-modal influence may accompany learning of the alphabetic principle, but continue as a child learns to read.

Phonological awareness tasks involving manipulation of smaller grain sizes (e.g., more reliant on knowledge of the alphabetic principle) have been more specifically referred to in the literature as phonemic awareness tasks. A recent meta-analysis showed that early phonemic awareness is closely related to growth of word reading and is more highly correlated with reading skill than both rime-level awareness and verbal short-term memory (Melby-Lervåg et al., 2012). Elision is a phonemic awareness task, typically measured in English by standardized assessment in which increasingly smaller segments must be removed from the stimulus at increasingly higher levels of linguistic complexity, from words down to phonemes within clusters (Wagner et al., 1999). In this task, participants repeat a verbally presented word (e.g., *CUP*, or /kp/) and then verbally produce a novel word after a particular phoneme has been deleted (e.g., /kp/ without the /k/ sound produces /p/, or *UP*) Elision has been noted as a sensitive measure of phonological skill that discriminates between high and low ability readers better than rhyming and phoneme identification (Vloedgraven and Verhoeven, 2007).

Because elision places large demands on phonemic awareness, it may tap into the processes that are critical for orthographicphonologic integration, and thus predict cross-modal integration performance. A cross-modal influence of orthographic knowledge on a phonemic awareness task would support the view that phonemic awareness is a byproduct of increased reading ability (Ziegler and Goswami, 2005). Several behavioral studies showing orthographic knowledge impinges on phoneme judgments support this position. Take for example the word *BIND*(/bajnd/), when instructed to omit the /n/ phoneme. The deletion of /n/ produces BIDE (/bajd/). Though irrelevant to the task, when the grapheme corresponding to /n/ is deleted from BIND, the product is BID (/bId/). Stuart (1990) showed that children often produced the result of the grapheme, rather than phoneme deletion (e.g., producing /bId/), suggesting they enlisted orthographic knowledge during the task. Another study using an elision task involving an orthographic transparency manipulation with transparent words in which the sound to be deleted had a one-to-one phoneme grapheme correspondence (delete /f/ from *rafter*), and opaque words, in which the sound to be deleted was a silent letter or a biphonemic grapheme (delete /n/ from *knuckle*). Results indicated that children found it more difficult to delete phonemes from opaque items, indicating orthographic influence on phonemic awareness (Castles et al., 2003). Collectively, these results support the notion that orthographic knowledge changes phonological awareness at the phonemic level.

Despite the strong ties between elision and audiovisual integration during reading, only one neuroimaging study to date has examined the relationship between elision and modality-related performance. Frost et al. (2009) examined whether elision skill was correlated with functional activation for unimodal (print vs. speech) tasks in typically developing children. They found correlations between elision and activity in left superior temporal cortex close to the PT and STS and in left occipitotemporal regions including the FG. In the left superior temporal cortex, higher phonemic awareness skill was associated with greater activation when processing print, equivalent to when processing speech. In left occipitotemporal cortex, higher phonemic awareness skill was correlated with less activation when processing speech. These results suggest that higher elision skill is associated with greater specialization of the orthography-phonology sub-network for print, but that the effects of increased audiovisual integration are most pronounced in phonological regions when processing print representations. This is consistent with the finding that elision is positively correlated with print-related activations in left STG, left FG, and left thalamus (Pugh et al., 2013). In summary, although elision is considered a measure of phonemic skill, it seems to be influenced by orthographic knowledge in developing readers, and therefore, may be sensitive to the audiovisual nature of literacy acquisition.

Because previous studies have not examined the role of phonemic awareness (i.e., elision) in unimodal vs. cross-modal tasks, there is no direct evidence relating this skill to audiovisual integration. Moreover, though elision skill has been shown to be diagnostic of reading difficulty, it remains unclear whether the specificity and sensitivity of this measure is a result of it tapping into processes underlying audiovisual integration. Finally, previous audiovisual studies have examined letter-speech congruency, so it is not known whether developmental or disability differences in audiovisual integration apply to larger grain sizes, despite these grain sizes being fundamental to English. To address these issues, the current study examined unimodal (auditory only) and cross-modal (auditory-visual) processing of words in typically developing (TD) readers and those with reading difficulty (RD). We focused our analyses to three regions in a left hemisphere subnetwork implicated in orthographic (FG) and phonolologic (PT) processing, and audiovisual integration (pSTS), consistent with a model of audiovisual integration in reading (Van Atteveldt et al., 2009; Blau et al., 2010). In this model, the pSTS is believed to have reciprocal interconnections with the PT and FG, permitting topdown influence of orthography on phonological processing in PT and the converse top-down influence on orthographic processing in the FG.

Stimulus congruency (i.e., whether two items match along a critical dimension) is often used in the investigation of crossmodal interaction, as it demonstrates that the processing of one item influences the processing of the other. Following other studies investigating reading-related cross-modal development (e.g., Froyen et al., 2008; McNorgan et al., 2013), we assessed the neural response to inter-stimulus congruency. Our question concerned whether elision is primarily sensitive to phonological awareness (manipulation of sounds in spoken language only) or is sensitive to access of orthography from spoken words. Consequently, it was most appropriate to assess these congruency effects under conditions in which participants are presented spoken words only (unimodal auditory) and requiring audiovisual integration (cross-modal). Because of the central role that it is assumed to play in audiovisual integration, we hypothesized that elision would be positively correlated with pSTS activity in the cross-modal condition. Because it should directly influence both phonological and orthographic processing areas, skill-dependent audiovisual integration effects were additionally hypothesized for the FG and PT suggesting interaction between neural systems involved in processing speech and print. Finally, we investigated whether a differential relationship of phonemic skill with audiovisual integration would be present in TD compared RD children.

# **MATERIALS AND METHODS PARTICIPANTS**

A group of 13 typically developing (TD) (7 males; mean age = 11 years, 0 months; range = 8 years, 0 months to 13 years, 7 months) and 13 children with reading difficulty (RD) (7 males; mean age = 11 years, 0 months; range = 9 years, 5 months to 12 years, 6 months) participated in the present study. All participants were native English speakers, right handed, had normal or corrected-to-normal vision, and had no history of psychiatric illness, neurological disease, learning disability or attention deficit hyperactivity disorder (ADHD). Participants were recruited from the Chicago metropolitan area. Informed consent was obtained from participants and their parents, and all procedures were approved by the Institutional Review Board at Northwestern University.

Prior to admission to the study, we evaluated children's nonverbal IQ using the Wechsler Abbreviated Scale of Intelligence and reading-related skill using the Word Identification, Word Attack and Reading Fluency subtests of the Woodcock Johnson Tests of Achievement—III (WJ III) and the Sight Word Efficiency and Phonetic Decoding Efficiency subtests of the Test of Word Reading Efficiency (TOWRE). Participants in the TD group had no subtest standardized score less than 95, and an average across the 5 reading subtests exceeding 100. Participants in the RD group had to have at least one subtest standardized score less than or equal to 85 and an average across the 5 reading subtests of less than 100. Other demographic and non-reading variables were matched as closely as possible. The minimum performance IQ cutoff for participants in both groups was 79 in all performance IQ subtests, and experimental task performance for all participants had to be better than chance for all experimental conditions of interest. Group mean and standard deviations of scaled scores across these standardized measures for the TD and RD participants are presented in **Table 1**, which shows that the TD and RD groups significantly differed across all standardized measures of reading skill, but not for performance (i.e., non-verbal) IQ.

We measured each participant's phonemic awareness using the elision subtest of the Comprehensive Test of Phonological Processing (CTOPP). Briefly, in this task, participants are instructed to repeat a verbally presented word (e.g., "Say /tajgr/") and then instructed to verbally produce the word with the specified phoneme omitted (e.g., "Now say /tajgr/ without the /g/ sound"), wherein the product of the elision is a valid English word (e.g., removing the /g/ from TIGER -/tajgr/ produces TIRE -/tajr/). Elision scores reflect the number of correct elision transformations on a set of 20 progressively difficult target items.

# **EXPERIMENTAL PROCEDURE** *Rhyme judgment task*

On each trial, participants were presented with paired stimuli the order of which was counterbalanced across participants. For

**Table 1 | Mean scaled scores and standard deviations (in parentheses) for standardized tests of achievement for typically developing (TD) and reading difficulty (RD) groups.**


*t(24), t-score; p, independent-samples t-test significance with 24 degrees of freedom; WJ-III, Woodcock Johnson Tests of Achievement—III; TOWRE, Test of Word Reading Efficiency; WASI, Wechsler Abbreviated Intelligence Scale.*

each scanning session, stimuli were presented in one of two modality conditions: In the cross-modal auditory/visual (AV) condition, the first item was presented auditorily and the second was presented visually. In the unimodal auditory/auditory (AA) condition, both items were presented in the auditory modality. Previous investigations of cross-modal lexical processing research (e.g., Van Atteveldt et al., 2004; Froyen et al., 2008) similarly employed auditory-then-visual presentations, motivating the task design for that modality condition. Half the pairs of stimuli rhymed and half did not, and participants were asked to make a rhyme judgment response by pressing one of two keys on a handheld keypad. Participants were asked to respond as quickly and as accurately as possible, using their right index finger for a yes (rhyme) response and their right middle finger for a no (non-rhyme) response. Participants participated in two runs for each modality condition, each lasting approximately 7 min. Participants generally saw the AV condition followed by the AA condition, though this varied across participants as factors such as task accuracy and movement necessitated reacquiring data. An independent samples *t*-test on the time interval between the AV and AA tasks confirmed, however failed to show a difference between these time intervals, *t(*24*)* = 1*.*24, *p >* 0*.*23. Thus, the two groups did not systematically differ with respect to the order in which they performed the tasks. Each stimulus item was presented for 800 ms, separated by a 200 ms interstimulus interval. Participants were free to respond as soon as the second stimulus item was presented. A red cross appeared for 2200 ms following the presentation of the second word, signaling to the participant to respond if they had not already done so. Responses made after the red cross disappeared from the screen were not recorded and counted as errors. A jittered response interval duration of between 2200 and 2800 ms was used to allow for deconvolution of the signal associated with each condition. The sequence and timings of lexical trial events are illustrated for each modality in **Figure 1**. Stimulus pairs varied in terms of their orthographic and phonological similarity, and were presented in one of four similarity conditions (24 pairs per condition). There were two phonologically similar (i.e., rhyming) conditions, one

with orthographically similar pairs (O+P+; e.g., CAGE-RAGE) and another with orthographically dissimilar pairs (O−P+; e.g., GRADE-LAID). There were also two phonologically dissimilar (i.e., nonrhyming) conditions, one with orthographically similar pairs (O+P−; e.g., SMART-WART) and one with orthographically dissimilar pairs (O−P−; e.g., TRIAL-FALL). All words were monosyllabic, having neither homophones nor homographs, and were matched across conditions for written word frequency in children (Zeno, 1995) and the sum of their written bigram frequency (English Lexicon Project, http://elexicon.wustl.edu). We restricted our analyses to the two rhyming conditions (i.e., those associated with "yes" responses) to avoid introducing response-related confounds related to making "yes" vs. "no" judgments. Fixation trials (24 for each run) were included as a baseline and required the participant to press the "yes" button when a fixation-cross at the center of the screen turned from red to blue. Perceptual trials (12 trials for each run) were included for a related study (McNorgan et al., 2013). Perceptual trials comprised two sequences containing tones (AA), or tones followed by glyphs (AV). These stimuli were presented as increasing, decreasing or steady in pitch (for auditory stimuli) or height (for visual stimuli). Participants were required to determine whether the sequences matched (e.g., two rising sequences) or mismatched (e.g., a falling sequence followed by a steady sequence) by pressing the "yes" button to indicate a match, and the "no" button otherwise. The timing for the fixation and perceptual trials were the same as for the lexical trials.

#### *Functional MRI data acquisition*

Participants were positioned in the MRI scanner with their head secured using foam pads. An optical response box was placed in the participant's right hand to log responses. Visual stimuli were projected onto a screen, which participants viewed via a mirror attached to the inside of the head coil. Participants wore sound attenuating headphones to minimize the effects of the ambient scanner noise. Images were acquired using a 3.0 Tesla Siemens Trio scanner. The BOLD (blood oxygen level dependent) signal was measured using a susceptibility weighted single-shot EPI (echo planar imaging) method. Functional images were interleaved from bottom to top in a whole brain acquisition. The following parameters were used: TE = 20 ms, flip angle = 80◦, matrix size = 128 × 120, field of view = 220 × 206*.*25 mm, slice thickness = 3 mm (0.48 mm gap), number of slices = 32, TR = 2000 ms, voxel size = 1.72 mm × 1.72 mm. Before functional image acquisition, a high resolution T1-weighted 3D structural image was acquired for each subject (TR = 1570 ms, TE = 3.36 ms, matrix size = 256 × 256, field of view = 240 mm, slice thickness = 1 mm, number of slices = 160, voxel size = 1 mm × 1 mm).

#### *Functional MRI data preprocessing*

fMRI data were analyzed using SPM8 (Statistical Parametric Mapping, http://www*.*fil*.*ion*.*ac*.*uk/spm). ArtRepair software (http://cibsr*.*stanford*.*edu/tools/human-brain-project/ artrepair-software*.*html) was used during image preprocessing to correct for participant movement. Images were realigned in ArtRepair, which identified and replaced outlier volumes, associated with excessive movement or spikes in the global signal, using interpolated values from the two adjacent non-outlier scans. Outlier scans were defined as those for which a signal change of more than 1.5% from the mean or movement of 4 mm or more along any axis was detected. No more than 10% of the volumes from each run and no more than 4 consecutive volumes were interpolated in this way. For each participant, a single attempt was made to reacquire runs requiring replacement of more than 10% of the volumes or more than 4 consecutive volumes. Slice timing was applied to minimize timing-errors between slices. Functional images were co-registered with the anatomical image, and normalized to the Montreal Neurological Institute (MNI) ICBM152 T1 template. This template is welldefined with respect to a number of brain atlas tools and the MNI coordinate system, and stereotactic space for children within the age range included in our study has been shown to be comparable to that of adults (Burgund et al., 2002; Kang et al., 2003). Images were smoothed using a 2 × 2 × 4 non-isotropic Gaussian kernel.

# **BEHAVIORAL ANALYSES**

We restricted our analyses to the rhyming conditions (i.e., those with a "yes" response), and thus within the context of our experiment, congruency referred to whether the spelling of rhyming pairs matched (i.e., congruent or O+P+, as in CAGE-RAGE) or mismatched (i.e., incongruent or O−P+, as in GRADE-LAID). The congruency effect was thus a measure of the difference between responses, whether in terms of behavior or brain activity, between these two conditions. Because stimulus pair congruency was assumed to influence behavioral performance and BOLD activity for the task (Bitan et al., 2007), a 2 group × 2 task modality analysis of variance (ANOVA) was conducted on the congruency effect (i.e., the difference between congruent and incongruent conditions) to parallel the fMRI congruency effect analysis, with modality as a within-subjects independent variable and group as a between-subjects variable. The dependent variables were the congruency effects for accuracy rates and decision latencies of correct responses.

#### *fMRI analyses*

Statistical analyses were calculated at the first-level using an eventrelated design with all four lexical conditions (O+P+, O−P+, O−P−, O+P−), the fixation condition, and the perceptual condition included as conditions of interest. Interpolated volumes were deweighted, and the first 6 volumes of each run, during which a fixation cross was presented, were dropped from the analyses. A high pass filter with a cut off of 128 s was applied. Lexical, fixation and perceptual pairs were treated as individual events for analysis and modeled using a canonical hemodynamic response function (HRF). Voxel-wise *t*-statistic maps were generated for each participant contrasting the balanced rhyme (O+P+, O−P+) vs. fixation (rhyme—fixation) and congruent vs. incongruent rhyme (O+P+ —O−P+) within each modality condition (6 contrasts). Group-level results were obtained using random-effects analyses by combining subject-specific summary statistics across the group as implemented in SPM8. We were primarily interested in the relationship between elision skill and cross-modal integration in TD and RD children, rather than absolute differences between groups or task modality. Thus, these maps were calculated for the purpose of identifying voxels that were reliably activated for the lexical task for constraining our region of interest definitions (see below) and were not analyzed further.

# *Region of interest definitions*

We focused on the neural responses to orthographic congruency in the PT, FG and pSTS—three anatomical regions associated with phonological, orthographic and cross-modal processing, respectively. Because it was plausible that RD participants would show weaker overall BOLD responses, we defined these regions anatomically and functionally in two steps. This procedure ensured that group differences could not be attributed to a comparison between robust vs. noisy data. In the first step, an atlas-based anatomical definition of left PT was taken from the Harvard-Oxford Cortical Structure Atlas. This atlas is probabilistic, meaning that one or more anatomical labels are assigned to each voxel with an associated probability reflecting the likelihood that the voxel is found in that anatomical region. We selected those voxels for which the PT was the most probable label. That is, if a voxel had been assigned the PT label (with any probability), and if that the probability for belonging to the PT was greater than the probability associated with any other single region, that voxel was included in the PT definition. An atlas-based definition of left FG was taken from the automated anatomic labeling (AAL) atlas included with SPM 8. An atlas-based definition of pSTS was created by intersecting the AAL definitions of left superior temporal gyrus and middle temporal gyrus, each dilated by 4 mm along each axis. The overlapping region defines the sulcus because it follows the line that delineates these immediately adjacent atlas definitions. Posterior STS was defined as those voxels posterior to y = −40, or roughly the posterior third of the volume. The use of two anatomical atlases was necessitated by the fact that not all regions were defined in a single atlas. STS is not defined in either the AAL or Harvard-Oxford probabilistic atlases, however it was relatively straightforward to define STS as described above using the WFU PickAtlas SPM toolbox (http://fmri*.*wfubmc*.*edu/), which interfaces with the AAL atlas. Unfortunately, the AAL atlas does not include a definition for PT, necessitating the use of the Harvard-Oxford atlas. Finally, though FG is defined in both atlases, there was no a priori justification to choose one atlas over the other, and so we selected the AAL template FG definition to provide consistency between this and another related study for which the FG definition had been previously from the AAL atlas.

In a second step, we intersected the atlas-based definitions with statistically thresholded contrast maps in order to constrain our analyses to voxels that were sensitive to congruency for both groups. This was because purely atlas-based ROIs might plausibly be biased even if overall group differences in congruency-related activity were below statistical threshold. Within each of these atlas-based definitions, we selected for each individual the 30 voxels with the highest positive *t*-statistic in the congruent vs. incongruent first-level contrast for each of the AA and AV tasks, thresholded at a liberal alpha of 0.1 (uncorrected). This threshold was a compromise between the need to select a sufficient number of voxels for both TD and RD participants, and the need to ensure that ROIs contained voxels that were reasonably sensitive to congruency, and 30 voxels was the largest multiple of 5 for which an ROI could be created for all participants, regions and tasks. This procedure thereby selected voxels demonstrating a congruency effect in both task modalities and in both groups, and produced ROIs of comparable extent across individuals. Note that because we selected the top thirty voxels in each individual's *t*statistic map, there was no common threshold across participants (i.e., the 30th highest *t*-statistic in each map varied by individual), apart from reaching the minimum uncorrected statistical threshold of 0.1. **Figure 2** depicts the voxels included in each ROI collapsed across participants within each group. For pSTS and FG, a large proportion of voxels were common to many participants from both groups, whereas within PT, there were more individual differences with respect to the voxels showing a congruency effect in both task modalities. The ROIs contained an average of 56, 51 and 58 voxels for the FG, pSTS and PT, respectively, and none had fewer than 44 voxels.

The preceding two steps served to create the ROI definitions. Congruency effects were calculated within each participant by finding the difference between the mean signal among voxels in each ROI for the congruent vs. incongruent rhyming conditions. The congruency effect was calculated separately for the AA and the AV task modalities. We calculated the Pearson correlation between these congruency effects and elision for each group to assess whether elision skill was related to the sensitivity of the BOLD response to inter-item congruency for TD and RD participants, and used the Fischer *Z* test to directly compare the TD and RD correlations.

#### **RESULTS**

#### **BEHAVIORAL ANALYSIS**

Behavioral task performance is presented in **Table 2**. No overall difference was observed between the accuracy congruency effect (i.e., the difference between O+P+ and O−P+ accuracy) for the AV (*M* = −0*.*05, *SE* = 0*.*03) and AA (*M* = −0*.*06,*SE* = 0*.*02) tasks, *F(*1*,*24*)* = 0*.*01,*p >* 0*.*90. The TD (*M* = −0*.*04, *SE* = 0*.*02) and RD accuracy congruency effects (*M* = −0*.*03, *SE* = 0*.*02) were equivalent, *F(*1*,*24*)* = 0*.*01, *p >* 0*.*90, and there was no group by task modality interaction, *F(*1*,*24*)* = 0*.*16, *p >* 0*.*60.

There was similarly no overall difference between decision latency congruency effects for the AV (*M* = 33 ms,*SE* = 0*.*29 ms) and AA (*M* = 32 ms, *SE* = 20 ms) tasks, *F(*1*,*24*)* = 0*.*00, *p >* 0*.*90. The TD (*M* = 42 ms, *SE* = 20 ms) and RD decision latency congruency effects (*M* = 24 ms, *SE* = 20 ms) were equivalent, *F(*1*,*24*)* = 0*.*43, *p >* 0*.*50, and there was no group by task modality interaction, *F(*1*,*24*)* = 0*.*27, *p >* 0*.*60.



*Decision Latency indicated in milliseconds.*

Thus, though the TD participants clearly outperformed the RD participants in terms of both accuracy and decision latency, the behavioral congruency effects were equivalent across the modality conditions and between the TD and RD groups.

#### **REGION OF INTEREST ANALYSIS**

A mixed model analysis of variance (ANOVA) was carried out on the neural congruency effects calculated for each condition and each ROI, using region (FG vs. pSTS vs. PT) and modality (AA vs. AV) as within-subjects variables and group (TD vs. RD) as a between-subjects variable. There was a significant main effect of region [*F(*2*,*48*)* = 9*.*14, *p <* 0*.*001], driven by a significantly greater congruency effect in the PT (*M* = 8*.*81, *SE* = 0*.*86) compared to FG (*M* = 5*.*75, *SE* = 1*.*20) and pSTS (*M* = 4*.*63, *SE* = 0*.*97). The congruency effect was also greater for the AA condition (*M* = 9*.*06, *SE* = 1*.*01) compared to the AV condition (*M* = 3*.*74, *SE* = 1*.*11). **Figure 3** presents the BOLD signal for each rhyming condition compared to baseline for each task modality and each ROI to aid in interpreting these results. Both groups showed similar relationships between congruent and incongruent signal for both modalities, with the RD group tending to exhibit weaker signal overall, but also showing greater variance. These main effects should be interpreted with caution, however for two important reasons. First, there was additionally a threeway interaction between region, mode and group [*F(*2*,*24*)* = 3*.*39, *p* = 0*.*04]. Second, and more importantly, as the remaining analyses show, and as we hypothesized, these congruency effects had a group and regional dependency on elision skill.

Pearson correlations between elision performance and the neural congruency effect within each ROI were calculated over the set of ROIs for the two groups. Across all ROIs, there was a significant correlation between elision and the neural congruency effect for the cross-modal task condition for the TD group, *r(*11*)* = 0*.*68, *p* = 0*.*005, but not for the RD group, *r(*11*)* = −0*.*12, *p* = 0*.*69, and the correlations differed significantly between the two groups, *Fischer* Z = 2.12, *p* = 0*.*03. Though none of the TD participants were statistical outliers with respect to Elision, we calculated the Spearman correlation between the neural congruency effect and the rank-order transformation of the Elision scores to ensure that the effects were not primarily driven by the two lowest-scoring TD participants. The results were similar, *rs(*11*)* = 0*.*60, *p* = 0*.*015. The neural congruency effect for the

unimodal task within this network was not significantly correlated with elision score for either the TD group, *r(*11*)* = 0*.*28, *p* = 0*.*35, or the RD group, *r(*11*)* = −0*.*08, *p* = 0*.*80, and these correlations did not significantly differ, *Fischer Z* = 0*.*82, *p* = 0*.*41. Thus, elision predicts the sensitivity of this sub-network to spelling-sound congruency, but only for typically developing readers and only when engaged in a task requiring the integration of cross-modal information (**Figure 4**). A follow-up analysis

showed the BOLD response to cross-modal congruency was not correlated with any of the standardized measures used as selection criteria, nor was elision skill significantly correlated with any of these measures. Thus, the neural response to cross-modal congruency was tied only to elision which was itself relatively independent of the reading skill measures we used to discriminate TD and RD participants.

By-region analyses of the neural congruency effect for the cross-modal task condition showed that there was a significant correlation between the congruency effect in FG for the TD group, *r(*11*)* = 0*.*80, *p* = 0*.*001, but not for the RD group, *r(*11*)* = −0*.*08, *p* = 0*.*80, and that these two correlations differed, Fishers Z = 2*.*64, *p* = 0*.*004 (**Figure 5A**). Within the pSTS, there was a significant correlation between the congruency effect and Elision in FG for the TD group, *r(*11*)* = 0*.*60, *p* = 0*.*03, but not for the RD group, *r(*11*)* = −0*.*12, *p* = 0*.*70, and these two correlations differed, *Fishers Z* = 1*.*82, *p <* 0*.*03 (**Figure 5B**). Within the PT, the congruency effect was not correlated with Elision for either the TD group, *r(*11*)* = 0*.*17, *p* = 0*.*57, or the RD group, *r(*11*)* = −0*.*1, *p* = 0*.*75 (**Figure 5C**) and these correlations did not differ, *Fishers Z* = 0*.*82, *p >* 0*.*20. Thus, the pattern of correlations seen across the network between elision and the neural congruency effect appear to be driven by a significant relationship in the FG and pSTS for TD participants in the cross-modal task.

As indicated above, the neural congruency effect was calculated as the difference between signal for congruent vs. incongruent rhyming items. Thus, it was unclear whether these correlations were primarily driven by either congruency condition in the two regions showing a clear relationship between congruency effect and elision. Within the FG, the signal strength associated with congruent items was not significantly correlated with elision, *r(*11*)* = 0*.*1, *p* = 0*.*75, whereas that associated with incongruent items was strongly positively correlated with elision,

Significant Fischer's Z test of differences between correlation coefficients are indicated by brackets.

*r(*11*)* = 0*.*57, *p* = 0*.*04, though the correlation coefficients did not significantly differ (**Figure 6A**). Within pSTS, correlations between activation and elision was neither significant for congruent items, *r(*11*)* = −0*.*21, *p* = 0*.*49, nor for incongruent items, *r(*11*)* = 0*.*21, *p* = 0*.*49 (**Figure 6B**).

#### **DISCUSSION**

This study investigated whether children with reading difficulty (RD) had altered audiovisual integration effects as compared to typically developing (TD) children. TD, but not RD, children showed significant correlations between a measure of phonemic

awareness (i.e., elision) and the congruency effect for the crossmodal task. The cross-modal task involved an auditory followed by a visual word that was either orthographically congruent (e.g., lake-cake) or incongruent (e.g., grade-laid). Moreover, there was no significant correlation for elision and the unimodal congruency effect for either group, indicating that elision predicts sensitivity to orthographic congruency, but only for cross-modal processing in the TD children. This pattern suggests two things: First, that elision and cross-modal processing skill are tightly bound in typically-developing readers; and second, that a breakdown in this relationship is associated with difficulties in reading. We note that, at first glance, our failure to find a correlation between elision and activity in the auditory-only condition may appear counter-intuitive, given both the presumed reliance of elision on phonemic awareness, and the presumed reliance of our rhyming judgment task on phonological processing. This pattern, however, reinforces the argument that elision is particularly sensitive to the presence of orthographic input in TD children. That is, in the auditory-only condition, orthographic conflict in the O- conditions is not relevant to the task. Though this conflict may influence phonological and automatic orthographic processing of auditorily-presented stimuli, elision skill does not predict how it influences performance. Rhyming judgments appear to be made largely on the basis of phonological similarity between stimuli in this task. For the cross-modal condition, however, elision predicts the degree to which orthographic conflict influences the network, and consequently produces a cross-modal congruency effect.

Correlations across the network between elision and crossmodal congruency were driven by a significant relationship in fusiform gyrus (FG) and posterior superior temporal sulcus (pSTS) for the TD children. Though there is little disagreement that, as part of the visual processing stream, FG is involved in orthographic processing, evidence for a role of FG in cross-modal processing is ambiguous, with some studies indicating that the region is predominantly specialized for unimodal orthographic processing (e.g., Booth et al., 2002b; Blau et al., 2010), and others suggesting that function of the region is dynamically determined by interactions with other areas involved language processing (e.g., Price et al., 2003; Price and Devlin, 2011). The coupling between elision and cross-modal activity in this region for TD children suggests that FG is sensitive to phonological information, but that this sensitivity is dependent on factors such as reader fluency. This is consistent with evidence for processing of orthography and phonology in the left FG in TD readers and a failure to do so in RD in studies by Schurz et al. (2010) and Richlan et al. (2010); reviewed by Richlan (2012). It is also consistent with our recent finding that TD children activate left FG during auditory rhyme judgment tasks, in contrast to children with RD (Desroches et al., 2010). This automatic activation of the orthographic area during spoken language processing suggests the left FG is involved in integration of orthographic and phonological information.

Studies examining audiovisual integration for orthography and phonology at small grain sizes have identified planum temporale (PT) and pSTS as cross-modal integration areas (Blau et al., 2008, 2010). Posterior STS, in particular, is often implicated as an audiovisual convergence zone for both speech and reading (Calvert, 2001; Van Atteveldt et al., 2004; Nath et al., 2011). These studies have shown that audiovisual integration is confined to a relatively narrow time window in which the two stimuli are presented near synchronously. Because pSTS sensitivity to cross-modal congruency distinguished between the reading groups, our results suggest that this region additionally integrates audiovisual information at the whole-word level over a wider time window in which words are presented 1000 ms apart. Our results extend studies examining audiovisual integration in adults and children with dyslexia showing decreased effects of cross-modal congruency in STS for near synchronous presentations as compared to typical readers (Blau et al., 2008, 2010).

Numerous studies on audiovisual integration of letters and speech sounds found congruency effects in PT suggesting that they are due to feedback originating in pSTS (see Van Atteveldt et al., 2009). Moreover, we recently demonstrated that sensitivity of PT (but not pSTS) to cross-modal congruency in TD children engaged in a rhyming task is correlated with reading age (McNorgan et al., 2013). The failure to find a correlation between a measure of phonemic awareness and the cross-modal congruency effect in the PT was thus surprising. This apparent inconsistency may be attributable, however, to the interaction between task demand and region. Reading requires mapping from orthography to phonology and involves the blending of sounds. The orthographic intrusions seen with skilled readers in phonological awareness tasks (Stuart, 1990; Castles et al., 2003), in contrast, suggests that the explicit separation of sounds encourages these readers map from phonology to orthography. McNorgan et al. argued that, as part of the phonological loop, PT should be more sensitive to large grain-size representations, as word representations unfold over time, whereas pSTS should be more sensitive to smaller grain-size representations. Because elision requires analysis of words at smaller grain-sizes, these results are consistent with McNorgan et al., and suggest that cross-modal processes during reading engage the pSTS for online small-grain integration and PT for large-grain integration over longer time windows.

Impaired phonological awareness is commonly cited as the source of reading impairment (Ramus, 2003), though numerous studies show that a failure to automatize letter-sound associations greatly contribute to reading failure (Vellutino et al., 2004; Ehri, 2005; Blau et al., 2010). Our finding that elision skill is significantly correlated with cross-modal lexical processing may reconcile both theories. As a measure of phonological awareness at the phonemic-level, elision has been shown to be a strong predictor of reading performance that differentiates good and poor readers (Vloedgraven and Verhoeven, 2007). Vaessen et al. (2009) suggest that decreased performance on phonological awareness and rapid naming may reflect not only phonological processing, but also impaired automatic integration of orthography and phonology in dyslexic children. We found the strongest correlations between elision and cross-modal congruency in the FG, with greater activation for incongruent orthography. Incongruent orthographic representations have been shown to influence behavioral performance on phonemic awareness tasks such as segmentation, but this effect is more pronounced in TD compared to RD readers (Landerl et al., 1996). These behavioral findings are consistent with neuroimaging data showing that children with RD have reduced sensitivity to grapheme-phoneme consistency compared to TD children (Bolger et al., 2008), and that children with RD lack effects of orthographic familiarity and print specificity in the FG (Van Der Mark et al., 2009). Frost et al. (2009) also found that elision performance was associated with increasing specialization of FG for print. This study, however, did not include a cross-modal condition, instead examining conjunctions and disjunctions between unimodal spoken or print conditions. Consequently, their findings provide indirect evidence to suggest that elision is critically related to cross-modal processes underlying reading. Thus our results are consistent with the body of literature showing a cross-modal influence of orthographic knowledge on phonological processing, and importantly extend it by showing that this influence is reflected in elision skill for normal readers, but not those with reading difficulty. The capacity to carry out audiovisual integration during reading should depend on connectivity between regions mediating phonological and orthographic representations. It is reasonable, therefore, to suppose that the group differences we observe between TD and RD children might be driven by differential development of this connectivity, and this remains a subject of future investigations which might examine task-related connectivity within these groups.

It is interesting to note that both TD and RD children spanned a range of elision skill, and these distributions were largely overlapping. Thus, there were RD children who performed comparably at elision but did not integrate crossmodal information in the same way as the TD children. The greater ability to access and manipulate phonemic knowledge in higher-elision RD children did not translate to improved cross-modal mapping, consistent with the finding that orthographic information is less likely to intrude on phonological tasks for RD readers (Landerl et al., 1996). RD interventions often involve extensive phonological awareness training (Hulme et al., 2012; Youman and Mather, 2012), but the emphasis on phonological awareness may be at the expense of learning to map between modalities. If the critical deficit in dyslexia is in the mapping from orthography to phonology, orthographic knowledge will be less likely to facilitate the development of phonological awareness, and the two skills will be decoupled. This would be reflected by an overall improvement in phonological awareness without a corresponding improvement in cross-modal integration in RD children undergoing phonological awareness training. In a quantitative meta-analysis, Bus and Van Ijzendoorn (1999) showed that phonological awareness training is most effective when paired with alphabetic training (i.e., learning to associate individual letters and corresponding phonemes) suggesting that intervention should focus on the cross-modal mapping between orthographic and phonological representations.

# **CONCLUSION**

Our findings suggest that phonemic-level awareness, as measured by elision, arises from increased orthographic fluency (Morais et al., 1979; Ziegler and Goswami, 2005). Thus, elision not only measures phonemic skill and therefore is not a pure measure of meta-phonological skill. Rather, it is a composite skill that taps both phonemic awareness and cross-modal integration. Elision may thus depend on the degree to which cross-modal integration in the FG and pSTS influences phonemic awareness, and a breakdown of this relationship may be indicative of reading difficulty. Behavioral literature examining the changing role of phonological awareness during literacy acquisition shows that it is not an especially strong predictor of reading outcomes in beginning readers (Schatschneider et al., 2004), and that there is a developmental increase in the ability of phonological awareness to predict reading skills (Anthony et al., 2007). Elision may thus not be a strong predictor of phonological processing in pre- or early readers, but rather a marker of facility with integrating orthography and phonology following formal reading instruction.

#### **ACKNOWLEDGMENTS**

This research was supported by grants from the National Institute of Child Health and Human Development (HD042049) to James R. Booth.

# **REFERENCES**


*Ann. Dyslexia* 59, 78–97. doi: 10.1007/s11881-009-0024-y


of phones arise spontaneously? *Cognition* 7, 323–331. doi: 10.1016/0010-0277(79)90020-9


G., and Wimmer, H. (2010). A dual-route perspective on brain activation in response to visual words: evidence for a length by lexicality interaction in the visual word form area (VWFA). *Neuroimage* 49, 2649–2661. doi: 10.1016/j.neuroimage.2009. 10.082


*Neuroimage* 47, 1940–1949. doi: 10.1016/j.neuroimage.2009.05.021


languages: a psycholinguistic grain size theory. *Psychol. Bull.* 131, 3–29. doi: 10.1037/0033-2909.131.1.3

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 April 2013; accepted: 04 July 2013; published online: 23 July 2013.*

*Citation: McNorgan C, Randazzo-Wagner M and Booth JR (2013) Cross-modal integration in the brain is related to phonological awareness only in typical readers, not in those with reading difficulty. Front. Hum. Neurosci. 7:388. doi: 10.3389/fnhum.2013.00388*

*Copyright © 2013 McNorgan, Randazzo-Wagner and Booth. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Comprehending expository texts: the dynamic neurobiological correlates of building a coherent text representation

#### *Katherine Swett 1†, Amanda C. Miller 2†, Scott Burns 1, Fumiko Hoeft 3, Nicole Davis 1, Stephen A. Petrill <sup>4</sup> and Laurie E. Cutting1 \**

*<sup>1</sup> Education and Brain Sciences Research Lab, Peabody College of Education and Human Development, Vanderbilt University, Nashville, TN, USA*

*<sup>2</sup> Department of Psychology and Neuroscience, Regis University, Denver, CO, USA*

*<sup>3</sup> Department of Psychiatry, University of California San Francisco, San Francisco, CA, USA*

*<sup>4</sup> Department of Psychology, Ohio State University, Columbus, OH, USA*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Michiru Makuuchi, National Rehabilitation Center for Persons with Disabilities, Japan Jianfeng Yang, Shaanxi Normal University, China*

#### *\*Correspondence:*

*Laurie E. Cutting, Education and Brain Sciences Research Lab, Peabody College of Education and Human Development, Vanderbilt University, 230 Appleton Place, PMB 328 Nashville, Tennessee, TN 37203, USA*

*e-mail: laurie.cutting@vanderbilt.edu †These authors have contributed equally to this work.*

Little is known about the neural correlates of expository text comprehension. In this study, we sought to identify neural networks underlying expository text comprehension, how those networks change over the course of comprehension, and whether information central to the overall meaning of the text is functionally distinct from peripheral information. Seventeen adult subjects read expository passages while being scanned using functional magnetic resonance imaging (fMRI). By convolving phrase onsets with the hemodynamic response function (HRF), we were able to identify regions that increase and decrease in activation over the course of passage comprehension. We found that expository text comprehension relies on the co-activation of the semantic control network and regions in the posterior midline previously associated with mental model updating and integration [posterior cingulate cortex (PCC) and precuneus (PCU)]. When compared to single word comprehension, left PCC and left Angular Gyrus (AG) were activated only for discourse-level comprehension. Over the course of comprehension, reliance on the same regions in the semantic control network increased, while a parietal region associated with attention [intraparietal sulcus (IPS)] decreased. These results parallel previous findings in narrative comprehension that the initial stages of mental model building require greater visuospatial attention processes, while maintenance of the model increasingly relies on semantic integration regions. Additionally, we used an event-related analysis to examine phrases central to the text's overall meaning vs. peripheral phrases. It was found that central ideas are functionally distinct from peripheral ideas, showing greater activation in the PCC and PCU, while over the course of passage comprehension, central and peripheral ideas increasingly recruit different parts of the semantic control network. The finding that central information elicits greater response in mental model updating regions than peripheral ideas supports previous behavioral models on the cognitive importance of distinguishing textual centrality.

**Keywords: discourse processing, expository text comprehension, situation model building, temporal analysis of text comprehension, central vs. peripheral information, fMRI BOLD, semantic control network**

#### **INTRODUCTION**

Reading comprehension is a complex process that requires the coordination and integration of a number of component cognitive skills. The ability to read single words in isolation is widely accepted as one skill critical to comprehension, but successful reading comprehension does not always directly stem from adequate word identification skills. Some individuals who are skilled word readers are not skilled passage comprehenders (e.g., Cain and Oakhill, 2006; Catts et al., 2006; Cutting et al., 2009), supporting the idea that reading comprehension requires processes above and beyond single word reading.

Theoretical models of reading comprehension propose that successful comprehension requires a reader to draw on both text-based information and prior knowledge in order to build a coherent and meaningful mental representation of the text (Kintsch, 1974; van den Broek, 1988; Gernsbacher, 1990; Graesser et al., 1994; Zwaan and Singer, 2003). This mental representation is the reader's understanding of the text's deeper meaning; it consists of ideas from the text, relevant background knowledge, and inferences the reader makes about things not explicitly stated in the text (McNamara and Magliano, 2009). Building this mental representation is a dynamic process because cognitive demands change over time. For example, readers are known to spend more time processing words and sentences at the beginning of a text relative to later points (Glanzer et al., 1984). This could be due to the fact that, without context or relevant background knowledge activated to facilitate comprehension, comprehension necessitates more effortful attention to the initial construction of a mental representation (Yarkoni et al., 2008). Conversely, later stages of comprehension processes are facilitated by an increasing semantic contextualization (McNamara and Kintsch, 1996; McNamara and Magliano, 2009).

A number of imaging studies have examined the neurobiological correlates of reading comprehension (e.g., Fletcher et al., 1995; Maguire et al., 1999; Xu et al., 2005; Speer et al., 2009). Patterns of activation emerge when processing discourse that cannot be predicted from models of reading single words, or even single sentences, in isolation (Xu et al., 2005). Areas that consistently appear to be unique to processing narrative texts include the dorsal medial prefrontal cortex and bilateral temporal parietal junction, often attributed to social cognition required in story comprehension, bilateral temporal poles (TP, see **Table 1** for all abbreviations), which play a role in generating specific semantic associations in connected text, and posterior medial structures, including posterior cingulate cortex (PCC) and precuneus (PCU), which have been associated with updates in and integration of the reader's mental model (e.g., St. George et al., 1999; Robertson, 2000; Gernsbacher and Kaschak, 2003; Yarkoni et al., 2008; Speer et al., 2009; Whitney et al., 2009; Price, 2012). This demonstrates that reading connected text involves additional processes beyond the phonological, orthographic, semantic, and syntactic processes seen at the word and sentence level. Still, many questions regarding how readers form a coherent text representation remain unanswered.

Only a handful of studies have examined how the neural correlates of discourse processing change over the temporal progression of the discourse (Xu et al., 2005; Yarkoni et al., 2008; Speer et al., 2009). Of the few, Xu et al. (2005) used fMRI to compare the activation associated with reading the beginning of a story (setting and initial events) with the activation associated with reading the end of the story (outcome and final events). They found that processing the story's setting and initiating events resulted in strongly left lateralized activation, while processing the story's outcome resulted in increased activation in right hemisphere perisylvian and extrasylvian regions thought to contribute to inference and contextualization of narrative (Xu et al., 2005).



These right hemisphere regions have since been related to social cognition processes that may be narrative-specific (Mar, 2011). This study provides evidence that reading comprehension not only involves processes distinct from those required in single word reading, but also that comprehension demands can vary from point to point within a given text.

Similarly, by modifying the cohesiveness of text (stories vs. scrambled sentences) Yarkoni et al. (2008) identified neural regions that showed linear increases in activation as a function of reading time. More specifically, they compared *construction* processes (i.e., those involved at the text outset as the reader lays a foundation for the mental representation) with *maintenance* processes (i.e., those involved in integrating new ideas onto previously read, related ones). They found that regions in the posterior parietal cortex associated with visuospatial updating and attention are involved in the construction of a reader's mental model, while perisylvian language areas were more involved in its maintenance. These studies support theoretical models that suggest that building a mental representation of text is a dynamic process in which the cognitive demands shift from one point in the text to the next.

Nevertheless, it is important to note that all of the aforementioned fMRI studies on discourse processing have exclusively examined narrative texts; none to date have examined expository texts (i.e., texts written to convey factual information on a topic). However, event-related potential (ERP) and behavioral studies suggest such genre distinctions are important. For example, Baretta et al. (2009) used ERP to distinguish between narrative and expository texts. They found that reading the final sentence of expository texts relative to narrative texts elicited a greater increase in N400 amplitude, and they concluded that expository texts required more demanding semantic processing. Eason et al. (2012) also reported differences between genres, showing that expository texts placed higher demands on executive function (EF) than narrative texts, particularly inferencing and planning/organizing information. EF is thought to be essential to the process of building a coherent text representation because it enables readers to store previously read text ideas as they simultaneously read new ideas and integrate them into their mental representation (Kintsch and Rawson, 2005). While behavioral data certainly support the theoretical significance of EF to reading comprehension in general (e.g., Carretti et al., 2005; Cain, 2006; Swanson et al., 2007; Cutting et al., 2009; Sesma et al., 2009; Locascio et al., 2010; Christopher et al., 2012), Eason et al.'s (2012) findings of higher demands on EF for expository text suggest that for this particular text genre, which is critical for acquiring new information, EF is particularly salient.

#### **SENSITIVITY TO STRUCTURAL CENTRALITY**

One hallmark of successful reading comprehension is that the reader can distinguish between ideas that are important, or central, to the overall meaning of the text, and those that are less important, or peripheral. Skilled readers form connections among a text's semantically related ideas as they read. The ideas and their connections form a network in the reader's mind (van den Broek and Espin, 2010). Some ideas are causally or logically connected to a great number of other ideas and as a result emerge as being important, or central, to the overall meaning of the text, while others have relatively few connections and fall out as being peripheral, or unimportant (Trabasso and van den Broek, 1985; van den Broek, 1988).

A robust finding in the comprehension literature is that skilled readers are more likely to recognize and recall an idea the more central it is to the overall meaning of the text (Kintsch et al., 1975; Kintsch and van Dijk, 1978; Britton et al., 1980; Cirilo and Foss, 1980; van den Broek, 1988). This finding holds for both narrative and expository texts (Miller and Keenan, 2011). van den Broek et al. (2013) propose that a reader's ability to distinguish a text's central and peripheral ideas, or their *sensitivity to structural centrality*, is an important indicator of their comprehension ability. For example, adults show greater sensitivity to structural centrality than do children (Brown and Smiley, 1977); typicallydeveloping children show greater sensitivity to centrality than do children with reading disability (Miller and Keenan, 2009) as well as those with Attention Deficit Hyperactivity Disorder (ADHD) (Miller et al., 2013); and readers show greater sensitivity to centrality when reading in their native compared to foreign language (Miller and Keenan, 2011).

Importantly, studies suggest that centrality tends to emerge as a feature of the developing text representation. van den Broek (2012) used eye-tracking equipment to show that skilled readers fixate more frequently and spend more time reading central ideas than they do peripheral ones. This suggests that centrality is a dynamic construct that emerges as the reader processes a text, consistent with the idea that readers form connections among semantically related text ideas as they read. In theory, the ideas that are most important stand out because they are the ones that have the most connections and are consequently the ones most likely to be recalled. To date, although sensitivity to centrality has been investigated behaviorally, the neural mechanisms remain unknown. Understanding the neural mechanisms underlying sensitivity to centrality may allow for a more specific understanding of normal and disrupted comprehension processes.

# **CURRENT STUDY**

The current study sought first to identify the neural correlates specific to expository text comprehension, looking both at regions which overlap with single word processing and those which are specific to discourse-level processing. Once the systems for expository text comprehension were identified, we employed temporal analysis techniques to examine how these systems change over the course of building and maintaining a coherent mental representation of the text.

We hypothesized that when isolating discourse-level comprehension from word-level comprehension, we would see regions that have previously been implicated by sentence and narrative comprehension, particularly those associated with discourse-level language processing separate from social cognition [bilateral TP, angular gyrus (AG), and PCC] (Price, 2012; Chow et al., 2013). We predicted that the other traditional language regions, such as left-lateralized inferior frontal gyrus (IFG), middle temporal gyrus (MTG), and anterior superior temporal sulcus (STS), would most likely be shared by both word and passage tasks, but that these multi-functional regions would behave differently over the temporal course of passage comprehension compared to single-word comprehension (Chow et al., 2013). Additionally, due to prevalence of information organization in expository comprehension, we expected to see activations in the dorsal attention network [dorsolateral prefrontal cortex (dlPFC), intraparietal sulcus (IPS), inferior parietal lobule (IPL)], which has been associated with the kind of updating, integrating, and immediate planning of information that has been behaviorally described in previous studies on expository comprehension (Eason et al., 2012; Ptak, 2012).

We consequently hypothesized that over the course of passages alone, semantic control areas shared by words and passages at the mean-level would become increasingly responsive over time in passage comprehension alone due to the increased semantic demands associated with integrating and maintaining new information in a global text representation. Given Yarkoni et al.'s (2008) study showing that the parietal visuospatial attention regions are involved in the *construction* of text (and building the necessary visuospatial representation), while classic language areas (perisylvian language areas) are reflective of the *maintenance* of readers' mental models, we hypothesized that along with the emergence of a greater reliance on perisylvian regions over time, in passages we would see a decrease over time of posterior parietal regions.

The second goal of the current study was to examine the patterns of neural activation that are uniquely associated with processing central vs. peripheral ideas. While behavioral measures clearly indicate that readers distinguish central from peripheral ideas, both online and when recalling the text, the neurobiological processes that support this fundamental aspect of text comprehension have yet to be explored. Gaining insight into processes that promote a reader's sensitivity to centrality advances current comprehension models. More focally, it allows for isolating the underlying neural mechanisms supporting processes that may be disrupted in individuals with poor sensitivity to centrality. Such knowledge may eventually inform ways to individualize intervention for problematic reading comprehension. Given previous behavioral findings, we expected there to be unique semantic and integrative regions that differentiate central ideas from peripheral ideas. Finally, we predicted that with the temporal progression of the text, there are changes in the cognitive demands required in differentiating central and peripheral information and integrating those units into the mental model, resulting in temporally dynamic neural systems for different types of textual information.

To accomplish the goals of the study, an fMRI passage comprehension task was designed in which participants viewed three types of stimuli: (a) coherent expository passages (*Passages* condition) and (b) scrambled words (*Words* condition) and (c) nonalphanumeric symbols (*Baseline* condition). Within the Passages condition, we delineated the text's central and peripheral ideas. To examine differences between central vs. peripheral ideas, as well as overall patterns of activation associated with text, we employed a typical general linear model (GLM). To examine the emergence of a mental representation of the text, or dynamic changes taking place over time, an approach sensitive to temporal features was taken (Grill-Spector et al., 2006), whereby examination of neural activation that emerged or diminished over time for various conditions was revealed.

#### **METHODS**

#### **PARTICIPANTS**

Seventeen adults (mean 24.7 years ± 3.3 years; 9 male) participated in the study. All participants met the following inclusion criteria: (1) native English speakers; (2) normal hearing and vision; (3) no history of major psychiatric illness; (4) no traumatic brain injury/epilepsy; (5) no history of a developmental disability; and (6) no contraindication to MRI. Each participant gave written consent at the beginning of the study, with procedures carried out in accordance with Vanderbilt University's Institutional Review Board. All participants had a standard score within the average range (85–115) on a composite of standardized reading tests (Sight Word Efficiency and Phonemic Decoding Efficiency subtests of Test of Word Reading Efficiency; Word Identification and Word Attack subtests of the Woodcock Reading Mastery Test-Revised) or had no history of difficulty with reading. Participants received \$25 as compensation for a 2-h testing session.

#### **fMRI TASKS**

#### *Passages condition (see Figure 1A)*

Coh-Metrix 2.0 (Graesser et al., 2004) was used to create 8 passages that were equivalent across measures of word concreteness, syntactic simplicity, referential cohesion, causal cohesion, and narrativity. Passages were matched on descriptive factors, including: number of words, average sentence length, and all passages measured a Flesch-Kincaid grade-level between 4.0 and 4.9. To insure that passages were equivalent in difficulty, each of the 8 passages was isolated and compared to the mean of the remaining 7 passages. Passages were considered equivalent when measures were within a 90% confidence interval of the mean of the group of remaining passages. At the end of this process, the passages were equal across 23 measures of descriptive statistics, vocabulary frequency, word concreteness, syntactic simplicity, referential cohesion, causal cohesion, and narrativity (i.e., the degree to which the text uses everyday oral conversation and tells a story with familiar characters, events, places, and things). Four of these passages were used for the Passages condition and four were used for the Words condition (see below), which included words from the passages in randomized order.

All passages were 150 words in length. Each sentence was no longer than 13 words. The passages were all expository and included the following topics: Hang Gliding, Wrasses, Velvet Worms, and Hydroponics. Each passage consisted of two paragraphs, the first of which served to introduce the topic while the second elaborated on a particular detail of the subject matter.

#### *Defining centrality*

Consistent with established procedures for determining central vs. peripheral ideas (Albrecht and O'Brien, 1991; Miller and Keenan, 2009, 2011), centrality of the passages' idea units was defined using importance ratings obtained from a sample of 14 adult volunteer raters using the following procedure. First, the raters read one of the four passages related to Hydroponics, Wrasses, Hang Gliding, and Velvet Worms. Next, we presented that passage to the rater, this time formatted as a checklist of all the ideas in the passage. Each idea on the checklist corresponded with a phrase presented in-scanner. Raters used this checklist to identify how important each idea was to the overall meaning of the passage using a 0–7 Likert scale that ranged from the idea being "unimportant to the passage" to "very important to the passage." We calculated a mean rating for each idea unit. The ratings formed a normal distribution and had high reliability estimates (ICCs: Hydroponics = 0.90; Wrasses = 0.88; Hang Gliding = 0.89; Velvet Worms = 0.91). Consistent with previous work (Miller and Keenan, 2009, 2011), we used a median-split to divide this distribution of idea units into two classes; we identified ideas below the median rating as "peripheral" and those above the median as "central."

Previous studies suggest that the neural correlates of reading words can vary according to grammatical class (e.g., nouns vs. verbs; see Vigliocco et al., 2011, for a review) or conceptual concreteness (Kiehl et al., 1999; Wang et al., 2010). To rule out such potential confounds, we examined whether central and peripheral ideas were comparable in the number and types of nouns and verbs they contained. Within each of the four passages, we used *t-*tests to assess whether central and peripheral ideas significantly differed on the number of action verbs, non-action verbs, abstract nouns, concrete nouns, or pronouns. Central and peripheral ideas did not significantly differ on any of these classifications [*Hydroponics t(*55*)* = −1*.*04, *p* = 0*.*30; *Wrasses t(*50*)* = −0*.*96, *p* = 0*.*34; *Hang gliding t(*43*)* = −0*.*47, *p* = 0*.*64; *Velvet worms t(*48*)* = −1*.*34, *p* = 0*.*19], abstract nouns [*Hydroponics t(*55*)* = −0*.*87, *p* = 0*.*39; *Wrasses t(*50*)* = −1*.*61, *p* = 0*.*11; *Hang gliding t(*43*)* = −1*.*36, *p* = 0*.*18; *Velvet worms t(*48*)* = 0*.*00, *p* = 1*.*00], action verbs [*Hydroponics t(*55*)* = −1*.*00, *p* = 0*.*32; *Wrasses t(*50*)* = 1*.*31, *p* = 0*.*20; *Hang gliding t(*43*)* = −0*.*38, *p* = 0*.*71; *Velvet worms t(*48*)* = −1*.*26, *p* = 0*.*22], or non-action verbs [*Hydroponics t(*55*)* = 1*.*79, *p* = 0*.*08; *Wrasses t(*50*)* = −1*.*41, *p* = 0*.*17; *Hang gliding t(*43*)* = 1*.*59, *p* = 0*.*12; *Velvet worms t(*48*)* = −1*.*22, *p* = 0*.*23].

#### *Words baseline*

The words baseline condition consisted of scrambled words presented in "phrases," which were exactly matched in length, word type, and presentation time to the phrases in the passages (see **Figure 1B**).

#### *Baseline*

The baseline condition included three non-alphanumeric symbols displayed horizontally on a slide (see **Figure 1C**).

#### **PROCEDURE**

Using imaging technology to explore the neural correlates of reading comprehension is a challenging task due to the temporal nature of discourse processing. Previous studies have presented the entire paragraph on one screen (Fletcher et al., 1995; Vogeley et al., 2001; Moss et al., 2011), but this procedure prohibits comparing how readers process specific aspects of the passage, such as central vs. peripheral ideas, because the block contains both types of information.

The most temporally precise presentation method is to present the story one word at a time, and several studies have employed this procedure (Xu et al., 2005; Yarkoni et al., 2008; Speer et al., 2009). When piloting passages using this approach, participants reported that it created an uncomfortable, artificial reading experience, likely in part because readers typically process words up to 14–15 letters to the right of their fixation (Rayner, 1986), and using a single word-by-word presentation prevents this. The moving window procedure is an alternative method that allows examination of the processing associated with single words. The advantage of this procedure is that the word(s) immediately preceding and following the word under fixation are also visible. Although this allows for a more naturalistic reading experience, the approach was undesirable for this study because it requires a self-paced design, and temporal consistency in the presentation of stimuli is required for group comparisons.

To avoid both the above confounds, we presented our passages one meaningful phrase at a time. This procedure enabled us to compare activation related to processing central and peripheral ideas, yet decreased the artificial demands imposed by a word-byword presentation. Each phrase was presented on a separate trial. The phrases included noun phrases, verb phrases, and prepositional phrases, and they ranged from 1 to 6 words in length. The number and type of words presented together determined the phrases' presentation duration. We allowed 550 ms for each content word and 275 ms for each function word. For timing purposes, we presented no more than three content words per slide and randomized the time between phrases to allow comparison across phrases. The Words condition followed the same presentation format as the Passages condition. The baseline condition was presented between paragraph 1 and paragraph 2 of both the Passages and Words conditions. The purpose of this design was to allow participants' activation to return to baseline after reading each block (paragraph). The presentation sequence was: (1) Passage condition, Paragraph 1; (2) Baseline condition; (3) Passage condition, Paragraph 2; (4) Baseline condition; (5) Words condition; (6) Baseline condition. The mean time for the passages block was 78.54 (*SD* = 22*.*94); Baseline mean = 47.69 (*SD* = 1*.*48); and Words mean = 82.45 (*SD* = 3*.*29).

In all three conditions, 8% of the stimuli were repeated on two consecutive screens. To monitor whether participants attended to the stimuli, participants pressed a button with their right thumb when they detected a phrase repetition or a symbol configuration repetition. Mean percentage correct response was very high (95*.*06 ± 5*.*36).

#### **fMRI DATA ACQUISITION, PREPROCESSING, AND FIRST-LEVEL ANALYSES**

Imaging was performed on a research-dedicated Philips Achieva 3T MR scanner with a 32-channel head coil. Functional images were acquired using a gradient echo planar imaging sequence with 40 (3 mm thick) slices with no gap and consisted of 4 runs, each 7 min (190 dynamics per run). Other relevant imaging parameters for the functional images are *TE* = 30 ms (for optimal BOLD contrast at 3T), FOV 240 × 240 mm, slice thickness = 3 mm with 0 mm gaps, 75◦ flip angle, *TR* = 2200 ms, and a matrix size 80 <sup>×</sup> 80 (interpolated), implying 3 mm<sup>3</sup> isotropic voxels.

All functional data were analyzed using MATLAB (Mathworks, Natick, MA) and SPM8 (Frackowiak et al., 1997). The functional data for each participant were slice-timing corrected, aligned to the mean functional image, normalized to MNI space, and spatially smoothed with a 8 mm FWHM Gaussian filter. Participants whose data in any run exceeded motion thresholds (*>*3 mm translational displacement, 3◦ rotation) were discarded from the analysis. First-level analysis was performed by creating a standard regression model with estimated HRF for each condition while the six motion parameters (x, y, z translational; x, y, z rotational) and outlying volumes as determined by ART (Whitfield–Gabrieli; http://www*.*nitrc*.*org/ projects/artifactdetect/) added to the design matrix as regressors of no interest.

For the standard GLM analyses (i.e., those examining mean group-level activation, heretofore referred to as "mean grouplevel analyses"), three sets of contrasts for each participant were created. First, we compared total activation for the Passages against the Baseline condition as well as the Central and Peripheral conditions against the Baseline condition. Then, Central – Baseline and Peripheral – Baseline contrasts were directly compared. Finally, to further understand the potential overlap and specificity of passages as related to scrambled words, we examined the Boolean conjunction of Passages – Baseline and Words – Baseline (see Supplementary data).

To investigate the dynamic processes involved in building a coherent text representation, brain regions were examined that demonstrated increased or decreased activation as a function of time as the participant progressed through the Passages, relative to the Baseline.

These temporal analyses determined whether the dynamic process of building a text representation was associated with increased or decreased activation in specific areas. To accomplish these analyses, for each run we modeled each phrase onset as a stick function with a height equal to the difference between phrase onset and the initial phrase of Paragraph 1. For instance, to examine increased activation over time for the Central phrases, if Central phrases were presented at the 3rd, 7th, and 9th TR, the resulting vector would be [0 0 1 0 0 0 5 0 7]. The "1" in the vector represents the onset in which the first event of interest (i.e., central or peripheral phrase) occurs. Any proceeding event is weighted in proportion to the time passed between the new event and original event of interest. The resulting vectors were then convolved using the HRF to create conditions of interest and were inserted into the first-level GLM. This formulation allowed us to model a linear relationship, thus representing temporal/dynamic changes associated with each condition. To temporally model the Passages against the baseline, we collapsed the Central and Peripheral onset vectors and built the condition of interest using the same formulation above.

#### **GROUP-LEVEL IMAGING ANALYSIS**

SPM8 and MATLAB (Mathworks, Nattick, MA) were used to create whole brain activation maps. Individual contrast maps were brought up to a group level one-sample *T*-test to analyze the Passages, Peripheral phrases, and Central phrases relative Baseline, and the Central and Peripheral phrases relative to each other. MNI coordinates were converted to Talairach (Talairach and Tournoux, 1988) using formulas by Matthew Brett (http://www*.*mrc-cbu*.*cam*.*ac*.*uk/) and locations were determined by querying the Talairach Daemon (Lancaster et al., 2000). The group-level analyses were subjected to a uncorrected statistical threshold of *p <* 0*.*001 and a cluster size of 90 voxels, which was determined by 3dClustSim to be equivalent to *p <* 0*.*05.

#### **RESULTS**

#### **PASSAGES vs. BASELINE: MEAN GROUP-LEVEL ANALYSIS**

Passages relative to baseline showed robust activation of traditional language regions, specifically left IFG (BA 45/46), left MTG (21/22) extending to AG (BA 39), left TP, and bilateral anterior STS. Additionally, PCC and ventral PCU were active, along with visual and word-processing regions, including bilateral occipital, fusiform, lingual gyri, and cuneus clusters (see **Table 2** and **Figure 2**).

# **PASSAGES vs. BASELINE: TEMPORAL GROUP-LEVEL ANALYSES** *Temporal – increasing*

Comparison of the passages condition to baseline elicited increasing activation over time predominantly in left


**Table 2 | Passages vs. Baseline mean analysis.**

*Cluster size in mm3. BA, Brodmann Area. All T-values are significant at p* <sup>=</sup> *0.05. \*Indicates region outside of Brodmann areas.*

*For large clusters, brackets indicate sub-cluster peaks in BA regions distinct from primary peak, extracted using a decreased peak search space of 4 mm within the main cluster.*

hemisphere language related regions, including left IFG (BA 44/45), left MTG (BA 21/22), and left anterior STS. Bilateral lingual gyrus (BA 18) extending to cuneus and left occipital also showed increasing activation (see **Table 3** and **Figure 3**).

#### *Temporal – decreasing*

Comparison of the passages condition to baseline indicated prominent decrease in activation over time in right dorsal PCU extending to IPS (BA 7) (see **Table 3** and **Figure 3**).

#### **CENTRAL AND PERIPHERAL COMPARISONS: MEAN GROUP-LEVEL ANALYSES**

#### *Central and peripheral ideas compared to baseline*

As to be expected, both the Central and Peripheral ideas from the Passages condition showed robust bilateral occipital activity and greater language (left IFG, left MTG, left AG, bilateral anterior STS, left TP) and PCC activity than right when compared to the baseline (see **Table 4**).

#### *Comparison of central vs. peripheral ideas*

Directly comparing Central to Peripheral, the Central condition showed more activation in posterior mid-line structures, including retrosplenial cortex (RSA) (BA 29), PCC (BA 31), and ventral and dorsal PCU clusters (BA 31 and BA 7), along with a large cuneus cluster that extended into lingual gyrus (BA 18/19). Laterally, Central showed greater activation in left TP and left anterior STS (BA 38 and 21) (see **Table 4** and **Figure 3**). Peripheral compared to Central phrases did not show significantly greater activation in any region (see **Table 4** and **Figure 4**).

#### **PERIPHERAL AND CENTRAL PHRASE COMPARISONS: TEMPORAL GROUP-LEVEL ANALYSES**

# *Central and peripheral phrases compared to baseline—increasing*

Central ideas as compared to baseline showed significantly greater increase in activation over time in many of the same areas that were recruited to a greater extent over time during the passages vs. baseline conditions: left IFG (BA 44/45), left

#### **Table 3 | Passages vs. Baseline temporal analysis.**


*Cluster size in mm3. BA, Brodmann Area. All T-values are significant at p* <sup>=</sup> *0.05.*

*For large clusters, brackets indicate sub-cluster peaks in BA regions distinct from primary peak, extracted using a decreased peak search space of 4 mm within the main cluster.*

#### **Table 4 | Central and peripheral mean analysis.**


*Cluster size in mm3. BA, Brodmann Area. All T-values are significant at p* <sup>=</sup> *0.05. \*Indicates region outside of Brodmann areas.*

*For large clusters, brackets indicate sub-cluster peaks in BA regions distinct from primary peak, extracted using a decreased peak search space of 4 mm within the main cluster.*

cuneus (17/18), and left lingual gyrus (18/19). Similarly, temporal group-level analyses examining increases in activation for peripheral ideas also elicited activation in left MTG (BA 21), left cuneus (BA 19/18), bilateral lingual gyrus (BA 18), and left superior temporal gyrus (STG) (BA 21/22) (see **Table 5** and **Figure 5A**).

#### *Central and peripheral phrases compared with baseline—decreasing*

Central phrases as compared to baseline in the decreasing temporal analyses elicited activation in right PCU and superior parietal lobule (SPL) (BA 7) and right middle occipital gyrus (18/19). Temporal analyses for peripheral phrases also showed significant

**Table 5 | Central and peripheral temporal analysis.**


*Cluster size in mm3. BA, Brodmann Area.*

*All T-values are significant at p* = *0.05.*

**FIGURE 5 | Regions that show (A) increasing and (B) decreasing activations over time of central ideas (red), peripheral ideas (yellow), and both central and peripheral ideas (purple) at (uncorrected)** *p <* **0***.***001,** *k* **= 90.**

decreasing activation in right PCU and SPL (BA 7) (see **Table 5** and **Figure 5B**).

# *Comparison of central vs. peripheral phrases—increasing and decreasing*

No statistically significant differences were found for either the increasing or decreasing temporal group-level analyses.

# **DISCUSSION**

The neural correlates of expository text comprehension have not previously been examined in fMRI. This study not only sought to identify a network of regions specifically activated for discourselevel processing of expository text but, due to the fluctuating cognitive demands within the comprehension of a single text, also examined the neural systems that underlie comprehending the text over time. Finally, to further identify core processes of expository text comprehension, this study aimed to define the functional underpinnings of comprehending textual centrality, which is one key indicator that a reader has formed a coherent mental representation (van den Broek et al., 2013).

#### **NEURAL CORRELATES OF DISCOURSE PROCESSING IN EXPOSITORY TEXTS**

In expository text comprehension, we see co-activation of left-lateralized language regions (IFG, posterior and anterior MTG) and two heteromodal association areas—left AG and PCC/PCU—commonly associated with higher-order cognitive processes (Price, 2012; Chow et al., 2013). Specifically, the observed language regions have been identified as part of an executive semantic control network (Whitney et al., 2011), with left IFG and left posterior MTG thought to direct semantic connections to fit the current context, while regions in the anterior temporal lobe are thought to store and integrate specific semantic associations (Binder et al., 2011; Whitney et al., 2011; Price, 2012). These regions have been observed to activate for different levels of comprehension (Price, 2012; Chow et al., 2013), and an examination of the scrambled words condition compared to expository comprehension in our own study shows that regions in this executive semantic control network overlapped for both conditions (see Supplementary Figure 1). From the perspective of hierarchical comprehension, in which reading is comprised of discourse comprehension built on top of single word comprehension, these shared regions could be interpreted as contributing to wordlevel processes only. However, given previous findings indicating these regions are active when processing semantic associations for words and sentences, hierarchical assumptions of functionality may overlook these regions' complex contributions to reading (Xu et al., 2005; Binder et al., 2011; Price, 2012; Chow et al., 2013). This complexity is supported by the temporal analyses discussed below, which show activations of the semantic control network over time that are unique to expository text.

The heteromodal regions that we see co-activated with the semantic control network (left AG and PCC/PCU) were activated for expository text, not words (see Supplementary Figure 1). The distinction of these regions as discourse-specific is unsurprising. Both regions have been previously identified as multi-function, cognitive "hubs," which perform higher-order cognitive processes (Chow et al., 2013; Seghier, 2013). In the context of language, left AG is primarily associated with semantic memory, incorporation of semantic information into a coherent whole, and making top-down semantic predictions (Price, 2012; Seghier, 2013), while PCC has been noted for its activation at updates in readers' mental representation of narrative texts (i.e., where readers are required to integrate information that conflicts with the present situation model) (Maguire et al., 1999; Speer et al., 2009; Whitney et al., 2009). This co-activation of left AG and PCC, along with language regions suggests that expository text comprehension involves a core semantic-processing network which integrates semantic information both at the word- and sentence-level, along with activation of heteromodal regions that more globally update the situation model into a coherent whole. Further discussion of the PCC and PCU roles in the context of centrality can be found in the following section.

Our findings of posterior midline and left AG in expository text when compared to single word reading is similar to what is reported for narrative, and further supports the possibility that these regions are involved in global comprehension processes which aren't necessarily dependent on discourse type (Mar, 2011). Unlike previous findings on narrative comprehension, however, it's important to note that apart from these left-lateralized activations, our findings suggest that expository text comprehension does not rely on additional regions within the theory of mind network— a network associated with social inference processes and contextualization of narrative text within world knowledge (Xu et al., 2005; Ferstl et al., 2008; Mar, 2011). The absence of other primary hubs of the theory of mind network, particularly the medial prefrontal cortex, emphasizes that narrative and expository texts may have critically different cognitive requirements, stressing the need to examine both text types in order to isolate specific comprehension processes susceptible to dysfunction. A direct study of narrative and expository texts is needed to further explore these comparisons.

Contrary to our hypothesis, expository text did not show activation of the dorsal attention network. This could be a result of the fact that our participants were skilled adult readers, and our passages were written at a fourth grade reading level. We created the passages to be highly cohesive, easily decodable, and thus easy to comprehend. However, it is likely that these relatively undemanding passages decreased the overall EF load.

#### **TEMPORAL DYNAMICS OF EXPOSITORY TEXT COMPREHENSION**

Interestingly, regions in the executive semantic control network progressively activate over time in passages alone, despite being activated in both passages and words in the mean analysis (See Supplementary Figure 2). This shows that these semantic regions have a unique activation pattern in expository text comprehension, further supporting findings that they play multifunctional roles interacting with different comprehension levels (Xu et al., 2005; Ferstl et al., 2008; Mar, 2011). Observations of the BOLD signal in these regions and its correlation to the HRF for central or peripheral events suggest that these increases are specifically due to language processes (see Supplementary Figures 5–7). During discourse comprehension, in order to maintain the reader's situation model, these semantic networks would necessarily be increasingly relied on over the course of the passage. As the amount of required semantic connections increases, both a greater store of semantic associations (eliciting activation in anterior MTG) and increased executive direction of those associations (left IFG and left posterior MTG) are required to ensure that new information aligns with and is integrated into the current situation model (Yarkoni et al., 2008; Whitney et al., 2011). It has also been suggested that left IFG and left posterior MTG play a role in integrating modality-specific knowledge (i.e., perceptual, motor, and affective) into the reader's situation model, which could also contribute to its increasing activation through comprehension (Chow et al., 2013).

When looking at regions that decrease over the course of comprehension, we see decreased activation of right IPS in both word and passage conditions (See Supplementary Figure 3). However, compared to words and baseline, the BOLD signal in right IPS shows a marked decrease in activation at central and peripheral events, suggesting that IPS could have a unique relationship with discourse-specific processes (see Supplementary Figure 7). Interestingly, this region has been previously implicated in discourse-level narrative comprehension. Ferstl et al. (2005) suggested that right IPS is involved in attentional shifts from local to global aspects of the mental representation of the text. In narrative comprehension compared to scrambled sentences, Yarkoni et al. (2008) saw an initial spike of activation in the same region, followed by a linear decrease over the time course of comprehension, attributing the activation pattern to visuospatial updates involved in initial situation model construction. Similarly, when contrasting the first paragraph of expository text to the second paragraph (see Supplementary Figure 4), we see activation in the same region, suggesting that right IPS is more prevalent in the beginning of comprehension than the end. Consequently, decreased activation of right posterior parietal cortex could be indicative of the region's role in construction of the situation model. The overlapping temporal decrease in scrambled words could reflect readers' initial attempts to build a situation model despite incoherence, particularly since task types were not identified to readers ahead of the stimuli. However, higher-order interpretations of IPS activations in texts should be treated carefully, as activations could reflect subtle, visual attention differences between tasks.

These findings closely reflect Yarkoni et al.'s (2008) narrative findings, and support a cross-genre reading model in which visuospatial updating and attention regions are involved in the initial construction of a reader's mental representation of a text, and executive semantic control areas are increasingly necessary for its maintenance. The similarities between studies suggest not only that there are distinct cognitive stages during text comprehension, but that some of the neural structures underlying these stages may be shared across text genre.

#### **NEURAL CORRELATES OF CENTRAL AND PERIPHERAL TEXT**

Our second aim was to examine the neural correlates associated with the ability to distinguish between a text's central and peripheral ideas, or readers' sensitivity to centrality (van den Broek et al., 2013). Skilled readers demonstrate sensitivity to centrality by recognizing and recalling a greater proportion of central than peripheral ideas (Kintsch et al., 1975; Kintsch and van Dijk, 1978; Britton et al., 1980; Cirilo and Foss, 1980; van den Broek, 1988); however, identifying central information is a skill known to be particularly vulnerable to disruption among individuals who experience comprehension difficulties (Miller and Keenan, 2009, 2011; Miller et al., 2013). Because sensitivity to centrality is both a critical component of comprehension and one that is vulnerable to disruption, we aimed to explore the neural underpinnings of this process.

A direct comparison of mean group-level activation indicates that central text ideas are cognitively distinct from peripheral ideas, eliciting greater activation in textual integration regions when compared to peripheral. Specifically, reading central relative to peripheral ideas was associated with posterior midline structures, namely PCC and PCU (BA 29/31), as well as anterior temporal regions. These findings relate to previous studies of discourse processing that have found PCC and PCU to be associated with forming connections among text ideas (Fletcher et al., 1995; Maguire et al., 1999; Robertson, 2000), updating story representations (Whitney et al., 2009), and connecting text-based information to prior knowledge (Fletcher et al., 1995; Maguire et al., 1999). Additionally, Speer et al. (2009) found greater PCC activation when readers processed the points in the text that required the greatest degree of mental model updating. Activation of STG/MTG (BA 38) has also been associated with linking semantic ideas to form a connected narrative (Fletcher et al., 1995; Maguire et al., 1999). These findings confirm that in addition to readers' ability to behaviorally distinguish between central and peripheral information, the degree of textual relevancy is associated with a distinct neural network of textual/extra-textual integration and mental representation regions in the comprehender.

#### **TEMPORAL DYNAMICS OF CENTRAL AND PERIPHERAL TEXT**

Comparing central and peripheral activations over time shows that as the text progresses, central ideas recruit different parts of the language network than peripheral ideas. Specifically, regions within the executive semantic control network differentiate central and peripheral processing over time, with central ideas increasingly relying on the left IFG, and peripheral ideas activating left anterior MTG independently from and posterior MTG to a greater extent than central ideas. This centrality-driven division between frontal and temporal semantic processing regions can be seen in the BOLD signal, with left IFG and left anterior MTG initially responding generally to the switch from non-word to word stimulus, before demonstrating clear correlation with central and peripheral HRF prediction peaks, respectively (see Supplementary Figures 5, 6). While both temporal and frontal regions are implicated in semantic cognition, it has been suggested that left posterior MTG acts as a general interface between lexical and conceptual knowledge, anterior MTG is involved in specific semantic associations, while left IFG is more contextspecific, activating for conceptual knowledge that is cued by the preceding text (Price, 2012; Chow et al., 2013). Consequently, for central textual ideas, which are more semantically-dependent on previous ideas, the IFG is increasingly involved in making appropriate semantic connections to the established context. On the other hand, processing peripheral ideas, or ideas which have looser semantic connections to the preceding text, would rely more heavily on regions that support general semantic knowledge to contextualize the present text. This suggests that within the fronto-temporal semantic control network, there is a functional divide between frontal and temporal contributions related to perception of textual centrality.

Decreased activation over time for both central and peripheral ideas was similar to the patterns of temporal activations associated with passages—as language regions increased over time, activation of the visuospatial attention system decreased. This pattern is also apparent in the BOLD signal, and appears to be anti-correlated with both central and peripheral phrases (see Supplementary Figure 7). However, the extent and strength of the right IPS cluster in central ideas was significantly greater than peripheral. This difference can be explained by right IPS involvement in situation model construction—because central ideas contribute more to the situation model, they would consequently be more sensitive to the decreasing need of construction regions (Kintsch, 1988).

#### **LIMITATIONS AND FUTURE DIRECTIONS**

Our temporal analyses assumed a linear relationship between time and neural activation of text processing; however, nonlinear temporal relationships may exist, and future studies should explore such non-linear changes. A second limitation is that our models assume that neural activation builds not only as the reader progresses through the paragraphs, but also during the baseline condition between the two paragraphs. Future work should compare whether removing this baseline assessment changes the patterns of temporal activation change.

One methodological consideration is that our participants were skilled adult readers, and our passages were written at a fourth grade reading level. Future studies should manipulate the reading level of the passages and examine how this manipulation influences the neural correlates of expository comprehension, particularly regions associated with EF.

Future studies should also consider the important interaction between text and reader by considering the background knowledge that the readers hold about each passage topic. Background knowledge plays an important role in building a coherent representation of the text (Spilich et al., 1979; Miller and Keenan, 2009, 2011) and allows the reader to form a more meaningful representation that goes beyond the text-based ideas (Kintsch, 1988; Albrecht and O'Brien, 1991). A reader's existing knowledge base is especially important to consider with respect to expository texts because they often use topic-relevant vocabulary that builds upon the reader's assumed knowledge base.

Finally, future work should examine the neural correlates of building a coherent text representation among groups of readers known to be less sensitive to structural centrality, such as individuals with reading disability, individuals with ADHD, and foreign language learners. Comparing the patterns of activation associated with skilled and less skilled comprehension could help identify the comprehension processes that are disrupted and the underlying source of their comprehension difficulties. This insight could perhaps be employed to inform and improve reading comprehension instruction and interventions.

# **CONCLUSION**

Successful expository text comprehension is critical in school learning environments and requires different cognitive processes than narrative comprehension, including a greater ability to organize and plan information to develop a cohesive mental representation (Eason et al., 2012). Expository text consequently offers a unique environment to study reading ability and disability. However, while there is an increasing number of imaging studies examining discourse processing through narrative comprehension, expository text comprehension has largely been overlooked. This study not only identifies the neural correlates of expository text comprehension as a whole, but also isolates those regions involved in the dynamic construction of mental representations of the text over time, as well as those associated with textual centrality. By better understanding these dynamic, within-text processes of reading comprehension, we can begin to identify the key cognitive stages of comprehension that are particularly prone to dysfunction in populations with discourse-level reading disabilities.

#### **ACKNOWLEDGMENTS**

This work was supported by NICHD 5K08HD60850-2, NICHD R01HD067254, NICHD R01HD044073, Vanderbilt Kennedy Center grant 5P30HD015052-31, VICTR Resources grant UL1TR000445 from NCATS/NIH, and Vanderbilt CTSA grant UL1RR024975 from NCRR/NIH, and Grant # R24HD075460.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2013*.*00853/abstract

# **REFERENCES**


brain mapping. *Hum. Brain Mapp.* 10, 120–131. doi: 10.1002/1097- 0193(200007)10:3*<*120::AID-HBM30*>*3.0.CO;2-8


van den Broek, P. W., and Espin, C. A. (2010). Improving reading comprehension: connecting cognitive science and education. *Cogn. Crit.* 2, 1–25.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 August 2013; accepted: 21 November 2013; published online: 12 December 2013.*

*Citation: Swett K, Miller AC, Burns S, Hoeft F, Davis N, Petrill SA and Cutting LE (2013) Comprehending expository texts: the dynamic neurobiological correlates of building a coherent text representation. Front. Hum. Neurosci. 7:853. doi: 10.3389/ fnhum.2013.00853*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Swett, Miller, Burns, Hoeft, Davis, Petrill and Cutting. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Spatial attention in written word perception

# *Veronica Montani 1, Andrea Facoetti 1,2 and Marco Zorzi 1,3,4 \**

*<sup>1</sup> Department of General Psychology, University of Padua, Padua, Italy*

*<sup>2</sup> Neuropsychology Unit, "E. Medea" Scientific Institute, Bosisio Parini, LC, Italy*

*<sup>3</sup> IRCCS San Camillo Neurorehabilitation Hospital, Venice-Lido, Italy*

*<sup>4</sup> Center for Cognitive Neuroscience, University of Padua, Padua, Italy*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Xiaolin Zhou, Peking University, China Daniel J. Schad, Charité – Universitätsmedizin Berlin, Germany*

#### *\*Correspondence:*

*Marco Zorzi, Department of General Psychology, University of Padua, via Venezia 8, 35131 Padua, Italy e-mail: marco.zorzi@unipd.it*

The role of attention in visual word recognition and reading aloud is a long debated issue. Studies of both developmental and acquired reading disorders provide growing evidence that spatial attention is critically involved in word reading, in particular for the phonological decoding of unfamiliar letter strings. However, studies on healthy participants have produced contrasting results. The aim of this study was to investigate how the allocation of spatial attention may influence the perception of letter strings in skilled readers. High frequency words (HFWs), low frequency words and pseudowords were briefly and parafoveally presented either in the left or the right visual field. Attentional allocation was modulated by the presentation of a spatial cue before the target string. Accuracy in reporting the target string was modulated by the spatial cue but this effect varied with the type of string. For unfamiliar strings, processing was facilitated when attention was focused on the string location and hindered when it was diverted from the target. This finding is consistent the assumptions of the CDP+ model of reading aloud, as well as with familiarity sensitivity models that argue for a flexible use of attention according with the specific requirements of the string. Moreover, we found that processing of HFWs was facilitated by an extra-large focus of attention. The latter result is consistent with the hypothesis that a broad distribution of attention is the default mode during reading of familiar words because it might optimally engage the broad receptive fields of the highest detectors in the hierarchical system for visual word recognition.

**Keywords: visual word recognition, visuo-spatial attention, spatial cueing, reading aloud, phonological decoding, attentional focusing**

#### **INTRODUCTION**

Visuo-spatial attention is likely to be engaged at many levels of the process of recognizing printed word (McCandliss et al., 2003), but despite many studies investigating this issue the literature does not offer a clear and uncontroversial picture. Several different manipulations of attention have been used to investigate whether word processing is automatic or whether it requires some engagement of attention and, in the latter circumstance, what kind of reading sub-processes consume attention resources. We review below the previous literature and then present a new study examining the involvement of spatial attention in visual word perception, more specifically how the latter is modulated by focusing attention on the target stimulus.

The most cited evidence for the automaticity of word reading is the Stroop effect (for a review see MacLeod, 1991). Longer reaction times (RTs) in naming the ink color of words that convey incongruent color names is usually taken as a demonstration of automatic processing up to the word meaning, thereby suggesting that suppression of word reading is difficult or even impossible (e.g., Neely and Kahan, 2001; Brown et al., 2002). Nevertheless, automatic word processing in Stroop tasks can be moderated by attentional manipulations, as shown by the finding that focusing spatial attention on a single letter of the word can reduce the magnitude of the Stroop effect (e.g., Stolz and Besner, 1999; see also Lachter et al., 2004, 2008).

Another way to investigate the automaticity of word reading is to assess whether it can proceed in parallel with another task. To this aim, some studies have used the psychological refractory period (PRP) paradigm (Pashler, 1994; Johnston et al., 1995), which requires to perform two tasks in rapid succession. When the time interval between the two tasks is long, the two tasks are performed without interference, while RTs for the second task increase sharply when the time interval is short (i.e., PRP effect). McCann et al. (2000) concluded that orthographiclexical processing needs central attention, whereas Cleland et al. (2006) found exactly the opposite. Other studies, using the locus-of-slack logic, provide evidence that phonological recoding requires central attention while earlier visual-orthographic processing can automatically proceed (Reynolds and Besner, 2006; O'Malley et al., 2008). Lien et al. (2008) used the PRP paradigm in combination with the recording of event-related potentials (ERPs). They assessed the amplitude and latency of the N400 wave elicited by words that were semantically related or unrelated to the context, as well as the amplitude and latency of the P300 wave elicited by high or low frequency words (LFWs). Overall, their conclusion was that neither semantic nor lexical processing can proceed without attention (but see Rabovsky et al., 2008). Converging evidence regarding the role of attention in word reading is also provided by studies on mindless reading (e.g., Reichle et al., 2010; Schad et al., 2012).

Posner's spatial cuing paradigm (Posner, 1980) allows to direct attention to a particular position in visual space and to assess the consequences of processing a target stimulus at the attended vs. unattended location. In the context of written word perception, orienting spatial attention away from the target should be detrimental if word processing requires attention. However, the studies using variants of this paradigm have produced inconsistent results. Some studies have reported that biasing spatial attention with a cue either at the beginning or at the end of a letter string has a stronger influence on pseudoword (PWs) than on word reading (Sieroff and Posner, 1988; Givon et al., 1990; Auclair and Siéroff, 2002), thereby suggesting that the lexical status of the string can influence the distribution of attention. Other studies, however, reported that the cuing effect was not modulated by the type of string. For example, McCann et al. (1992) found faster lexical decision latencies at the cued position for both words and PWs that were presented above or below the fixation point. Similar results were found using left or right parafoveal presentation (Nicholls andWood,1998; Ortells et al.,1998; Lindell and Nicholls, 2003). Finally, a lack of cueing effect was reported by Ducrot and Grainger (2007) using a perceptual identification paradigm with target words appearing left or right of a central fixation point and using a string of hash marks as spatial cue. In valid trials the cue matched the target both in location and spatial extent, while in the neutral condition the hash marks covered both possible locations of the target. When the target was presented in central vision, with fixation either on the first or on the last letter, little or no effects of spatial cueing were found. However, it is important to note that the absence of invalid trials might have influenced the latter results.

Familiarity of the stimulus is typically manipulated through the frequency or the lexicality of the string (e.g., Monsell et al., 1989; Allen et al., 2005). A different approach was adopted by Risko et al. (2011), who used repetition to manipulate familiarity and combined it with spatial cueing in the context of a word naming task. They found that in the repetition condition (i.e., when the word was repeated numerous times throughout the experiment) the cueing effect was smaller than in the no repetition condition (i.e., when the word was presented a single time). This finding is in line with the idea that familiar items place less demands on spatial attention. Moreover, the study of Risko et al. (2011) offers an explanation of the inconsistent findings on the automaticity of reading, because the findings using the Stroop task may reflect the fact that stimulus repetition reduces spatial attentional requirements.

In summary, the studies reviewed above suggest that attention is flexibly used in visual word processing. This is also consistent with the finding of individual differences in the automaticity of visual word recognition that largely depend on reading skills (Ruthruff et al., 2008) and presumably on reading experience (Siéroff and Riva, 2011). In contrast to the idea of fully automatic processing that is highlighted by the Stroop task, the engagement of attention seems a necessary requirement in order to process visually presented words.

#### **SPATIAL ATTENTION IN MODELS OF READING ALOUD**

Beginning readers need to learn a system for mapping between visual symbols and sounds (Ziegler and Goswami, 2005). Simple visual features are combined to form detectors of letter shapes (Dehaene et al., 2005; Zorzi et al., 2013) and letters are then organized into higher-order units that map onto sounds (Perry et al., 2007, 2013). Indeed, phonological decoding is thought of as *sine qua non* for reading acquisition (Share, 1995). Repeated exposure to the printed material and the ability to recognize words through phonological decoding progressively leads to the development of orthographic representations of whole words (Ziegler et al., 2014, and Di Bono and Zorzi, 2013, for computational models of orthographic learning), with a neural substrate in the occipito-temporal area (i.e., the visual word form area, McCandliss et al., 2003; Glezer et al., 2009; Dehaene and Cohen, 2011). The distinction between phonological decoding (which involves small grain-size units) and recognition of whole words is a prominent feature of dual-route models of reading aloud (e.g., Coltheart et al., 2001; Perry et al., 2007, 2010). Nevertheless, the assumption that reading involves the interaction between two different pathways, one phonological and the other lexical-semantic, is shared by virtually all computational models (e.g., Plaut et al., 1996; Harm and Seidenberg, 2004; for a review see Zorzi, 2005).

In line with the seminal proposal of LaBerge and Samuels (1974), some of these models make specific assumptions on how attention is engaged in the two different pathways. In the CDP+ model (Perry et al., 2007), spatial attention is assumed to be engaged by the phonological pathway during the parsing of letter strings into the constituent graphemes that provide the input to the phonological decoding network (see also Perry et al., 2013). Other models assume a parsing mechanism that can operate on units of different sizes (e.g., letters vs. syllables; Ans et al., 1998) depending on the context. Regardless of the specific details, parsing in all models is thought to rely on focused spatial attention that moves from left to right across the letter string. That is, a topdown search mechanism is used to sweep the spotlight of attention serially over the sub-word units (Vidyasagar, 1999;Vidyasagar and Pammer, 2010).

Several lines of evidence support the hypothesis that the phonological route, rather than the lexical route, requires efficient focusing of visual-spatial attention. Patients with severe neglect dyslexia show preserved lexical-semantic access in reading (Ladavas et al., 1997a,b), suggesting an interaction between the attentional system and the different reading routes. Moreover, several studies have linked developmental reading difficulties to impaired visual-attentional processing mechanisms. Impaired visual-spatial attention has been repeatedly described in dyslexic children (e.g., Facoetti et al., 2005) and adults (Laasonen et al., 2012), in particular for those showing poor non-word reading ability (Cestnick and Coltheart, 1999; Buchholz and McKone, 2004; Facoetti et al., 2006, 2010; Roach and Hogben, 2007; Jones et al., 2008). Non-word reading performance taps the functioning of the phonological route and its impairment is a hallmark of dyslexia across different languages (Ziegler et al., 2003). Dyslexic children perform worse on visual-attention span tasks (i.e., tasks measuring the number of distinct visual elements that can be simultaneously processed at a glance) than normally reading children (Bosse et al., 2007). Moreover, the reading performance of dyslexic children can substantially improve after training visuospatial attention (Geiger and Lettvin, 1999; Facoetti et al., 2003; Franceschini et al., 2013) or through a simple manipulation of the physical appearance of the text (i.e., extra-large spacing of the letters) that reduces the demands on focused spatial attention (Zorzi et al., 2012; Schneps et al., 2013). Finally, visual-spatial attention skills in pre-schoolers is predictive of future reading performance (Franceschini et al., 2012).

The aim of this study was to further investigate how visual word processing is modulated by the allocation of spatial attention. Following Ducrot and Grainger (2007), we assessed the effect of a spatial cuing manipulation within a perceptual identification task. Importantly, and in contrast to the study of Ducrot and Grainger (2007), we included an invalid spatial cue condition and we manipulated the lexicality of the stimuli (by including PWs) in addition to familiarity (i.e., word frequency). We predicted that high frequency words (HFWs) should be less influenced by the distribution of attention than LFW, whereas PW should be the most influenced by the attention modulation because phonological decoding places particular demands on the orienting of focused visuo-spatial attention (Perry et al., 2007).

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Twenty undergraduate students from University of Padua participated in the study. Their mean age was 22.85, with range of 18 to 28 years. They were all native Italian speakers and had normal or correct-to-normal vision.

#### **APPARATUS AND STIMULI**

Stimulus presentation was on a 17" CRT monitor connected to a Pentium IV computer running E-Prime 1.1 software (Schneider et al., 2002). Strings were presented in uppercase white letters against a black background in 12-point Courier New font. Participants were seated at a distance of 60 cm from the screen. Each string subtended a visual angle of 4.25◦. Two hundred and sixteen eight-letter strings were used as target. Seventy two strings were HFWs (mean printed frequency greater than 33 occurrences per million; Bertinetto et al., 2005), whereas seventy two strings were LFWs (mean printed frequency less than 3 occurrences per million). Finally, seventy two strings were PW obtained by replacing two letters in a set of HFWs (different from those used as targets). In each frequency set, words were 88% nouns, 8% verbs, and 4% adjectives. The target strings were presented in the left or right visual field such that either the last letter or the first letter were aligned with the central fixation point.

In the valid condition, the spatial cue consisted of a string of eight hash marks (########) presented either in the right or left visual field accordingly with the location of the target string. In the invalid condition, the same spatial cue was presented either in the right or left visual field, opposite to the target string. In the neutral condition, the spatial cue consisted of a string of fifteen hash marks, presented centrally and covering both the right and left positions. The central fixation consisted by two vertically aligned central lines with a gap between them (as in Experiment 3 of Ducrot and Grainger, 2007) in order to avoid masking effects.

#### **DESIGN AND PROCEDURE**

Participants had their head positioned on a headrest and they were instructed to avoid eye movements. At the beginning of each trial, the fixation was displayed in the middle of the screen and participants were instructed to fixate the gap. After a delay of 1000 ms, the spatial cue appeared for 50 ms. After 30 ms of delay (i.e., cue-target interval was 80 ms), the target string was presented for 80 ms (**Figure 1**). Then, a window appeared on the screen inviting the participant to type the corresponding string using the computer keyboard.

Every experimental session was divided in two block with a short break between them. During the experiment, target strings were randomly presented such that every string was presented once and their position in the visual field, left vs. right, was randomly chosen such that half of the stimuli were assigned to the left presentation and the other half to the right presentation. The spatial cue condition (valid, invalid, and neutral) was randomly chosen such that each condition had a probability of one third. Therefore, the experiment consisted of three within subjects manipulations: type of string (HFW, LFW, and PW), spatial cue (valid, invalid, and neutral) and visual field (left and right).

# **RESULTS**

Data were analyzed employing mixed-effect multiple regression models (Baayen et al.,2008) using lme4 package (Bates et al., 2013) and afex package (Singmann, 2013), in the R environment (R Core Team, 2013). Mixed-effects models offer a flexible framework for modeling the sources of variation and correlation that arise from grouped data. In particular, the model fitting procedure takes into account the covariance structure of the data including random effects (for an exhaustive discussion about fixed and random effects, see Gelman, 2005). A great advantage of mixed models, as compared to more conventional methods, is that they do not assume independence amongst observations allowing a wide variety of correlation patterns to be explicitly modeled (Pinheiro and Bates, 2000). Another advantage is


**Table 1 | Mean accuracy (in percentage of correctly reported letters) and standard deviation (in parenthesis) for all conditions in the experiment.**

that mixed models can deal with the problem of the languageas-fixed-effect fallacy (Clark, 1973). Since it is not possible to make use of systematic sampling procedures both with subjects and items, bringing them as random effects into the model allows controlling better the unexplained by-subject and by-item variances. Overall, mixed models provide insight into the full structure of the data, they have slightly superior power (Baayen, 2008) and finally, they can also be extended to non-normal outcomes.

Response accuracy was computed by counting, for each item, the number of letters correctly reported by the participant. Each letter had to be reported in the correct position in the string to be counted as correct. Nevertheless, the results were virtually identical using a more lenient criterion that did not considered letter position. Note that string-level accuracy was too low for PWs to allow for meaningful analyses. We applied a multiple regression model with a logarithmic link function (Jaeger, 2008) and poisson variance distribution that is appropriate for counts of events in a fixed time window (e.g., Agresti, 2007; Baayen, 2008). Mean accuracies in the different conditions are reported in **Table 1**.

Barr et al. (2013) suggested that linear mixed-effects models generalize best when they include maximal random effects structure justified by the design. In our study, this implies the exclusion of the by-item random slopes for type factor because our manipulation of string type implies different itemsfor each level of the type factor. Subsequently, overfitted models (i.e., models with a random structure that caused the model to break) or random effects with no explanatory power (with variance parameters driven to zero or the correlations to +1 or −1) were excluded. Therefore, the final random structure included both by-subject and by-item random intercepts and random variation (random slopes) for the cue factor at the subject level and random variation (random slopes) for the visual field factor at the item level.

The model included three fixed effect and their interactions: *type of string*, *spatial cue*, *visual field*, two way-interactions *type of string* by *spatial cue*, *type of string* by *visual field*, *spatial cue* by *visual field*, and the three-way interaction *type of string* by *visual field* by *cue.* **Table 2** reports random effects of the final model. There was inter-subject variability and it was moderately modulated by the spatial cue effect. Furthermore, the variability in the neutral condition was correlated with the variability in the valid condition (0.79) and it was negatively correlated with the variability in the invalid condition (−0.66). There was inter-stimulus variability modulated by the visual field effect. Importantly, taking into account both these sources of variability, all predictors (fixed

#### **Table 2 | Random effects of the final model.**


*Visual field (VF): L* = *left. Sub: subjects. Cue: V* = *valid, I* = *invalid.*

effects) considered were significant. **Table 3** reports fixed effect coefficients of the final model (factors were dummy coded with HFW, neutral cue and right visual field as reference levels). Note that the *b* coefficient represents the adjustment with respect to the reference level.

In order to assess the significance of the main effects and interactions, we performed Type III test (which is based on control sum coding rather than dummy coding), comparing a model in which only the corresponding effect is missing with the model containing the effect. The *p*-values were calculated via the likelihood ratio tests. The type of string main effect was significant <sup>χ</sup>2(2) <sup>=</sup> 80.42, *p* < 0.0001, indicating that the accuracy was different for the three types of string. The spatial cue main effect was significant, <sup>χ</sup>2(2) <sup>=</sup> 6.83, *<sup>p</sup>* <sup>&</sup>lt; 0.05, indicating that accuracy was modulated by the spatial cue. The visual field main effect was significant, <sup>χ</sup>2(1) <sup>=</sup> 353.86, *<sup>p</sup>* <sup>&</sup>lt; 0.0001, indicating that that accuracy was better in the right visual field than in the left visual field. The interaction type of string by spatial cue was significant, <sup>χ</sup>2(4) <sup>=</sup> 16.51, *p* < 0.01, indicating that the effect of the spatial cue was different for the three types of string. The interaction visual field by spatial cue was not significant, <sup>χ</sup>2(2) <sup>=</sup> 3.30, *<sup>p</sup>* <sup>=</sup> 0.19, indicating that the effect of the spatial cue was similar in the two hemifields. The interaction type of string by visual field was not significant, <sup>χ</sup>2(2) <sup>=</sup> 4.61, *<sup>p</sup>* <sup>=</sup> 0.09, indicating that the effect of the type of string was similar in the two hemifields. However, the three-way interaction just missed significance, <sup>χ</sup>2(4) <sup>=</sup> 9.12, *<sup>p</sup>* <sup>=</sup> 0.05, suggesting that the effect of the spatial cue on the types of string was different in the two hemifields for at least one of the three types.

The interaction between type of string and spatial cue, which is crucial for the purpose of the present study, is shown in **Figure 2**. The nature of this interaction was inspected conducting separate multilevel models on each level of the *type of string* factor*.* Hence,



*Type: LFW* = *low frequency words, PW* = *pseudowords. Cue: V* = *valid, I* = *invalid. Visual Field (VF): L* = *left.*

for this analysis the main effect and the interaction term of the type of string were excluded. In addition, since in the full model the interaction type of string × spatial cue × visual field just missed significance, we first assessed for each model (i.e., type of string) whether inclusion of the *visual field* by *cue* interaction would improve the model fit according to the likelihood ratio tests. This was the case only for PWs (HFW: <sup>χ</sup>2(2) <sup>=</sup> 2.46, *<sup>p</sup>* <sup>=</sup> 0.29; LFW: <sup>χ</sup>2(2) <sup>=</sup> 0.39, *<sup>p</sup>* <sup>=</sup> 0.82; PW: <sup>χ</sup>2(2) <sup>=</sup> 7.20, *<sup>p</sup>* <sup>&</sup>lt; 0.05). Therefore, for HFWs and LFWs the visual field factor was excluded. Factors were dummy coded with valid or neutral cue as reference levels. We report regression coefficients (*b*), *z* and *p* values. **Figure 3** shows how accuracy for each type of string changed as a function of cue condition and hemifield, using the neutral cue as baseline.

For PWs in the right visual field, accuracy did not significantly differ across cue conditions (valid vs. invalid: *b* = −0.05, *z* = −1.04, *p* = 0.30; valid vs. neutral: *b* = −0.01, *z* = −0.16, *p* = 0.87; invalid vs. neutral: *b* = −0.06, *z* = −1.16, *p* = 0.25). For PWs in the left visual field, accuracy was significantly higher in the valid condition in comparison to both the invalid and the neutral condition (respectively *b* = −0.16, *z* = −2.30, *p* < 0.05 and *b* = −0.23, *z* = −3.15, *p* < 0.01). The difference between the neutral and the invalid conditions was not significant (*b* = 0.07, *z* = 0.91, *p* = 0.36). For LFWs, none of the effects reached significance (valid vs. invalid: *b* = −0.07, *z* = −1.64, *p* = 0.10; valid vs. neutral: *b* = −0.02, *z* = −0.43, *p* = 0.66; neutral vs. invalid: *b* = −0.05, *z* = −1.35, *p* = 0.18). Finally, for HFWs, there was no difference between valid and invalid conditions (*b* = −0.04,*z* = −1.14, *p* = 0.25). However, the neutral condition

showed higher accuracy than both the valid condition (*b* = −0.09, *z* = −2.62, *p* < 0.01) and the invalid condition (*b* = −0.13, *z* = −4.20, *p* < 0.001).

#### **DISCUSSION**

The central question addressed in the present study is how spatial attention affects the processing of visual words. To this end, in the context of a perceptual identification paradigm, we manipulated the focus of attention concurrently with the type of string. HFWs, LFWs, and PWs were presented in parafoveal view, either in the left or in the right visual field. Target strings were preceded by a spatial cue that oriented attention to the target location (valid condition) or away from it (invalid condition). In the neutral condition, the cue broadened the focus of attention by directing it on both possible locations. The results of previous studies using various variants of the cueing paradigm do not offer a clear and uncontroversial picture. A novel aspect of our study was the control of random variability both at the subject and items level by exploiting mixed-effects models (Baayen et al., 2008), thereby increasing the sensitivity of the analyses and eliminating confounding factors that might affect the results.

Performance was markedly superior in the right visual field than in the left visual field, in agreement with previous studies that found a right visual field advantage for briefly presented parafoveal words (e.g., Mishkin and Gorgays, 1952; Ducrot and Grainger, 2007; Siéroff and Riva, 2011). The direct access to the left hemisphere for right presented word, scanning reading habits and attentional effects are the different factors most likely involved in the emergence of a right visual field superiority effect (see Siéroff et al., 2012, for further discussion).

Performance was also significantly affected by the spatial cue, but crucially it varied with the type of string (see **Figure 2**). In addition, but for PWs only, the cueing effect was modulated by

the visual field (see **Figure 3**). In particular, PW identification was affected by the spatial cue when the string was presented in the left visual field, in agreement with previous studies that found a larger cueing effect in the left visual field (Nicholls and Wood, 1998; Gatheron and Siéroff, 1999). PWs were better identified in the valid condition, that is when attention was focused on the target location. For LFWs, the spatial cue effect was not significant but the mean accuracies showed a similar trend. These results are consistent with those of Sieroff and Posner (1988), Auclair and Siéroff (2002), as well as with the assumption of the CDP+ model (Perry et al., 2007) that the phonological route implies parsing of the string into sub-lexical units by sweeping the attentional focus from left to right across letters. Therefore, the pre-allocation of spatial attention to the target position following a valid cue meets the processing demands of phonological decoding and PW processing in particular, in line with previous studies that have linked spatial attention to phonological decoding (e.g., Facoetti et al., 2006, 2010; Ruffino et al., 2010). This explanation is also supported by the significant interaction between spatial cue and visual field for PWs. The attentional bias theory (Kinsbourne, 1970) assumes that more attentional resources are allocated to the right visual field. Accordingly, a valid cue will be more effective for the location where the least amount of attention is already allocated (Siéroff et al., 2012). This implies that the

processing of stimuli that require more attention will exhibit a greater advantage.

A completely different pattern emerged for HFWs. Strikingly, word identification was best in the neutral cue condition that is when attention was directed to both the possible locations. The neutral condition showed an advantage with respect to both the valid and the invalid conditions. Given that the lateralized cues were uninformative of target location, it could be argued that the unexpected advantage of neutral trials might reflect a form of inhibition of return (Klein, 2000) that follows the exogenous shift to the lateral locations. However, this interpretation falls short in explaining why the advantage of neutral trials would be limited to the HFWs. Indeed, the classic time course of inhibition of return leads to the prediction that the effect would be maximal for the more difficult stimuli, that is the PWs. A more plausible interpretation of this finding can be found by carefully examining the nature of the neutral cue. Indeed, the neutral cue consisted in a string of hash marks that had double length with respect to the target because it was designed to cover both the possible target locations. This implies that the cue modulated also the size of the focus of attention, as suggested by studies showing that the size of the attentional focus is automatically adjusted to the size of the cue (e.g., Eriksen and St. James, 1986; Turatto et al., 2000; Ronconi et al., 2014). Thus, in the neutral condition, attention

was spread out over a portion of the visual field that was approximately twice the target string length. What is the consequence of this broader focus for the processing of visual words? *Processing gradient* models of eye movements, such as SWIFT (Engbert et al., 2002, 2005; Schad and Engbert, 2012) assume that allocation of attention can extend over fixated word to support parallel processing of several words at a time. When the orthographic stimulus is not familiar, as in LFW processing or PW decoding, the foveal load is high and the perceptual span (i.e., the visual region of effectively processed information) includes only the fixated word. In contrast, processing familiar stimuli like HFWs implies low foveal load and therefore a wider perceptual span that extends over several neighboring words (e.g., LaBerge and Brown, 1989; Henderson and Ferreira, 1990). The notion that the size of the attentional window during visual word processing might be broader than the length of target words is also supported by the eye movements literature (e.g., Kennedy and Pynte, 2005; Kliegl et al., 2006, 2007; Wang et al., 2009; Dare and Shillcock, 2013; Kennedy et al., 2013) and by the finding that lateral information can affect the processing of a centrally presented target as a function of its familiarity (e.g., Lee and Kim, 2009; Waechter et al., 2011; Khelifi et al., 2012).

Therefore, HFWs, due to their overlearned representation, can provide a strong feedback signal toward lower areas of the visual system allowing fast identification of the string. The low perceptual load, due to this stronger top-down support, allows the distribution of attentional resources on a broader region of space (e.g., Brand-D'Abrescia and Lavie, 2007; see for a review Lavie, 2005). Notably, this top-down support might also compensate for the slower bottom-up processing implied by a broader focus of attention (as assumed in the zoom-lens model).

Given that a broad distribution of attention appears to be the default mode during processing of HFWs (e.g., Kliegl et al., 2006; Brand-D'Abrescia and Lavie, 2007; Schad and Engbert, 2012; Ghahghaei et al., 2013), it is conceivable that the identification of HFWs in the present study was better in the neutral condition because the cue triggered a broader attentional focus. Indeed, the attention literature shows that optimal performance in perceptual identification is obtained with an adequate allocation of attentional resources and that too much focused attention may be not beneficial (Yeshurun and Carrasco, 1998). Focused spatial attention is necessary to obtain spatial detail (e.g., Yeshurun and Carrasco, 1999; Ho et al., 2002; Hochstein and Ahissar, 2002; see Anton-Erxleben and Carrasco, 2013 for a review), whereas recognition of HFWs might be facilitated by a more global processing. Dehaene et al. (2005) suggested a neuronal model of word recognition that, in order to solve the problem of location and size invariance, postulates increasingly broader and more abstract local combination detectors (LCD model). Written words are encoded by a hierarchy of neurons with increasingly larger receptive fields, successively tuned to increasingly complex word fragments (McCandliss et al., 2003; Dehaene et al., 2005; Dehaene and Cohen, 2011). At the highest levels of this hierarchy, detectors presumably are responsive to whole words and their broad receptive field allow to respond with spatial invariance across a large part of the visual field (also see Di Bono and Zorzi, 2013). HFWs have an overlearned orthographic representation, probably

located in the left ventral occipito-temporal cortex, the "visual word form area" (e.g., Glezer et al., 2009 see Dehaene and Cohen, 2011).

Although the previous data of Ducrot and Grainger (2007) brought to different conclusions, three main differences between their study and ours might explain the discordant findings. First, target duration in their study was 30 ms shorter (i.e., 50 vs. 80 ms). The deployment of attention along the whole letter string is a process that takes time (Ghahghaei et al., 2013). Therefore, it is possible that 50 ms of target duration are not enough to detect fine modulations of the attentional focus. Benso et al. (1998)studied the time course of attentional focusing with a standard spatial cue-size paradigm. While they showed that the focus of attention requires 33–66 ms to adjust to object size in the fovea, they found that the control of the attentional focus in the periphery took place only when the interval between the cue and the stimulus was between 300 and 400 msec. Summing together cue duration, delay time and target duration in our paradigm results in an overall time of 160 ms during which the size of the focus might be modulated, an intermediate value that seems suitable for our parafoveal stimuli. A second difference is that the stimuli of Ducrot and Grainger (2007) did not include PWs. There is growing evidence that reading is context dependent even at the single word level (e.g., Reynolds and Besner, 2005; O'Malley and Besner, 2008; Reynolds et al., 2010). For example,O'Malley and Besner (2008)showed that the presence of PWs in the list composition changed the effect of stimulus degradation on the modulation of the frequency effect. In the same vein, it seems likely that the presence of PWs in our study promoted a more flexible shaping of the attentional focus. Finally, the cuing paradigm of Ducrot and Grainger (2007) did not include the invalid condition. The presence of an invalid condition in our study is likely to have induced a stronger cueing effects and in turn a more effective modulation of the deployment of spatial attention. It could be argued that the lateralized spatial cues in Ducrot and Grainger's (2007) study were highly informative because they perfectly predicted the location of the letter string (unlike our study, in which they were uninformative). However, it is unlikely that this discrepancy implies a different type of attentional orienting, because cue-target stimulus onset asynchrony (SOA) in their study was too short (i.e., 80 ms) to allow voluntary deployment. That is, attention orienting was stimulus-driven both in their study and in ours.

In conclusion, we found that the manipulation of spatial attention affects string processing and this influence was modulated by the type of string, as predicted by the CDP+ model of reading (Perry et al., 2007) as well as by processing gradient models (e.g., LaBerge and Brown, 1989; Henderson and Ferreira, 1990; Schad and Engbert, 2012). Processing of unfamiliar strings, such as LFW and PW, is affected by directing attention to a different location and it is facilitated by attentional focusing. Conversely, identification of HFWs was enhanced in a condition promoting distributed attention, an attentional set that appears to be the default mode during reading of familiar words and is likely to optimally engage the broad receptive fields of the highest detectors in the hierarchical system for visual word recognition. However, the explanation of this novel finding is speculative and it therefore warrants further investigation.

#### **ACKNOWLEDGMENTS**

This study was supported by the European Research Council (grant no. 210922 to Marco Zorzi) , the University of Padua (Strategic Grant "NEURAT" to Marco Zorzi) and by Cariparo Foundation (Excellence Grants 2012 to Andrea Facoetti). We are grateful to Massimiliano Pastore, Gianmarco Altoè and Giorgio Arcara for statistical advice.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 June 2013; accepted: 20 January 2014; published online: 10 February 2014.*

*Citation: Montani V, Facoetti A and Zorzi M (2014) Spatial attention in written word perception. Front. Hum. Neurosci. 8:42. doi: 10.3389/fnhum.2014.00042*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Montani, Facoetti and Zorzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Brain and behavioral correlates of action semantic deficits in autism

# *Rachel L. Moseley1\*, Bettina Mohr <sup>2</sup> , Michael V. Lombardo3 , Simon Baron-Cohen3 , Olaf Hauk1 and Friedemann Pulvermüller1, 4*

*<sup>1</sup> MRC Cognition and Brain Sciences Unit, Cambridge, UK*

*<sup>2</sup> Department of Psychiatry, Charite Universitätsmedizin, Berlin, Germany*

*<sup>3</sup> Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK*

*<sup>4</sup> Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, Berlin, Germany*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Donald Rojas, University of Colorado Denver, USA Diane L. Williams, Duquesne University, USA*

#### *\*Correspondence:*

*Rachel L. Moseley, MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, UK e-mail: rachel.moseley@cantab.net*

Action-perception circuits containing neurons in the motor system have been proposed as the building blocks of higher cognition; accordingly, motor dysfunction should entail cognitive deficits. Autism spectrum conditions (ASC) are marked by motor impairments but the implications of such motor dysfunction for higher cognition remain unclear. We here used word reading and semantic judgment tasks to investigate action-related motor cognition and its corresponding fMRI brain activation in high-functioning adults with ASC. These participants exhibited hypoactivity of motor cortex in language processing relative to typically developing controls. Crucially, we also found a deficit in semantic processing of action-related words, which, intriguingly, significantly correlated with this underactivation of motor cortex to these items. Furthermore, the word-induced hypoactivity in the motor system also predicted the severity of ASC as expressed by the number of autistic symptoms measured by the Autism-Spectrum Quotient (Baron-Cohen et al., 2001). These significant correlations between word-induced activation of the motor system and a newly discovered semantic deficit in a condition known to be characterized by motor impairments, along with the correlation of such activation with general autistic traits, confirm critical predictions of causal theories linking cognitive and semantic deficits in ASC, in part, to dysfunctional action-perception circuits and resultant reduction of motor system activation.

**Keywords: Autism, semantics, motor systems, action**

#### **INTRODUCTION**

A surprising finding in contemporary neuroscience concerns the motor system's function as a vehicle for higher cognitive processes which, on first glance, appear to be entirely unrelated to basic motor function. Theoretical bridges between supposedly "lowerorder" sensorimotor functions and cognitive processes, such as developing semantic concepts or language, have been proposed in the psychological literature (Piaget, 1950). Recent research indicates that this link may lie in action-perception circuits, neuronal ensembles connecting neurons in sensory and motor areas via brain systems intertwining the two (Pulvermüller, 1999; Fuster, 2003; Rizzolatti and Craighero, 2004; Jeannerod, 2006; Pulvermüller and Fadiga, 2010). Correlated activation in motor and sensory areas of the cortex is proposed to lead to the development of neuronal assemblies that represent motor acts. These actionperception circuits become the basis of mirroring, i.e., repeating visually perceived actions performed by others, repeating verbal utterances and working memory (Fuster, 2003; Rizzolatti and Craighero, 2004; Jeannerod, 2006).

Critically, however, by interlinking with each other, such actionperception circuits can themselves become substrates for a range of additional higher cognitive processes, such as language and representation of conceptual meaning. Words denoting concrete, visible concepts, such as actions or visible objects, tend to be learnt in the context of interacting with or experiencing that concept

in the world (Pulvermüller, 1999). In accordance with Hebbian principles, simultaneous activation across numerous brain regions results in the formation of connected circuits. Specifically, the sensorimotor patterns for hearing and articulating a word (represented in core perisylvian language areas, **Figure 1A**) become linked to the differential areas activated by experiencing/interacting with actions or objects, thus forming conceptual circuits for words. Action words such as "grasp," which semantically relate to the concepts of actions represented by action schemas stored in cortical motor systems, therefore draw upon motor systems, whilst object words relate to visual objects and are thus processed in the temporo-occipital visual processing stream. This is depicted in **Figure 1B**, (but please see Garaghani et al., 2009 for elaboration on this model and the linkage of regions through Hebbian processes). Though this relates to concrete items, there may be a critical role for action-perception circuits in the representation of meaning for abstract words, too (Moseley et al., 2012). In addition, elementary action schemas can be linked into action chains and may multiply into action hierarchies, thus embedding actions and object representations into plans and frames (Pulvermüller and Fadiga, 2010). Research in social neuroscience has also repeatedly shown that action-perception systems involved in mirroring can interact synergistically with mentalizing systems (Zaki et al., 2009; Lombardo et al., 2010a; Schippers et al., 2010; Spunt and Lieberman, 2012). This mutual link suggests that

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 1 — #1

motor problems may lead developmentally to a whole host of downstream deficits in higher cognitive functions (e.g., language, communication, social cognition, understanding action concepts, and the meaning of action words; Pulvermüller, 2005; see **Figure 1**).

If action-perception circuits form one basis of higher cognition, a motor deficit could hinder theirformation and thus compromise, or alter, higher processes normally built upon the latter. Critical test cases here are patients with motor impairments, such as in stroke, Parkinson's disease, or Motor Neuron disease, who indeed show specific deficits in understanding action-related concepts and in processing action-related words (Neininger and Pulvermüller, 2003; Boulenger et al., 2008; Bak and Chandran, 2011; Kemmerer et al., 2012). On this basis, it is however difficult to confidently attribute the motor systems a critical role in cognition: lesions are typically characterized by substantial involvement of multiple regions which limits linkage of cognitive deficits to this focal area. A syndrome marked by more subtle motor deficits and a broad range of social and cognitive difficulties may allow testing of new predictions regarding the role of sensorimotor systems in higher cognitive processing and corroborate the evidence provided by other neuropsychological patient groups, especially if behavioral experiments are combined with spatially precise neuroimaging.

A case in kind are autism spectrum conditions (ASC), neurodevelopmental conditions primarily diagnosed by the "triad" of social-communication deficits, stereotyped/repetitive behaviors, and unusually restricted/narrow interests. Interestingly, ASC are also commonly characterized by subtle motor impairments in gait, posture, fine and sometimes gross coordination (Jansiewicz et al., 2006; Dewey et al., 2007; Dziuk et al., 2007)*.* Perhaps due to the somewhat hidden nature of any potential link between social-communicative deficits and non-social motor abnormalities, the latter have traditionally been seen as minor, secondary phenomena, especially as the social-communicative deficits of autism are far more disabling. Emerging evidence, however, supports the idea that sensorimotor abnormalities *precede* the emergence of core social-communicative problems in infants at risk for developing autism (Teitelbaum et al., 1998; Rogers, 2009) and that later deficits in social interaction, imitation, and social cognition may emerge downstream from the atypical development of sensorimotor systems (Mostofsky et al., 2006). A further hint that socio-cognitive deficits and motor processes are linked comes from the mirror neuron literature, where inferior frontal and premotor "mirror neuron systems" (MNS) are hypoactive in ASC [Iacoboni and Dapretto, 2006; Cattaneo et al., 2007; Williams, 2008; Rizzolatti and Fabbri-Destro, 2010; though the role of the mirror system in autism is debated by other authors (Marsh and Hamilton, 2011)]. If a motor deficit precedes or is intertwined with higher cognitive deficits in ASC (Leary and Hill, 1996), the investigation of motor brain mechanisms, conceptual action understanding and severity of ASC may be fruitful in understanding the mechanisms of these complex conditions.

Dysfunction of action-perception circuits in ASC implies that these individuals should fail to activate action representations in speech comprehension, especially during understanding of actionrelated meanings (**Figure 1**). In contrast, when typically developing (TD) participants read action words or sentences, motor activation reflects somatotopic aspects of word meaning (Hauk et al., 2004; Pulvermüller and Fadiga, 2010). If action-perception circuits are indeed a basis for understanding action-related language, a further prediction is that people with ASC should exhibit a specific deficit in semantically understanding action words. Crucially, behavioral and motor-cognitive brain activation deficits should correlate with and should predict autistic traits.

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 2 — #2

To test these predictions of the action-perception model, we used event-related fMRI to assess patterns of cortical activation during passive reading and comprehension of action and object words in individuals with ASC and TD controls, and carried out a behavioral experiment to assess their ability to semantically classify these words. To elucidate links across different levels of brain and behavior, we then investigated correlations between motor system activation and cognitive-semantic ability, and also the correlation between activation and the number of autistic traits as measured by the Autism Spectrum Quotient (AQ; Baron-Cohen et al., 2001). Employing a data-driven regions of interest (ROI) approach, we were able to investigate activation evoked by action- and objectrelated words in frontal and temporal areas typically involved in language (Pulvermüller, 1999; Cohen et al., 2002; Kronbichler et al., 2004) and regions in the motor system which typically respond to action words (Pulvermüller and Fadiga, 2010). Given the movement impairments and structural abnormalities of motor systems inASC,we hypothesised that these individuals would show a category-specific abnormality during the processing of words with action meaning. If motor systems play an important role in retrieving the meaning of action words, a processing deficit for these words manifested in psycholinguistic semantic tasks should be predicted by abnormal activity for action words in this motor region.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Nineteen participants with ASC were initially recruited for the study. One fMRI dataset was lost due to excessive movement, and so brain activity of 18 participants with ASC [mean age: 30.4 SD: 10, range: 39; mean IQ: 113.5 (SD: 23)] was compared with that of 18 typically-developed controls [mean age: 28.6 (SD: 11.7, range: 44); mean IQ: 110.2 (SD: 12.3)], with all participants being righthanded monolingual native speakers. No significant differences appeared between the groups in age [*t*(34) = 0.490, *p* > 0.6)] or IQ [*t*(34) = 0.411, *p* > 0.6), and they were roughly balanced for gender (9 men in the ASC group, 12 men in the control group). All ASC participants (17 with Asperger Syndrome, 1 with PDD-NOS) were recruited from the participant panel of the Autism Research Centre (ARC) in Cambridge, where they were registered after having been clinically diagnosed using DSM-IV criteria. The ASC group scored significantly higher than the control group [*t*(32) = 6.857, *p* < 0.001] on the Autism Spectrum Quotient (AQ: Baron-Cohen et al., 2001), with a mean score of 34 (SD: 10) in comparison to 13 (SD: 5). All but 4 of the ASC group scored above 26 on this test, a cut-off point believed to capture the majority of adults with autism (Woodbury-Smith et al., 2005). The same authors found the AQ to reliably discriminate between individuals with and without autism (correctly classifying 83% of individuals). The study was approved by the NRES Cambridgeshire 3 Ethics Committee.

#### **STIMULI**

Critical stimuli employed in the study included 120 action-related (e.g.,"grasp,""walk,""chew") and 120 object-related (e.g.,"cheese," "shark," "flute") words without inflections. Prior to the fMRI experiment, a semantic rating study was carried out (please see Hauk et al., 2004 for full details of procedure) on a large corpus of words to ascertain semantic features including imageability, concreteness, visual-relatedness, form-relatedness, color-relatedness, arousal, valence, and action-relatedness. Words were matched for psycholinguistic factors including word frequency, letter bigram and trigram frequency, number of orthographic neighbors, and number of meanings. Please see **Table 1** for psycholinguistic and semantic features of critical stimuli. In order to distract participants from the study's focus on action- and object-language, they were interspersed with 120 filler words (e.g., "fluke," "ail," "cite," which were matched to experimental words in length, bigram and trigram frequency, and number of neighbors) and 120 hash-mark strings (###), which, also matched for length, acted as a low-level visual baseline.

The full-display of the monitor presenting the stimuli had a visual angle of 16.7◦ (width display 25.16◦, height display 14.31◦). The stimuli were presented subtending a visual angle of 2.3◦.

#### **PROCEDURE**

For this task of silent reading, subjects were scanned in a 3-T Tim-Trio scanner with a 12-channel head-coil attached. Functional scans consisted of 32 slices covering the whole brain in descending order (slice thickness: 3 mm, in-plane resolution: 3 mm × 3 mm, inter-slice gaps: 0.75 mm), and echo-planar sequence parameters were TR = 2000 ms, TE = 30 ms, and flip angle = 78◦. The silent reading task was split into three EPI blocks of approximately 7 min and 210 32-slice volumes each, with five dummy scans used at the beginning of each block to achieve a T1-steady state but discarded in the analysis.

Brain activity was compared between groups during passive reading of action- and object-related words. These stimuli, interspersed with filler words and hash-mark strings, were projected onto a screen and presented for 150 ms in a randomized order, with a 2.5-s stimulus onset asynchrony, and participants were requested to keep as still as possible, attend to the stimuli, and read them silently. This task was split into three parts of approximately 7 min each (21 min overall), allowing participants breaks in between if needed. Following the scan and without prior warning, they performed a word recognition test (involving rating a list to indicate their recognition of novel words and some of those previously seen in the experiment) that confirmed that they had been attentive during scanning. The data confirmed that they had been attentive: both groups performed above chance [average hit rate: controls = 76.2% (SD = 18.1%), ASC = 76.2% (SD: 19.1%)], with no significant difference appearing between them in the number of correct answers [*t*(34) = −0.018, *p* > 0.9].

Participants returned 4–10 weeks later (average: 8 weeks) to perform a semantic decision experiment on the action- and objectrelated words previously used in the fMRI experiment. Their task was to indicate as quickly as possible, within an interval of 2.5 ms, whether the meaning of tachistoscopically presented words (150 ms) related to actions or objects by button presses with the left or right thumb (counterbalanced over participants). Words were presented in light gray font on a black monitor; the order was pseudo-randomized between participants. After completing the semantic decision task, participants completed the Autism Spectrum Quotient (Baron-Cohen et al., 2001).

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 3 — #3


**Table 1 | Psycholinguistic and semantic features of word stimuli.**

*Statistical tests between action and object words are displayed in t values. The psycholinguistic properties of the filler words are included in the rightmost column but not in the statistical tests reported, as they were included to detract from the nature of the task and were not compared with the experimental word categories.*

To compensate for drop-outs, two new ASC and seven TD controls were recruited. Altogether, 19 ASC and 18 TD subjects took part in the behavioral experiment, as the one ASC individual whose fMRI dataset was excluded was included in this analysis. Age and IQ differences between groups remained non-significant.

#### **DATA ANALYSIS**

SPM5 (Wellcome Department of Imaging Neuroscience, London, UK) was employed for all processing stages, including slice-timing and re-aligning using sinc interpolation, co-registration of images to structural T1 images and normalization of the previous to the 152 subject T1 template of the Montreal Neurological Institute (MNI). Transformation parameters were applied to co-registered EPI images, which were also resampled with a spatial resolution of 2 mm × 2 mm × 2 mm and spatially smoothed with an 8-mm full-width half-maximum Gaussian kernel.

Single-subject statistical contrasts were computed using the canonical hemodynamic response function (HRF) of the general linear model. Low-frequency noise was removed by applying a high-pass filter of 128 s. Onset times for each stimulus were extracted from Eprime output files and integrated into a model for each block in which each stimulus category was modeled as a separate event. Group data were then analyzed with a random-effects analysis and second level group analysis performed. Activation to each of the experimental word categories in each groups was compared statistically against baseline (the hashmark condition) and voxel coordinates reported in MNI standard space.

In addition to whole-brain analysis, a ROI investigation was undertaken using the MarsBar function of SPM5. As the left hemisphere is the major site of language processing, four 2 mm-radius regions located in left-hemispheric key areas of theoretical interest from previous literature (inferior frontal gyrus, superior temporal sulcus, precentral and fusiform gyrus) were extracted from the

contrast of all words against baseline (###) in typical controls, and four right-hemispheric homologs were chosen to match these as closely as possible. Three of the four ROIs were also confirmed by the activation patterns seen when both groups were pooled (a highly significant peak in the superior temporal sulcus had marginally different coordinates). Note the fact that the all-words vs. baseline contrast which is orthogonal to the contrasts relevant for hypothesis testing (ASC vs. TD) rules out the risk of double dipping (see also Kriegeskorte et al., 2009). Activation for actionand object-word categories was compared between groups in these regions. Because voxels were resampled with a spatial resolution of 2 mm × 2 mm × 2 mm and smoothed at a 8 mm kernel, the half maximum width of each 2 mm-radius ROI was 12 mm, thus allowing us to keep ROIs overlap free while at the same time compensating for some of the spatial variance caused by the projection of individual brains to the averaged MNI template. Statistical analysis of ROIs was executed in both SPSS and Statistica. Bonferroni corrections were applied on the data where appropriate and are indicated in the text. For the main 4-way ANOVA, correction was for the full 15 significance tests.

#### **RESULTS**

#### **fMRI RESULTS: FRONTAL-MOTOR HYPOACTIVITY IN ASC**

In both groups (18 ASC vs. 18 TD participants), the contrast of all words against baseline [strings of repeated familiar symbols (hash marks)] revealed similar activation patterns in posterior temporal regions, which are typically activated by written word stimuli (Cohen et al., 2002). In contrast, inferiorfrontal and precentral cortex were strongly active in TD controls but not in people with ASC. This finding was revealed by a low level contrast (words vs. baseline) and is displayed at a lenient threshold (uncorrected *p* < 0.005) in **Figure 2A** to show the full activation range for both groups. Following stringent

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 4 — #4

**FIGURE 2 | (A)** Activity for words (*p* < 0. 005, uncorr.) viewed in a passive reading task contrasted against a hash mark (###) baseline, in TD controls (blue) and participants with ASC (red). **(B)** Direct statistical contrast between groups for words viewed in the passive reading task (FWE *p* < 0.05). Light blue foci show areas of stronger activation for TD controls as compared to ASC (TD > ASC) participants: the opposite contrast, ASC > TD, was non-significant. **(C)** Latencies (ms) of TD controls (blue) and ASC participants (red) who made semantic classifications for action- (diagonal stripes) and object-related (crosshatch) words. Bars show average response times (and standard errors) taken to make semantic decisions. The significant difference between word categories in ASC is reflected by an asterisk (∗). **(D)** Significant correlations for ASC participants between activity in the motor system for

action words and behavioral difference scores in the semantic decision task. Behavioral underperformance in classifying action words was quantified by subtracting response times for matched object-words from those to action-related words, and is significantly correlated with lower activity in the motor system. Motor activation was measured in precentral cortex, at coordinate (−50, −10, 44), where maximal activation was seen in action word reading in TD participants. **(E)** Significant correlations for ASC participants between activity in the motor system [see **(D)**] for action words and AQ scores. Higher numbers reflect increasing number of autistic traits. In **(C,D)**, values along the *x*-axis reflect parameter estimates (arbitrary units) reflecting the difference in activation between action words and the baseline condition (hash-marks).

whole-brain correction for multiple comparisons (*p* < 0.05, FWE corrected), direct statistical comparison of word-elicited activations between both groups still confirmed that ASC subjects showed reduced inferior-frontal and precentral activation compared with TD controls (**Figure 2B**). The opposite contrast (ASC > TD) failed to reach significance anywhere in the brain.

To explore whether between-group differences were significantly more pronounced in frontal cortex compared with other sites, a ROI analysis and ANOVA were conducted on data from the two frontal and two temporal regions which emerged from the contrast of all words against baseline (###). These were included in a four-way ANOVA, including the two-level factors "hemisphere," "peris- vs. extrasylvian" ("PES"), "frontal vs. temporal" ("FT") and the group variable. A significant interaction of factors PES, FT, and Group [*F*(1, 34) = 9.234, *p* < 0.01] revealed significant differences in word-related activity between groups, with generally lower activity for autistic subjects but particularly strongly

reduced activity in frontal cortex. Further exploration of the language-dominant left hemisphere revealed a significant interaction of the factors Fronto-temporal and Group [*F*(1, 34) = 4.210, *p* < 0.05], which further confirmed specificity of hypoactivity to inferior-frontal and precentral sites in the ASC group. Following Bonferroni-correction, significant between-group differences were only found in the deep-inferior frontal [*t*(34) = 4.229, *p* < 0.001] and precentral gyrus [*t*(34) = 3.514, *p* < 0.002], but not in temporal areas.

Since motor systems are activated during passive speech perception and language comprehension (Wilson et al., 2004; D'Ausilio et al., 2009; Pulvermüller and Fadiga, 2010), this group difference in word-elicited motor activation could reflect an ASC-specific processing difficulty in mapping language to articulatory motor programes. However, separation of hemodynamic response by word type in the previously defined ROIs confirmed ASCspecific frontal hypoactivity for action words but only partially for object-related words. Although inferior-frontal cortex showed

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 5 — #5

between-group differences for both word types, precentral cortex revealed a significant group difference, with reduced activity in the ASC group, for the contrast of action words against baseline [*t*(34) = 2.917, *p* < 0.01], but not for object-related words against baseline (**Figures 3A,B**). Whilst the interaction of group and word category was non-significant (*p* < 0.1), these results do suggest that reduced motor system activation in ASC relates to semantic-conceptual processing. Though this finding is consistent with the study hypotheses, the lack of a group difference for object words in the motor system could potentially reflect a failure of statistical power and this might be further investigated in future experiments. At present, however, only behavioral data can clarify whether such activation is *necessary* for semantic processing: if precentral/premotor hypoactivity in ASC reflects a genuine semantic processing deficit in motor cognition, this should be apparent during processing of action-related words.

#### **BEHAVIORAL RESULTS: LINKING BRAIN WITH BEHAVIOR**

To address this question experimentally, participants from the fMRI experiment returned to perform a semantic decision experiment on the same action- and object-related words. Due to drop-outs and the loss of some datasets, this analysis also included three ASC participants who were not included in the fMRI study (see Procedures, above, for details). A factorial two-way ANOVA (Word category × Group) showed that performance was generally high throughout the task, without revealing a difference between

**FIGURE 3 | (A)** Activity for action words in control (blue) and ASC (red) participants (*p* < 0. 005, uncorr.). Bar charts depict action-word activity for both groups in key inferior frontal and precentral ROIs. **(B)** Activity for object words in control (blue) and ASC (red) participants (*p* < 0.005, uncorr.). For further explanation, see legend for **(A)**. In **(A,B)**, values along the *y*-axis reflect parameter estimates (arbitrary units) reflecting the difference in activation between action or object words respectively and the baseline condition (hash-marks).

groups or word kinds [TD: mean = 87% correct, SD = 5%; ASC: mean = 87% correct, SD = 5.5%; *F*(1, 35) = 0.128, *p* > 0.7]. However, an ANOVA performed on reaction times revealed a significant interaction between Word category and Group [*F*(1, 35) = 4.291, *p* < 0.05]. ASC participants were significantly slower in semantically-judging action-related words compared with their speed at semantically classifying object words [*t*(18) = 3.116, *p* < 0.01]; TD individuals showed no evidence of a similar contrast [*t*(17) = 0.429, *p* > 0.6], **Figure 2C**). These results show that, in a speeded semantic decision task, ASC participants are significantly debilitated in processing action-related words (mean: 815.3 ms, SD: 204.5) compared with matched object words (mean 760.6 ms, SD: 191.5).

The strongest *a priori* prediction action-perception theory makes about ASC concerns the relationship between semantic processing deficits and reduced motor system activation in cognitive processing, so correlations between behavioral response times and cortical activation in the left-premotor ROI (−50, −10, 44) during action word reading were examined in datasets from the 16 ASC participants who participated in both experiments. To obtain a specific behavioral measure of action semantics, we used the object word response times for normalizing the action word latencies in semantic decisions. A significant correlation between reaction time and precentral activation to action words (*r* =0.497, *p*<0.05) was observed in the ASC group, whereby relative underperformance on the semantic task for action verbs, but not object nouns, was linearly related to decreasing brain activation elicited by action words (**Figure 2D**). Further exploration of the behavioral-BOLD correlation in the previous ROIs revealed a similar correlation for inferior frontal cortex with action words, but not for other parts of the brain. No comparable correlations with brain activity were observed for object words.

In order to explore the link between activity in semantic motor system activity and the wider spectrum of autistic symptoms, we studied the correlation between precentral semantic activity and autistic symptoms as assessed by the AQ (Baron-Cohen et al., 2001). Higher scores on the AQ (greater number of autistic traits) were significantly correlated with hypoactivity in the same precentral ROI cortex for words generally (*r* = −0.556, *p* < 0.02). Following removal of one marked outlier with a family history of left-handedness (seen to the far left in **Figure 2E**), this negative correlation remained significant. The correlation was especially pronounced when considering brain activity elicited by action words alone (*r* = −0.654, *p* < 0.005; **Figure 2E**). All other analyses' remained significant with removal of this individual and the ASD group were still significantly slower to process action words [*t*(17) = 2.797, *p* < 0.02].

# **DISCUSSION**

Here we report a novel investigation of semantic action word processing in ASC and its relationship to motor cortex activation and general ASC symptomatology. Using fMRI, we found hypoactivation of inferior-frontal and premotor cortex during word reading in ASC relative to matched control participants. This reduction in activity was most clearly apparent for words semantically related to actions. Corresponding to the significantly reduced activation of motor systems in action word processing seen in ASC, we

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 6 — #6

found increased reaction times for processing these words in this group, thus indicating a category-specific abnormality in semantic processing. Third, linking the two results together, a significant correlation emerged between hypoactivity in the motor system and slowed reaction time for processing action words. Critically, a similar correlation also appeared between semantic hypoactivity and autistic symptomatology in our ASC group. These results all support the prediction of an action-perception theory of ASC, whereby the reported category-specific semantic processing disadvantage, and possibly a wider range of ASC symptoms, may stem from atypical information exchange between the motor cortex and other brain regions.

This newly observed language-related hypoactivation of motor systems in ASC and their correlated deficit in semantically processing action-related words refute an interpretation of semantic motor systems activation as "ancillary" or "epiphenomenal" in the general population. Rather, it appears that an intact and well-connected motor system brings about motor system activation during action word reading and is necessary for optimal processing of these items, whereas, in a condition where this activation is absent, specific abnormalities in semantic information processing are manifest for words with action-related meaning. "Disembodied" theories of conceptual representation (Mahon and Caramazza, 2008), assuming meaning processing divorced from sensorimotor systems, cannot account for this finding. Nor is there currently evidence to suggest the activation of motor systems by another region or system (see Kiefer and Pulvermüller, 2012, for review and discussion of the literature). Although a range of cortical areas take their share in meaning processing (Pulvermüller, 2013), the present study failed to reveal brain activation outside the motor system that predicted the ASC-specific deficit in semantic processing of action-related words.

To account for the observed correlations between motor system hypoactivity, action-semantic deficit and ASC symptomatology, parsimony demands that a causal link be postulated. That sociocommunicative or semantic deficits in autism might give rise to motor impairments seems unlikely, given that such a proposition would fail to explain why semantic deficits are specific to action words or why premotor cortex is hypoactive in language processing; furthermore, this position seems difficult to reconcile with the early emergence of motor dysfunction in ASC, long before semantic or social deficits become manifest or at least evident to current means of measurement at this age (Teitelbaum et al., 1998; Rogers, 2009). As there is currently no evidence for a third process acting on both motor and semantic systems, the third possibility is offered by action-perception theory. A functional deficit in motor areas and/or in the interaction between motor and other brain systems (see **Figure 1**), leads to atypical development of action-perception circuits required for language processing, motor cognition and action semantics. Thus the ASC motor deficit, which emerges early in ontogenesis, would cause hypoactivity in the precentral cortex in action-semantic processing and the observed slowing of action-semantic classification in ASC. On the basis of the existing literature on ASC and the specificity of the present results, such an account is highly plausible.

The observed action-semantic deficits in autism are paralleled in patients suffering from lesions in the motor system (Neininger

and Pulvermüller, 2003; Boulenger et al., 2008; Bak and Chandran, 2011). The significant advance of the present study is the specificity of the relationship between cognitive-semantic deficits and the functionality of focal precentral cortex activation. The functional importance of motor systems for higher cognition demonstrated by earlier cognitive and brain research (Buccino et al., 2005; Pulvermüller et al., 2005; Fischer and Zwaan, 2008; Shebani and Pulvermüller, 2013) fits with a potential causal role of autistic motor dysfunction, as suggested by the developmental *primacy* of motor symptoms relative to core autistic symptomatology (Teitelbaum et al., 1998; Rogers, 2009). Indeed, impairments in the motor system in ASC and its connectivity predict not only movement problems but difficulty establishing typical action-perception circuits and therefore representations of complex actions (Mostofsky and Ewen, 2011). In turn, such representational alteration may lead to higher-order deficits in action understanding (Blake et al., 2003; Williams, 2008), gesture and imitation (Williams et al., 2001; Dewey et al., 2007), as well as language and communication. Although presently unascertained, it appears plausible that the aberrant structural connectivity of cortico-cortical tracts and reduced functional connectivity in ASC (Courchesne and Pierce, 2005; Sundaram et al., 2008; Jones et al., 2010) contribute to their motor and cognitive impairments. Atypical connectivity between frontal action-systems and posterior perception-related neural systems in the arcuate fascicles, which connect anterior and posterior language regions (Keller et al.,2007; Fletcher et al., 2010; Lai et al., 2012), may be of special relevance here, considering the key role these pathways play in connecting action-perception circuits, thus merging information about actions and perceptions in linguistic and semantic neural systems (Pulvermüller and Fadiga, 2010). The atypical development of language circuits would explain why, during reading words of different semantic categories, participants with ASC exhibit atypical patterns of local cortical activity (Moseley et al., 2013). We were unable, in the present study, to ascertain the degree of cortical motor abnormality in our participants, and this, alongside perhaps the course-grained measure of semantic processing in our behavioral task, might explain why inaccuracy in action word processing was not revealed alongside longer reaction times. The "tipping point" at which brain abnormalities manifest in semantic errors is likely to be influenced by task demands. It is clear that this study convincingly supports the contribution of motor regions to optimal action word processing, but much remains to be elucidated regarding the contribution of these and other brain areas to semantic processing in different contexts (Hauk and Tschentscher, 2013; Pulvermüller, 2013).

Action-perception circuits provide a functional link between sensory and motor neurons and become active when the individual performs an action and when they perceive the action visually or hear its characteristic action sounds (see **Figure 1B**). This mechanism explains the response patterns of mirror neurons, which play a key role in these circuits. Over and above providing a mechanism of mirror neuron activity and behavioral imitation, these action perception circuits can support a range of higher mental processes necessary for language, semantics, and social cognition (Pulvermüller and Fadiga, 2010). The lack of "embodied" actionrelated semantic processing in ASC demonstrated for the first time

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 7 — #7

in this study is therefore in agreement with the well-known inactivity of the mirror neuron system seen in ASC subjects during tasks unrelated to language (Iacoboni and Dapretto, 2006; Cattaneo et al., 2007; Williams, 2008; Rizzolatti and Fabbri-Destro, 2010). In this context, the correlation between language-evoked motor activation and autistic traits (AQ scores) bolsters the previous suggestion that a range of typical ASC symptoms relate to motor systems abnormalities (Mostofsky et al., 2006; Mostofsky and Ewen, 2011). Although our present data are consistent with the prediction that an impairment of mirror mechanisms relying on action perception circuits would entail ASC deficits in semantic motor system activation, semantic processing of action words, and even general traits of ASC, we hasten to emphasize that that our data are correlational and therefore cannot provide proof of causality. Still, we offer some related considerations in the following paragraph.

Atypical grounding of semantics in action-perception circuits, such as would result in abnormal linguistic/communicative processing, might derail further development in domains that depend on input from motor systems, such as mentalizing. In particular, several researchers have hypothesized that "embodied" premotor cortical systems involved in mirroring also interact with systems for mentalizing (Zaki et al., 2009; Lombardo et al., 2010a; Schippers et al., 2010; Spunt and Lieberman, 2012). For example, reduced functional connectivity in ASC has been observed between a key mentalizing/self-representation region, ventromedial prefrontal cortex, and ventral premotor cortex and somatosensory cortex (Lombardo et al., 2010b). This would suggest that atypical development of (premotor-prefrontal links in) action-perception circuits underlying higher cognition could impact on the way in which individuals with ASC interact with others (Lombardo and Baron-Cohen, 2010, 2011), and how such motor problems, preceding higher-level socio-communicative difficulties, might set children on atypical trajectories that lead to increased risk for autism. Though implications beyond the semantic processes studied in this present work may appear as speculative, our results clearly demonstrate that motor problems in ASC cannot be regarded as separate from, or secondary to, higher cognitive and socio-communicative difficulties. Instead, atypical development of action-perception circuits carrying higher cognitive processes derail aspects of language and conceptual processing which may entail further difficulties in communication, social interaction, and thought.

Further investigation is clearly necessary as to the relationship between motor system dysfunction and the development of other symptoms of ASC. One limitation of the present study is the lack of rigorous in-study diagnosis of ASC and of an overt behavioral measure of motor dysfunction. The inclusion criteria of our experiment strictly excluded those with "suspected" ASC and all individuals who had not received a previous formal diagnosis, and as such we were confident of the diagnostic status of our ASC participants. Here, the lack of motor systems response to word and action word processing was taken to reflect abnormality in these underlying systems, but future work might look in parallel at surface motor symptoms of such abnormalities, and relate them in greater detail to autistic symptoms as captured by gold-standard diagnostic instruments. Though movement abnormalities have been neglected in autism research, a causal dependence of cognitive and semantic capacities on motor systems, if demonstrated empirically, could have substantial implications for conceptualization of and interventions for ASC.

Whilst the majority of our discussion has focused on the deficit specific for action words in this population, a final note for consideration concerns the processing of visual object words. It has been suggested that individuals with ASC depend more on perceptual, perhaps more surface-level strategies of processing, rather than deep semantic analysis (Kamio and Toichi, 2000; Toichi and Kamio, 2001, 2002, 2003; Harris et al., 2006; Kana et al., 2006; Mottron et al., 2006; Gaffrey et al., 2007). Despite showing substantial activity in inferior temporal cortex, ASC participants did not activate this region significantly more than typically developed controls whilst reading, though the task in the present study is not directly comparable to previous findings as it involved passive reading and therefore minimal processing demands. Strength in visual or perceptual processing might, however, be supported by the relative sparing of visual object words in ASC participants, who were slower than controls at semantically judging action but not object words. In the typical population, object words also evoke activity in motor systems which relates in a somatotopic manner to the primary affordances of the concept denoted: tool words evoke activity in dorsal motor system (hand area) and the left cerebellar hemisphere which controls the right hand of the body, and food words activate dorsal portions of the motor system related to the face and mouth (Carota et al., 2012). Such motor activity, reflecting action semantic knowledge related to object affordances, appeared to be preserved here in ASC, a finding which sits parallel to the lack of a group difference in object word-induced brain activation. It appears that the direct linkage between an action word and the action it denotes is degraded in ASC, whilst the more indirect relationship between an object word and the affordances of the concept remains intact; but further investigation is required to assess this possibility and why, furthermore, object words might hold a privileged place in processing in ASC (at least compared with action words). The bias towards visual processing in ASC (Mottron et al., 2006) might provide a protective factor for words with primarily visual semantic associations. Another possibility is that action words, alongside their particular dependence on motor schemas, are additionally jeopardized by the social-pragmatic information intrinsic to their nature. Unlike object words, all action words imply an actor and several of the action words in this experiment had social associations (e.g., "speak," "smile," and "kiss"), and might therefore be specially problematic given the fundamental social handicap in ASC (Baron-Cohen, 2009). A third possibility is that the deficit seen here reflects a generic abnormality for processing the lexical verb class rather than an abnormality for words with action meaning *per se*. Strong neuropsychological and neurophysiological data suggests that the organization of meaning in the brain is driven by semantic rather than lexical differences (Vigliocco et al., 2011; Cappa and Pulvermüller, 2012; Kemmerer et al., 2012; Kiefer and Pulvermüller, 2012), but the present study cannot speak to this in ASC and so that, at present, we cannot refute with certainty the possibility that other morphosyntactic differences between nouns

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 8 — #8

and verbs might result in the difference seen here between action and object words. Future research might choose to study different types of verbs (for example, abstract nouns and verbs such as "beauty" and "contemplate") in ASC, to investigate whether the action word deficit observed in the present work relates to lexical category or to the action semantic content of these words.

# **CONCLUSION**

In contrast to TD control subjects, ASC participants do not significantly activate cortical motor-executive systems during language processing and show corresponding difficulties processing the action-related meaning of words. Crucially, motor hypoactivation predicted, and significantly correlated with, these semantic processing difficulties, consistent with a causal role of motor-executive systems in processing action-related meaning. Motor hypoactivity also predicted the severity of autistic traits, thus suggesting a further relationship between dysfunction of motor systems and wider traits typical in ASC. More research is needed to elucidate the putative role of neural motor systems in ASC and, more generally, in social cognition and theory of mind.

#### **ACKNOWLEDGMENTS**

The authors would like to thank the following: Clare Cook, Yury Shtyrov, and Francesca Carota for help and input at various stages of theoretical discussion and imaging analysis; Amanda Ludlowfor early help with ASC participant recruitment; and staff at the ARC, particularly Bonnie Aeyeung and Carrie Allison, who assisted with participant recruitment. This work was supported by the Medical Research Council (MC\_US\_A060\_0034, U1055.04.003.00001.01 to Friedemann Pulvermuller), the Freie Universität Berlin (startup grant to Friedemann Pulvermuller), the Deutsche Forschungsgemeinschaft (Excellence Cluster Languages of Emotion), and the Engineering and Physical Sciences Research Council (UK) (BABEL grant).

#### **REFERENCES**


"fnhum-07-00725" — 2013/11/7 — 21:36 — page 9 — #9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 May 2013; accepted: 11 October 2013; published online: 08 November 2013.*

*Citation: Moseley RL, Mohr B, Lombardo MV, Baron-Cohen S, Hauk O and Pulvermüller F (2013) Brain and behavioral correlates of action semantic deficits in autism. Front. Hum. Neurosci. 7:725. doi: 10.3389/fnhum.2013.00725*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Moseley, Mohr, Lombardo, Baron-Cohen, Hauk and Pulvermüller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00725" — 2013/11/7 — 21:36 — page 10 — #10

# Functionally distinct contributions of the anterior and posterior putamen during sublexical and lexical reading

# *Marion Oberhuber1, 'Oiwi Parker Jones ¯ 1,2 ,Thomas M. H. Hope1, Susan Prejawa1, Mohamed L. Seghier1, DavidW. Green3 and Cathy J. Price1\**

*<sup>1</sup> Wellcome Trust Centre for Neuroimaging, University College London, London, UK*

*<sup>2</sup> Wolfson College, University of Oxford, Oxford, UK*

*<sup>3</sup> Cognitive, Perceptual and Brain Sciences, University College London, London, UK*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Wouter Braet, University of Kaiserslautern, Germany Xuchu Weng, Hangzhou Normal University, China*

#### *\*Correspondence:*

*Cathy J. Price, Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London WC1N3BG, UK e-mail: c.j.price@ucl.ac.uk*

Previous studies have investigated orthographic-to-phonological mapping during reading by comparing brain activation for (1) reading words to object naming, or (2) reading pseudowords (e.g., "phume") to words (e.g., "plume"). Here we combined both approaches to provide new insights into the underlying neural mechanisms. In fMRI data from 25 healthy adult readers, we first identified activation that was greater for reading words and pseudowords relative to picture and color naming. The most significant effect was observed in the left putamen, extending to both anterior and posterior borders. Second, consistent with previous studies, we show that both the anterior and posterior putamen are involved in articulating speech with greater activation during our overt speech production tasks (reading, repetition, object naming, and color naming) than silent oneback-matching on the same stimuli. Third, we compared putamen activation for words versus pseudowords during overt reading and auditory repetition. This revealed that the anterior putamen was most activated by reading pseudowords, whereas the posterior putamen was most activated by words irrespective of whether the task was reading words or auditory word repetition. The pseudoword effect in the anterior putamen is consistent with prior studies that associated this region with the initiation of novel sequences of movements. In contrast, the heightened word response in the posterior putamen is consistent with other studies that associated this region with "memory guided movement." Our results illustrate how the functional dissociation between the anterior and posterior putamen supports sublexical and lexical processing during reading.

**Keywords: fMRI, reading, word production, putamen, orthography, phonology**

#### **INTRODUCTION**

Reading involves the mapping of visual features (orthography) to meaning (semantics) and to articulatory codes (phonology) that will generate the corresponding speech sounds (phonetics). The non-semantic mapping from orthography-to-phonology can theoretically proceed lexically or sublexically (i.e., "champion" versus "cham"-"pi"-"on"), with sublexical processing enabling new or low frequency words (e.g., "jentacular") to be read. The aim of our paper was to identify the brain areas associated with non-semantic orthographic-to-phonological mapping. We start by considering the cognitive processing that might be needed to support this function. We then review previous functional imaging approaches for identifying the associated brain regions, prior to introducing a novel experimental design that allows us to dissect different types of processing that explain the observed activation.

In cognitive terms, there are multiple levels at which orthography can be mapped to phonology within a single word. For example, in alphabetic scripts, phonology can be generated from a single letter (e.g., "s"); letter pair (e.g., "sh"), single syllables (e.g., "cham"), multi-syllables (e.g., "cham-pi"), and the whole word (e.g., "champion"). Critically, although the different levels are more or less consistent (e.g., the letter "m" has a similar sound

by itself as in the word "champ"), there will also be multiple levels of inconsistencies, particularly in non-transparent languages like English (e.g., "c" has a different sound by itself than in "ch"). When lexical and sublexical outputs are consistent, sublexical processing can facilitate the production of the intended word (e.g., sublexical processing helps to distinguish the low-frequency word "animate" from the higher-frequency word "animal"), but when lexical and sublexical outputs are inconsistent, sublexical processing can interfere with word production, particularly for words with low lexical frequency (e.g., "yacht"). Accurate reading therefore requires the selection of articulatory codes that will support the intended pronunciation, and the inhibition of articulatory codes that are inconsistent with the intended pronunciation (e.g., "co-", "count," and "try" need to be suppressed when reading "country"). Finally, sublexical phonological codes need to be assembled in the right order with the correct prosody prior to speech production.

Previous functional neuroimaging approaches for identifying the brain regions associated with the non-semantic mapping of orthography to phonology have primarily involved the comparison of activation for reading pseudowords relative to reading familiar words (Binder et al., 2003; Jobard et al., 2003; Mechelli et al., 2003, 2005; Ischebeck et al., 2004; Dietz et al., 2005; Vigneau

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 1 — #1

et al., 2005; Borowsky et al., 2006; Levy et al., 2008; Woollams et al., 2011). The rationale here is that pseudowords are more reliant on non-semantic orthographic to phonological mapping than words because the latter benefit from semantics. The trouble with this approach is that the higher activation for reading pseudowords than words could also arise because the visual inputs and articulatory sequences are less familiar. Therefore, more activation for pseudoword than word reading could reflect more "difficulty" at many levels of processing, not just the sublexical mapping of orthography-to-phonology.

An alternative approach is to include a further comparison in which we contrast reading aloud to picture naming (Bookheimer et al., 1995; Moore and Price, 1999; Price et al., 2006; Yoon et al., 2006; Borowsky et al., 2007; Mechelli et al., 2007; Seghier and Price, 2010; Kherif et al., 2011; Parker Jones et al., 2011;Vogel et al., 2013; Wheat et al., 2013). The rationale here is that word reading involves non-semantic mapping between visual inputs and phonology but object naming does not because (a) object parts provide semantic cues but not phonological cues to the object's identity while (b) word parts (i.e., letters) provide the phonological cues but not semantic cues to the word's identity. Unlike the comparison of reading pseudowords to reading words, the comparison of reading words to naming pictures can control for the demands on articulation and semantic content by using the same object names for both conditions (e.g., read the word "banana" versus name the picture of a banana). However, activation for reading relative to object naming does not control for the visual processing of orthographic inputs. Notably, the confounds associated with the contrast (reading words > picture naming) are different to those associated with (reading pseudowords > reading words). We can therefore minimize both sets of confounds by looking at what is commonly activated by (reading words > picture naming) and (reading pseudowords > words). This should isolate areas associated with the non-semantic mapping of orthography to phonology from (a) visual processing differences which are controlled in the comparison of pseudowords to words; and (b) general task demands/attention because reading is easier than object naming. To date, we are not aware of any neuroimaging study that has investigated such commonalities. We aimed to do so here.

The logic of our experimental design was as follows: to identify areas associated with non-semantic orthographic-tophonological mapping, we compared activation for (reading words + reading pseudowords) to activation for (naming pictures of objects + naming the colors of meaningless, scrambled shapes). Activation that is higher for reading words than picture naming cannot be explained by word frequency differences or semantic content because the words were the written names of the same objects presented in the picture condition (i.e., they had the same semantics and word frequency). The influence of visual familiarity on our effects of interest was minimized because familiar and unfamiliar stimuli were balanced in the activation and baseline conditions (familiar words and unfamiliar pseudowords compared to familiar pictures of objects and unfamiliar pictures of scrambled objects). Any residual influence of visual familiarity could be tested by directly comparing the familiar to unfamiliar stimuli (i.e., familiar words and pictures of objects relative to unfamiliar pseudowords and scrambled objects).

Within the identified areas of interest, we compared activation for pseudowords and words. Our expectation was that pseudoword reading would increase the demands on sublexical processing because it is not supported by lexical or semantic processing. On the other hand, we hypothesized that greater activation for words than pseudowords might occur at the level of selecting articulatory codes from competing possibilities because of greater inconsistency between sublexical and lexical phonological codes ("country" versus "coun" and "try") that will increase the demands on the suppression of mismatching codes.

Our experimental design also included four auditory conditions that corresponded to the four visual conditions, namely (i) repetition of words, (ii) repetition of pseudowords, (iii) naming objects and animals from sounds (e.g., "cat" in response to "meow"), and (iv) naming the gender of a humming voice. This allowed us to isolate which of the areas that were more activated for reading than visual naming were also more activated by reading than auditory repetition. Greater activation for reading would indicate the influence of orthographic processing, whereas similar activation for auditory repetition and reading would indicate processing at the phonological/articulation level. More specifically, in areas activated by reading more than naming that were not activated by auditory repetition, we associated more activation for (i) pseudowords than words with sublexical orthographic-to-phonological conversion; and more activation for (ii) words compared to pseudowords with lexical influences on orthographic-to-phonological conversion. In contrast, in areas activated by reading more than naming that were also activated for auditory repetition, we associated more activation for (i) pseudowords than words with the demands on novel sequences of sublexical articulatory codes; and more activation for (ii) words compared to pseudowords with lexical influences on articulation (i.e., well-rehearsed motor outputs); see **Figure 1A**.

Finally, our experimental design was expanded to 16 conditions, by including a one-back-matching task for each of the eight types of stimuli (four visual, four auditory) used in the speech production tasks. The one-back-matching task involves viewing or listening to a series of stimuli and pressing a button when a stimulus is repeated. These conditions allowed us to identify (i) stimulus effects that were dependent or independent of task; (ii) areas that were involved in overt articulation, by comparing the speech production tasks to the one-back-matching tasks on the same stimuli. Effects of stimuli (e.g., pseudowords relative to words) that were independent of task must be arising at the subvocal level, whereas those that were greater for overt speech production are more likely to be related to articulation.

#### **MATERIALS AND METHODS SUBJECTS**

Our sample initially included 26 healthy adults with no history of neurological conditions. One subject was subsequently excluded due to missing data in one of the conditions. The remaining 25 participants included 12 females and 13 males, aged 20–45 years (mean = 31.4, SD = 5.9 years). All were right handed (assessed with Edingburgh Handedness Inventory; Oldfield, 1971), native English speakers with normal or corrected-to-normal vision. They

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 2 — #2



**FIGURE 1 | Rationale and summary of results. (A)** Rationale: in Step 1, we identified areas that were involved in orthographic-to-phonological mapping as those that were more activated by reading words and pseudowords than naming objects in pictures and the color of scrambled pictures. Subsequent analyses were restricted to these regions of interest. In Step 2, we identified activation increases for sublexical reading where activation was greater for reading pseudowords than reading words. In Step 3, we identified activation increases for lexical reading where activation was greater for reading words than pseudowords. In Step 2 and Step 3, we distinguish activation that is

each gave written informed consent prior to the scanning and received financial compensation for their participation. The study has been approved by London Queen Square Research Ethics Committee (Study number NO32).

#### **EXPERIMENTAL DESIGN**

The experiment comprised a 4 × 2 × 2 factorial design. Factor 1 compared stimuli with sublexical phonological properties (i.e., words and pseudowords) to stimuli without sublexical phonological properties (pictures of objects and meaningless shapes). Familiarity was controlled because half the stimuli were familiar (words, pictures of objects) and the others were unfamiliar (pseudowords, meaningless shapes). Factor 2 manipulated stimulus modality (visual or auditory). The four auditory stimuli were familiar words, unfamiliar pseudowords, environmental sounds associated with familiar objects or animals, and unfamiliar humming sounds. Factor 3 manipulated task (speech production or one-back-matching requiring a finger press response). The inclusion of the one-back-matching task allowed us to test whether activation (in areas activated by reading more than naming) was related to stimulus differences (e.g., written words versus pictures of objects) that were independent of task; or task effects (i.e., speech production versus one-back-matching) that were independent of stimulus.

# **PARTICIPANT INSTRUCTIONS**

In the speech production conditions, participants were instructed to (1)"Read words," (2)"Read pseudowords"(3)"Name pictures of

specific to orthographic processing and activation arising at the level of articulation by testing whether the differences between pseudoword and word production was also observed during auditory repetition or not. **(B)** Results: the only area significant in Step 1 was the left putamen. The anterior putamen was more activated by reading pseudowords than any other condition, consistent with the influence of sublexical orthographic processing. The posterior putamen was more activated by words than pseudowords during reading and repetition, consistent with lexical influences at the level of articulation rather than orthography.

objects," (4) "Name colors of meaningless shapes" (visual baseline condition), (5) "Repeat heard words," (6) "Repeat heard pseudowords," (7) "Name the source of environmental sounds" (i.e., CAMERA in response to the clicking noise of a camera), and (8) "Name the gender of a humming voice" (MALE or FEMALE; auditory baseline condition).

The one-back-matching task required a finger press response to indicate if the current stimulus was the same as the previous stimulus. To fully control for stimulus-effects, subjects were presented with exactly the same stimuli in both the speaking conditions and the one-back-matching conditions.

# **STIMULUS SELECTION/CREATION**

Stimulus selection started by generating 128 pictures of easily recognizable animals and objects (e.g., cow, bus, elephant, plate) with one to four syllables (mean = 1.59; SD = 0.73). Visual word stimuli were the written names of the 128 objects, with 3–12 letters (mean = 5 letters; SD = 1.8). Auditory word stimuli were the spoken names of the 128 objects (mean duration = 0.64 s; SD = 0.1), recorded by a native speaker of English with a Southern British accent approximating Received Pronunciation. Pseudowords were created using a non-word generator (Duyck et al., 2004) and matched to the real words for bigram frequency, number of orthographic neighbors, and word length. The same male speaker recorded the auditory words and pseudowords.

The non-verbal sounds associated with objects were available and easily recognizable for a quarter (32) of the stimuli, and taken from the NESSTI sound library

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 3 — #3

(http://www.imaging.org.au/Nessti; Hocking et al., 2013). The duration of the environmental sounds needed to be significantly longer (mean length = 1.47 s, SD = 0.13) than the duration of the words [*t*(158)=40.28; *p* <0.001) because shorter sounds were not recognizable. The auditory baseline stimuli were recorded by both a male and female voice humming novel pseudowords, thereby removing any phonological or semantic content (mean duration = 1.04 s, SD = 0.43). Half of these stimuli were matched to the length of the auditory words (0.64); the other half, to the length of the environmental sounds (1.47). The visual baseline stimuli were meaningless object pictures, created by scrambling both global and local features, and then manually edited to accentuate one of 8 colors (brown, blue, orange, red, yellow, pink, purple, and green). Consistent speech production responses were ensured for all stimuli in a pilot study conducted on 19 participants.

#### **STIMULUS AND TASK COUNTERBALANCING**

The 128 object stimuli were divided into four sets of 32 stimuli (A, B, C, and D). Set D was always presented as environmental non-verbal sounds. Sets A, B, and C were rotated across pictures, visual words, and auditory words in different participants. All items were therefore novel on first presentation of each stimulus type (for task 1); and the same items were repeated for task 2 but in a different condition. Half the subjects performed all eight speech production tasks first (task 1) followed by all eight oneback-matching tasks (task 2). The other half performed all eight one-back-matching tasks first (task 1) followed by all eight speech production tasks (task 2). Within each task, half the subjects were presented auditory stimuli first, followed by visual stimuli; and the other half were presented visual stimulus first followed by auditory stimuli. The order of the four stimulus types was fully counterbalanced across subjects, and full counterbalancing was achieved with 24 participants.

Each set of 32 items was split into 4 blocks of 8 stimuli, with one of the 8 stimuli repeated in each block to make a total of 9 stimuli per block (8 novel, one repeat). The stimulus repeat only needed to be detected and responded to (with a finger press) during the one-back-matching task.

#### **fMRI DATA ACQUISITION**

Functional and anatomical data were collected on a 3-T scanner (Trio, Siemens, Erlangen, Germany) using a 12 channel head coil. To minimize movement during acquisition, a careful head fixation procedure was used when positioning each participant's head. This ensured that none of the speech sessions were excluded after checking the realignment parameters. Functional images consisted of a gradient-echo EPI sequence and 3 × 3 mm in-plane resolution (TR/TE/flip angle = 3080 ms/30 ms/90◦, EFOV = 192 mm, matrix size = 64 × 64, 44 slices, slice thickness = 2 mm, interslice gap = 1 mm, 62 image volumes per time series, including five "dummies" to allow for T1 equilibration effects). The TR was chosen to maximize whole brain coverage (44 slices) and to ensure that slice acquisition onset was offset synchronized with stimulus onset, which allowed for distributed sampling of slice acquisition across the study (Veltman et al., 2002).

For anatomical reference, a high-resolution T1 weighted structural image was acquired after completing the tasks using a three dimensional modified driven equilibrium Fourier transform (MDEFT) sequence (TR/TE/TI = 7.92/2.48/910 ms, flip angle = 16◦, 176 slices, voxel size = 1 × 1 × 1 mm). The total scanning time was approximately 1 h and 20 min per subject, including set-up and the acquisition of the structural scan.

# **PROCEDURE**

Prior to scanning, each participant was trained on all tasks using different stimulus material, except for the environmental sounds which remained the same throughout both training and experiment. All speaking tasks required the subject to produce a single verbal response after each stimulus presentation. For the oneback-matching task, participants had to use two fingers of the same hand (right hand for half of the subjects, left hand for the other half) to press one of two buttons on a fMRI compatible button box to indicate whether the stimulus was the same as the one preceding it (left button for "same," right button for "different"). The participants were instructed to keep their body and head as still as possible and to keep their eyes open throughout the experiment and attend to a fixation cross on the screen while listening to the auditory stimuli. Each of the 16 tasks was presented in a separate scan run, all of which were identical in structure. The script was written with COGENT and run in Matlab 2010a (Mathsworks, Sherbon, MA, USA).

Scanning started with the instructions "Get Ready" written on the in-scanner screen while five dummy scans were collected. This was followed by four blocks of stimuli (nine stimuli per block, 2.52 s inter-stimulus-interval, 16 s fixation between blocks, total run length = 3.2 min). Every stimulus block was preceded by a written instruction slide (e.g., "Repeat"), lasting 3.08 s each, which indicated the start of a new block and reminded subjects of the task. Visual stimuli were each displayed for 1.5 s. Each image was scaled to 350 × 350 pixels and subtended a visual angle of 7.4◦, with a screen resolution of 1024 × 768. Words and pseudowords were presented in lower case Helvetica. Their visual angle ranged from 1.47 to 4.41◦ with the majority of words (with five letters) extending 1.84–2.2◦.

The length of sound files varied across stimuli and tasks, ranging from 0.64 to 1.69 s (see stimulus creation above). Auditory stimuli were presented via MRI compatible headphones (MR Confon, Magdeburg, Germany), which filtered ambient inscanner noise. Volume levels were adjusted for each subject before scanning. Each subject's spoken responses were recorded via a noise-canceling MRI microphone (FOMRI IIITM Optoacoustics, Or-Yehuda, Israel), and transcribed manually for off-line analysis. We used eye-tracking to ensure participants were keeping their eyes open throughout the experiment.

#### **fMRI DATA PRE-PROCESSING**

We performed fMRI data preprocessing and statistical analysis in SPM12 (Wellcome Trust Centre for Neuroimaging, London, UK), running on MATLAB 2012a (Mathsworks, Sherbon, MA, USA). Functional volumes were (a) spatially realigned to the first EPI volume and (b) un-warped to compensate for nonlinear distortions caused by head movement or magnetic field inhomogeneity. We used the unwarping procedure in preference to including the realignment parameters as linear regressors

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 4 — #4

in the first-level analysis because unwarping accounts for nonlinear movement effects by modeling the interaction between movement and any inhomogeneity in the*T*2*\** signal. After realignment and unwarping, we checked the realignment parameters to ensure that participants moved less than one voxel (3 mm) movement within each scanning run. The anatomical T1 image was (c) co-registered to the mean EPI image which had been generated during the realignment step and then spatially normalized to the Montreal Neurological Institute (MNI) space using the new unified normalization-segmentation tool of SPM12. To spatially normalize all EPI scans to MNI space, (d) we applied the deformation field parameters that were obtained during the normalization of the anatomical T1 image. The original resolution of the different images was maintained during normalization (voxel size 1 mm × 1 mm × 1 mm for structural T1 and 3 mm × 3 mm × 3 mm for EPI images). After the normalization procedure, (e) functional images were spatially smoothed with a 6-mm full-width-half-maximum isotropic Gaussian kernel to compensate for residual anatomical variability and to permit application of Gaussian random-field theory for statistical inference (Friston et al., 1995).

#### **FIRST-LEVEL ANALYSES**

In the first-level statistical analyses, each pre-processed functional volume was entered into a subject specific, fixed-effect analysis using the general linear model (Friston et al., 1995). All stimulus onset times were modeled as single events, with two regressors per run, one modeling instructions, and the other modeling the stimuli of interest. Stimulus functions were then convolved with a canonical hemodynamic response function. To exclude lowfrequency confounds, the data were high-pass filtered using a set of discrete cosine basis functions with a cut-off period of 128 s. The contrasts of interest were generated for each of the 16 conditions (relative to fixation). The results of each individual were visually inspected to ensure that there were no visible artifacts (edge effects, activation in ventricles, etc.) that might have been caused by within scan head movements.

#### **EFFECTS OF INTEREST**

At the second level, the 16 contrasts for each subject were entered into a within subjects one way ANOVA in SPM12. Statistical comparisons between different sets of conditions aimed to identify areas activated by reading more than naming and dissect these according to different levels of processing, as described below and illustrated in **Figure 1A**.

First we identified areas that were activated for reading words and pseudowords relative to picture and color naming (*p* < 0.05 FWE corrected for multiple comparisons across the whole brain). Second, within the identified areas of interest, we used an uncorrected statistical threshold to test whether activation was greater for pseudowords than words; distinguishing these areas as either activated for [pseudowords > words] during reading more than auditory repetition [i.e., the interaction of (pseudowords >words) and (reading > auditory repetition) or commonly activated for (pseudowords > words)] during both reading and auditory repetition (i.e., a main effect of pseudowords > words where there was no interaction with stimulus modality). Third, we repeated this process to test whether activation was greater for words than pseudowords; distinguishing (words > pseudowords) that was greater for reading than auditory repetition [i.e., the interaction of (words > pseudowords) and (reading > auditory repetition) or commonly activated for (words > pseudowords)] during both reading and auditory repetition (i.e., a main effect of words > pseudowords where there was no interaction with stimulus modality). The rationale for this three step approach is illustrated in **Figure 1A**. Fourth, in each region of interest, we examined the pattern of response across all 16 conditions to determine the type of processing that was being influenced by sublexical reading (e.g., articulation, visual processing).

#### **BEHAVIORAL ANALYSIS OF IN-SCANNER ACCURACY AND RESPONSE TIMES**

Statistical analyses involved 2 × 4 ANOVAs in SPSS manipulating stimulus modality (visual versus auditory) with stimulus type (word, pseudoword, sound/picture, and gender/color). All ANOVAs were corrected for potential violations of sphericity, adjusting their degrees of freedom using the Greenhouse–Geisser correction (Greenhouse and Geisser, 1959). These corrections result in more conservative statistical tests (i.e., decreasing the risk of false positives while increasing the risk of false negatives), and account for the non-integer degrees of freedom below. Data from all 25 subjects were included for the speech production tasks (measuring accuracy in both visual and auditory modalities), while data from only 22 subjects were included for the one-back-matching tasks (measuring accuracy and response times (RTs) in both visual and auditory modalities). Three subjects' data were excluded because their button press responses were not consistently detected (due to technical failure) in one of the following one-back-matching conditions (written pseudowords, environmental sounds, and spoken words).

#### **RESULTS**

#### **fMRI RESULTS**

#### *Areas activated by reading more than naming*

Our areas of interest were defined as those that were more activated for reading words and pseudowords compared to object and color naming. Only one region reached a corrected level of significance (*p* < 0.05 FWE-corrected). This was a large area of the left putamen, reaching from the most anterior to the most posterior borders (see **Figure 2**; **Table 1**). The many other areas activated by either reading pseudowords relative to words; or reading words relative to picture naming are summarized below.

#### *Differential activation for pseudowords and words*

Within the left putamen, greater activation for reading pseudowords than reading words was observed in the most anterior segment and an interaction between task and stimulus type indicated that the difference between pseudowords and words was greater during reading than during repetition (see **Table 1** for details). In contrast, greater activation for words than pseudowords was observed in the posterior putamen with no significant interaction between stimulus type (words versus pseudowords) and task (reading versus auditory repetition); see **Table 1**.

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 5 — #5

**FIGURE 2 | Activation in the anterior and posterior putamen.** Top: activation in the putamen for reading words and pseudowords compared to naming objects and colors. A close up of the left putamen labels anterior putamen (A) and posterior putamen (P). Plots show relative activation for each of the 16 conditions in (A) at MNI co-ordinates *x* = −21, *y* = +9, *z* = +3 and (P) at MNI co-ordinates *x* = −30, *y* = −9, *z* = 0. Values on *y*-axis show activation relative to fixation (the baseline condition). Bars on each column indicate confidence intervals. Black represents speech tasks, gray represents one-back-matching task. The first eight columns represent the visual tasks with WPPC = word reading, pseudoword reading, picture naming, and color naming stimuli. The second eight columns represent the auditory tasks with WPSH = auditory repetition of words, auditory repetition of pseudowords, naming objects from non-verbal sounds and identifying gender of humming voice. Dotted frames highlight greater activation for pseudowords than words in anterior putamen and greater activation for words than pseudowords in posterior putamen. Details of the relevant statistics are provided in the text and**Table 1**.

The contrasting responses to reading words and pseudowords in the anterior and posterior putamen was confirmed by a significant region by condition interaction [*F*(1, 24) = 25.4; *p* < 0.001, with Greenhouse–Geisser correction for non-sphericity), with greater activation for pseudoword than word reading in the anterior putamen but greater activation for word than pseudoword reading in the posterior putamen. This analysis was based on effect sizes from the peak voxel for pseudoword reading compared to word reading in the anterior putamen; and the peak voxel for

#### **Table 1 | Location and effects in the left and right putamen.**


*The location (in MNI space) and significance of the effects in the left and right putamen for reading words (W) and pseudowords (P) relative to picture naming and color naming. L* = *left hemisphere, R* = *right hemisphere. R* − *A* = *stimulus effects that are greater for reading* > *auditory repetition (i.e., an interaction of pseudowords versus words with stimulus modality). R&A* = *an effect of stimulus type that does not interact with stimulus modality (i.e., common to reading aloud and auditory repetition). Zsc* = *Z score, where negative Z scores indicate greater activation for words than pseudowords, and ns* = *not significant at p* < *0.05 uncorrected. The co-ordinates at the peak and posterior putamen locations were identified in the main contrast (reading words and pseudowords relative to naming objects and colors).The peak Z score for the left putamen (highlighted in bold), and the number of voxels (k) that surpassed an uncorrected threshold of p* < *0.001 were both significant after family wise error correction for multiple comparisons across the whole brain. In the right putamen, the corresponding Z scores and K are based on a statistical threshold of p* < *0.001. The co-ordinates for the anterior putamen come from the direct comparison of pseudoword to word reading which was highly significant in the left hemisphere (Z score* = *5.1, p* < *0.05 after correction for multiple comparisons across the whole brain; k* = *69 at p* < *0.001 uncorrected) and significant at p* < *0.001 at the corresponding location in the right hemisphere (Z score* = *4.7; k* = *29). The latter effects of stimuli in the reading task are not shown in the table that distinguishes the Z scores for the main effect of pseudowords relative words summed over the reading and auditory repetition conditions (R&A) from the interaction of stimuli and task (R* − *A) at the same voxel.*

word reading compared to pseudoword reading in the posterior putamen. The pattern of effects across all 16 conditions at these peak voxels is illustrated in **Figure 2**.

#### *The response profile in the left anterior and posterior putamen across conditions*

Although we found that activation in the left anterior putamen was more activated by reading pseudowords than any other condition (see **Figure 2**), activation was not specific to orthographic input. On the contrary, the left anterior putamen was activated by all speech conditions relative to one-back-matching on the same stimuli. We therefore describe the enhanced activation for pseudoword reading in the left anterior putamen as "the influence of sublexical orthographic processing on an articulation response." Likewise, we observed that the increased activation for words over pseudowords in the left posterior putamen was not specific to reading, with comparable effects during auditory repetition (see **Figure 2**). An influence of sublexical phonology on the left posterior putamen was indicated by greater activation for words and pseudowords relative to picture and sound naming. Even higher activation for words than pseudowords

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 6 — #6

indicates that left posterior putamen activation was most responsive when there was input from both lexical and sublexical phonology.

*Response in anterior and posterior putamen in the right hemisphere* Our focus has been on the left putamen because this was the only area to be significantly more activated by reading words and pseudowords relative to naming pictures of objects and colors, when the statistical threshold was set at *p* < 0.05 FWE-corrected for multiple comparisons across the whole brain. However, *post hoc* analyses revealed that the pattern of effects observed in the left putamen were mirrored in the right putamen (see **Table 1**), albeit less significantly.

#### *Greater activation for pseudowords than words, outside the putamen*

Consistent with previous studies, we found many regions that were more activated for reading pseudowords than words, even though they were not more activated when pseudoword reading was compared to object naming. Greater activation for pseudowords than words that was common to reading and auditory repetition was observed in the left dorsal premotor cortex (MNI: −48, 0, +48), SMA/PreSMA (−6, +3, +66/0, +12, +51), bilateral posterior inferior frontal gyri (−45, +6, +24/+45, +9, +24), bilateral frontal operculum (−30, +21, 0/+33, +21, 0), left dorsal supramarginal gyrus (−42,−42,+45), and right cerebellum (+27,−63, −27). Greater activation for pseudowords than words that was dependent on task (reading > auditory repetition) was distributed bilaterally in occipital, occipito-temporal and the intraparietal cortices. All the above regions were identified after family wise error correction for multiple comparisons across the whole brain.

#### *Greater activation for words than objects, outside the putamen*

For completeness, we also looked for regions that were more activated for reading words than object naming even though they were not more activated for pseudoword relative to word reading. There was nothing significant in a whole brain search. Using regions of interest from Price et al. (2006), we found that reading relative to object naming that was common to the visual and auditory modalities (i.e., reading and repeating words relative to picture and sound naming), increased activation in the left premotor cortex (−54, −6, +33/−54, −6, +18) and the precuneus (−9, −53, +27/−3, −66, +36) with a non-significant trend in the left posterior superior temporal sulcus (−57, −42, +2). There were no regions that were more activated by words than object naming in the visual modality more than the auditory modality.

#### *In-scanner behavior*

Details of the in-scanner speech production accuracy are provided in **Figure 3**. There was no significant effect of stimulus modality [*F*(1.00, 24.00) = 0.04; *p* = 0.84, Greenhouse–Geisser] but there was an effect of stimulus type [*F*(1.38, 33.11) = 29.14; *p* < 0.001, Greenhouse–Geisser) which interacted with stimulus modality [*F*(1.52, 36.41) = 3.82; *p* = 0.042, Greenhouse–Geisser). In the visual domain, accuracy was higher for words and colors than pictures and pseudowords. In the auditory domain, accuracy was higher for words and gender than sounds or pseudowords. RT data were not available for the speech production tasks.

For accuracy in the one-back-matching task (with partially missing data for three subjects), we found a main effect of stimulus type [*F*(2.25, 47.32) = 29.94; *p* < 0.001, Greenhouse–Geisser], a main effect of stimulus modality [*F*(1.00, 21.00) = 4.89; *p* = 0.038, Greenhouse–Geisser] and a stimulus modality by condition interaction [*F*(2.08, 43.65) = 6.54; *p* = 0.003, Greenhouse–Geisser]. In the visual domain, accuracy was higher for pictures, pseudowords, and words relative to colors. Likewise, in the auditory domain, accuracy was higher for words, pseudowords, and sounds than gender. The lower accuracy for color and gender arose because some participants attempted to match these stimuli on their visual or auditory forms, rather than their color or pitch.

For RTs in the one-back-matching task, we found a main effect of stimulus type [*F*(1.62, 34.07) = 21.17; *p* < 0.001, Greenhouse–Geisser], a main effect of stimulus modality [*F*(1.00, 21.00) = 150.51; *p* < 0.001, Greenhouse–Geisser] and a stimulus modality by condition interaction [*F*(1.81, 38.00) = 6.68; *p* = 0.004, Greenhouse–Geisser]. For all conditions, participants were slower in the auditory modality than the visual modality. Within both stimulus modalities, RTs mirrored the accuracy on the one-back-matching task with faster RT and higher accuracy for words and pseudo-words compared to the baseline conditions (gender and color).

#### **DISCUSSION**

Using a multi-factorial experimental design, we aimed to identify the brain areas where activation increases during the non-semantic mapping of orthography to phonology. Previous studies have addressed this question by looking at activation that is either greater for reading pseudowords than words or greater for word reading than object naming. In contrast, to avoid the confounds associated with each of these approaches, we identified areas associated with the non-semantic mapping of orthography to phonology as those where activation was greater for reading aloud words and pseudowords than for object or color naming. We also compared the effect of stimulus type during reading and auditory

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 7 — #7

repetition to dissociate sublexical effects that were related to orthographic processing or articulation. Our logic was that effects that arose at the level of mapping orthography-to-phonology would be greater for reading than auditory repetition whereas effects that arose at the level of articulation would be common to reading and auditory repetition.

Our most significant finding was observed in the left putamen where we found that activation extending from the most anterior border to the most posterior border was greater for: (i) reading than picture naming; (ii) auditory repetition than sound naming; and (iii) producing speech than one-back-matching on the same stimuli. Within the putamen, the comparison of words to pseudowords revealed a striking and unexpected dissociation between the anterior and posterior territories. In the anterior putamen, activation was greater for reading pseudowords than any other condition, suggesting an influence of early orthographic processing on an articulatory area. In the posterior putamen, activation was greater for words than pseudowords during auditory repetition as well as during reading which demonstrates greater activation for more familiar speech output. The same pattern of response was observed in the right anterior and posterior putamen, albeit less significantly than in the left anterior and posterior putamen (see **Table 1**; **Figure 2**)

According to our task analysis (see **Figure 1**), increased demands on orthographic-to-phonological mapping will result in (i) greater activation for pseudowords than words, in areas (ii) more activated for words than object naming, but (iii) not more activated for auditory repetition of pseudowords than words. In contrast, greater activation for words than pseudowords was expected at the level of selecting articulatory codes from competing possibilities because words have greater inconsistency between sublexical and lexical phonological codes ("country" versus "coun" and "try") that increases the demands on the suppression of mismatching codes. However, a role for the anterior putamen in orthographic-to-phonological mapping does not explain why the anterior putamen is also activated by object naming and other speech production tasks that do not involve orthographic-tophonological mapping (see **Figure 2**). Moreover, a role for the posterior putamen in suppressing conflict between lexical and sublexical phonological codes does not explain why the posterior putamen is also more activated for words than pseudowords during auditory repetition (where there is no conflict between lexical and sublexical codes). We therefore turn to prior studies of the putamen to provide alternative interpretations of our findings.

Although we did not predict a double dissociation between the response of the anterior and posterior putamen to pseudowords and words, a *post hoc* literature search revealed that our findings are consistent with prior observations. For example, greater activation in the anterior putamen for reading pseudowords than words is consistent with prior studies that associated the anterior putamen with "the initiation of unskilled difficult movements" (Okuma andYanagisawa,2008;Aramaki et al.,2011), and the binding of sequential motor elements (Wymbs et al., 2012) as occurs during sublexical reading. Greater activation for pseudowords than words in the left anterior putamen during reading compared to auditory repetition can therefore be explained in terms of the demands on initiating or sequencing novel combinations of movements. Such demands will be less when the intended output is known (i.e., during auditory repetition) than when it can only be derived from visual cues (i.e., during reading). We are therefore not proposing that the anterior putamen is involved in orthographic-to-phonological mapping; instead, we are proposing that the output from orthographic-to-phonological mapping provides a more challenging trigger to the initiation of movements than the output of phonological processing during auditory repetition. Likewise, anterior putamen activation has been reported when the demands on articulation increase for non-native more than native language processing (Abutalebi et al., 2013) and late more than early bilingual speech (Frenck-Mestre et al., 2005).

Plausibly, sub-articulatory processing might also explain why Kotz et al. (2002) found more activation in the anterior putamen when participants made lexical decisions on auditory words (confirming the stimuli were real words) relative to lexical decisions on pseudowords (rejecting stimuli as real words), while ignoring auditory primes, presented 100 ms before target onset, that induced different types of semantic interference for word targets than for pseudoword targets. Clearly, further investigation is required to confirm this and explain all the different effects that have been reported. Moreover, there may be multiple variables that influence activity in the anterior putamen. For example, the anterior putamen has been associated with eye movements (Petit et al., 2009; Neggers et al., 2012) which can explain our findings if we argue that eye movements increase when reading novel relative to familiar letter strings but does not readily explain greater activation for auditory words than pseudowords in Kotz et al. (2002). Note that Kotz et al. (2002) also report more activation for auditory pseudowords than auditory words in the "posterior putamen." We do not discuss this result because the co-ordinates they report (Talairach: −32, +6, +9; MNI: −33, +9, +5) are far from those that we associate with the posterior putamen (MNI: −30, −9, 0).

With respect to our finding that activation in the posterior putamen was greater for words than pseudowords during auditory repetition and reading, we note that prior studies have associated the posterior putamen with "memory guided movement" (Menon et al., 2000; Tricomi et al., 2009). Activation in the posterior putamen may therefore be higher for word than pseudoword production because of familiarity with the required motor sequences. The preference of the posterior putamen for well-learnt movements and the anterior putamen for novel movements has been replicated in many other studies (Jueptner et al., 1997; Lehericy et al., 2005; Bapi et al., 2006; Fernandez-Seara et al., 2009) and concords with animal studies showing that injections of muscimol (GABA agonist) in the anterior putamen impairs learning of new sequences whereas injections into the middle-posterior putamen impairs the execution of well-learned sequences (Miyachi et al., 1997). More broadly, these differential responses in anterior and posterior putamen have been linked to the operation of different cortico-striatal loops. The anterior putamen interacts with anterior cortical areas (premotor cortex and anterior cingulate) as well as Broca's area (Ford et al., 2013), whereas the posterior putamen interacts directly with the sensorimotor cortex, and cerebellum (Alexander et al., 1986; Jueptner et al., 1997; Hikosaka et al., 2002; Fernandez-Seara et al., 2009).

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 8 — #8

We have focused this paper on only one brain region (the putamen) because the pattern of response that we were looking for (greater activation for reading than picture naming; and reading pseudowords than words) did not reach significance in any other brain region. This was not due to a lack of sensitivity in all the other regions that have previously been associated with pseudoword reading more than word reading (Binder et al., 2003; Jobard et al., 2003; Mechelli et al., 2003, 2005; Ischebeck et al., 2004; Binder et al., 2005; Dietz et al., 2005; Vigneau et al., 2005; Borowsky et al., 2006; Levy et al., 2008; Woollams et al., 2011). On the contrary, we identified all the usual candidates for this contrast (pars opercularis, occipito-temporal gyrus, supramarginal gyrus etc.) but did not associate them with sublexical reading because they were not more activated for reading pseudowords than object naming. They may therefore represent levels of processing that are shared by pseudoword reading and object naming, such as enhanced demands on phonological retrieval and articulation relative to word reading. In addition, because our experimental design included both reading and auditory repetition tasks (see Hartwigsen et al., 2013), we are able to segregate areas that were more activated for pseudoword than word reading into those that were (i) also more activated by pseudowords than words during the auditory repetition task (i.e., at a postorthographic level of processing), and those where the difference between pseudowords and words was greater for reading than auditory repetition (i.e., those related to orthographic processing). This revealed that all the anterior brain areas that were more activated for pseudoword than word reading (e.g., premotor, parietal, and SMA) were also more activated for auditory repetition of pseudowords than words. The commonality here is therefore assumed to arise in post-orthographic processing. In contrast, all the posterior brain areas that were more activated for reading pseudowords than words were not more activated by repetition of pseudowords than words. They are therefore likely to be associated with the visual processing that supports written word and object recognition.

To summarize, our findings are consistent with previous studies but offer a novel interpretation of activations that have previously been associated with pseudoword reading. By including multiple conditions we have dissociated the functions of the anterior and posterior putamen from areas that are involved in the visual processing that supports word and object recognition, or the articulatory processing that is common to reading and repetition.

#### **CONCLUSION**

We have shown a functional dissociation between the anterior and posterior putamen. The response in the anterior putamen is consistent with prior studies that associated this region with "the initiation of unskilled difficult movements," prior to motor output (Aramaki et al., 2011). In contrast, the response in the posterior putamen is consistent with prior studies that associated this region with "memory guided movement" (Tricomi et al., 2009). Prior studies have also noted a transition of activity from anterior to posterior putamen during visuo-motor sequence learning (Bapi et al., 2006). Here we show how the anterior and posterior putamen are differentially involved in lexical and sublexical reading.

# **ACKNOWLEDGMENTS**

This work was funded by the Wellcome Trust. We thank Eldad Druks for providing the picture stimuli and Julia Hocking for providing the environmental sound stimuli.

# **AUTHOR CONTRIBUTIONS**

Cathy J. Price, David W. Green, 'Oiwi Parker Jones, and Mohamed ¯ L. Seghier were responsible for the study design, Thomas M. H. Hope, 'Oiwi Parker Jones, and Susan Prejawa created the ¯ paradigm, Marion Oberhuber, Susan Prejawa, Thomas M. H. Hope, and 'Oiwi Parker Jones were involved in data acquisi- ¯ tion and Marion Oberhuber and Susan Prejawa analyzed the data. Marion Oberhuber created the figures and searched the literature. All authors contributed to and approved the final manuscript.

#### **REFERENCES**


"fnhum-07-00787" — 2013/11/17 — 15:53 — page 9 — #9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 July 2013; accepted: 30 October 2013; published online: 19 November 2013.*

*Citation: OberhuberM, Parker Jones 'O, Hope TMH, Prejawa S, SeghierML,Green DW ¯ and Price CJ (2013) Functionally distinct contributions of the anterior and posterior putamen during sublexical and lexical reading. Front. Hum. Neurosci. 7:787. doi: 10.3389/fnhum.2013.00787*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Oberhuber, Parker Jones, Hope, Prejawa, Seghier, Green and Price. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00787" — 2013/11/17 — 15:53 — page 10 — #10

# Behavioral evidence for inter-hemispheric cooperation during a lexical decision task: a divided visual field experiment

# *Marcela Perrone-Bertolotti 1†, Sophie Lemonnier 2† and Monica Baciu3,4\**

*<sup>1</sup> INSERM U1028, CNRS UMR5292, Lyon Neuroscience Research Center, Brain Dynamics and Cognition Team, Université Claude Bernard Lyon 1, Lyon, France <sup>2</sup> LUTIN Userlab - CHArt Cognitions Humaine et Artificielle, Département de Psychologie, Université Paris 8, Paris, France*

*<sup>3</sup> Laboratoire de Psychologie et Neurocognition CNRS – UMR 5105, Département de Psychologie, Université Pierre Mendès-France, Grenoble, France*

*<sup>4</sup> Institut Universitaire de France, IUF, Paris, France*

#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*William W. Graves, Rutgers University, USA Lise Van Der Haegen, Ghent University, Belgium*

*'Oiwi P. Jones, Wellcome Trust Centre for Neuroimaging, UK*

#### *\*Correspondence:*

*Monica Baciu, Laboratoire de Psychologie et Neurocognition, UMR CNRS 5105, BP 47, 38040 Grenoble 09, France e-mail: mbaciu@upmf-grenoble.fr*

*†These authors have contributed equally to this work.*

#### **HIGHLIGHTS**


This study explores inter-hemispheric interaction (IHI) during a lexical decision task by using a behavioral approach, the bilateral presentation of stimuli within a divided visual field experiment. Previous studies have shown that compared to unilateral presentation, the bilateral redundant (BR) presentation decreases the inter-hemispheric asymmetry and facilitates the cooperation between hemispheres. However, it is still poorly understood which type of information facilitates this cooperation. In the present study, verbal stimuli were presented unilaterally (left or right visual hemi-field successively) and bilaterally (left and right visual hemi-field simultaneously). Moreover, during the bilateral presentation of stimuli, we manipulated the relationship between target and distractors in order to specify the type of information which modulates the IHI. Thus, three types of information were manipulated: perceptual, semantic, and decisional, respectively named pre-lexical, lexical and post-lexical processing. Our results revealed left hemisphere (LH) lateralization during the lexical decision task. In terms of inter-hemisphere interaction, the perceptual and decision-making information increased the inter-hemispheric asymmetry, suggesting the inhibition of one hemisphere upon the other. In contrast, semantic information decreased the inter-hemispheric asymmetry, suggesting cooperation between the hemispheres. We discussed our results according to current models of IHI and concluded that cerebral hemispheres interact and communicate according to various excitatory and inhibitory mechanisms, all which depend on specific processes and various levels of word processing.

#### **Keywords: asymmetry, cooperation, inhibition, divided visual field, redundant, bilateral, lexical decision**

# **INTRODUCTION**

The majority of individuals show a left hemisphere (LH) predominance for language processing (Josse and Tzourio-Mazoyer, 2004). Nevertheless, both hemispheres are more or less involved during language processing and they are in constant interaction. The mechanisms underlying the inter-hemispheric interaction (IHI) is still a topic of debate (for a review, see van der Knaap and van der Ham, 2011). In the present study, and through the manipulation of the information conveyed between hemispheres by means of divided visual fields (DVF) presentation of verbal material, we evaluated both the hemispheric specialization and the inter-hemisphere interaction.

DVF is based on the anatomo-functional properties of partially crossed visual pathways (Chiarello et al., 2004; Bourne, 2006). Consequently, a briefly presented stimulus (flashed) in one's visual hemi-field is processed first by the opposite hemisphere (LH for the right visual hemi-field presentation; right hemisphere-RH for left visual hemi-field presentation). The logic behind this procedure is that visual verbal stimuli are processed faster and more efficiently if they are presented first to the specialized hemisphere to process language, generally the left one (Bourne, 2006). Studies performed with DVF procedure provide convergent evidence with those using brain lesion-deficit approach and neuroimaging studies. Indeed, DVF studies suggest that (1) LH is predominant for processing language and (2) RH has several language abilities. The degree of hemispheric specialization (LH *>* RH) varies according to the language task and the psycholinguistic features of the stimuli (Chiarello et al., 2005; Cousin et al., 2006; Perrone et al., 2009). These studies are suggesting a continuum of hemispheres involvement, rather than absolute unilateral hemispheric specialization (Pulvermüller, 1996; Jung-Beeman, 2005). Both hemispheres communicate continuously during language processing and show dynamic interaction (Banich, 1998). This IHI may be explored by using a specific procedure derived from DVF, the bilateral and simultaneous presentation of stimuli in both left (LVF) and right (RVF) visual hemi-fields (see Bourne, 2006 for a review). Compared to unilateral, the bilateral presentation shows higher performances for language processing. This is particularly true if bilateral presented stimuli are redundant (identical) rather than different (Banich and Karol, 1992; Hellige, 1993; Mohr et al., 2000, 2002). The gain of performance (bilateral *>* unilateral) is called "bilateral gain" (BG) and represents behavioral evidence for the interhemispheric cooperation (Zaidel and Rayman, 1994; Mohr et al., 1996; Hasbrooke and Chiarello, 1998; Weissman and Banich, 2000). Cooperation between hemispheres is based on anatomical structures connecting both hemispheres such as with the corpus callosum (Weems and Reggia, 2004; Stephan et al., 2005). Indeed, the BG is not obtained in split-brain patients (Mohr et al., 1994). To be measured, the BG requires that "hemispheres are not independent processors" (Weems and Reggia, 2004).

Nevertheless, in healthy subjects, BG was only observed under certain conditions such as the performance of complex tasks and familiar stimuli (Banich and Belger, 1990; Mohr et al., 1996). The facilitation for familiar stimuli is currently explained by the neurocognitive model based on Hebbian learning mechanisms (Pulvermüller and Mohr, 1996). This model suggests that previously learned items are stocked under memory representations of words (see Pulvermüller, 1996; Pulvermüller and Mohr, 1996; Mohr et al., 2007). These representations correspond to large networks composed of inter-connected neurons and constitute functional units distributed across hemispheres. A Functional Unit emerges from the frequent co-activation of an inter-hemispheric ensemble of neurons by the repeated presentation of stimuli. Thus, a unilateral visual hemi-field presentation may activate the corresponding functional unit distributed across hemispheres. If the same item is presented simultaneously in both visual hemi-fields, the activation of cortical representation across hemispheres could double its strength with the additive mechanisms present. In terms of performance the additive mechanisms may be reflected by the BG effect and the decrease of inter-hemispheric asymmetry (Mohr et al., 1996). In other words, the bilateral redundant (BR) presentation of stimuli increases the cooperation between hemispheres.

Even with the BG reports following the BR presentation, it remains unclear what specific type of information facilitates the inter-hemispheric cooperation during the word processing. The BR presentation involves identical perceptual, semantic and decisional processing (response making, i.e., for the two stimuli presented the same response as expected). Each type of processing may facilitate the cooperation: perceptual (see Banich and Karol, 1992) such as physical resemblance of stimuli (Fernandino et al., 2007; Baird and Burton, 2008), semantic relationship (Koivisto, 2000; Baird and Burton, 2008) and decision-making for providing responses (Banich and Karol, 1992; Iacoboni and Zaidel, 1996; Fernandino et al., 2007). In line with this proposal, Baird and Burton (2008) evaluated the nature of the inter-hemispheric cooperation during a non-verbal task. Specifically, they evaluated the effect of low sensory and high abstract level of the information transferred between hemispheres. They presented faces (familiar vs. unfamiliar) under unilateral and bilateral visual field presentation. The bilateral presentation was composed of two conditions, redundant (i.e., same face image projected simultaneously to both hemispheres) and non-redundant (NR) (i.e., different face images, each one presented to one hemisphere). In the NR condition, the two faces represented the same person (semantic similarity) taken in different positions (perceptual difference). Results revealed BG for familiar faces during both bilateral conditions presentation, redundant and NR. The authors suggest that the BG is not restricted to identical stimuli (perceptual information) but also concerns stimuli designating the same concept or same identity (semantic information). Nevertheless, although reduced, the perceptual information was still persistent during the bilateral NR presentation (i.e., same face). Consequently, the BG observed in this condition may be related to both, semantic and perceptual information shared between hemispheres.

Furthermore, Fernandino et al. (2007) investigated the IHI during a lexical decision task using a DVF experiment. Participants were asked to judge whether a target string-of-letters was a word (manual response "yes") or a pseudo-word (manual response "no"). Target items (word or pseudo-word) were always underlined in order to be easily differentiated from the distractors. Authors used two types of bilateral presentation to evaluate the effect of "lexical redundancy" during the IHI. In one of them, the distractor had the same lexical nature as the target item (i.e., both items were lexically related; if the target was a word, the distractor is a word too, i.e., congruent distractor condition). In the other one, the distractor was lexically different from the target item (i.e., if the target was a word, the distractor was a pseudo-word, i.e., incongruent distractor condition). Thus in the bilateral congruent distractor condition, both target and distractor induced the same response decision in each hemisphere. Conversely, in the bilateral incongruent distractor condition, the target and the distractor induced a different response decision in each hemisphere. Based on this experimental configuration, the authors investigated how the lexical nature of the distractor modulates target processing by each hemisphere (visual hemifield of presentation) at different levels (before or after response decision). Their results suggest that IHI takes place before the programming of the motor response. Specifically, they found that incongruent distractors delayed the lexical decision compared to perceptual distractors (i.e., string of XXXX in the opposite visual field).

Here we used an original paradigm based on previous studies (Banich and Karol, 1992; Fernandino et al., 2007), which manipulates the type of shared information between hemispheres during a lexical decision task (i.e., decide whether the stimulus presented is a real word or not; Chiarello, 1988). By manipulating the relationship between target and distractor, our paradigm allows us to specifically focus on perceptual characteristics (physical resemblance) of the information at pre-lexical level, on semantic characteristics (knowledge and meaning) during lexical access, and on decisional information (planning response) at the postlexical processing. Our major aim was to determine how the type of information modulates the inter-hemispheric cooperation; we focused on the degree of hemispheric involvement, namely, the increase or the decrease of the degree of inter-hemispheric asymmetry.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Forty native French speakers (15 males, Mean = 22.33 year, *SD* = 4*.*89 year) participated in the experiment. They had normal or corrected-to-normal vision and were right-handed (Mean = 98, *SD* = 15) as determined by the Edinburgh Handedness Inventory (Oldfield, 1971). All participants were undergraduate students and received course credits for their participation. They all gave informed consent to participate in the experiment.

#### **STIMULI**

Stimuli were built to meet the five modes of presentation (see **Figure 1** and Experimental Conditions section). We used 128 French words and 128 pseudo-words during a lexical decision task (see **Table A1**). Stimuli were controlled in a number of letters <sup>1</sup> (4–7), French lexical frequency (Lexique.org; New et al., 2004) and semantic relationship (Alario, 1999) according to each of the five experimental conditions (see below). The pseudo-words were built by changing three or four letters in words.

<sup>1</sup>To ensure that our results are not imputable to the number of letters (NL) composing the items, a complementary analysis has been done. Specifically, we performed an ANOVA by item in terms of accuracy (% CR) and latency (RT ms). ANOVA (Analysis of variance) included the NL (4–7) and the visual field of presentation (left, right). The results indicated lack of main effect of the NL (in terms of % CR, *F*2*(*3*,* <sup>60</sup>*)* = 1*.*82, *p* = 0*.*15 and in terms of RT, *F*2*(*3*,* <sup>60</sup>*)* = 1*.*64, *p* = 0*.*18). Also, the NL does not interact with the visual field of presentation neither in terms of % CR, *F*2*(*3*,* <sup>60</sup>*)* = 0*.*88, *p* = 0*.*45 nor in terms of RT, *F*2*(*3*,* <sup>60</sup>*)* = 1*.*73, *p* = 0*.*17).


**FIGURE 1 | Experimental conditions (see description in Material and Methods section).**

Further, 32 pairs of words were selected, from the 100 highest semantically related (SR) word pairs, from the Alario's database (Alario, 1999). The semantic association could lead to a taxonomic (apple–banana) or a contextual (apple–fruit) category. Alario's database is based on free verbal associations between names of concrete objects, made by 89 French participants. Our assumption was that the free semantic association of items favors the ecological approach of neural connectivity (neural functional units), without any prerequisite that could distort the results.

All target items were presented during the five experimental conditions. The other word of each pair was considered as distractor and presented during the bilateral semantically related (SR) condition. In addition, we selected two other words matching the distractor in terms of frequency of occurrence <sup>2</sup> , number of letters, lexical status and gender (see **Table A1**). One of these two words was presented during the bilateral semantically unrelated (SU) condition, and the other one during the bilateral NR condition. Pseudo-words were constructed from words and were presented during each experimental condition according to original words (e.g., the pseudo-word "rechai" was built from the real word "cahier," thus "reachi" was presented in the same experimental condition presenting "cahier"). To summarize, for each target-word and according to each condition of presentation we associated 32 semantically related words (SR condition), 32 SU words (SU condition) and 32 pseudo-words (NR condition). Similarly, for each target pseudo-words we associated 64 control pseudo-words (SR and SU conditions) and 32 words (NR condition).

E-Prime software (E-Prime Psychology Software Tools Inc., Pittsburgh, USA) was used for the experimentation. Stimuli were written in black "Courier New" font size 24 and displayed on the white screen of a computer monitor (screen resolution 1024 × 768 pixels) located at a distance of 57 cm from the participant's eyes.

# **EXPERIMENTAL CONDITIONS**

We considered the following five experimental conditions according to the relationship between target (presented in one visual hemi-field) and distractor (presented in the other visual hemifield):


<sup>2</sup>We also verified that distractors were not significant different in terms of lexical frequency *F*2*(*2*,* <sup>62</sup>*)* = 0*.*11, *p* = 0*.*89.

selected from Alario (1999) database. The information shared between target and distractor was semantic and decisional.


For (4, 5) conditions, only word trials were considered for analysis, the pseudo-words were used as a control.

#### **PARADIGM**

The whole experiment was divided into four blocks, each of them including the same number of words and pseudo-words (i.e., 32 items). Furthermore, each block was performed under two modes of presentation, unilateral (UVF) and bilateral (BR or BNR\_SR or BNR\_SU or BNR). Indeed, each block was composed of 32 unilateral and 32 bilateral trials for one on the four bilateral conditions. Thus, there were four blocks, one for each bilateral condition. Trials were presented randomly within each block. This presentation allowed a left and right visual hemi-field presentation of each target item. The whole experiment included 256 trials.

Each trial (**Figure 2**) began with a 500 ms fixation cross (in order to keep the gaze direction at the center of the screen) followed by a stimulus displayed for 180 ms, either in UVF (RVF or LVF) or simultaneously in both visual hemi-fields (RVF and LVF). The short duration of stimulus presentation insured monohemispheric presentation (Belger and Banich, 1992; Afraz et al., 2003). Stimulus presentation was followed by a 30 ms visual mask composed of a sequence of eight stars. The inner and the outer edges of the lateralized presented stimuli were located at 2 and 6◦ from the eyes fixation, respectively. The trial ended with a 1500 ms fixation cross. The target stimulus was underlined.

Participants were instructed to perform a lexical decision task based on deciding whether or not the underlined item (target) was a real French word. The task was the same for the five experimental conditions. Participants provided manual responses with their index and middle finger. The responding hand was controlled (Eviatar et al., 1997; Provins, 1997); as each participant responded with the right hand for half of the experimental blocks and switched to the left hand for each last half. Before the experiment, participants went through a short training session which included various items not shown during the experiment. The participant's reaction time (RT) and accuracy (% Correct Responses, CR) were recorded for each participant and condition.

#### **DATA PROCESSING**

A two-step ANOVAs (analysis of variance) was performed. First of all, we evaluated the degree of hemispheric specialization by considering all target items, independently of their lexical nature and for all conditions. In order to achieve this, we compared the performances for stimuli presented in the left vs. the right visual hemi-fields. Thus, if a right visual hemi-field advantage was observed which suggests LH predominance, a second level analysis was subsequently performed, this to evaluate the effect of considered variables.

Only words were considered for the second analysis. Specifically, the second level analysis was an ANOVA including all five experimental conditions according to the visual hemi-field of presentation. Four statistical contrasts were calculated according to the hypotheses: (1) BG: UVF vs. BR; (2) Effect of perceptual information: BR vs. BNR\_SR; (3) Effect of semantic information: BNR\_SR vs. BNR\_SU; (4) Effect of decisional information: BNR\_SU vs. BNR.

#### **RESULTS**

Accuracy (% CR) and latency (mean RT) values were included in two ANOVAs, one by participant (F1) and another one by item (F2).

#### **FIRST STEP ANOVA: HEMISPHERIC SPECIALIZATION**

The performances were collapsed for all conditions (unilateral and bilateral) and for all targets stimuli (words and pseudowords). Thus, we considered visual hemi-field of presentation (RVF-LH; LVF-RH) as a within-subject factor.

#### *Latency (mean RT)*

In terms of RT, results reveal main effect of visual hemi-field [*F*1*(*1*,* <sup>39</sup>*)* = 65*.*24; *PRE* <sup>3</sup> = 0.62; *p <* 0*.*05; *F*2*(*1*,* <sup>63</sup>*)* = 20*.*25; *PRE* = 0*.*24; *p <* 0*.*05] with faster responses for RVF-LH (*M* = 754*.*81 ms, *SD* = 20*.*90 ms) than for LVF-RH (*M* = 785*.*36 ms, *SD* = 26*.*12 ms), suggesting LH specialization (**Figure 3**).

#### *Accuracy (% CR)*

In terms of Accuracy, the results revealed the visual hemifield's main effects of [*F*1*(*1*,* <sup>39</sup>*)* = 65*.*24; *PRE* = 62; *p <* 0*.*05; *F*2*(*1*,* <sup>63</sup>*)* = 32*.*96; *PRE* = 0*.*34; *p <* 0*.*05] with more accurate responses for RVF-LH (*M* = 71*.*05%, *SD* = 1*.*35%) than for LVF-RH (*M* = 61*.*85%, *SD* = 1*.*11%). This result suggests that lexical decision was performed more accurately when stimuli were presented first to the LH (**Figure 3**).

#### **SECOND STEP ANOVA**

We considered the visual hemi-field of presentation (RVF-LH, LVF-RH) and the experimental condition (UVF, BR, BNR\_SR, BNR\_SU, BNR) as within-subject factors for word items. We first presented the omnibus ANOVA results according to both dependent variables. Then, we presented the statistically planned comparison according to each of our hypotheses. Significant results were only obtained in terms of accuracy. Based on latency, we did not obtain significant interaction with the omnibus ANOVA. Moreover, the planned comparisons according to each hypothesis were not significant.

#### *Latency (mean RT)*

In terms of RT, results revealed only a significant main effect of visual hemi-field [*F*1*(*1*,* <sup>39</sup>*)* = 15*.*71; *PRE* = 0*.*28; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 25*.*66; *PRE* = 0*.*45; *p <* 0*.*05] with faster responses for RVF (*M* = 728*.*29 ms, *SD* = 19*.*7 ms) than for LVF (*M* = 777*.*92 ms, *SD* = 24*.*39 ms) suggesting LH specialization. The main effect of experimental condition was not significant [*F*1*(*4*,* <sup>156</sup>*)* = 0*.*88; *PRE* = 0*.*022; *p* = 0*.*47; *F*2*(*4*,* <sup>124</sup>*)* = 1*.*74; *PRE* = 0*.*05; *p* = 0*.*14]. The interaction between the experimental condition and the visual hemi-field of presentation was not significant either [*F*1*(*4*,* <sup>156</sup>*)* = 0*.*49; *PRE* = 0*.*01; *p* = 0*.*73; *F*2*(*4*,* <sup>124</sup>*)* = 0*.*78; *PRE* = 0*.*02; *p* = 0*.*53].

#### *Accuracy (% CR)*

In terms of Accuracy, the results revealed a significant effect of visual hemi-field's presentation [*F*1*(*1*,* <sup>39</sup>*)* = 49*.*05; *PRE* = 0*.*55; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 62*.*82; *PRE* = 0*.*66; *p <* 0*.*05] with more accurate responses for RVF-LH (*M* = 69.40%, *SD* = 2*.*09%) than for LVF-RH (*M* = 52.10%, *SD* = 2*.*04%). Furthermore, our results reveal a significant effect of experimental conditions [*F*1*(*4*,* <sup>156</sup>*)* = 10*.*69; *PRE* = 0*.*22; *p <* 0*.*05; *F*2*(*4*,* <sup>124</sup>*)* = 15*.*14; *PRE* = 0*.*33; *p <* 0*.*05], see **Table 1**. More interestingly, our results reveal a significant interaction between the experimental condition and the visual hemi-field's presentation [*F*1*(*4*,* <sup>156</sup>*)* = 7*.*66; *PRE* = 0*.*16; *p <* 0*.*05; *F*2*(*4*,* <sup>124</sup>*)* = 8*.*74; *PRE* = 0*.*21; *p <* 0*.*05], see **Figure 4**. Consequently, we present below (**Table 1**) the results for each statistical contrast, according to our hypotheses.

#### **MODULATION OF THE IHI**

#### *By the type of visual presentation (UVF vs. BR)*

As shown in **Figure 4**, we obtained significant interaction between experimental conditions (UVF, BR) and the visual hemifield of presentation (RVF-LH, LVF-RH) [*F*1*(*1*,* <sup>39</sup>*)* = 4*.*85; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 6*.*08; *p <* 0*.*05] with a higher degree of interhemispheric asymmetry during UVF [*F*1*(*1*,* <sup>39</sup>*)* = 22*.*55; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 31*.*99; *p <* 0*.*05] than BR [*F*1*(*1*,* <sup>39</sup>*)* = 0*.*96; *p* = 0*.*33; *F*2*(*1*,* <sup>31</sup>*)* = 0*.*94; *p* = 0*.*33]. This result suggests supplementary recruitment of the RH with BG, reflecting increased hemispheric cooperation during bilateral presentation of linguistic stimuli.

#### *By the perceptual information (BR vs. BNR\_SR)*

As illustrated in **Figure 4** our results do not reveal a significant interaction between experimental conditions (BR, BNR\_SR)

<sup>3</sup>Instead of presenting Eta squared (used to illustrate the size effect) we present the percentage of reduction of error (PRE). Indeed, Greek letters (e.g., Eta squared) correspond to the mathematical formalization for population values (that we cannot know) and not the values for the sample. For this reason, we used the PRE that corresponds to the sample value of the partial Eta squared (Judd and McClelland, 1989, see Muller and Butera, 2007, Note 5).

and the visual hemi-field of presentation (RVF-LH, LVF-RH) [*F*1*(*1*,*39*)* = 1*.*69; *p* = 0*.*20; *F*2*(*1*,* <sup>31</sup>*)* = 2*.*64; *p* = 0*.*11]. This result suggests a lack of significant differences between both bilateral presentations. The perceptual information does not modulate the IHI.

#### *By the semantic information (BNR\_SR vs. BNR\_SU)*

As shown in **Figure 4**, we obtained a significant interaction between experimental conditions (BNR\_SR, BNR\_SU) and the visual hemi-field of presentation (RVF-LH, LVF-RH) [*F*1*(*1*,*39*)* = 10*.*69; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 7*.*99; *p <* 0*.*05] with a higher degree of inter-hemispheric asymmetry during BNR\_SU [*F*1*(*1*,*39*)* = 40*.*09; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 54*.*96; *p <* 0*.*05] than BNR\_SR [*F*1*(*1*,*39*)* = 7*.*87; *p <* 0*.*05; *F*2*(*1*,* <sup>31</sup>*)* = 9*.*18; *p <* 0*.*05]. This result suggests that the semantic information decreases the degree of inter-hemispheric asymmetry and modulates the IHI.

#### *By the decisional information (BNR\_SU vs. BNR)*

As illustrated in **Figure 4**, our results do not reveal significant interactions between experimental conditions (BNR\_SU, BNR) and the visual hemi-field of presentation (RVF-LH, LVF-RH) [*F*1*(*1*,*39*)* = 0*.*47; *p* = 0*.*49; *F*2*(*1*,* <sup>31</sup>*)* = 0*.*70; *p* = 0*.*41] suggesting that BNR\_SU and BNR induce a similar degree of interhemispheric asymmetry. The decisional information does not modulate the IHI.

#### **DISCUSSION**

The aim of the present study was to explore the modulation of inter-hemispheric cooperation during a lexical decision task according to the type of visual presentation (unilateral, bilateral) and three types of information processing (perceptual, semantic and decisional). A DVF experiment was used to compare performances for unilateral vs. bilateral simultaneous presentation of stimuli.

Our results replicated previous DVF studies reporting shorter RT and increased accuracy if the target was presented within RVF-LH than LVF-RH. This is consistent with the LH advantage for language processing (Iacoboni and Zaidel, 1996; Waldie and Mosley, 2000; Barnea et al., 2005; Vigneau et al., 2011). The results showed LH predominance for both unilateral and bilateral presentations and validated the DVF experimental paradigm.

Furthermore, we obtained BG effect for the BR compared to the unilateral presentation in RVF-LH. The BG suggests inter-hemispheric asymmetry reduction and increased interhemispheric cooperation by a supplementary involvement of the right hemisphere (see Lindell, 2006). Indeed, no significant difference was observed between the visual hemi-fields of presentation during redundant presentation. This pattern may be explained in terms of the facilitatory mechanism of information processing during identical simultaneous stimulus presentation (Mohr et al., 1996). Further, our results suggest that the BR presentation facilitates the cooperative work of the hemispheres, which increases behavioral performances. These results are in agreement with the neurocognitive model proposed by Pulvermüller (1996) and Pulvermüller and Mohr (1996) suggesting neural additive mechanisms (Mohr et al., 1994, 1996; Pulvermüller and Mohr, 1996).

We were also interested in identifying which type of information and processes modulates the inter-hemispheric cooperation during the bilateral redundant (BR) presentation. During the BR presentation, the information addressed to both hemispheres was identical (i.e., perceptual, semantic and decisional). Indeed, several studies suggest that this information is involved at different levels of visual word recognition and involves several types of IHI (Fernandino et al., 2007; Shipp, 2011; Doron et al., 2012). Results illustrated in **Figure 4** suggest that perceptual and decision-making processes are not sufficient to explain the BG observed during BR. Indeed, we did not obtain significant interaction between the BR and the bilateral non-redundant semantic related (BNR\_SR) condition, suggesting that the low perceptual information at pre-lexical level cannot explain alone the BG observed during BR presentation. Thus, the information processing at pre-lexical stage does not induce inter-hemispheric cooperation (see Chiarello, 1988).

Similarly, the lack of significant interaction between NR semantic unrelated (BNR\_SU) and lexical incongruent (BNR) presentations, suggests that the decision-making requirement at the post-lexical level cannot alone explain the BG observed during BR presentation. Accordingly, the information processing at the post-lexical level during lexical decision does not induce inter-hemispheric cooperation (Weems and Zaidel, 2005).

Our results suggest that only semantic information induces a significant difference of the degree of inter-hemisphere asymmetry during BR. Indeed, SR words (BNR\_SR) induce a lower degree of inter-hemispheric asymmetry than semantically unrelated words (BNR\_SU). This effect may be explained by supplementary involvement of the right hemisphere known to be equipped with semantic abilities (Hutchinson et al., 2003; Graves et al., 2010; Borovsky et al., 2013). In fact, both hemispheres show semantic abilities and complementary mechanisms (Beeman and Chiarello, 1998; Chiarello, 1998; Faust and Lavidor, 2003; Jung-Beeman, 2005). The time course of semantic processing varies across hemispheres (Koivisto, 1997): it starts earlier and gains more speed in the LH than in RH (Koivisto and Laine, 1995). This observation explains not only the supplementary right hemisphere involvement during semantic processing, but also why this effect was obtained only in terms of %CR but not in terms of RT as the LH is always faster in all types of visual presentation conditions. Nevertheless, it is important to notice that the comprehension of IHI mechanisms requires that behavioral measures were coupled with electrophysiological recording (Doron et al., 2012). Indeed, inter-individual variability in terms of duration of language processing, of inter-hemispheric transfer and of response programming may constitute an important factor. All its dimensions are difficult to distinguish by using the behavioral approach solely.

These results, which reflect cooperation between hemispheres for semantic information, may be explained according to Pulvermüller and collaborators (Pulvermüller, 1996; Pulvermüller and Mohr, 1996). The relationship between words used in our study derived from a database of words SR by the free semantic association (Alario, 1999). The semantic relationship between two words may activate a specific cortical representation (functional unit) since this semantic association is frequently used (Borovsky et al., 2013). Consequently, strong connections between cortical representations of word-pairs may be created **Table 1 | Summarizes the mean value and the standard (***italic values***) deviation for the dependent variables: percent of correct responses (% CR) and mean of correct response time (mean RT, ms), according to the type of presentation and the visual hemi-field/the hemisphere of presentation.**


*Abbreviations: UVF, unilateral visual hemi-field; BR, bilateral redundant; BNR\_SR, bilateral non-redundant semantic related; BNR\_SU, bilateral non-redundant semantic unrelated; BNR, bilateral non-redundant; RVF, right visual hemi-field; LVF, left visual hemi-field; LH, left hemisphere; RH, right hemisphere.*

and reinforced based on a large number of multimodal associations (Pulvermüller, 2012).

Interestingly, for unrelated semantic words (BNR\_SU) and also for lexical incongruent stimuli (BNR) presentations, increased RVF-LH performances were observed. Both BNR\_SU and BNR condition revealed greater inter-hemispheric asymmetry and suggested that the enhancement of the LH predominance may be detrimental for the right hemisphere performance. Surprisingly the LVF-RH performance was above the chance level (*<*50%) during these conditions (BNR, BNR\_SU) compared to other experimental conditions (**Table 1**). This pattern of hemispheric involvement may reveal that two different words (lexically congruent) or two lexically dissimilar items may induce IHI. However, this interaction is mainly inhibitory (see for a review Bloom and Hynd, 2005). Indeed, Fernandino et al. (2007) established that only lexical congruent and incongruent distractors would slow down the target processing, but not the perceptual distractors. According to hemispheric independence model (Iacoboni and Zaidel, 1996; Fernandino et al., 2007; see also Weems and Reggia, 2004), the performance of a lateralized task induces inhibition of the contralateral non-predominant hemisphere in order to reduce the interference and to increase the performance. Indeed, it is possible that LH starts to process the incoming information early, leading to the inhibition of other incoming information from the RH.

# **CONCLUSION**

This study explored the modulation of IHI during a lexical decision task by using a DVF procedure with BR presentation of stimuli. Our results confirmed two well-documented results: (1) LH lateralization for a lexical decision and (2) increase of hemisphere cooperation BG during BR presentation(s). Specifically, we investigated the effects of the type of information (perceptual, semantic and decisional) on the BG. Our results suggested that perceptual and decision-making information were not sufficient to explain the IH cooperation. These show the IH cooperation is less likely to emerge during pre-lexical (perceptual) and/or post-lexical (decision-making) processing but

#### **REFERENCES**


were so, mainly during lexical semantic processing, when the semantic information was shared between hemispheres. During the lexical processing, we can explain these results in terms of facilitatory mechanisms of cooperation and supplementary right hemisphere recruitment. Overall, our results indicated that the interaction between hemispheres may follow various mechanisms, some inhibitory and others facilitatory. Additional experiments will be needed to increase the robustness of these results.

in lateralization: effects of gender and handedness. *Neuropsychology* 11, 562. doi: 10.1037/0894-4105.11. 4.562


natural language. *Trends Cogn. Sci.* 9, 512–518.


of referential, interactive, and combinatorial knowledge. *J. Neurolinguist.* 25, 423–459. doi: 10.1016/j.jneuroling.2011.03.004


phonological, lexico-semantic, and sentence processing?: Insights from a meta-analysis. *Neuroimage* 54, 577–593. doi: 10.1016/j. neuroimage.2010.07.036


normal brain: evidence from redundant bilateral presentations. *Atten. Perform.* 15, 477–504.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 March 2013; accepted: 10 June 2013; published online: 27 June 2013.*

*Citation: Perrone-Bertolotti M, Lemonnier S and Baciu M (2013) Behavioral evidence for interhemispheric cooperation during a lexical decision task: a divided visual field experiment. Front. Hum. Neurosci. 7:316. doi: 10.3389/fnhum.2013.00316 Copyright © 2013 Perrone-Bertolotti, Lemonnier and Baciu. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*




**Frontiers in Human Neuroscience www.frontiersin.org** June 2013 | Volume 7 | Article 316 | **225**


*the target and the distractor in BNR\_SR condition (Alario, 1999).*

# Don't words come easy? A psychophysical exploration of word superiority

# *Randi Starrfelt\*, Anders Petersen and Signe Vangkilde*

*Department of Psychology, Center for Visual Cognition, University of Copenhagen, Copenhagen, Denmark*

#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Jonathan Grainger, Centre National de la Recherche Scientifique, France Stefan Heim, RWTH Aachen University, Germany*

#### *\*Correspondence:*

*Randi Starrfelt, Department of Psychology, Center for Visual Cognition, University of Copenhagen, O. Farimagsgade 2A, DK-1352 Copenhagen, Denmark e-mail: randi.starrfelt@psy.ku.dk*

Words are made of letters, and yet sometimes it is easier to identify a word than a single letter. This *word superiority effect* (WSE) has been observed when written stimuli are presented very briefly or degraded by visual noise. We compare performance with letters and words in three experiments, to explore the extents and limits of the WSE. Using a carefully controlled list of three letter words, we show that a WSE can be revealed in vocal reaction times even to undegraded stimuli. With a novel combination of psychophysics and mathematical modeling, we further show that the typical WSE is specifically reflected in perceptual processing speed: single words are simply processed faster than single letters. Intriguingly, when multiple stimuli are presented simultaneously, letters are perceived more easily than words, and this is reflected both in perceptual processing speed and visual short term memory (VSTM) capacity. So, even if single words come easy, there is a limit to the WSE.

**Keywords: reading, word processing, Theory of Visual Attention (TVA), word superiority effect, visual processing speed, visual short term memory**

# **INTRODUCTION**

The popular notion that we see words as images or objects is reflected in the widely held belief (aided by an email epidemic some years back) that as long as the first and last letters are correctly positioned it "deosn't mttaer in waht oredr the ltteers in a wrod are (*...*) bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe." Contemplating the time it takes to even read this misspelt sentence, its claim is obviously not entirely correct (see Grainger and Whitney, 2004). Single letter processing has been shown to be of utmost importance for word reading (e.g., Pelli et al., 2003; Grainger and Dufau, 2012), but the relationship between letter and word processing is complex and yet underspecified.

The *word superiority effect* (WSE) refers to the observation that when written stimuli are degraded by noise or brief presentation, letters in words are reported more accurately than single letters and letters embedded in non-words. This effect has been studied using different tasks, stimuli, and masking conditions (see e.g., Johnston, 1981). In the classical Reicher-Wheeler paradigm, words, non-words, and/or single letters are presented for a single, brief exposure duration and then masked, followed by a forced choice decision about which of two letters was present (Reicher, 1969; Wheeler, 1970). The finding of a superior performance with words in such experiments was one of the driving forces in the development of the Interactive Activation Model of visual word processing (IAM; McClelland and Rumelhart, 1981). In this model, word recognition is achieved through processing on three interactive levels, where activation on higher levels (i.e., word representations) may strengthen or inhibit activations on the letter level. These feedback connections were suggested to be important in explaining the WSE, as this top–down activation of letters renders them more active, than does bottom–up activation alone (which is more likely to be the case when the stimulus is a single letter or a string of unrelated letters).

Visual word processing, and its cerebral substrate, has been intensively investigated in both brain injured and normal participants following the suggestion that a region in the left occipito-temporal cortex—the Visual Word Form Area—may be specialized for the processing of written letter strings (Cohen et al., 2000, 2002). The relationship between the fast, parallel processing of words in canonical format (central presentation of same case words), and effects of word length on processing when stimuli are in some way degraded or distorted (cAsE MiXinG, s p a c i n g, or vertically tilted) has also been studied within this framework, and it has been suggested that the VWFA and the ventral visual stream contribute significantly to processing of words in a canonical format, while attentional mechanisms, relying more on the dorsal stream may a role when words in distorted formats are processed (Vinckier et al., 2007; Cohen et al., 2008). Although the precise role of the VWFA is highly debated (Dehaene and Cohen, 2011; Price and Devlin, 2011), there is general agreement that processing in the ventral visual stream is important for fast and efficient visual word processing. How single letters are treated by these systems has been less thoroughly investigated, but it may rely on processing in slightly different regions than words and letter strings (Flowers et al., 2004; James et al., 2005). Of particular interest for the current study is the point that "the processing of non-pronounceable letter strings cannot be assumed to be equivalent to single-letter perception" (James et al., 2005, p. 452).

This relates to a conceptual distinction made in the cognitive research on word, non-word, and letter processing, between the classical WSE (defined as superior report of letters in words over non-words) and the *word-letter phenomenon* (defined as superior report of letters in words compared to single letters, Jordan and Bevan, 1994). Both phenomena obviously reflect a "word superiority" in processing, and although the word-nonword effect has received the most attention in the experimental literature, the word-letter effect may be the most thought-provoking one: Even if words consist of single letters, and even if there are strong indications that individual letters must be processed for a word to be recognized, words enjoy a processing advantage compared with single letters. Following the IAM (McClelland and Rumelhart, 1981), most will agree that the word advantage is due to top–down effects on word recognition, that are absent or smaller for single letters. It is not clear, however, if this processing advantage may affect the threshold for visual processing of words and letters, or whether it is mainly reflected in the perceptual processing speed. It is also not known how word and letter processing may differ at the level of visual short term memory (VSTM). Can words be encoded as units or wholes in the sense that they are treated like entities in VSTM?

In the current study, we investigate these questions using classical psychophysical paradigms with words and letters as stimuli, and methods based on a Theory of Visual Attention (TVA; Bundesen, 1990). TVA is a theoretical framework for understanding and investigating attentional effects at the behavioral (e.g., Peers et al., 2005; Starrfelt et al., 2009; Vangkilde et al., 2011) and neurophysiological level (Bundesen et al., 2005). TVAbased experiments employ unspeeded, accuracy-based measures of perception and attention, and use computational modeling to derive several attentional parameters unconfounded by response times, from one single task. We focus on three of these parameters in the experiments reported here: (1) *t*0, the threshold of conscious perception measured in milliseconds; (2) *C*, the speed of visual processing measured in items processed per second; and (3) *K*, the capacity of VSTM measured in number of items. The parameters are illustrated in **Figure 1**, right panel. Parameters *C* and *t*<sup>0</sup> can be estimated both in tasks presenting a single stimulus, and in paradigms with multiple stimuli (whole report).

This study contains three experiments, each including both letter and word stimuli. The first, a computerized naming task, was used to familiarize subjects with the stimuli. In the second experiment, we compared performance with single words and letters at a range of exposure durations. This allowed us to investigate whether the WSE was present in a task where stimuli were to be reported (in contrast to the traditional forced choice tasks), and if so, whether the WSE is reflected either in the threshold for conscious processing (*t*0) or the perceptual processing speed (*C*), or both. In the third experiment we used a classical whole report paradigm with multiple stimuli, to estimate the capacity of VSTM (i.e., the *K*-value) for words and single letters. The speed *C* and threshold *t*<sup>0</sup> were also estimated in the whole report paradigm,

# **MATERIALS AND METHODS**

All experiments were conducted in a semi-darkened room, and subjects were seated ∼100 cm from a 19 CRT monitor running at 160 Hz.

# **SUBJECTS**

Twenty-one bachelor students (six male; mean age 23, range 19–36) at the University of Copenhagen participated in this study for course credits. All provided written, informed consent.

# **STIMULI AND MASKS**

The stimuli were the same in all three experiments and were presented in lower case Arial font (point size 40) in white on a black background. The order of tasks and stimulus conditions was counterbalanced across subjects. The letter-condition featured 25 letters of the alphabet (*w* excluded) with the average letter subtending 0.52◦ (width; range 0.11–0.92) by 0.83◦ (height; range 0.69–0.97) of visual angle. For the word-condition, 25 high-frequency, three-letter words were chosen so they could not be predicted by identifying only one letter of the word (see Appendix for a list of stimuli). A printed list of the stimuli was present during all experiments, for easy reference. The average word subtended 1.92◦ (width; range 1.32–2.41) by 0.99◦ (height; range 0.69–1.20) of visual angle. Masks were rectangular whiteon-black pattern masks (2.46◦ by 2.12◦ of visual angle) constructed of letter fragments, thus covering both letters and words completely.

#### **MATHEMATICAL MODELING**

The results from Experiments 2 and 3 were analyzed using Bundesen's theory of visual attention (TVA; Bundesen, 1990). According to TVA, stimuli in the visual field compete in a race for access to a limited visual short-term store of *K* items. Specifically, the speed at which a stimulus *x* in the visual field races for access to VSTM is given by,

$$\nu\_{\mathfrak{x}} = C \frac{\mathfrak{w}\_{\mathfrak{x}}}{\sum\_{z \in \mathcal{S}} \mathfrak{w}\_{z}} \tag{1}$$

where *C* is the overall speed of visual processing and *wx* is the attentional weight of stimulus *x* which is divided by the sum of attentional weights across all stimuli in the visual field, *S*. In other words, the competition for access to VSTM is represented by the attribution of attentional weights such that a stimulus with a high weight will be processed faster (i.e., have a higher probability of being represented in VSTM) than a stimulus with a low weight.

In the special case in which only a single stimulus is presented in the visual field *vx* = *C* (i.e., no competition) and the probability that the stimulus gets represented in VSTM is given by

$$p = 1 - e^{-\nu\_{\mathbf{x}}(\mathbf{r} - t\_0)} \text{ for } \mathbf{r} > t\_0 \tag{2}$$

where τ is the exposure duration of the stimulus and *t*<sup>0</sup> is the threshold of conscious perception. That is, if the exposure duration of the stimulus is shorter than *t*<sup>0</sup> the probability that the stimulus will be represented in VSTM is zero. However, if the exposure duration is longer than *t*<sup>0</sup> the probability will follow an exponential function (see **Figure 1**, right panel, for two examples).

In a single stimulus experiment attentional weights and *K* cannot be estimated resulting in a simple model with only two free parameters, *C* and *t*0. However, with larger display sizes the complexity of the model increases as does the number of free parameters (see Dyrholm et al., 2011, for a full specification of the model). In Experiment 3, we used a display size of six stimuli resulting in a model with 13 free parameters. Five parameters were used to characterize a probability distribution of the storage capacity of VSTM. Hence the *K*-value reported in the result section is the expected *K* given a particular probability distribution for each individual participant. Another five parameters were used to estimate the attentional weights (*w*-values) at each of the six stimulus locations (one attentional weight was fixed at a value of 1). The remaining three free parameter were used to estimate the threshold of conscious perception, *t*0; the speed of visual processing, *C*; and the sensory decay in the unmasked trials. In both Experiments 2 and 3, the individual data were fitted by an improved maximum likelihood fitting procedure using the LibTVA toolbox for MatLab (Dyrholm et al., 2011).

#### **EXPERIMENT 1. STIMULUS FAMILIARISATION**

Experiment 1 was a computerized naming task, used to familiarize subjects with the stimuli employed in Experiments 2 and 3. Half the subjects (*n* = 11) performed the letter task first. Stimuli were randomly selected and presented at the center of the screen with an inter-trial interval of 1 s from response to the next stimulus. Subjects were instructed to name the stimuli as quickly as possible, without making errors, and reaction times (RTs) were measured using a voice key. The letter and word conditions included 50 and 100 trials, respectively, and 10 practice trials were included in each condition. RTs below 200 ms and above 900 ms were considered voice key errors and were removed from the data. On average 5.6% (*SD* = 5) of the letter trials and 2.4% (*SD* = 2*.*7) of the word trials were removed.

#### **EXPERIMENT 2. SINGLE ITEM REPORT**

Experiment 2 tested identification of single stimuli flashed briefly at the center of the screen. Letters and words were presented in separate blocks of 160 trials. In total, subjects ran 320 trials per condition, and the first and second blocks for each stimulus type were preceded by 30 and 15 practice trials, respectively. In each trial, a single stimulus was chosen randomly and presented for one of eight exposure durations (6–80 ms, randomly intermixed). The stimulus was terminated by a pattern mask shown for 500 ms. Participants were instructed to make an unspeeded report of the stimulus, if they were "fairly certain" of its identity. Responses were recorded by the experimenter. To ensure foveal presentation, participants were required to focus on a centrally placed cross and then initiate the trial by pressing the right mousebutton.

The analysis first compared the proportions of correct responses for the different exposure durations for the two stimulus conditions. Then, participants' performance was modeled individually by TVA (see section Mathematical Modeling for details). This resulted in separate parameter estimates for visual processing speed (*C*) and threshold of conscious perception (*t*0) for all participants. Parameter estimates for letters and words were compared in paired-samples *t*-tests (see **Table 1**).

#### **EXPERIMENT 3. WHOLE REPORT**

Experiment 3 was designed to measure the participants' ability to perceive multiple independent stimuli simultaneously. Words and letters were presented in different blocks of 120 trials. There were four blocks in all. In every trial, six stimuli were chosen randomly without replacement from the stimulus sets described above. Stimuli were presented for one of six exposure durations (30–200 ms, randomly intermixed), and followed by either six pattern masks (500 ms), or a blank screen prolonging the effective exposure duration by a visual afterimage. Stimuli were shown at six locations on the circumference of an imaginary circle with a radius of 4.6◦ of visual angle centered on fixation(given this radius and the size of the words and letters used crowding effects between stimuli are minimal, see Kyllingsbæk et al., 2007). Again the instruction was to make unspeeded reports of the items which the subject was "fairly certain" of having seen, and responses were recorded by the experimenter. The first and second blocks for each stimulus type were preceded by 36 and 12 practice trials, respectively.

In the analysis, the raw scores (items correctly reported) for the different exposure durations were compared for the two stimulus conditions. Then, the performance of individual subjects was modeled by TVA (see section Mathematical Modeling for details) resulting in parameter estimates for speed of visual processing (*C*), threshold of conscious perception (*t*0), and capacity of VSTM (*K*), and attentional weights for each of the six stimulus positions. These weights were used to characterize any bias of attention toward the left or right visual hemifield by calculating a laterality index, *w*index, given as the ratio between the sum of the three weights in the left visual hemifield and the sum of all six attentional weights. This index ranges from zero (absolute rightsided bias) to one (absolute left-sided bias) with 0.5 indicating perfectly unbiased attentional weighting between the hemifields. An additional parameter was included to estimate the sensory decay in the unmasked trials; see Bundesen (1990). The mean estimates of *C*, *t*0, *K*, and *w*index across subjects were compared for the letter and word conditions using paired-samples *t*-tests (see **Table 1**).

# **RESULTS**

A summary of performance in the word and letter conditions in each experiment can be found in **Table 1**.

#### **EXPERIMENT 1. STIMULUS FAMILIARISATION**

Mean RTs (SDs) were significantly longer for single letters, *MLetterRT* = 476 ms (37), than for words, *MWordRT* = 441 ms (45), see **Table 1** for statistics. This difference was significant in 15/21 individual subjects. To be certain this was not attributable to the fact that there were more trials in the word condition, we also made this comparison with only the first 50 word trials. The RT advantage for naming words was slightly smaller when looking only at the first 50 word trials, *M*50*WordRT* = 447 ms (48), but the difference was still highly significant, *t(*20*)* = 3*.*75, *p* = 0*.*001.

#### **EXPERIMENT 2. SINGLE ITEM REPORT**

**Figure 2**, left panel, displays the raw scores (mean proportion of correct reports) for the two stimulus conditions at each exposure duration. Overall, words were identified significantly better than letters at all exposures from 19 to 37 ms. Participants were generally better at identifying words than letters, and significantly so in all conditions where floor effects (performance at exposures below the perceptual threshold) or ceiling effects (where performance were close to a 100% for both stimulus types) were not present. This difference was further qualified by the TVA-based parameter estimates.


*Units for individual parameters: Reaction time (ms), t0 (ms), C (items/s), K (items), and windex (unitless).*

Number of items correctly reported for words/letters at the different exposure durations. ∗∗*p <* 0*.*01, ∗∗∗*p <* 0*.*001.

A comparison of the mean TVA-estimates (across subjects) of *t*<sup>0</sup> and *C* in the two conditions (see **Table 1**) revealed that the mean *t*<sup>0</sup> values for letters (14.2 ms) and words (11.8 ms) were not significantly different. In contrast, the perceptual processing speed, the*C*-value, was significantly higherfor words (114 items/s) than letters (68 items/s). This performance pattern is illustrated for a single, representative subject in **Figure 1**, left panel.

#### **EXPERIMENT 3. WHOLE REPORT**

A comparison of the raw scores (items correctly reported, see **Figure 2**, right panel) showed that significantly more letters than words were reported at all exposure durations except for the shortest (30 ms), where performance in both conditions was close to zero. Indeed, the TVA-based modeling revealed that *t*<sup>0</sup> was above 30 ms for both stimulus types, and not significantly different between letters and words (see **Table 1**). In contrast, processing speed (*C*) was significantly higher for letters (33.0 items/s) than words (14.4 items/s) in this experiment. In addition, the analysis revealed that significantly more letters (3.9 letters) than words (2.5 words) were retained in VSTM (*K*). See **Figure 1**, right panel, for an illustration of a single subject's performance and parameter estimates for the whole report of letters.

#### **GENERAL DISCUSSION**

We investigated normal performance with single letters and short simple words in three experiments, aiming to explore the extents and limits of the WSE. In a naming task, we found that mean RTs were significantly shorter for words than letters. In our second experiment, single item report, we replicate the classical effect that words are identified better than letters with brief, masked presentation. Testing a range of stimulus durations, we found significantly better performance with single words than single letters at a several exposures between the perceptual threshold and ceiling performance.

In Experiments 2 and 3, we have adopted a novel approach to the investigation of the WSE by taking advantage of the TVA framework (Bundesen, 1990). This provides us with a more detailed picture of the factors underlying this effect, as we can derive several measures from one and the same task, and thus disentangle the contribution of e.g., perceptual processing speed and the threshold for perception. The combination of single item and whole report experiments further enables us to map out the perceptual process from the beginning of encoding the first word or letter, to the level where multiple word or letter representations are encoded in VSTM.

TVA-based modeling of data from Experiment 2 revealed that single words are processed significantly faster than letters, whereas the perceptual threshold did not differ between the two types of stimuli. In the third experiment, a classic whole report with multiple stimuli, a different pattern of performance emerged: Processing speed was faster for letters than words. Also, the capacity of VSTM, *K*, was significantly higher for letters than words.

#### **EXTENTS AND LIMITS OF THE WORD SUPERIORITY EFFECT**

Our findings indicate that the WSE is more general than previously reported. When presented in isolation, at the center of the visual field, single words are identified better than single letters at all exposure durations between the perceptual threshold and ceiling performance. The effect is also apparent in simple vocal reaction times to unmasked stimuli, perhaps indicating that words enjoy "superiority" not only at perceptual levels of processing. However, although single words are perceived and reported better and faster than single letters, words do not enjoy the same advantage when multiple stimuli are presented simultaneously. In such cases, single letters are processed faster than words, and, in addition, more single letters than whole words can be encoded into VSTM. Also, there is a general decrease in processing speed for both stimulus types from the single item to the whole report experiment. It is well-known that both errors and RTs increase with eccentricity (Eriksen and Schultz, 1977; Carrasco et al., 1995), and thus this speed dependence on eccentricity is not unexpected.

The WSE has typically been reported in experiments using brief, masked displays of single stimuli (e.g., McClelland and Johnston, 1977), and forced choice reponses. The type of masking or degradation required to evoke this effect has been widely debated (Johnston, 1981; Prinzmetal, 1992; Jordan and Bevan, 1994), and most studies have used the one exposure duration where subjects perform about 75% correctly (Pollatsek and Rayner, 2005). Our results suggest that this may not be necessary, as the WSE, at least when measured with a report task rather than forced choice, is evident over a range of exposure durations. Using a two-alternative forced-choice paradigm comparing performance with postmasked words and single letters, Jordan and de Bruijn(1993; see also Jordan and Bevan, 1994) found that word superiority persisted only when the same size masks were used for both words and letters but disappeared when the width of the masks were adjusted to the actual width of the individual stimuli. This latter approach, however, may inadvertently have resulted in a letter benefit as certain letters could easily be excluded just by the size of their masks. Hence, we used similar masks for both letters and words. Even if the WSE we observed in Experiment 2 could potentially be explained by the mask we used, this does not necessarily make the effect less interesting. Also, mask attributes cannot explain why the effect is reversed in Experiment 3, where the same stimuli and masks were employed.

In addition, the results of Experiment 1 indicate that words are processed more efficiently than single letters even when they are unmasked. Cattell (1886) was the first to record such a word superiority in vocal naming times, but the phenomenon has not been studied to any large degree, although it does, in our opinion, deserve further investigation. For instance, it is possible that some of the word advantage in RTs may have its roots on other levels of processing than in visual perception, and may perhaps be related to the ease of phonological retrieval. The relative speed of lexical and sublexical processing has been investigated within the framework of the Dual Route Cascaded model of reading (Coltheart et al., 2001). Sublexical processing (letter-sound translation processes) is slower than lexical (whole word) processing, and this may be related to the RT difference we observe between single letters and words. It may also be the case, however, that the advantage in visual processing speed observed for words compared to letters in Experiment 2 contributes to the overall difference in RT, and this would be interesting to investigate further.

One question that remains is why words—when they are so effectively processed alone—do not enjoy the same advantage when multiple stimuli are presented simultaneously. Why can our subjects not encode as many words into their VSTM as they can letters? First, this argues against the notion that words are processed as units, or at least as units encodable in VSTM. Similar to the 7 ± 2 rule for verbal short term memory (Miller, 1956), VSTM is known to have a capacity of about four items (Sperling, 1960). This is qualified by the finding that capacity decreases as objects become more complex (Alvarez and Cavanagh, 2004), which could perhaps explain our finding, as words are obviously more visually complex than single letters. On the other hand, some studies indicate that VSTM capacity is larger for objects of expertise than unfamiliar objects (Curby et al., 2009). Being fluent readers, our subjects are indeed experts in word identification, and in that light the limit of their VSTM capacity for words seems surprisingly low. Another possible explanation of the reversed effect in the whole report experiment is that stimuli were presented outside the central visual field (at 4.6◦ from fixation). Although previous work indicates that there is little crowding between stimuli at this eccentricity (Kyllingsbæk et al., 2007), "within stimulus crowding" (i.e., lateral masking) may have affected the processing of words in this condition. Jordan and Patching (2004) have shown that the word-letter phenomenon can be reversed when stimuli are presented in lateralized displays, which resembles the effect we find in Experiment 3. They suggest that while crowding effects (or effects of lateral masking) are counteracted by strong lexical activations when words are presented foveally, such top–down effects do not prevent crowding in lateralized displays. This presents a challenge for our ability to measure the capacity of VSTM for word stimuli, however, as it will be difficult to avoid both within and between stimulus crowding in the same paradigm, while keeping stimuli in the central visual field.

It is worth noticing, however, that if we count the number of letters encoded in the word condition in Experiment 3, we do see a WSE: While our subjects could only encode a mean of 2.5 (i.e., 2 or 3) three-letter words at the longest exposure durations, this of course translates to them having encoded between six and nine *letters*. This is clearly superior to their performance in the letter condition, where the mean capacity was about four letters. Thus, the WSE may be said to be present also in the whole report condition, but not to the same extent as in the single item task.

#### **FUTURE DIRECTIONS**

We have previously used methods based on TVA to investigate visual processing in the disorder of pure alexia, where word reading is disrupted by brain injury, typically affecting the visual word form area and surrounding structures (Starrfelt et al., 2009, 2010). We have shown that this seemingly selective reading disorder is characterized by reduced central processing speed not only for letters but also for digits, and reduced VSTM capacity for both types of stimuli. An interesting extension of the current work would be to compare pure alexia patients' performance with words and letters using similar paradigms. The reading deficit in pure alexia affects both word and letter identification, but yet a WSE (words vs. non-words) has been reported in some patients with this disorder (Behrmann et al., 1998). Indeed, in the same patients where we observed reduced central processing speed and VSTM capacity for unrelated letters and digits, we also found better report of letters from words compared with non-words (Starrfelt et al., 2013), although the WSE was generally smaller in patients than in controls. The word-letter experiments presented in the current paper seem fit to characterize the relationship between letter and word processing in pure alexia further. Pure alexia is thought to be a deficit in parallel processing of letters, resulting in a compensating strategy of serial letter identifications (and thus a large effect of word length on reading times). If this is the case, we should expect patients to show the opposite pattern of performance in our single stimulus word-letter experiments compared to normal subjects: they should be slower in naming words than letters, and show reduced processing speed for words compared with letters. Indeed, if pure alexia truly abolishes parallel letter processing, one would expect their threshold for identifying three letter words to be three times as high as for single letters.

#### **CONCLUSION**

We have shown that the WSE, at least for simple short words, can be revealed in vocal reaction times, and that part of this superiority is probably caused by increased visual processing speed for words compared to letters. This fits neatly with previous observations of the WSE, and the interpretation that top–down connections may enhance processing of letters in words, while single letter processing may rely more on bottomup signals. A novel finding is that the WSE is significant at a range of exposure durations, which means that at least in our paradigm, the meticulous search for a given performance level is not necessary to reveal the effect. Rather, words seem to be processed better or faster than letters from the threshold of perception. When several stimuli are presented simultaneously, we find the opposite result: letters are processed faster than words, and more letters than words can be encoded in VSTM. This indicates that words are not treated as units in VSTM.

#### **ACKNOWLEDGMENTS**

This work was supported by a grant to the Center for Visual Cognition from the University of Copenhagen's Centre of Excellence Program and from the Danish Research Council for Independent Research to R. Starrfelt. We are grateful to the Danish Lexicographic Society for creating a list of orthographic neighborhood size (N-size) for Danish words. The first author is indebted to CG Fakutsi for keeping the words together, and to an anonymous computer technician in Majori, Italy, for rescuing the data from these experiments. Thanks to Felicia Kettelz and Mark Ruby for testing and data coding.

#### **REFERENCES**


*Psychol. Rev.* 108, 204–256. doi: 10.1037/0033-295X.108.1.204


*Psychol. Hum. Percept. Perform*. 19, 549–563. doi: 10.1037/0096-1523. 19.3.549


presentations. *Psychol. Monograph Gen. Appl*. 74, 1–29.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 12 August 2013; published online: 04 September 2013.*

*Citation: Starrfelt R, Petersen A and Vangkilde S (2013) Don't words come easy? A psychophysical exploration of word superiority. Front. Hum. Neurosci. 7:519. doi: 10.3389/fnhum.2013.00519*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Starrfelt, Petersen and Vangkilde. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **APPENDIX**

#### **WORD STIMULI**

All words are high frequency Danish words, with high neighborhood-size. At least two neighbor words were included in the list for all stimuli, thus making it necessary to process at least two, and for most words all three letters in the word to identify it correctly.


*aBergenholtz (1992).*

*bNumber of words in the Danish dictionary (www.ordnet.dk/ddo) differing from the target by only one letter. Values kindly calculated by the Danish Lexicographic Society.*

# Resolving the orthographic ambiguity during visual word recognition in Arabic: an event-related potential investigation

# *Haitham Taha1,2,3 and Asaid Khateb1,2\**

*<sup>1</sup> The Unit for the study of Arabic language, Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Faculty of Education, University of Haifa, Haifa, Israel*

*<sup>2</sup> Department of Learning Disabilities, Faculty of Education, University of Haifa, Haifa, Israel*

*<sup>3</sup> The Cognitive Laboratory for Learning and Reading Research, Sakhnin College for Teachers' Education, Sakhnin, Israel*

#### *Edited by:*

*Urs Maurer, University of Zurich, Switzerland*

#### *Reviewed by:*

*Thomas Koenig, University Hospital of Psychiatry, Switzerland Marina Laganaro, University of Geneva, Switzerland*

#### *\*Correspondence:*

*Asaid Khateb, The Unit for the study of Arabic language, Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Faculty of Education, University of Haifa, Mount Carmel, Haifa 31905, Israel e-mail: akhateb@edu.haifa.ac.il*

The Arabic alphabetical orthographic system has various unique features that include the existence of emphatic phonemic letters. These represent several pairs of letters that share a phonological similarity and use the same parts of the articulation system. The phonological and articulatory similarities between these letters lead to spelling errors where the subject tends to produce a pseudohomophone (PHw) instead of the correct word. Here, we investigated whether or not the unique orthographic features of the written Arabic words modulate early orthographic processes. For this purpose, we analyzed event-related potentials (ERPs) collected from adult skilled readers during an orthographic decision task on real words and their corresponding PHw. The subjects' reaction times (RTs) were faster in words than in PHw. ERPs analysis revealed significant response differences between words and the PHw starting during the N170 and extending to the P2 component, with no difference during processing steps devoted to phonological and lexico-semantic processing. Amplitude and latency differences were found also during the P6 component which peaked earlier for words and where source localization indicated the involvement of the classical left language areas. Our findings replicate some of the previous findings on PHw processing and extend them to involve early orthographical processes.

#### **Keywords: Arabic orthography, pseudohomophones, orthographic decision, N170 component, P2 component, P600 component, source localization**

# **INTRODUCTION**

Visual word recognition in alphabetic orthographies is thought as a multi-sequential process in which different sub-processes occur within determined time windows (McClelland and Rumelhart, 1981; Rumelhart and McClelland, 1982; Bentin et al., 1999; Dehaene et al., 2005; Martin et al., 2006). These time windows represent different stages of information processing, which in written words correspond to the orthographic, phonological, and lexico-semantic ones (Salmelin et al., 1996; Pammer et al., 2004). Different theoretical models tried to describe the sequence of occurrence of such cognitive processes during visual word recognition. In the dual route models, where word familiarity and frequency play a major role (Coltheart, 2005), reading non-familiar words is supposed to rely on phonological decoding strategies (the non-lexical route), while reading familiar ones passes through the orthographic knowledge only (lexical route). Accordingly the semantic access may occur either after the phonological decoding takes place or directly using the orthographiclexical access. Contrary to the dual route model, other models postulate the existence of one single route which allows a direct access from orthography to phonology and then into semantics (Plaut et al., 1996; Seidenberg et al., 1996). Yet, the different models agree with the notion that the process of visual word recognition begins with processing the orthographic features of the written words. In this regard, it is important mention that the extent to which the orthography of a given system reflects the phonology of its written words might vary between the different languages, depending on the regularity of the so called "grapheme-to-phoneme consistency" (e.g., Van Orden et al., 1990; Lukatela and Turvey, 1994a,b; Frost, 1998). The degree of such consistency was found to determine the use of the different routes during word recognition, a notion known as the "orthographic depth hypothesis" (Frost, 2005). Indeed, Frost et al. (1987) have observed that the lexical route has little impact in highly transparent orthographies like the Serbo-Croatian one, in contrast to English, where the lexical route is the dominant route for word recognition (Frost, 1998).

The question of the time of occurrence of the various stages of processing (and the routes involved) in visual word recognition in the different orthographies has often been investigated using event related brain potentials (ERPs) during various reading paradigms (see for instance Bentin et al., 1999; Kaan and Swaab, 2003; Simon et al., 2004, 2007; Maurer et al., 2005b, 2008; Grainger et al., 2006; Holcomb and Grainger, 2006; Braun et al., 2009; Briesemeister et al., 2009). One of the tasks used in this context was the lexical decision task (LDT) with pseudohomophone words (hereafter PHw, McCann and Besner, 1987; Seidenberg et al., 1996; Braun et al., 2009). PHw are pseudowords that differ from real words in their orthography but have the same phonology (e.g., *Brane* for the word *Brain* in English). In such tasks, a PHw effect is usually found in terms of longer response time and higher error rate for PHw compared to real words. Using PHw in LDTs, and specifically because of the phonological similarities between the stimuli, enforces the analysis of the orthographic features of the word in order to make the decision (see Ferrand and Grainger, 1992). In ERP studies of visual word recognition, several studies in various alphabetic orthographies have linked different early components with the hypothesized stages of word processing (Bentin et al., 1999; Kaan and Swaab, 2003; Grainger et al., 2006; Holcomb and Grainger, 2006; Simon et al., 2006, 2007; Maurer et al., 2008; Braun et al., 2009; Briesemeister et al., 2009). Most consistently, the N170 component, a negative occipito-temporal response at ∼170 ms (Simon et al., 2004, 2006; Maurer et al., 2005a; Bar-Kochva, 2011; Horie et al., 2011; Taha et al., 2013) was linked with the orthographic stage in word recognition. For instance, it was found that stimulus repetition and familiarity modulate the N170 component (Simon et al., 2007). Also, Maurer et al. (2005b) found that the orthographic expertise effects appear around ∼170 ms after stimulus onset. In Arabic language, we have recently shown that the words' internal orthographic connectivity modulated both the amplitude and latency of the N170 (Taha et al., 2013). In studies using PHw and real words, ERP differences were also reported during the early components (Newman and Connolly, 2004; Yeung et al., 2004; Grainger et al., 2006; Braun et al., 2009). For example, in a recent study Comesana et al. (2012) analyzed ERPs to examine the role of phonological and orthographic overlap in the recognition of cognate and non-cognate words, conditions that mimicked to some extent the effects of PHw in English and Portuguese. The authors indicated that the differences observed around the P2 component indexed an initial discrimination of the stimuli on the basis of their physical properties. A similar interpretation was proposed by other results with logographic orthographies (Kong et al., 2012). However, in another recent ERP study it was found that responses evoked by PHw differed significantly from those evoked by the real words (taksi vs. taxi) already around 160 ms following stimulus presentation (Braun et al., 2009). This difference which occurred around the time period of the N170 component was explained as expressing an early phonological processing step and not an orthographic one. This interpretation appears to be in contradiction with many other ERP studies which support the notion that this early time window is more related to orthographic processing (see above) and to other studies that suggest that phonological processing occurs later in time. Indeed, it had been proposed that the phonological stage in visual word recognition is reflected in the N320 component, which is measured in mid temporal regions at ∼320 ms (Bentin et al., 1999; Simon et al., 2004; Khateb et al., 2007b). This component was found to be modulated by orthographic transparency of the writing system, suggesting that it reflects the sublexical mapping between orthography and phonology (Simon et al., 2006). Regarding the stage of lexical access, which is thought to occur at a later stage after the orthographic and the phonological ones, this has been suggested to involve later components such as the N400 which has repeatedly been linked to lexical-semantic processing (Halgren et al., 2002; Simon et al., 2004; Khateb et al., 2010; Kutas and Federmeier, 2011). Since PHw have the same phonology and semantics as their basic words, it was found that no differences were observed between these conditions around this component (Braun et al., 2009; Briesemeister et al., 2009). Thus, Braun et al. (2009) found that the differences around the N400 were found between words and non-words but not between words and their PHw. In contrast, it has been reported that PHw modulated the P600 component (Vissers et al., 2006), a brain response which had frequently been associated with orthographic error detection and other anomalies. The modulation of the P600 by PHw was interpreted as reflecting a process of monitoring that takes place during language perception and when the cognitive system is found in an indecision state. Support for this notion was recently found in a study on Chinese, a non-alphabetic orthography, where a modulation of the late positive component (600–1000 ms) was reported during orthographic decision and semantic tasks (Kuo et al., 2012). Other researchers suggested that this component is modulated by stimulus familiarity and represents the search for these stimuli in memory, such as with infrequent words (Allan et al., 1999), pseudowords or irregular words (Osterhout and Hagoort, 1998; Shaul, 2011), and as with words that are syntactically inappropriate (Osterhout and Holcomb, 1992; Kaan and Swaab, 2003; Van Herten et al., 2005).

In view of the fact that word recognition in Arabic has scantily been investigated using physiological measures, our objective here was to investigate the time course of word recognition in Arabic with a special emphasis on orthographic processing steps, using an orthographic decision task with real words and PHw. Given the fact that Arabic is an alphabetic language, we expected to replicate previous findings about PHw effects reported in other orthographies. Also, assuming that Arabic has many unique orthographic features, we expected a particular modulation of the early ERP components that are thought to reflect the orthographic stages of word processing. Indeed, the Arabic language has a very particular alphabetic orthographic writing system consisting of 29 letters of which three are long vowels. Short vowels are not considered as part of the alphabet and are represented by diacritical marks added above or below the letters (see Taha, 2013). Most Arabic letters have more than one written form, depending on the letters' position within the written word (in the beginning, middle, or end of word) and on the letters' connectedness with former and subsequent letters (see Taha et al., 2013). In addition, different letters may have the same essential shape and can differ only by the presence (or not) of one or more dots, or by the location of the dots on or below the letter (for example: . In order to provide the full phonological information in the written Arabic words, these have to be vowelized by diacritical marks (representing short vowels) added above and below the letters within the word. In the case of vowelized written words, the written patterns are considered as shallow orthography, while in the case of nonvowelized written words, the orthography is considered as a deep one. In this later case, which usually appears in texts dedicated to adult readers (Abu-Rabia, 2001), the phonology is not entirely reflected through the orthography and the reader must rely on the context cues to read correctly. Most particularly for our purpose and the task used here, the Arabic phonological system includes a group of phonemes referred to as the "emphatic phonemes." An emphatic phoneme is one that share a phonological similarity with another phoneme in Arabic and use the same articulation parts of the articulacy system but represented by two different graphemes (for example: the letter represents an emphatic phoneme, and its similar is the letter = d, but the = d itself is not an emphatic one). In Arabic vernaculars (i.e., spoken Arabic dialects), some of these emphatic phonemes are absent within the specific phonological system of certain dialects (for example the emphatic does not exist within some spoken vernaculars). As a main result of the phonological similarity between one emphatic phoneme and its similar (although) nonemphatic phoneme, there are difficulties in spelling words that include one emphatic phoneme or more. Such difficulties appear as inaccuracy in spelling, in the form of phonologically plausible orthographic errors where the subject writes down a PHw instead of writing the correct orthographic pattern of the word (e.g., instead of , like the word "kat" instead of "cat" in English). It means that, when making these errors the subject relies on simple phoneme-to-grapheme mapping and not on the specific orthographic knowledge stored in long term memory about such words. Therefore, writing down words that contain those emphatics requires a specific familiarity with the word's orthographic pattern and demands additional cognitive and memory resources. In this regard, it was suggested that difficulty in discrimination between empathic phonemes and their similar non-empathic ones, together with the lack of sufficient orthographic knowledge, is the main reason for producing the abovementioned phonologically plausible orthographic errors during spelling in Arabic (see Abu-Rabia and Taha, 2004, 2006; Taha, 2013). The orthographic decision task used here with real words and PHw, while neutralizing phonological and semantic effects, aimed replicating previous findings about the PHw effects and tracking more specifically visual orthographic processes. Indeed, given the fact that at the phonological level the real words and PHw are identical and at the semantic level they activate the same meanings, we predicted that the discrimination between words and their corresponding PHw would be a relatively difficult task that relies primarily on a careful orthographic analysis. In addition, such a discrimination process, to be efficient, should recruit additional cognitive resources to allow retrieval of orthographic knowledge from long term memory. Therefore, differences in the ERP between real words and PHw were hypothesized to occur during early and late stages of stimulus processing, but not during time periods necessarily devoted to phonological and lexico-semantic processing. At the behavioral level, we expected faster RTs to correctly written words than to PHw.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Eighteen right handed (15 females and 3 males) Native Arab students were recruited from the University of Haifa to participate in this study using an orthographic decision task during EEG recordings. Their age ranged from 19 into 34 with mean age of 23.4 and *SD* = 3*.*8. All the participants were right handed with normal reading development and without attention difficulties or other sensory, emotional or neurological disorders. All had normal or corrected-to-normal vision, gave their informed consent prior to the inclusion in the study, and were paid for their participation (35 ILS/h).

# **STIMULI AND PROCEDURE**

The stimulus list was composed of 80 real words and their 80 corresponding PHw. The words consisted of 40 concrete literary Arabic nouns and 40 verbs varying from middle to high lexical frequency (mean frequency = 3.65 ± 0.88 on a scale from 1 to 5 by 23 raters). The selected words were between 3 and 6 letters length (mean 4.31 ± 0.79) with an average number of syllable = 2.52 ± 0.82 (range between 2 and 4 syllables). For the purpose of the study, 80 corresponding pseudohomophonic words (PHw) were created. Half of the PHw were produced by replacing a letter in the beginning syllable and the other half by replacing a letter in the last syllable of a real word while keeping the phonology of the word identical [see the following examples: (i) the word modified into and (ii) the word modified into . Taken together, words and PHw totalized 160 stimuli which were pseudorandomly mixed and divided into two experimental blocks each containing 80 stimuli.

Participants were seated comfortably in front of a computer screen, approximately at 90 cm distance and performed a speeded orthographic decision task. Since the main objective of the study was to characterize brain responses involved in the orthographic analysis of the words in Arabic, this task appeared more suitable than a standard LDT in which PHw have to be rejected as non-words while in the mean time they activate the same phonological and semantic processes. Hence, non-words were not used here and the subjects had in the present task only to respond whether or not the presented stimulus was written correctly without implying other phonological and lexico-semantic analysis. Each stimulus was presented for 700 ms on the center of the screen in white over gray background. After each stimulus, they were asked to decide as quickly and accurately as possible using two keyboard keys. The response window was of 1550 ms. The stimuli were written with "Traditional Arabic Fonts" with point size of 45 using the E-Prime v.II software (Psychology Software Tools, Inc., www.pstnet.com/ PA, USA).

# **EGG RECORDINGS AND ANALYSIS**

Experiments were carried out in an isolated, sound attenuated room. Electroencephalographic (EEG) recordings were collected continuously using a 64 channel BioSemi Active Two system (www.biosemi.com) and the ActiveView recording software (2009). Pin-type electrodes were mounted on a customized Biosemi head-cap, using an electrode gel and arranged according to the 10–20 international system. Two flat electrodes were placed on the sides of the eyes in order to monitor horizontal eye movements. A third flat electrode was placed underneath the left eye in order to monitor vertical eye movements and blinks. The EEG signals were collected reference free (i.e., Biosemi active electrodes), with a 0.25 high pass filter, amplified and digitized with a 24 bit AD converter, at 2048 Hz sampling rate.

ERP epochs were averaged and analyzed offline using the Brain Vision Analyzer software (Brain-products). The EEG data were first filtered (Low pass filter: 30 Hz and High pass filter: 1 Hz), then ocular artifacts were corrected using the Gratton et al. (1983) method and the data were afterwards re-referenced to the common average of all electrodes. The epochs were determined from 100 ms pre-stimulus baseline and 900 ms post-stimulus only for correct responses. Artifacts were rejected (artifacts were defined by amplitudes greater than 50µV and lower than −50µV). The resulting data were baseline-corrected for each subject using the 100 ms pre-stimulus interval and then down-sampled to 512 Hz.

#### **ERP WAVESHAPE ANALYSIS**

In order to characterize response differences between words and PHw, we conducted two analysis. First, we performed a global analysis using point-wise *t*-tests on the individual ERPs of the two conditions using Cartool software© (v.3.43; https://sites*.*google*.* com/site/fbmlab/cartool). This aimed at determining time periods and scalp location exhibiting difference between words and PHw. Hence it was performed over all time frames (stimulus onset to 700 ms, i.e., 358 time points) and all recording sites. Time periods that exhibited significant *t*-values (at *p <* 0*.*05) during at least 5 consecutive time frames (∼10 ms) and involved at least three adjacent electrodes, were considered as significant. In the second analysis, and on the basis of previous findings with PHw (see Introduction) and on our results on Arabic orthography (Taha et al., 2013), we compared the amplitude and latency of the N170, P2, and P6 components between conditions. In the analyses presented hereafter, we computed the mean signal for the N170 component in each subject and condition in the time period between 170 and 190 ms from the three left posterior (P7, PO7, and O1) and three right posterior (P8, PO8, and O2) which exhibited the maximum negativity (at PO7) for this component. We then computed the mean amplitude for the P2 component in the time period between 250 and 280 ms from the same electrodes since these were again the ones that exhibited the maximum positivitiy for the component (at PO8, see **Figure 2** below). Finally, in view of previous findings regarding the P6 component (see Introduction), the analysis of the late responses were performed around the peak of the P6 component. For this purpose, we computed the mean amplitude from four left central and centroparietal electrodes (C1, C3, CP3, and CP1) and four right ones (C2, C4, CP4, and CP2) during the time period 450–600 ms. For the analyses of the components' latency, we determined in each subject for the N170 the latency of the most negative time point between 120 and 200 ms and immediately after the most positive time point for the P2 from the same subset of electrodes (as for the amplitude, see above). For the P600, we first computed in each subjects the average of 15 central, centro-parietal and parietal electrodes around Cz, CPz, and Pz which showed the highest P6 amplitude. The resulting individual "P6" waves were then lowpass filtered at 5 Hz [to avoid the selection of spurious peaks, as in Moreno and Kutas (2005), Khateb et al. (2010) for the N400 component] and from these were determined the latency of the most positive peak occurring after 450 ms. Statistical analyses were then conducted on these measures using repeated measures ANOVAs with word condition (word vs. PHw), hemisphere and electrode as within subject factors.

# **SOURCE LOCALIZATION ANALYSIS**

This analysis aimed at estimating the location of the sources in the brain whose activity differentiated the two conditions. Here, we applied LAURA (Grave De Peralta Menedez et al., 2001), a distributed linear inverse solution, to estimate brain regions that lied behind the ERP differences between conditions. This technique, like other distributed inverse solution algorithms, deals with *a priori* unknown number and location of active sources and uses a real head shape model with 4024 solution points in the gray matter. This technique has now been used in a large variety of cognitive paradigms including language tasks (see Ducommun et al., 2002; Ortigue et al., 2004; Blanke et al., 2005; Thierry et al., 2006; Khateb et al., 2007a,b, 2010; Taha et al., 2013). Here LAURA was applied to the topographic maps computed in each subject and condition, from the mean signal of the periods of interest. The individual inverse solutions were first averaged to display the mean source localization over subjects and then were compared statistically using paired *t*-tests with the significance level fixed at *p <* 0*.*01. Sources localization were then reported using the Talairach and Tournoux's (1988) *x, y, z* coordinates.

#### **BEHAVIORAL ANALYSIS**

The mean of the individual reaction times and the individual rate of correct responses were computed separately of the words and the PHw conditions. These values were compared statistically using paired *t*-tests.

# **RESULTS**

# **BEHAVIORAL MEASURES**

The individual means of the RTs and the rate of correct responses were computed for each subject in each condition. Responses with RTs below 250 ms were excluded from the individual mean responses. The paired *t*-test comparing the individual performance in words and PHw showed no significant difference (*p* = 0*.*55, mean = 86 ± 10.8% and 83 ± 19.4% respectively). The comparison of the RTs revealed significantly faster responses for words than for PHw (*t* = 0–3.0, *df* = 17, *p <* 0*.*009, mean = 729 ± 124 and 784 ± 147 ms respectively). The comparison of the accuracy and the RTs for the PHw where the letters in the real word were changed in the first and in the last syllables (while keeping the phonology of the word) showed no significant difference both in terms of accuracy and RTs. Similarly, no difference between these two types of PHw was also observed when comparing the individual standard deviations of the RTs.

# **ELECTROPHYSIOLOGICAL ANALYSIS**

Due to technical problems during EEG recordings and to the presence of a high amount of artifacts in other subjects, three subjects were excluded and the analysis presented here concerns 15 subjects. As indicated in the Methods section, the first analysis using point-wise *t*-tests on all time frames and electrodes aimed at identifying time periods and locations where the electrophysiological signal differed between words and PHw. This analysis is presented in **Figure 1**, which depicts graphically in A the significant *p*-values (at *p <* 0*.*05) on all electrodes and over all time frames (for 10 ms consecutively) up to 700 ms post-stimulus. It shows that the earliest differences appeared around the time window of the N170 component. The significant differences appeared then at around 250 ms and then between around 500 ms, with the latter differences extending also after 750 ms. The upper row in **Figure 1B** displays the *t*-maps successively for the first period (N170), then immediately after for the second period (referred hereafter to as the P2, at 250 ms) and finally for the late period (hereafter the P6, between 450 and 600 ms). The lower raw in **Figure 1B** shows the location of the electrodes with significant differences. These schematic maps indicate that: (i) the differences

**FIGURE 1 | (A)** Graph depicting the statistical *p*-values (yellow-green for *p <* 0*.*05; dark red for *p <* 0*.*01 and highest values) of the exploratory point-wise *t*-tests analysis (see Methods) performed on all time points from 0 to 700 ms (i.e., 358 time frames, *x*-axis) and all 64 recording sites (*y*-axis) for the comparisons words vs. PHw. This analysis showed that robust (highest *p*-values) and consistent (longest in duration and implying many electrodes) differences appeared already in the N170 time window, then at ∼250 ms (i.e., P2 component) and then around the P600 component. **(B)** This panel illustrates *t*-maps (upper raw) successively for the time periods at 160, 250, and 500 ms (see the same color scale for *t*-values). The lower raw presents on schematic maps the location of the electrodes depicting significant differences during these time points, and shows that during the N170 there were six adjacent posterior and six anterior sites with significant *p*-values (lower left) and in the other periods there was also a high number of significant and adjacent electrodes.

around the N170 concerned mainly posterior sites (with six adjacent electrodes, appearing a little more in the right) and frontal sites (again six adjacent electrodes, appearing a little more on the left), (ii) the P2 differences concerned again bilateral sites (although more dominantly in the left) and (iii) the P6 differences involved a high number of electrodes distributed mainly centro-parietally and bi-frontally. The later differences, appearing at around 750 ms onwards and being of lesser interest for our purpose, were not further analyzed here.

**FIGURE 2 | Superimposition of grand-mean ERP (from −100 to 700 ms post-stimulus) induced by words (black traces) and PHw (red traces).** The selected electrodes represent antero-posteriorly distributed left and right electrodes where differences were maximally evident for the N170 and P2 components at electrodes PO7 and O1 (left) and PO8 and O2 (right). Note the succession of the P1and the N170 components on P7. The differences during the P6 component are best illustrated on centro-parietal electrodes (see CP3). The inset in lower left depicts the location of the selected electrodes (see text for details).

**Figure 2** illustrates a superposition of the grand-mean ERP traces (from −100 to 700 ms) on a subset of frontal, parietal and postero-occipital electrodes that exhibited maximal response differences between words and PHw. This illustration indicates that the difference in amplitude for the N170 and the P2 involved particularly the left PO7 and O1 sites and the right PO8 and PO2 ones. For the P6, the differences appeared on various centroparietal sites including here at CP3, P1, CP4, and P2. In the following analysis, we further quantified the difference between words and PHw by computing the mean amplitude (and the latency) for each of these components.

#### **THE N170 COMPONENT**

The 2 × 2 × 3 ANOVA performed on the N170 amplitude (computed between 170 and 190 ms) using word condition (Word vs. PHw), hemisphere (left vs. right) and electrode (3 sites: P7, PO7, and O1 in the left and their parallel P8, PO8, and O2 in the right) as within subject factors showed significant main effects of condition [*F(*1*,* <sup>14</sup>*)* = 5*.*39, *p <* 0*.*04], hemisphere [*F(*1*,* <sup>14</sup>*)* = 7*.*61, *p <* 0*.*02], and electrode [*F(*2*,* <sup>28</sup>*)* = 7*.*45, *p <* 0*.*003]. As illustrated in **Figure 3A**, the condition effect was due to higher N170 amplitude in PHw than in words. The hemisphere effect was due to a greater N170 negativity in left hemisphere electrodes. The electrode effect was due to varying N170 amplitude across the different electrodes with PO7 showing the largest negativity in the left and PO8 in the right. The comparison of the N170 peak latency computed from these two electrodes, with the largest negativity, was performed with a 2 × 2 ANOVA using condition and hemisphere as within subject factors. No significant main effect of word condition was observed (mean = 160 ± 9 and161 ± 10 ms respectively for words and PHw) and no interaction with hemisphere, which showed a significant effect [*F(*1*,* <sup>14</sup>*)* = 5*.*36, *p <* 0*.*04], due to earlier latency in the right.

#### **THE P2 COMPONENT**

A 2 × 2 × 3 ANOVA was performed on the P2 amplitude computed (between 250 and 280 ms) from the same electrodes as the N170 using again word condition (Word vs. PHw), hemisphere (left vs. right) and electrode (3 sites) as within subject factors. This showed a significant main effects of condition [*F(*1*,* <sup>14</sup>*)* = 16*.*36, *p <* 0*.*002], of hemisphere [*F(*1*,* <sup>14</sup>*)* = 8*.*44, *p <* 0*.*02], and electrode [*F(*2*,* <sup>28</sup>*)* = 5*.*43, *p <* 0*.*02]. No interaction was observed between condition and the other factors but a significant interaction was found between hemisphere and electrode [*F(*2*,* <sup>28</sup>*)* =

(see text, black lines for words and red for PHw). **(C)** Traces of the averaged ERP signal from 15 centro-parietal electrodes to illustrate differences in amplitude and peak latency of the P6 in words (black) and PHw (red). The inset in lower right depicts the location of the selected electrodes for this illustration.

6.87, *p <* 0*.*004]. As shown in **Figure 3B**, the condition effect was due to a larger P2 amplitude in words (mean = 1.97 mV) than in PHw (mean = 1.06 mV). The hemisphere effect was due to a greater positivity in right than in left hemisphere electrodes. The electrode effect was due to varying P2 amplitude across the different electrodes with PO8 showing the highest amplitude in the right. Here also, the peak latency of the P2 component was computed from PO8 and PO7. The 2 × 2 ANOVA performed on the P2 peak latency values using condition and hemisphere as factors showed no significant main effect of word condition (mean = 227 ± 8 and 225 ± 7 ms respectively for words and PHw) and no interaction was found between this factor and the hemisphere.

#### **THE P6 COMPONENT**

A 2 × 2 × 4 ANOVA was performed on the P6 amplitude, computed on the mean signal between 450 and 600 ms, using word condition (Word vs. PHw), hemisphere (left vs. right) and electrode (4 sites: C1 C3 CP3 CP1 from the left and C2 C4 CP4 CP2 from the right) as within subject factors. This showed significant main effects of condition [*F(*1*,* <sup>14</sup>*)* = 21*.*53, *p <* 0*.*0005] due to higher P6 positivity in words (mean = 2.88 mV) than in PHw (mean = 2.04 mV). In addition, there was an electrode effect but no interaction was observed between condition and the other factors. The electrode effect was mainly due to the fact that more centro-posterior electrodes showed globally higher amplitude than central ones. The comparison of the latency of the P6 computed (as the most positive time point after 450 ms) from the average of 15 centro-parietal electrodes around Cz showed that this component peaked significantly earlier in words than in PHw (see **Figure 3C**, *t* = −2*.*73, *df* = 14, *p <* 0*.*02, mean = 590 ± 70 and 657 ± 63 ms respectively).

#### **SOURCE LOCALIZATION ANALYSIS**

In order to estimate which brain areas where at the origin of the ERP differences that differentiated words and PHw, we applied the LAURA linear inverse solution (Grave De Peralta Menedez et al., 2001) to the time periods of interest in each subject and condition and compared these solutions statistically (see **Figures 4A–C**). First, for the N170, the individual inverse solutions were computed from each subject mean signal between 170 and 190 ms (as for the amplitude analysis) and these were then compared using paired *t*-tests to determine the differences between PHw and words. As shown in **Figure 4A** which depicts the mean inverse solution for the N170, source maps indicated that this period maximally involved in both words and PHw the left temporo-occipital areas in particular an extensive recruitment of the left inferior temporal gyrus and middle occipital gyrus (BA 19/37, Talairach x, y, z coordinates = −47, −58, 0 and −41, −70, −4) and slightly the left lingual gyrus (BA 17). Also, bilateral sources were found in the superior and middle temporal gyrus (BA 21/22, −59, −33, 3 and −59, −23. −2). The statistical comparison between PHw vs. words (since the N170 was larger in PHw, **Figure 4A**, right panel) showed more activity for PHw in the left post-central gyrus (BA 43, −47, −10, 18), left superior temporal gyrus (BA 22, −59, −34, 14), left Cuneus (BA 18, −11, −80, 22) and the bilateral middle occipital gyrus (BA 19, −29, −80, 22), but also in right middle frontal gyrus (BA 46, 46, 18, 23).

For the P2 and the P6 components, the individual inverse solutions were computed from each subject mean signal between 250 and 280 ms and between 460 and 600 ms respectively. As illustrated in **Figure 4B**, the mean inverse solution for the P2 showed that both in words and PHw, the maximal activity was found in the superior temporal gyrus bilaterally (BA 22, Talairach *x, y, z* coordinates = −58, −23, 3), together with another weaker activity extending posteriorily into the bilateral middle temporal gyrus (BA 37/19, −53, −58, 0). The statistical comparison between words vs. PHw (since the P2 was larger in words) showed significantly more activity for words (B, right panel, *p <* 0*.*01) in the right inferior parietal lobule (BA 40, 41, −44, 48) and in the left lingual gyrus (BA 19, −17, −64, −5). For the P6 period (**Figure 4C**), a bilateral pattern of activity with a left dominance was also observed, including the left superior frontal gyrus (BA 10, −35, 58, −1), the inferior frontal gyrus bilaterally (BA 47, −47, 29, 0), the superior temporal gyrus bilaterally (BA 22, −59, −17, 2), the bilateral inferior temporal gyrus (BA 19, −47, −58, 0) and the left lingual gyrus (BA 17, −17, −87, 1). The statistically significant differences between words vs. PHw (since the P6 was larger in words) showed more activity for words only in the left hemisphere (see **Figure 4C**, right panel), including the inferior frontal gyrus (BA 46, −41, 35, 5), middle frontal gyrus (BA 11, −41, 46, −10), middle temporal gyrus (BA 20, −53, −35, −6) and inferior occipital gyrus (BA 19, −35, −69, 0).

# **DISCUSSION**

In this study, we sought to investigate the time course of word recognition in Arabic using ERP analysis with a special emphasis on orthographic analysis processes. For this purpose, we used an orthographic decision task with real words and PHw. We predicted to find word type effects in terms of RT and to observe ERP differences mainly during the early and late stages of processing. Our results replicated previous findings and showed a clear effect of PHw in RTs which were significantly faster to words than to PHw. Also, our results replicate previous observations regarding the modulation of the late responses (P6 component) and extend them to emphasize the modulation of the early responses (N1- P2 components) involved in orthographic processing steps. The use of the PHw paradigm was motivated by the fact that their phonological similarity with the real words forces the reader to analyze the orthographic features of each stimulus in order to make the correct decision. Accordingly, we also postulated that, together with late serial analytic processes, early automatic orthographic processes might be involved in this judgment task. Since PHw differ from real words in their orthography but have the same phonology and lead to the same meaning, the discrimination between words and their corresponding PHw is supposed to rely on the analysis of the physical properties of the stimulus (i.e., the orthography). Here, PHw differed from the real words just in one graphemic feature but shared the same phonology and the remaining majority of their graphemes. This fact is supposed to enable testing if the discrimination process begins early (i.e., during the early visual recognition steps), or if there are phonological

**FIGURE 4 | (A)** Topographic potential map (left) and average source localization computed during the N170 period from the individual inverse solution in words (upper raw) and PHw (lower raw). In both cases, maximal activity involved left temporo-occipital areas. Paired *t*-tests comparing inverse solutions to PHw vs. words (significance at *p <* 0*.*01, right dashed square) showed more activity for PHw in the left posterior areas (see text). **(B)** Topographic potential map (left) and average source localization computed during the P2 period in words (upper raw) and PHw (lower raw). In both cases, maximal activity was found in bilateral temporal areas. Paired *t*-tests comparing words vs. PHw showed more activity for words (right dashed

square) in the left lingual gyrus. **(C)** The same analysis performed during the P6 period on the individual inverse solutions showed a dominant pattern of source in the left hemisphere. Paired *t*-tests comparing words vs. PHw showed more activity for words (right dashed square) in left frontal, temporal and occipital areas. The sources estimated by LAURA inverse solution are displayed on four successive MRI slices where the maximal activity was observed. Note that the inverse solutions in **(A,B)** are scaled to their maximum (see color scale for each raw). In the right panels (dashed squares) the red color corresponds to areas where solution points (i.e., voxels) showed significant statistical differences at *p <* 0*.*01.

processes involved in later stages. Due to the fact that words and PHw had the same phonology and activate the same semantic meaning, differences in the ERP were not expected during periods devoted to phonological or lexical-semantic processing.

The ERPs analysis supported our major assumptions and revealed that the discrimination between real words in Arabic and their PHw occurred already in the early stages where the readers assess the orthographic features of the presented words. This was reflected by a significant modulation of N170 component whose amplitude increased in the PHw condition. The N170 component was hypothesized to represent the processing step in word reading and recognition related to the orthographic stage [see Taha et al. (2013) for a recent discussion]. The differences observed here cannot be explained in terms of low level physical characteristics of the stimuli such as stimulus length or the spatial frequency (Bar-Kochva, 2011; Horie et al., 2011), since these were similar in length and only one grapheme was changed. Other studies have reported a modulation of the N170 by word frequency (being larger for frequent than infrequent words and pseudowords), and interpreted this result as evidence that the N170, although reflecting probably a prelexical orthographic processing, could for frequent words represent a more holistic process allowing words to be processed as a global visual pattern (Simon et al., 2007). Here, one could postulate that the orthographic patterns of the real words represent the situation of the frequent orthographic patterns while the PHw represent the irrgular and non-frequent patterns. In this case, the finding that PHw induced larger N170 is hardly understandable under the previous explaination of word frequency where one would have expected it to be larger for real words. One plausible interpretation here is that the larger N170 in PHw was due to the fact that these necessitated a deeper visual orthographic analysis together with increased cognitive/attentional ressources for determining the locus of the orthographic error. In terms of the timing of these early differences, the result observed here is in line with other ERP investigations which showed that differences between real words and PHw (Braun et al., 2009) and between words and scrambled words (Zhang et al., 1997) occurred as early as 150 ms after stimulus presentation. Although in the recent study reported by Braun et al. (2009) the early ERP differences between PHw and real words were attributed to phonological activation, we assume that, since both types of stimuli had similar phonological patterns, this earlier modulation could only be attributed to difference in orthographic analysis. This postulation is in accordance with other studies that reported differences in the latency and distribution of the N170 during visual orthographic tasks where this component, as observed here, was found larger over the left than the right hemisphere sites (Sereno et al., 1998; Bentin et al., 1999; Hauk and Pulvermuller, 2004; Maurer et al., 2005b; Appelbaum et al., 2009; Grossi et al., 2010). The source localization analysis, which aimed at shedding light on the cerebral origin of the scalp recorded responses, showed that the estimated brain activation maps explained quite nicely the first steps of visual word processing as suggested by some computational models and previous functional imaging studies. For instance, the local combination detectors (LCDs) model proposed by Dehaene et al. (2005) suggests that the visual recognition of words depends firstly on the detection of the local orienting bars which are the basic components of the letters. After responding to specific letters, the system responds to multi letter strings (i.e., morphemes and small words), a process which seems to rely on the activation of the left occipital-temporal system, thought also to be involved in object recognition (Dehaene et al., 2005). The source localization analysis performed here during the N170 period are compatible with this model's assumption and indicated, as illustrated here the involvement mainly of the occipito-temporal areas during the processing of both words and PHw (see Jobard et al., 2003; Taha et al., 2013).

The early process of analysis of the physical orthographic features during visual word recognition went beyond the N170 component since amplitude differences were measured also at the level of the P2 component, where words induced higher responses than PHw. This results fits with previous observations comparing word and scrambled-words and revealing long-lasting ERP differences which started after 170 ms post-stimulus and remained up to 600 ms (Zhang et al., 1997). This finding related to the P2 is also compatible with others from a recent study (Comesana et al., 2012) using electrophysiological and behavioral measures, which examined the role of phonological and orthographic overlap in the recognition of cognate and non-cognate words. The authors suggested that the differences observed around the P2 component indexed an initial discrimination of the stimuli based on their physical properties. The modulation of the ERPs on posterior sites during the P2-N2 components (around 250 ms post-stimulus) had also been reported in other studies with different types of stimuli including faces and objects (see Pegna et al., 2008). This modulation was attributed to processes of conscious visual recognition (Koivisto et al., 2006) or visual perceptual closure (Doniger et al., 2000) and represent the earliest manifestation of awareness. In a previous ERP mapping study (Khateb et al., 2002), it was found that the map segments (i.e., ERP microstates) occurring immediately after the N170 component differentiated words and pseudowords by exhibiting shorter duration in words than in non-words (but also in images than in scrambled images). This time period was interpreted as corresponding to a phase of enhanced pattern analysis where stimuli are re-evaluated before the recognition process and final decision are operated in successive distinct steps. The differences observed here around the N1-P2 complex (N170 and P200) on the posterior sites support the notion that the differentiation of the written Arabic words starts early and depends on intensive visual discrimination process (Taha et al., 2013). The reliance during this period clearly on visual discrimination process might be explained by the fact that the particular characteristics of the Arabic orthography (i.e., mainly the fact that different forms might represent almost each letter and that certain letters differ only by the presence or absence of a dot or more) oblige the readers to develop automatic and sophisticated visual discrimination mechanisms. The fact here that the accuracy level in PHw did not differ significantly from that for words seems to support this notion of enhanced visual verification processes in Arabic. The source localization maps estimated during the P2 component in both words and PHw involved bilaterally the superior and middle temporal gyrus, with significant differences appearing maximally and most interestingly in the left lingual gyrus. This later area has already been involved in visual word recognition (Mechelli et al., 2000a,b) and its activation was shown to increase with word length. Also, it has been reported that activity in the lingual gyrus decreases with increasing stimulus duration (Price and Friston, 1997). Taken together, these observations indicate that this region participates to the visual analysis of the stimulus and probably to word recognition as a familiar orthographic pattern. The difference in terms of activated areas during the P2 suggested also the involvement of the inferior parietal lobule (BA 40). This part of the parietal cortex has been involved in vigilant attention and recently proposed to be part neither of the conventionally "dorsal" nor conventionally "ventral" system, but a non-spatial parieto-frontal circuit that plays a role in top-down processing (Husain and Nachev, 2007). Thus, the involvement of this part of the attentional system might be explained by the specific features of the Arabic orthographic system which necessitates increased cognitive demands due to the physical similarities between the letters and their changing forms according the place in the word (see Taha et al., 2013).

Because of the similar phonology between words and PHw, no differnces were expected and thus were found during the processing stages asociated with the phonological discrimination steps previously described around the N320 component (Bentin et al., 1999). This component was generally measured during visual word recognition in mid temporal regions at ∼320 ms (Bentin et al., 1999; Simon et al., 2004; Khateb et al., 2007b). For instance, in a previous ERP investigation by Khateb et al. (2007b) using a phonological judgment task, response differences between rhyming and non-rhyming words occurred already around 300 ms post stimulus with increased negativity for rhyming words over left temporal sites. Also, differences in the analysis of the semantic and phonological content of the words were reported in a previous study during the time window between 280 and 380 ms after stimulus presentation. Our global analysis showed no difference during these periods (i.e., the 300 ms time range) devoted to phonological processing and that the differences observed were probably not related to phonology. Similarly, as predicted, no difference was found during the time period of the N400 component, classically devoted to the semantic processing of the words and sentences (Kutas and Federmeier, 2011).

The higher amplitude and the earlier latency of the P6 (P600) component might interpreted as an index of late memory monitoring processes before and around the decision making (Kaan and Swaab, 2003). The difference in the peak latency (about 67 ms) of this component, which probably explains the amplitude differences at centro-parietal electrodes, closely resembles the difference in the mean RT between words and for PHw (∼55 ms). It has previously been suggested that the late components associated with anomalies and error detection like the P6 component reflect deep processing rather than automatic discriminations processes (Kaan and Swaab, 2003) and are typically found at centro-parietal electrodes (Osterhout and Holcomb, 1992). Here, the modulation of the P6 could be interpreted as reflecting a monitoring stage during orthographic analysis that enables correct decisions about the errors and anomalies detected in the orthographic patterns of written words in long term memory (Allan et al., 1999; Osterhout and Hagoort, 1998; Van Herten et al., 2005; Vissers et al., 2006; Shaul, 2011). During this period, the estimated source maps pointed to a left dominant activation that comprised the classical language areas such the inferior frontal gyrus and the superior temporal gyrus, all involved in reading (Fiez and Petersen, 1998). The significant differences between words and PHw were found only in the left hemisphere including in the inferior frontal gyrus. A greater activation in language areas including the inferior frontal gyrus is often observed in functional studies when testing both for word frequency and lexicality effects (Price, 2010; Woollams et al., 2010). The difference of activation between words and PHw in such language areas during this specific time period might be explained by the earlier response latency in words relative to PHw, which attests of a more efficient and rapid way to identify and read the correct stored orthographic patterns. Actually, one can assume that such areas, as indicated by the source estimation maps, are involved in the reading of both correctly written and erroneously written words, but the time course of this activation is reached earlier in words, hence the difference in the response amplitude.

To summarize, the present research highlight the importance of a dominant visual discrimination stage during the word recognition in Arabic, especially when these words are to distinguish from homophones that share the same phonology and lead automatically to the same meaning. This finding supports our previous findings that stressed the important role of early automatic visual discrimination during word recognition in Arabic (Taha et al., 2013). The differences observed here at the level of the N1-P2 complex and then later during the P6 (together with the lack of differences during time periods related to phonological and lexico-semantic processing) suggests that words recognition in Arabic, possibly more than in other orthographies, begun with serial orthographic-phonemic assembly processes. The late response differences, which replicate those found in previous studies, are thought to reflect error monitoring and memory processes before decision making. The findings presented here are the first of their kind which assess the time course of visual word recognition in the Arabic language using an orthographic decision task. Further research using different paradigms are still needed to better explore the brain mechanisms involved in the visual recognition of word in this particular orthography. Thus, it would be of interest to verify how the differences observed between words and PHw compare with effects lexical frequency among readers of Arabic.

# **AUTHOR CONTRIBUTIONS**

The first author Haitham Taha designed the experiment, collected the EEG data, performed ERP averaging and preliminary analysis, and wrote the first draft of the article. The second author Asaid Khateb completed the analysis of the ERP data, performed source localization and improved the draft, especially the results section and the discussion.

#### **ACKNOWLEDGMENTS**

This research was supported by a post-doctoral fellowship to Haitham Taha from the Edmond J. Safra Brain Research Center, and partly by the Israeli National Science Foundation (Grant no' 623/11). We thank Drs Rolando Grave de Peralta Menedez and Sara Gonzales Andino for providing the LAURA inverse solution.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 July 2013; accepted: 13 November 2013; published online: 02 December 2013.*

*Citation: Taha H and Khateb A (2013) Resolving the orthographic ambiguity during visual word recognition in Arabic: an event-related potential investigation. Front. Hum. Neurosci. 7:821. doi: 10.3389/fnhum.2013.00821*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Taha and Khateb. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The VWFA: it's not just for words anymore

#### *Alecia C. Vogel <sup>1</sup> \*, Steven E. Petersen2,3,4,5 and Bradley L. Schlaggar 2,3,4,6*

*<sup>1</sup> Department of Psychiatry, Washington University in St. Louis, St. Louis, MO, USA*


#### *Edited by:*

*Mohamed L. Seghier, University College London, UK*

#### *Reviewed by:*

*Joseph T. Devlin, University College London, UK Monica Baciu, Université Pierre Mendès-France, France Jason Yeatman, Stanford University, USA*

#### *\*Correspondence:*

*Alecia C. Vogel, Department of Psychiatry, Washington University in St. Louis, 660 S. Euclid Ave., St. Louis, MO 63110, USA e-mail: vogela@wustl.edu*

Reading is an important but phylogenetically new skill. While neuroimaging studies have identified brain regions used in reading, it is unclear to what extent these regions become specialized for use predominantly in reading vs. other tasks. Over the past several years, our group has published three studies addressing this question, particularly focusing on whether the putative visual word form area (VWFA) is used predominantly in reading, or whether it is used more generally in a number of tasks. Our three studies utilize a range of neuroimaging techniques, including task based fMRI experiments, a seed based resting state functional connectivity (RSFC) experiment, and a network based RSFC experiment. Overall, our studies indicate that the VWFA is not used specifically or even predominantly for reading. Rather the VWFA is a general use region that has processing properties making it particularly useful for reading, though it continues to be used in any task that requires its general processing properties. Our network based RSFC analysis extends this finding to other regions typically thought to be used predominantly for reading. Here, we review these findings and describe how the three studies complement each other. Then, we argue that conceptualizing the VWFA as a brain region with specific processing characteristics rather than a brain region devoted to a specific stimulus class, allows us to better explain the activity seen in this region during a variety of tasks. Having this type of conceptualization not only provides a better understanding of the VWFA but also provides a framework for understanding other brain regions, as it affords an explanation of function that is in keeping with the long history of studying the brain in terms of the type of information processing performed (Posner, 1978).

#### **Keywords: visual word form area, occipito-temporal cortex, fMRI, resting-state fMRI, resting-state functional connectivity, resting-state networks, reading, orthography**

Reading is central to most of our lives—after all, you are reading this manuscript. Certainly reading is often an integral part of modern life, necessary for reading scientific papers, novels and news, but also important for such quotidian tasks as reading street signs, instruction sheets, prescription information, and recipes. Reading is integral to academic success (Stanovich, 1986). Clearly, reading is important, and given that the key difference between language and reading is the use of written characters, it stands to reason that reading must rely on the development of parts of the brain devoted to processing written words.

However, while reading is important, it is a relatively new, and far from universal, human skill. Written language was developed only about 5000 years ago and the printing press was invented in the mid fifteenth century. Much of the world lacks even basic literacy. In the United States up to 17% of the population are not fluent readers with reading skill at or below the 4th grade level (Stanovich, 1986; Baer et al., 2009). This functional illiteracy characterizes up to 44% of those living in poverty (Baer et al., 2009). Thus, it is extraordinarily unlikely that the capacity to read is intrinsic to the human brain or that natural selection had an opportunity to specialize brain regions for reading. The amount and kind of overt teaching and practice needed to achieve fluent reading underscores the lack of intrinsic capacity to read afforded by the human brain (Schlaggar and McCandliss, 2007).

Knowing that reading is important, yet new and not universal, leaves us with the supposition that as we develop into fluent readers we are likely to repurpose parts of the brain originally devoted to something other than reading, *per se*, and use those regions to process written characters, turn those visual representations into sounds, and extract their meanings. Yet while this supposition is widely held, there remains disagreement about the extent to which learning to read changes the brain. Does reading truly remodel the brain and result in brain regions specifically or predominantly devoted to reading (i.e., Dehaene and Cohen, 2007) or does learning to read depend on utilizing brain regions that continue to maintain other functions or processing features (i.e., Price and Devlin, 2003)?

Prior work can be found to support both of the aforementioned hypotheses, that regions of the brain become used relatively specifically for reading and that regions of the brain used in reading also continue to be used more broadly. Proponents of the former hypothesis have termed a region of the brain in left occipito-temporal (OT) cortex the visual word form area (VWFA) (McCandliss et al., 2003; Cohen and Dehaene, 2004). The argument that the VWFA is predominantly or even specifically used for words is based on both classic work demonstrating lesions of left OT cortex near the putative VWFA disrupt fluent reading (Dejerine, 1892; Cohen et al., 2003; Gaillard et al., 2006) and more recent functional magnetic resonance imaging (fMRI) experiments demonstrating the VWFA often shows more activity for words than similar non-word stimuli such as consonant strings (Cohen et al., 2002; Polk et al., 2002; Cohen and Dehaene, 2004; Baker et al., 2007; Vinckier et al., 2007). Additionally, activity in this region is not based on simple visual stimulation, as there is similar activity, as measured by fMRI, regardless of word size or font (Cohen et al., 2003; McCandliss et al., 2003). However, proponents of the second hypothesis, that the regions of the brain used in reading continue to be utilized in other types of information processing, argue that the VWFA, while used in processing words, also shows activity when processing other visual stimuli, including numbers, line drawn pictures, colors, and gratings (Tagamets et al., 2000; Price and Devlin, 2003; Xue et al., 2006; Ploran et al., 2007; Van Doren et al., 2010; Kherif et al., 2011). Several recent reviews address the body of data around this question (Dehaene and Cohen, 2011; Price and Devlin, 2011).

Recently, our group has published three studies utilizing various neuroimaging techniques, including task based fMRI experiments, a seed based resting state functional connectivity (RSFC) experiment, and a network based RSFC experiment, in an attempt to address the competing hypotheses that regions of the brain, specifically the VWFA, are used predominantly or even specifically in reading vs. the hypothesis that regions of the brain, including the VWFA, are used more generally in a number of different tasks. First, we describe a task-based fMRI experiment indicating stronger activations for non-word visual stimuli than words in left OT cortex in the same location as the VWFA. This experiment also demonstrates activity in left OT cortex is driven by other visual properties such as visual complexity and a property we call "groupability," rather than word-likeness *per se* (Vogel et al., 2012b). Second, we describe a seed-map based RSFC experiment establishing that the VWFA has stronger resting state correlations with regions of the dorsal attention network than regions thought to be used predominantly in reading (Vogel et al., 2012a). Last, we describe network based RSFC analyses demonstrating regions thought to be used predominantly in reading have no special RSFC relationship to one another (Vogel et al., 2013). Rather than address the whole of the literature related to VWFA, we will review each of our studies in more detail below and then discuss how they relate to the larger body of work addressing the question of VWFA specificity.

#### **SUMMARY OF STUDIES**

#### **THE LEFT OCCIPITAL-TEMPORAL CORTEX DOES NOT SHOW PREFERENTIAL ACTIVITY FOR WORDS**

As discussed above, there has been much debate about how specifically a region of left occipital-temporal cortex called the VWFA is activated by words. The region was named the visual word form area in part due to the opinion that it responded predominantly to words (McCandliss et al., 2003). However, that specificity has been debated essentially since the name VWFA was coined (Price and Devlin, 2003). We have recently published a study that compared fMRI activity elicited by a visual matching task using words, pseudowords that contained only letter combinations typically present in English words, non-words that contained letter combinations that are illegal in English, consonant strings, Amharic character strings which comprise the writing system used in Ethopia, and line drawn pictures (Vogel et al., 2012b).

Our study demonstrated no specificity in VWFA activity. Healthy, neurotypical, skilled adult readers were asked to determine if two simultaneously presented strings of letters, Amharic characters (described above), or line drawn pictures were the same or different and give a button press response. In a whole brain analysis looking for regions in which there was differential activity for the five types of stimuli, a region was found in left occipital-temporal (OT) cortex near the VWFA. However, in this region the activity was greatest for matching Amharic characters and line drawn pictures, which were significantly stronger than matching consonant strings, which was significantly stronger than matching non-words, pseudowords, and words (**Figure 1A**). When a region was applied directly on the reported coordinates of the VWFA (Cohen and Dehaene, 2004), the same pattern emerged, with the strongest activity seen for the Amharic character strings, less for consonants, line drawn pictures, non-word, pseudoword, and word stimuli (**Figure 1B**). Clearly, this set of results is inconsistent with the supposition that the VWFA is specifically or even predominantly used in processing words, a conception that would predict the VWFA to have the strongest activity for processing words, with less activity for the least word-like stimuli, such as Amharic characters.

**FIGURE 1 | There is more activity for Amharic character strings than letter strings in the left OT cortex. (A)** Activity profile and location of region of the left OT cortex defined in a whole brain analysis of stimulus type. The location of the region closest to the VWFA is denoted with an arrow. The timecourses of BOLD activity for each stimulus type is shown for this region. **(B)** Timecourses of BOLD activity for each stimulus type in an region applied to the classic VWFA coordinates (coordinates in MNI, original Talaraich coordinates taken from Cohen and Dehaene, 2004). Figure adapted from Vogel et al. (2012b).

Given that left OT cortex, including the VWFA, was not activated specifically or predominantly by words, in the same manuscript described above, we performed a second set of analyses designed to determine what properties do drive VWFA activity (Vogel et al., 2012b). Specifically, we hypothesized that stimuli most likely to drive left OT cortex were high spatial frequency, high contrast, complex stimuli that can be processed in groups. We chose these characteristics as they comprise some of the most salient properties of letters, words and other stimuli that have been shown to activate left OT. For example, all words, as well as numbers, line drawings, and gratings, which have been shown to activate the VWFA, are high spatial frequency and high contrast. Additionally, recent studies have shown left OT cortex to be directly responsive to spatial frequency (Kveraga et al., 2007). Unfortunately, all of our stimuli were also high spatial frequency and high contrast, so we were unable to evaluate these properties. There is also evidence that the VWFA is responsive to complexity, as patients with VWFA lesions not only have difficulty reading fluently, but also have difficulty processing more visually complex stimuli (Behrmann et al., 1998).

We were able to evaluate the effect of visual complexity on VWFA activity as our stimuli did vary in complexity, which we measured as the number of brushstrokes per character, a method previously used to compare writing systems (Changizi and Shimojo, 2005). First, we divided each string type (Amharic strings, consonant strings, non-words, pseudowords, and words) into three groups based on visual complexity, or the number of brushstrokes per character. Then, we looked for regions of the cerebrum that showed differential activity between the most complex and least complex groups. A region in left OT was found to have activity differences related to complexity, and a region based analysis demonstrated that difference was driven by increased activity for the most complex strings relative to the least complex strings in left OT.

Additionally, we were able to use reaction time data on the matching task and within stimuli properties to validate that some stimuli, like words, were processed in groups of letters, while others, such as the Amharic strings, were processed as individual characters. This "grouped" processing was also reflected in fMRI activity. It is intuitive that we read words as groups of letters; reading words as a whole or in sets of graphemes is one of the hallmarks of fluent reading (Weekes, 1997; Cohen et al., 2003). Lesions to left OT cortex are shown to result in "letter by letter" reading in which each letter of a word must be processed individually and response time increases linearly with length, accordingly (Cohen et al., 2003). However, intuition also indicates that processing unfamiliar, complex strings, such as readers who are naïve to the Amharic alphabet processing Amharic strings, requires evaluating each character of the string individually.

We were able to validate these intuitions due to a second property of our four-character string stimuli. The two strings presented simultaneously contained either four identical letters or characters, differed by 2 letters or characters, or had four different letters or characters. If words are processed as whole, or at least in multi-level groups, it should make no difference whether two simultaneously presented strings are all different or all the same, it should take about the same amount of time to identify the answer (same/different). However, if one had to look at the letters individually, it would take longer to identify that two strings were the same because one would have to evaluate all four letters or characters, whereas making a decision about strings that are all different requires evaluating only one character. In our experiment, the reaction time (RT) to match strings of familiar characters such as words and pseudowords matched the proposed pattern for "grouped" processing, in that it took the same amount of time to make a same/different judgment when the two strings had all of the same letters or all different letters (RTs for words shown in **Figure 2C**). However, stimuli that

**FIGURE 2 | The left OT processes unfamiliar stimulus strings as individual characters and familiar strings as groups of characters. (A)** Location of the left OT region defined in a whole brain pair type by timecourse analysis (−44, −67, −4 in MNI coordinates). **(B)** Reaction times and timecourses of BOLD activity for Amharic character pairs that are all the same, 2 character different hard pairs, and 4 character different easy pairs. The RTs and BOLD activity increase with the number of characters that must be evaluated to make a matching decision, indicating character by character processing. Asterisks denote RT values that have differences with *p <* 0*.*05.

Though not shown, consonant strings show a similar pattern of both RTs and BOLD activity. **(C)** Reaction times and timecourses of BOLD activity for word pairs that are all the same, 2 character different hard pairs, and 3 character different easy pairs. The RTs and BOLD activity are equivalent for the all same and all different easy pairs, indicating these stimuli are evaluated as a group. Asterisks denote RT values that have differences with *p <* 0*.*05. Though not shown, pseudowords, which contain all legal letter combinations, show a similar pattern of both RTs and BOLD activity. Figure adapted from Vogel et al. (2012b).

were unfamiliar to the subjects, including consonant strings and Amharic character strings showed the other proposed effect; it took subjects longer to make a same/different judgment on the identical four character strings than on the all different four character strings of unfamiliar stimuli (RTs for Amharic characters shown in **Figure 2B**).

A whole brain analysis searching for regions whose activity different by pair type (i.e., all same vs. all different characters) showed a region in left OT cortex located at −44, −67, −4 in MNI coordinates (shown in **Figure 2A**). Planned secondary analyses showed the activity in this region also varied by stimulus type (i.e., words vs. Amharic characters) and there was an interaction between the pair type and stimulus type. Further analyses showed this interaction was driven by the same pattern described above. When matching Amharic character and consonant strings, there was more activity for matching identical strings relative to strings that differed in all characters (Amharic strings shown in **Figure 2B**). However, when matching words and pseudowords, there was equivalent activity for matching the identical and all different pairs (words shown in **Figure 2C**). While this set of results mimics the response time data, these effects remain significant even when response time was used as a regressor (Vogel et al., 2012b).

While these additional analyses demonstrated that left OT is involved in processing complex stimuli in groups, both sets of analyses primarily used whole brain approaches to determine what regions of the brain showed such effects. While this approach is the least biased way to perform these analyses, it leaves open the question of whether these effects hold in the VWFA or are even localized to the same region. Applied region of interest analyses showed the same effect of complexity and groupability, at least qualitatively, in the classically defined VWFA. More persuasive, a conjunction analysis of the three effects in question—differences between stimulus type such as words, consonant strings, and Amharic strings, differences in complexity, and differences in pair type—demonstrated that all three effects are located in the same voxels in only one place in the brain, the left OT cortex. This region of left OT cortex that had increased activity for Amharic characters and consonant strings relative to words and word-like stimuli, is more active for complex stimuli, and showed "grouped" processing of words and pseudowords but character by character processing of consonant strings and Amharic characters, was located at −46, −66, −4 (MNI coordinates), very near the VWFA.

In total, this set of analyses demonstrated that the left OT cortex, including the VWFA, does not seem to be used predominantly for processing words, as it is more strongly activated for non-word stimuli. Rather, we have demonstrated that left OT cortex at or near the VWFA is used in processing visually complex stimuli in "groups."

#### **THE PUTATIVE VISUAL WORD FORM AREA IS FUNCTIONALLY CONNECTED TO THE DORSAL ATTENTION NETWORK**

Although fMRI is useful for defining when a part of the brain is activated and studying the pattern of activity across a variety of tasks can allow for a relatively board definition of a region's processing properties, fMRI is still generally limited by experimental paradigms. Recently, resting state functional connectivity (RSFC) has been shown to operate outside of those single paradigm boundaries, as RSFC correlations seem to reflect the statistical likelihood that regions of the brain are co-activated across time, including a large number of tasks. RSFC uses correlations in large, slow changes in the BOLD signal that occur even at rest. Regions that are co-activated across a number of tasks seem to have high RSFC correlations (examined in Power et al., 2011; Yeo et al., 2011). For example, there are high RSFC correlations between members of the default mode network (Greicius et al., 2003; Fox et al., 2005), dorsal and ventral attention networks (Fox et al., 2006), attention control regions such as the previously defined fronto-parietal and cingulo-opercular networks (Dosenbach et al., 2007; Seeley et al., 2007), visual regions and motor regions (Biswal et al., 1995).

We have used RSFC correlations to query with which regions the VWFA is likely most often coactivated. If the VWFA is predominantly used for reading, it should be most often coactivated with other regions thought to be used predominantly in reading, leading to RSFC between them. However, if the VWFA is used in many types of tasks, for example if it is generally used in processing high spatial frequency, high contrast, complex stimuli in groups, it may be more commonly coactivated with other regions used in such tasks, and lead to RSFC with these other regions.

Recently, we have published a set of analyses demonstrating the latter (Vogel et al., 2012a); the VWFA has strongest RSFC correlations with regions of the dorsal attention network, and has relatively weak correlations with other regions thought to be used in reading (**Figure 3**). There were very weak to weakly negative correlation between the VWFA and other putatively readingrelated regions, including left inferior frontal gyrus (IFG) (Fiez and Petersen, 1998; Mechelli et al., 2003) and supramarginal gyrus (SMG), thought to be used in phonological processing (Church et al., 2008, 2010), or left angular gyrus (AG) or medial temporal gyrus (MTG), thought to be related to semantic processing (Binder et al., 2005, 2009; Graves et al., 2010) (**Figure 3**). Additionally, the strength of RSFC correlations between the VWFA and dorsal attention regions seems to increase both with age and reading ability. In contrast, the correlations between the VWFA and putative reading regions are unrelated to age or reading level (Vogel et al., 2012a), though it should be noted that all of these analyses were performed with only movement matched groups and were not subjected to the strict quality control analyses for movement that we now know to be necessary (Power et al., 2012)

We purport that the relationship between the VWFA and regions of the dorsal attention network are related to the findings that the VWFA processes familiar stimuli, such as words, in groups. In order to process words as a whole, or in groups of letters, it is necessary to direct attention to the whole of the word or the larger group of letters. However, in order to process unfamiliar stimuli such as Amharic strings as individual characters, attention must be allocated to the individual characters. Thus, the RSFC connectivity between the VWFA and dorsal attention regions, that increases with age and reading level, reflects its more general use in processing various visual stimuli in appropriately sized groups (Vogel et al., 2012a).

**the dorsal attention network than "reading related" regions. (A)** Seed map of voxels with the strongest RSFC correlations to the putative VWFA, as defined in a meta-analysis of single word reading studies. Positive correlations in RSFC are shown in warm colors, negative correlations shown in cool colors. The location of regions of the dorsal attention network are

defined by a meta-analysis of region activated by single word reading as well as a review of the literature, shown in blue. **(B)** RSFC Correlation coefficients between the VWFA and regions thought to be used predominantly in reading, shown in blue, and regions of the dorsal attention network, shown in green. Figure from Vogel et al. (2012a).

# **FUNCTIONAL NETWORK OF READING-RELATED REGIONS ACROSS DEVELOPMENT**

In addition to defining the predominant functional connections of a given region, RSFC can be used to define the network structure of large groups of regions. For example, in the analysis described above, we used the RSFC correlations between the VWFA and the rest of the brain to demonstrate that the VWFA was more commonly co-activated with regions in the dorsal attention network than other reading related regions. Alternatively, one could look at regions across the brain and attempt to discern whether there is a "reading community," or a group of regions with strong RSFC correlations that seem to be most commonly activated in reading tasks and correspondingly, whether the VWFA is part of such a "reading community."

Defining groups or communities of highly related items within a larger group or network of items based on a similarity metric is the purview of a branch of mathematics termed graph theory. In graph theory, graphs are defined as a group of items, also termed nodes, and the relationships between them, also called edges (Sporns et al., 2004), and this theory provides a powerful new way to define communities of brain regions using similarity as defined by RSFC correlations (Power et al., 2011). Our lab has used graph theoretic techniques on RSFC correlations to define the network structure of many general use regions across the brain and to define the network structure of the brain at the level of the MRI voxel (Power et al., 2011). These whole brain analyses have defined a number of communities previously demonstrated to be commonly co-activated across tasks, including primary visual regions, default mode regions, dorsal and ventral attention regions, cognitive control regions, as well as several previously unidentified communities (Power et al., 2011). No community of regions that could be construed as "reading related" was found in these large analyses.

While no "reading community" was defined in whole brain analyses, we wished to be certain this result was not due to either missing regions used in reading, or overwhelming any reading related effect with a whole brain analysis. Thus we performed similar graph analyses on regions defined by a meta-analysis of studies in which subjects read single words aloud (Vogel et al., 2013). Similar to the hypotheses stated above, if regions such as the VWFA are used specifically or even predominantly in reading, they should have high RSFC correlations with each other and lower RSFC correlations with other brain regions, allowing them to be grouped together into a "reading community" by the graph analytic algorithms. However, if regions used in reading are also used in a variety of other tasks, these "reading" regions will also have high RSFC correlations with the other regions with which these "reading" regions are commonly co-activated, placing them in more general communities reflecting the "reading" regions most common use (Vogel et al., 2013).

With these hypotheses in mind, we first defined all regions used in transforming a written word into spoken output via a meta-analysis of five studies of adult subjects reading single words aloud and a single developmental study of the same to capture any changes in reading activations with age (Vogel et al., 2013). The resulting group of regions included both "task general" regions, such as primary visual cortex, auditory, and motor cortex, as well as regions thought to be relatively specific to reading, including the VWFA, described above, the left SMG and IFG, thought to be related to phonological processing (i.e., Church et al., 2010), and the left AG and MTG thought to be related to semantic processing (i.e., Graves et al., 2010).

We used two graph analysis techniques, InfoMap (Rosvall and Bergstrom, 2008) and Modularity Optimization (Newman and Girvan, 2004), to define the community structure of this large network of reading-related regions across a number of RSFC thresholds, to ensure we were not biased by any one algorithm or threshold (Vogel et al., 2013). These graph analyses were performed on 3 groups of 38 subjects each, one set of adults (age 21–29 years), a set of adolescents (age 11–14 years), and a set of children (age 6–10 years) matched for image quality and movement as described in Power et al. (2012).

Neither InfoMap nor modularity optimization identified a "reading community" at any threshold in any age group (Vogel et al., 2013). Rather, the regions purported to be used relatively specifically for reading, including the VWFA, were largely found to be intermixed in other, more general use communities, such as visual regions, fronto-parietal and cingulo-opercular control regions, regions of default mode network (**Figure 4**). Additionally, there were no significant changes in the network structure of these regions-related regions across development, which included no emerging "reading community" with age/reading skill (Vogel et al., 2013). Therefore, we conclude that regions used in reading, even those thought to be essential for reading, retain more general processing properties, resulting in these regions relating to more general use communities.

# **DISCUSSION**

In sum, neither functional analyses nor RSFC analyses, including both region specific and large network analyses, indicate the VWFA is used specifically or even predominantly in reading. Rather, our fMRI analyses demonstrate the VWFA is activated more strongly by non-word and even non-letter stimuli such as Amharic characters and line drawn pictures than by words, and that activity seems to be driven by other stimulus properties such as visual complexity and the "group-ability" of the stimuli (Vogel et al., 2012b). These findings are supported by the RSFC correlations of the VWFA which show stronger relationships between the VWFA and regions of the dorsal attention network than regions thought to be used predominantly in reading, likely reflecting the need to allocate spatial attention to the appropriate group of stimuli (Vogel et al., 2012a). Additionally, no reading community can be found using graph analyses to define the network structure of all regions used in reading aloud, again indicating regions used in reading retain more general processing properties (Vogel et al., 2013).

While we suggest that the VWFA has some general visual processing functions, we emphasize that we are not arguing that it is a completely general use visual region. Rather, we contend that the processing performed in the VWFA is related to specific visual properties, which can be used in processing a number of stimuli, but are also very useful for reading. For example, the VWFA is responsive to the visual complexity of stimuli, which is a shared characteristic of written languages (Changizi and Shimojo, 2005). Additionally, the VWFA processes familiar stimuli in groups, which is one of the defining features of fluent reading. In fact, lesions involving the VWFA often do not abolish reading, *per se*. Rather, they abolish fluent reading, or the ability to read words of varying lengths in about the same amount of time, while continuing to allow for "letter by letter" reading, in which words are processed as single characters (Cohen et al., 2003). We believe that this conceptualization of VWFA function, based in an information processing view of the brain, is supported not only by the results presented here, but by the wider literature, and can be used as an instructive example for understanding neural specialization more generally.

# **OUR RESULTS IN CONTEXT**

The results described in this manuscript defining the VWFA as a more general use region that is particularly suited for reading due to its specific processing capabilities are largely consistent with the state of the literature. First, there has been increased acknowledgement that while the VWFA plays an important role in reading, it is not solely used for processing words. This is supported by a number of functional imaging studies (Tagamets et al., 2000; Price and Devlin, 2003; Xue et al., 2006; Ben-Shachar et al., 2007; Ploran et al., 2007; Starrfelt and Gerlach, 2007; Xue and Poldrack, 2007; Mei et al., 2010; Van Doren et al., 2010; Kherif et al., 2011), as well as lesion studies demonstrating deficits not only in reading words but also processing groups of visual stimuli or complex visual stimuli (Behrmann et al., 1990, 1998; Starrfelt et al., 2009). Moreover, we suggest that our results, that the VWFA responds to familiar stimuli in groups, may explain some of the discrepancies in the literature. As detailed in Vogel et al. (2012b), studies that demonstrate increased or specific activity for words relative to non-word stimuli typically rely on implicit or low level processing tasks (Cohen et al., 2002; Baker et al., 2007; Vinckier et al., 2007). In contrast, studies that demonstrate more activity for non-words, consonants, or symbols rely on tasks with increased processing demands (Tagamets et al., 2000; Xue et al., 2006; Xue and Poldrack, 2007; Mei et al., 2010; Van Doren et al., 2010). An important study by Brem et al. (2010) demonstrated increased activity for words in the N150 ERP response, but no corresponding increase in BOLD activity for words during an attention demanding task. All together, these results point to faster, specialized processing for words in the VWFA based on grouped processing of these familiar stimuli. However, they also demonstrate that when required to attend to non-word or even non-letter stimuli, the VWFA is also active, though likely with a slower timecourse.

While we were the first to specifically address the RSFC of the VWFA, our results can also be viewed in the context of other studies of functional connectivity, RSFC, and a recent study of structural connectivity using diffusion tensor imaging (DTI). Wang et al. (2011) described the functional relatedness of the VWFA with other parts of the brain in a visual matching task of familiar and unfamiliar stimuli. The authors demonstrated that in a visual matching task the VWFA is strongly related to the same regions of parietal cortex involved in visual attention that we see in our RSFC analysis. Additionally, Koyama et al. (2010) studied the RSFC of predefined regions thought to be involved in reading. While addressing the RSFC of the VWFA was not the foremost goal of this study, a visual inspection of the VWFA seed maps presented in the manuscript show similar results to our analysis (Vogel et al., 2012b). Finally, a recent analysis of structural connectivity using DTI demonstrated a relatively underappreciated white matter tract connecting the ventral occipital cortex near the VWFA with parietal cortex (Yeatman et al., 2013), likely in the vicinity of some of the inferior parietal lobe regions thought to be involved in visual attention.

Moreover, our results indicating the VWFA is related to other regions involved in attention processing, influencing its ability to process visual stimuli in groups, is consistent with a growing body ofliterature addressing the role of visual attentionin fluent reading. VWFA activityinfMRI taskswasfound to be related to reading skill in dyslexic children and adults in a meta-analysis by Richlan et al. (2011). Reading performance is predicted by visual attention span (Pammer et al., 2004). Furthermore, a subset of dyslexic children Vogel et al. The VWFA is not just for words

to making the projects described in this manuscript possible, including Dr. Jessica Church for her many hours of discussion and thoughtful comments on the topics discussed here, as well as her assistance with the initial analyses. We would also like to thanks Dr. John Pruett, Dr. Christina Lessov-Schlaggar, Dr. Deanna Barch, Dr. Tamara Hershey, and Dr. Judy Liu for the use of their data in the original manuscripts, Jonathan Power and Fran Miezin for their assistance in the analyses performed in the initial manuscripts, and Dr. Steven Nelson, Dr. Joe Dubis, Rebecca Coalson, Kelly McVey, and Rebecca Lepore for their assistance in the original data collection.

#### **REFERENCES**


have a reduced visual attention span (see Valdois et al., 2004; Vidyasagar and Pammer, 2010 for reviews). These dyslexic children show deficits in simultaneous processing of consonant strings (Lassus-Sangosse et al., 2008) and meaningless non-alphanumeric strings (Lobier et al., 2012). Dyslexic adults with deficits in visual attention span also have decreased activation of both ventral occipital areas in the vicinity of the VWFA and parietal areas in multi-element alphanumeric and non-alphanumeric processing tasks (Reilhac et al., 2013). Finally, there is decreased task based connectivity between the VWFA and parietal regions in dyslexic children (van der Mark et al., 2011). Together, these results emphasize the role of the VWFA in processing visually complex stimuli of multiple types in groups, as well as emphasizing the importance of the relationship between this region and others of the dorsal attention network as detailed here.

#### **PROCESSING CHARACTERISTICS vs. STIMULUS SPECIFICITY**

One of the major themes of the work presented here is an emphasis on defining information processing characteristics of regions rather than defining regions based on stimulus specificity. We believe this mindset is essential to understanding the brain, though we acknowledge determining how to best implement such a mindset is still up for debate. We suggest a reasoned approach is to examine past work and determine across sub-fields what kinds of stimuli or tasks are known to drive activity in a region, to look at what type of information processing, or stimulus transformations, are common across those tasks or particularly salient in those tasks. Lesion studies can be used as an adjunct to better understand what functions are disrupted when the processing done in a given injured region must be subsumed or circumvented by other parts of the brain. Finally, knowing the structural and functional connectivity of a region, what parts of the brain feed information into it, where it passes that information on to and what regions may have mediating effects on its processing, both in specific tasks and across a collective history of tasks, allow for further refinement of the types of information processing that could be carried out in a specific region. Lastly, it is useful to think of what types of processes can conceivably be carried out by a set of neurons (i.e., how could neurons reasonably represent a given stimulus or perform a given transformation or task).

This method has been very informative in our studies of the VWFA, and we argue it should be generally useful in studying the processing properties of regions across the brain. If brain regions are truly thought of as a set of neurons with given inputs and outputs, with an intrinsic organization constraining the processes performed, it becomes important to define those processing characteristics rather than limiting the consideration to the general stimulus class or task type that activates the region.

#### **THE VWFA IN ONLY PART OF THE VENTRAL OCCIPITAL-TEMPORAL CORTEX**

While this review has focused on the VWFA, the VWFA is only a single region within the left OT cortex. Our group has used RSFC analyses to define the complex organization of other part parts of the brain (Cohen et al., 2008), particularly parietal cortex (Nelson et al., 2010; Barnes et al., 2012). There is increasing evidence that the organization of the left OT cortex may be even more complicated. Prior functional analyses have demonstrated that there is a gradient in activation for word-like stimuli (Vinckier et al., 2007). In addition to the visual processing described here, other studies have demonstrated effects of abstract processing or memory and semantic processing, especially in more anterior portions of the left OT, and effects of attention and cue related activity, especially in more posterior portions of the left OT (Leonards et al., 2000; Corbetta and Shulman, 2002; Egner et al., 2008; Fairhall et al., 2009). Our voxel-wise RSFC network analyses also demonstrate a complex organization of the left OT (Power et al., 2011). A number of communities are represented, including visual, dorsal attention, and fronto-parietal communities in a posterior to anterior gradient. A better understanding of this complex organization should lead to a better understanding of the processing performed in each regional component. Hopefully, a better understanding of the function of these components and their connections will also help illuminate the role or roles of the left OT in reading.

Additionally, we cannot neglect to mention that we used a group based analysis in our studies. We believe group based studies are the most reliable for studying the information processing properties of brain regions, as they allow for enough data to compare the individually defined timecourses elicited by various stimulus and task manipulations without requiring those timecourses to be fit to a model, and do not fall victim to the difficulty of correcting for multiple comparisons across each voxel of the brain. However, it is conceivable that by averaging the timecourses of multiple individuals one may "drown out" very small regions that are truly reading specific in this complicated landscape. Hopefully, as discussed above, a better understanding of the complex organization of the occipital-temporal cortex will be possible at a finer a level of detail not only for RSFC studies, but also for functional studies.

# **CONCLUSIONS**

In sum, our recent research on the VWFA indicates that it is not specifically or even predominantly used for reading. Rather the VWFA is a general use region that has processing properties making it particularly useful for reading, though it continues to be used in any task that requires its general processing properties. Conceptualizing the VWFA as a brain region with specific processing characteristics rather than a brain region devoted to a specific stimulus class, allows us to better explain the activity seen in this region during a variety of tasks, as well as providing an explanation of function that is in keeping with the long history of studying the brain in terms of what type of information processing is performed (Posner, 1978).

# **ACKNOWLEDGMENTS**

This project was supported by the National Institutes of Health (grant numbers: NS0534425 to Bradley L. Schlaggar, HD057076 to Bradley L. Schlaggar, NS61144 to Steven E. Petersen, and NS6144 to Steven E. Petersen), the National Science Foundation (IGERT grant number 0548890 to Alecia C. Vogel), and the Intellectual and Developmental Disabilities Research Center at Washington University (NIH/NICHD P30 HD062171). We would also like to thank the many people whose work contributed

While we were the first to specifically address the RSFC of the VWFA, our results can also be viewed in the context of other studies of functional connectivity, RSFC, and a recent study of structural connectivity using diffusion tensor imaging (DTI). Wang et al. (2011) described the functional relatedness of the VWFA with other parts of the brain in a visual matching task of familiar and unfamiliar stimuli. The authors demonstrated that in a visual matching task the VWFA is strongly related to the same regions of parietal cortex involved in visual attention that we see in our RSFC analysis. Additionally, Koyama et al. (2010) studied the RSFC of predefined regions thought to be involved in reading. While addressing the RSFC of the VWFA was not the foremost goal of this study, a visual inspection of the VWFA seed maps presented in the manuscript show similar results to our analysis (Vogel et al., 2012b). Finally, a recent analysis of structural connectivity using DTI demonstrated a relatively underappreciated white matter tract connecting the ventral occipital cortex near the VWFA with parietal cortex (Yeatman et al., 2013), likely in the vicinity of some of the inferior parietal lobe regions thought to be involved in visual attention.

Moreover, our results indicating the VWFA is related to other regions involved in attention processing, influencing its ability to process visual stimuli in groups, is consistent with a growing body ofliterature addressing the role of visual attentionin fluent reading. VWFA activityinfMRI taskswasfound to be related to reading skill in dyslexic children and adults in a meta-analysis by Richlan et al. (2011). Reading performance is predicted by visual attention span (Pammer et al., 2004). Furthermore, a subset of dyslexic children

# fMRI evidence for the interaction between orthography and phonology in reading Chinese compound words

# *Jiayu Zhan1,2 , HongboYu1,2 and Xiaolin Zhou1,2,3,4,5 \**

*<sup>1</sup> Department of Psychology, Peking University, Beijing, China*

*<sup>3</sup> Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, China*

*<sup>4</sup> Key Laboratory of Computational Linguistics (Ministry of Education), Peking University, Beijing, China*

*<sup>5</sup> PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China*

#### *Edited by:*

*Gui Xue, Beijing Normal University, China*

#### *Reviewed by:*

*Suiping Wang, South China Normal University, China Leilei Mei, University of California Irvine, USA*

#### *\*Correspondence:*

*Xiaolin Zhou, Department of Psychology, Peking University, 5 Yiheyuan Road, Beijing 100871, China e-mail: xz104@pku.edu.cn*

Compound words make up a major part of modern Chinese vocabulary. Behavioral studies have demonstrated that access to lexical semantics of compound words is driven by the interaction between orthographic and phonological information. However, little is known about the neural underpinnings of compound word processing. In this functional magnetic resonance imaging study, we asked participants to perform lexical decisions to pseudohomophones, which were constructed by replacing one or both constituents of twocharacter compound words with orthographically dissimilar homophonic characters. Mixed pseudohomophones, which shared the first constituent with the base words, were more difficult to reject than non-pseudohomophone non-words. This effect was accompanied by the increased activation of bilateral inferior frontal gyrus (IFG), left inferior parietal lobule (IPL), and left angular gyrus. The pure pseudohomophones, which shared no constituent with their base words, were rejected as quickly as non-word controls and did not elicit any significant neural activation. The effective connectivity of a phonological pathway from left IPL to left IFG was enhanced for the mixed pseudohomophones but not for pure pseudohomophones. These findings demonstrated that phonological activation alone, as in the case of the pure pseudohomophones, is not sufficient to drive access to lexical representations of compound words, and that orthographic information interacts with phonology, playing a gating role in the recognition of Chinese compound words.

**Keywords: compound word, pseudohomophone, reading, lexical processing, Chinese, fMRI**

# **INTRODUCTION**

Access to lexical semantics is a fundamental process in reading. It has been widely accepted that word meaning can be accessed in two ways. One way is through direct visual access, where visual features in the input are projected onto underlying orthographic representations in the lexicon, which are subsequently transformed directly into the activation of semantic properties. The other way is through a phonologically mediated process, wherein the orthographic input first activates lexical phonological representations and then activates semantic representations.

Psycholinguists disagree upon which route plays the predominant role in visual word recognition and to what extent the two pathways might be independent from each other (Coltheart, 1978; Van Orden, 1987; Seidenberg and McClelland, 1989; Van Orden et al., 1990; Coltheart et al., 1993; Plaut et al., 1996). Although it is widely accepted that phonological mediation plays a predominant role in accessing lexical semantics in reading alphabetic scripts (Frost, 1998), answers to these questions are more divergent for the Chinese logographic writing system. Unlike the alphabetical system, the basic meaningful units in the logographic system are characters, each of which corresponds to one morpheme and one syllable. However, given the limited number of syllables in the language, many morphemes or characters are homophonic, and these characters may or may not share orthographic features. Thus, a character's pronunciation (i.e., syllabic representation) cannot be used to uniquely identify the meaning of the corresponding morpheme, reducing the efficiency of computation from orthographic input to semantic representation via phonological mediation.

Two contrasting views have been proposed for how lexical semantics is accessed in reading Chinese characters. One view, the "universal phonological principle," is based on the idea that the phonological mediation plays the same predominant role in reading Chinese as it does with alphabetic scripts (Perfetti et al., 1992). However, the empirical findings in support of this view have proven difficult to replicate (see, for example, Chen and Shu, 2001; Xie and Zhou, 2003). The alternative view postulates that access to lexical semantics in Chinese is constrained by both phonology and orthography operating in interaction with each other, and that phonology has no inherently privileged role over orthography in driving semantic activation (Zhou and Marslen-Wilson, 1999, 2000).

Questions regarding the pathways to lexical semantics can also be asked for compound words, which consist of two or more constituent characters (morphemes) and which make up more than seventy percent of modern Chinese vocabulary (Institute of Language Teaching and Research, 1986). Taking advantage of the pseudohomophone effect in lexical decision, Zhou and Marslen-Wilson (2009) demonstrated that lexical access in reading

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 1 — #1

*<sup>2</sup> Center for Brain and Cognitive Sciences, Peking University, Beijing, China*

Chinese compound words cannot rely solely on the combinatorial phonological information of the constituent characters, and that orthographic information plays an indispensable role in accessing the semantics of compounds. Pseudohomophones are non-words (e.g., *brane*) that sound like real words but are written differently. It has consistently been observed that when making lexical decisions, participants need more time to reject the pseudohomophones than the match non-pseudohomophone non-words (e.g., *brune*). This effect has been taken as evidence that the visual input of a pseudohomophone activates the corresponding phonological representation (e.g., /brein/), and this activation automatically spreads to all semantic representations corresponding to this phonological representation, including the one corresponding to the base word of the pseudohomophone (e.g., *brain*). The semantic activation of the base word interferes with the processing system by slowing down the "no" decision.

Zhou and Marslen-Wilson (2009) created pseudohomophones in Chinese by replacing one (e.g., yan[2]ge[2]) or both (e.g., yan[2]ge[2]) constituents of two-character compound words (e.g., (yan[2]ge[2], *strict*; the number in brackets indicating the tone of the syllable). They found that mixed pseudohomophones sharing either the first or second constituent with their base words were more difficult to reject than control non-words, but pure pseudohomophones sharing no constituents with their base words did not show this effect. Moreover, the pseudohomophone effect in lexical decision interacts with the frequency of the shared constituents of mixed pseudohomophones: while pseudohomophones based on high frequency words with high or low frequency characters and pseudohomophones based on low frequency words with high frequency characters were more difficult to reject than control non-words, pseudohomophones based on low frequency words and with low frequency characters did not show a significant effect. The authors interpreted the pseudohomophone effect as reflecting the semantic activation of base words by both the orthographic and phonological information conveyed in mixed pseudohomophones. Taking together other findings, Zhou and Marslen-Wilson (2009) argued that the direct mapping from orthography to semantics plays a dominant role in Chinese compound word recognition and this pathway acts in an interactive manner with phonological information in driving semantic activation.

The main purpose of the current study was to examine the neural basis of the pseudohomophone effect and to provide further evidence for the interaction between orthography and phonology in reading Chinese compound words. To this end, we asked participants to carry out lexical decisions to mixed and pure pseudohomophones as in Zhou and Marslen-Wilson (2009) and measured their brain activity with functional magnetic resonance imaging (fMRI). Specifically, we attempted to find the brain regions underlying the pseudohomophone effects and to see whether or not these regions would show a pattern of activation parallel to the pattern of the behavioral finding, i.e., a significant pseudohomophone effect for mixed pseudohomophones and an absence of this effect for pure pseudohomophones. We will also relate our findings to those on the processing of single-character words and to those on alphabetic pseudohomophones.

Although there has been no neuroimaging study on how the processing of Chinese compound words is influenced by the processing of their constituent characters, previous research on Chinese characters may provide some clues as to where we might expect to find activations for different processes involved in the recognition of compound words. Studies explicitly asking participants to carry out phonological tasks, such as rhyme judgment (Xue et al., 2005), homophone judgment (Kuo et al., 2004; Dong et al., 2005), and naming (Kuo et al., 2003; Lee et al., 2004; Liu et al., 2006), have consistently implicated left inferior parietal lobule (IPL) and the dorsal part of the left inferior frontal gyrus (IFG). The left IPL is sensitive to the conflict between orthographic and phonological information, as in naming inconsistent characters (i.e., characters that share the same phonetic radical but are pronounced differently; Lee et al., 2004). Since the resolution of such conflicts relies on the extraction of the relationship between orthographic and phonological forms of the characters, it has been suggested that this region is associated with transformation and integration between orthography and phonology (Booth et al., 2002, 2006; Newman and Joanisse, 2011). Given that phonological information is automatically activated by orthographic input (Zhou and Marslen-Wilson, 2000) and that this activation may interact with orthographic information to constraint access to lexical semantics (Zhou and Marslen-Wilson, 1999, 2009), we expected to observe IPL activation for mixed pseudohomophones, relative to their non-word controls, but not for pure pseudohomophones.

Previous neuroimaging studies have also examined neural correlates of semantic processes in character processing by using tasks such as semantic relatedness judgment (Tan et al., 2001; Dong et al.,2005; Xue et al., 2005). These studies showed the activation of the left angular gyrus and the left IFG. In a recent study,Chou et al. (2009) manipulated the strength of semantic association between two characters and found that the stronger semantic association elicited greater activation in the left angular gyrus and that the weaker semantic association elicited greater activation in the left IFG during semantic relatedness judgment. The authors suggested that activation of the angular gyrus is driven by the overlapping semantic features shared by the character pairs, whereas activation of the left IFG reflects the effortful retrieval and selection of appropriate semantic features. Given that lexical decision to Chinese compound words and non-words is mainly based on semantic activation (Zhou et al., 1999; Zhou and Marslen-Wilson, 2009), and given that IFG is involved in a number of different processes in language comprehension, we expected to observe the activation of both the left angular gyrus and the left IFG only for mixed pseudohomophones, relative to control non-words, and not for pure pseudohomophones (assuming that the pure pseudohomophones are not sufficient to activate the semantic representation of the base words).

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Nineteen undergraduate and graduate students (10 males, mean age 22) participated in our experiment. They were native speakers of Chinese and were right-handed as assessed by the Chinese Handedness Questionnaire (Li, 1983). They had normal or

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 2 — #2

corrected-to-normal vision, and none of them reported to have a history of neurological or psychiatric disorder. Informed written consents were obtained from all the participants prior to scanning. This study was approved by the Ethics Committee of the Department of Psychology at Peking University. In the final data analysis, three participants (one male and two females) were excluded, one for chance-level accuracy in lexical decision and two for excessive head movements.

#### **STIMULI AND PROCEDURES**

A total of 120 two-character compound words were chosen as base words. All of these words, like most compound words in Chinese, were phonologically unambiguous. Phonological ambiguity means that no other compound words have the same phonological forms as the base words used in this study. The mean frequency of the base words was 151 per million. The average character frequencies were 756 per million for the first constituent character and 615 per million for the second constituent character.

Two types of pseudohomophones were created according to whether the second constituents ("mixed") or both constituents ("pure") of the base words were replaced with orthographically dissimilar homophonic characters. We did not include pseudohomophones that were created by replacing the first characters of the base words, because this type of pseudohomophones showed the same pattern of effects as the mixed pseudohomophones used here (Zhou and Marslen-Wilson, 2009). Control non-words were created by recombining the first and second constituents of pseudohomophones. In other words, the pseudohomophones and the corresponding control non-words used the same set of characters, although the mixed and the pure pseudohomophones differed in their initial characters. Examples of pseudohomophones and their controls derived from the base words are presented in **Table 1**. Properties of the constituent morphemes of pseudohomophones (and the corresponding control non-words) are summarized in **Table 2**. These properties included the average character frequency (per million), visual complexity (in terms of the number of strokes per character), and the average "productivity," which indexed the number of compound words that contained the characters as constituents.

The critical stimuli were assigned into four test versions using a Latin square design. Each version was composed of 60 pseudohomophones and 60 control non-words; half of each type were from "Mixed" group and the other half were from "pure"


*Note: Pseudohomophones in the table are derived from the base words (yan[2]ge[2], strict) and (fan[4]wei[2], scope). The first characters in the "mixed" group are also the first characters of base words.*

group. Pseudohomophones and control non-words created from the same base words were split into different versions. Each version additionally had 120 filler words that were the same across the four versions. Each participant received one version in which pseudohomophones, control non-words, and word fillers were presented in a pseudo-random order (with the restriction that no more than three consecutive trials were from the same category). In each trial, participants were asked to decide as quickly and as accurately as possible, by pressing the "yes" or "no" button, whether the two characters presented on the screen formed a real word or not. For half the participants, the "yes" button was pressed by the right thumb and the "no" button by the left thumb; for the other half, the mappings between fingers and buttons were reversed.

For each trial, an eye fixation sign ("+") was first presented at the center of the screen for 250 ms, followed by a 100 ms blank interval; the fixation was then presented again for another 250 ms, creating a flick that could more firmly capture attention. A word or non-word, subtending a visual angle of about 2.5◦ horizontally and 1.25◦ vertically, was finally presented for 400 ms for lexical decision. The interval between the disappearance of the last stimuli and the appearance of the next fixation sign was randomized between 4000 and 6000 ms to improve the ability to detect regions of BOLD signal changes (Serences, 2004).

The 240 trials were scanned in one session, lasting about 25 min. A fixation sign was displayed at the beginning of the session for 10 s to allow the scanner to reach stability. Before entering the scanner, all the participants completed a practice session consisting of 24 stimuli with similar compositions of stimuli as the formal test.

#### **fMRI DATA ACQUISITION AND ANALYSIS**

Functional images were acquired on a 3-T Siemens Trio system at the Institute of Biophysics, Chinese Academy of Sciences, using a T2∗-weighted echo planar imaging (EPI) sequence, with 2 s repetition time, 30 ms echo time, and 90◦ flip angle. Each image consisted of 32 axial slices covering the whole brain. Slice thickness was 3 mm and inter-slice gap was 0.75 mm, with a 220 mm field of view, 64 × 64 matrix, and 3 mm × 3 mm × 3 mm voxel size.

Data were pre-processed with Statistical Parametric Mapping (SPM) software SPM8 (Welcome Department of Imaging Neuroscience, London, http://www.fil.ion.ucl.ac.uk). The first five volumes were discarded to allow stabilization of magnetization. Images were realigned to the sixth volume for head movement. Participants whose head movements did not exceed 3 mm were included in the final data analysis. A temporal high-pass filter with a cutoff frequency of 1/128 Hz was used to remove low-frequency drifts in an fMRI time series, and smoothed with a Gaussian kernel of 8 mm full-width half-maximum (FWHM).

Statistical analysis was based on the general linear model (GLM). The hemodynamic response to each event was modeled with a canonical hemodynamic response function (HRF) with its temporal derivative. We define seven regressors: four corresponded to the correctly judged trials in the four conditions (interested regressors), one corresponded to the correctly judged trials for filler words, one corresponded to the incorrectly judged trials and outlier, and one corresponded to the button press. The six rigid body parameters were also included to correct for the head motion artifact. The onset of the critical regressors was set

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 3 — #3


#### **Table 2 | Properties of stimuli.**

*Number of strokes measures visual complexity of the character; Total productivity refers to the number of words contain this character as a constituent.*

to the appearance of the pairs of characters. We rendered the SPMs at an uncorrected voxel threshold of *p* < 0.001 and report maxima with a cluster size of *p* < 0.05 corrected for multiple comparisons and adjusted for the entire brain, unless otherwise stated. We conducted spatially restricted region of interest (ROI) analysis using anatomically defined ROI masks based on the automatic anatomical labeling (AAL) system (Maldjian et al., 2003) with voxel threshold *p* < 0.05 (FWE-corrected) and cluster size threshold of 20 voxels.

Effective connectivity analysis was performed using the Dynamic Causal Modeling tool in SPM. Bilinear DCM, which was used in this study, is featured by three different sets of parameters (Friston et al., 2003): (1) the "intrinsic" connectivity representing the latent connectivity between brain regions in the absence of experimental perturbations; (2) the "modulatory" connectivity representing the changes imposed on the intrinsic connectivity by experimental perturbations; and (3) the "input" representing the driving influence on brain regions by external perturbations. Since we were interested in seeing whether the semantic representation can be accessed through a phonologically mediated route (as the strong phonological view argues) or through interaction between orthographic and phonological information, the model was restricted to the phonological and semantic related regions activated for the main effect of pseudohomophones (i.e., IPL, MNI coordinates: −46, −46, 44; IFG, MNI coordinates: −46, 8, 22; see Results). Specifically, we examined whether the activity of this network was modulated by the orthographic information carried by the pseudohomophones (i.e., the type of pseudohomophones). For each volume of interest (VOI), a time series was extracted as the first principal component of all voxel time series within a sphere (radius 4 mm) centered on the group maximum. We constructed and compared four models, which had the same input region (i.e., the left IPL) and intrinsic connectivity pattern (bidirectional connectivity between IFG and IPL) but differed in the way in which the experimental manipulations (i.e., mixed vs. pure pseudohomophone conditions) modulated the connectivity. We chose the left IPL as the input region since it is implicated in orthography-to-phonology mapping. For Model 1 and Model 2, the modulatory effects were exerted on the IPLto-IFG intrinsic connectivity, with only the mixed (Model 1) or both mixed and pure pseudohomophones (Model 2) as the modulatory factors. For Model 3 and Model 4, the modulatory effects were exerted on the IFG-to-IPL intrinsic connectivity, with only the mixed (Model 3) or both mixed and pure pseudohomophones (Model 4) as the modulatory factors. The four models

were compared using random-effect Bayesian Model Selection (BMS; Penny et al., 2004; Stephan et al., 2009), by which the "exceedance probability" (the probability of each model being more likely than any other model) of each model was calculated. Effective connectivity strength was estimated based on the model with the highest exceedance probability (i.e., the winning model).

# **RESULTS**

#### **BEHAVIORAL RESULTS**

**Figure 1** shows the mean response times (RTs) for the final 16 participants (4 for each version of stimuli) based on correct, untrimmed responses. Repeated-measures ANOVAs were conducted separately for RTs and error rates, with both stimulus type (pseudohomophone vs. control) and stimulus group (mixed vs. pure) as within-participant factors. For RTs, the main effect of stimulus type was not significant, although the main effect of stimulus group was, *F*(1,15) = 20.917, *p* < 0.001, with the stimulus RTs in the Mixed group significantly slower than the stimulus RTs in the Pure group. Importantly, the interaction between stimulus type and stimulus group was significant, *F*(1,15) = 3.96, *p* = 0.06, such that RTs for mixed pseudohomophones (mean = 692 ms, SD = 161 ms) were significantly longer than RTs for the controls (667 ± 167 ms), *t*(15) = 2.74, *p* < 0.05, whereas no difference was found between the pure pseudohomophones (658 ± 167 ms) and their controls (654 ± 176 ms), *t*(15) = 0.43, *p* > 0.1. For error rates, there was a significant main effect of stimulus type, *F*(1,15) = 8.94, *p* < 0.01, and a significant main effect of stimulus

**FIGURE 1 | Mean response times of lexical decision to two mixed and pure pseudohomophones and their respective controls.** ∗*p* < 0.05.

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 4 — #4

group, *F*(1,15) = 21.42, *p* < 0.001, indicating that, in general, participant responses were more error-prone to pseudohomophones than to control non-words and more error-prone to the Mixed group than to the pure group.

#### **fMRI RESULTS**

#### *General linear model analysis*

We first identified the brain regions involved in the main effect of pseudohomophones (collapsed over the Mixed and Pure groups). Compared with control non-words, reading pseudohomophones invoked greater activity in the bilateral IFG (left BA44 and right BA 44/45/48), left IPL (BA40), and left angular gyrus (BA7; **Figure 2A**).

As we were interested in the differential activations associated with different types of pseudohomophones, we contrasted mixed and pure pseudohomophones with their controls respectively.

Compared with controls, mixed pseudohomophones activated the bilateral IFG, left angular gyrus, as well as the left insular, middle and posterior cingulated cortex (**Figure 2B**). The activation of left IPL, however, failed to reach the statistical threshold in the whole-brain analysis after separating the two types of pseudohomophones. Since the left IPL has been consistently implicated in phonological processing and may play an important role in accessing lexical representations of the base words in this study, we therefore conducted a spatially restricted analysis of this region using anatomically defined ROI masks based on the AAL system. A significant cluster was found for the contrast "mixed pseudohomophones > controls" within the ROI (MNI: −48, −42, 44; *k* = 476). For pure pseudohomophones, no brain regions reached the cluster level threshold.

We also examined the interaction of stimulus type (pseudohomophone vs. non-word) by stimulus group (mixed vs. pure) by conducting the contrast between "mixed pseudohomophones > controls" and "pure pseudohomophones > controls." This contrast revealed activations in the medial orbitofrontal cortex and anterior cingulate cortex (**Figure 2C**).

To show more detailed information concerning the activity in the regions revealed in the main contrast (pseudohomophone vs. non-word), we computed the average beta values of these regions and conducted ROI analysis. Each ROI was defined as a cube with a side length of 5 mm, centered at the maximum coordinates of a cluster listed in **Table 3**. **Figure 3** plots the beta values in these ROIs. Statistical tests showed significant main effects of stimulus type for all the three regions, consistent with the main contrast: left IFG, *F*(1,15) = 8.31, *p* < 0.05; left IPL, *F*(1,15) = 8.75, *p* = 0.01; and left angular gyrus, *F*(1,15) = 17.42, *p* = 0.001. Importantly, the interaction between stimulus type and stimulus group was significant for the left IPL and IFG: *F*(1,15) = 4.33, *p* = 0.055 and *F*(1,15) = 5.16, *p* < 0.05, respectively. The same trend was also observed for the left angular gyrus *F*(1,15) = 3.57, *p* = 0.078. Tests of simple effects showed that activations were significantly higher for the mixed pseudohomophones than for the controls: *t*(15) = 3.09, *p* < 0.01 for IFG; *t*(15) = 3.83, *p* < 0.05 for IPL; and *t*(15) = 3.96, *p* = 0.01 for angular gyrus. However, no differences were found between the pure pseudohomophones and their controls: *t*(15) = 1.57, *p* = 0.14 for IFG; *t*(15) = 0.71, *p* = 0.49 for IPL; and *t*(15) = 1.02, *p* = 0.32 for left angular gyrus. Clearly, the pattern of effects in the ROI analysis here is consistent with the findings in the above comparisons for the two types of pseudohomophones.

# *Effective connectivity analysis*

**Figures 4A–D** presents four DCM models for the connectivity between the left IPL and the left IFG. Result of BMS showed that Model 1 had an exceedance probability of 35.2%, which was greater than the exceedance probability of all the other models (**Figure 4E**). The estimated connectivity strength of Model 1 yielded the following results (**Table 4**; **Figure 4F**): the input to the left IPL by mixed pseudohomophones, but not pure pseudohomophones, was significantly greater than zero (*p* < 0.01, Bonferroni-corrected for multiple comparisons). The intrinsic connectivity from the left IPL to the left IFG, but not the other way

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 5 — #5


 28



MCC

PCC

IPL Angular

*PsH,* 

 L

 7

 0.021

 3.93

 199

−34

−58

 42 0.020

 3.18

 223

−32

−60

 42

–

–

 –

 –

 –

L

40

0.026

3.86

189

−46

−46

44

 L/R

L

–

 –

 –

 –

 –

 –

 3.27

–

–

–

–

0.041

3.99

166

a –

–

–

–

–

–

–

–

–

0

−32

 30

–

–

 –

 –

 –

−4

−38

38

0.001

4.11

a

−6

20

44

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 6 — #6

around, was significantly larger than zero (*p* < 0.05, Bonferronicorrected for multiple comparisons). Finally, the modulation of the intrinsic connectivity from the left IPL to the left IFG by the mixed pseudohomophones was positive and was significantly greater than zero (*p* < 0.05).

# **DISCUSSION**

This study provides the first neural evidence for the processing of Chinese compound words. By applying a lexical decision task to pseudohomophones, we demonstrated that, compared with control non-words, only pseudohomophones that share one constituent with their corresponding base words were more difficult to reject, and that this effect was found in language-related brain regions such as the left IFG, left IPL, and left angular gyrus. Pure pseudohomophones that had no orthographic similarity to the base words were no more difficult to reject and had no obvious brain activation compared with non-word controls. These results suggest that an interaction between orthography and phonology, rather than a predominant phonological mediation, is responsible for the semantic activation in reading Chinese compound words. The connectivity analysis further showed that the link between the left IPL and IFG is not simple phonological, as mixed pseudohomophones, not pure pseudohomophones, enhanced the functional connectivity between these two brain areas.

The left IPL has been shown to be involved in phonological processing for both reading alphabetic scripts (e.g., Gold and Buckner, 2002; Gold et al., 2005) and reading Chinese (e.g., Booth et al., 2006; Liu et al., 2009). In the current study, the left IPL was activated only by the mixed, not the pure pseudohomophones, indicating that orthographic input gates the neural processing of phonological information associated with compound words. The left IPL serves to construct the phonological representation of Chinese compound words by integrating the phonological and orthographic information of its constituent morphemes. This is in line with a previous finding that the left IPL is involved in lexical decision to English orthographically similar homophones (*dear* vs. *deer*), as compared with non-homophone control words (Newman and Joanisse,2011). Disambiguating these homophones also requires the integration of orthographic and phonological information.

The dorsal part of the left IFG (BA44) has been found to serve as a control center that collaborates with posterior brain regions for phonological retrieval or selection (e.g., Poldrack et al., 1999; Burton et al., 2000; Fiebach et al., 2002; Gold and Buckner, 2002; McDermott et al., 2003; Gold et al., 2005; Joseph et al., 2006; Liu et al., 2009). Moreover, activation of this part of the left IFG has also been observed in tasks that require access to the meaning of words (Thompson-Schill et al., 1997; Wagner et al., 2001; Gold and Buckner, 2002; Gold et al., 2005; Xue et al., 2005; Chou et al., 2009). Given that the pseudohomophone effect in lexical decision to Chinese compounds reflects the semantic activation of base words (Zhou and Marslen-Wilson, 2009) and given the absence of IFG activation for pure pseudohomophones, we argue that the activation of the left IFG in responding to mixed pseudohomophones reflects the process of retrieving semantic properties of base words. This semantic processing is interactively supported by the phonological representation of the compounds (i.e., the combination of syllabic representations) and appropriate orthographic input (the character shared between the base word and the mixed pseudohomophone).

One could argue that the activation of the left IFG might serve as a top-down modulation of the activation in the left IPL (Cao

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 7 — #7

et al., 2008). However, our effective connectivity analysis found only intrinsic connectivity from the left IPL to left IFG, not the other way around, indicating a unidirectional impact of IPL activation on IFG activation. The connectivity from the left IPL to left IFG has also been found in previous studies on word recognition (Levy et al., 2008, 2009) and rhyme judgment (Cao et al., 2008), indicating its role in phonological analysis. However, the fact that this connectivity was only enhanced by the mixed pseudohomophones, and not by the pure pseudohomophones, suggests that orthographic information plays a vital role in this "phonology-tosemantic network," at least for the processing of compound words. In other words, using phonological information to access lexical

semantics relies on appropriate orthographic support. Thus, in reading Chinese compound words, orthographic and phonological information is integrated in the left IPL, and this integration is then projected to the left IFG for the retrieval of lexical semantics.

Previous neuroimaging studies demonstrated the involvement of the left angular gyrus in processing anomalous words embedded in sentences (Ni et al., 2000; Friederici et al., 2003; Newman et al., 2003), suggesting that it functions to integrate individual concepts into larger ones (Lau et al., 2008; Binder et al., 2009). Indeed, Binder et al. (2009) suggested that this region also functions as a communication hub where different types of intra-lexical



*PsH, pseudohomophone.*

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 8 — #8

information, such as orthography, phonology, and semantics, converge and interact. Zhou et al. (1999) suggested that, in the real-time processing of a Chinese compound word, both semantic representation of the whole word and the semantic representations of its constituent morphemes are activated in parallel, and that the semantic activation of constituent morphemes can be consistent or in conflict with the activation of the whole word. It is plausible that the activation of the left angular gyrus for mixed pseudohomophones may reflect this parallel activation and integration. Further studies are needed to investigate systematically the neural basis of competition and collaboration between semantic activation of whole words and constituent morphemes.

Direct contrast between the pseudohomophone effects for the mixed and pure pseudohomophones (i.e., the interaction analysis) did not show any activation of language-related areas such as the left IPL, IFG, and angular gyrus, although the interaction *was* found in the ROI analysis for these regions. Instead, the whole-brain interaction analysis showed the activation of the anterior cingulate cortex and medial orbitofrontal gyrus. These regions have long been associated with conflict detection and cognitive control (van Veen et al., 2001; Milham et al., 2003; Stephan et al., 2003; Ye and Zhou, 2009). It is possible that a more effortful control process is needed when making "no" responses to mixed pseudohomophones when the corresponding base words are activated not only by the phonological and orthographic information associated with the pseudohomophones but also by the morphemic representation for the shared (the first constituent) morphemes (Zhou et al., 1999; Zhou and Marslen-Wilson, 2009). The activation of the right IFG for mixed pseudohomophones also reflects the involvement of cognitive control in processing the mixed pseudohomophones (Xue et al., 2008; Vigneau et al., 2011).

We may need to rule out an alternative account for the brain activations for mixed pseudohomophones. This account states that the difficulty in rejecting mixed pseudohomophones and the associated brain activations reflect the processing of orthographic information for the characters shared between the pseudohomophones and the base words; phonological information and its interaction with orthographic information plays no role in processing the compound words. Although our experimental design does not allow us to rule out this account completely, the pattern of brain activations here and our unpublished behavioral data suggest that the pseudohomophone effect observed for mixed pseudohomophones was not driven purely by orthographic information. Compound non-words that were orthographically similar to the base words but shared no morpheme with the base words were only slightly more difficult to reject than non-words composed of randomly chosen characters. A large number of studies have also demonstrated that the processing of orthographic information alone occurs mainly in the occipitotemporal cortex (e.g., Kuo et al., 2004; Liu et al., 2008; Chan et al., 2009; Wang et al., 2011), whereas here we observed activations in parietal and frontal regions. Indeed, the orthographic properties of the pseudohomophones and control non-words were perfectly matched in this study, as the pseudohomophones and the corresponding non-word controls used the same sets of characters.

How do we relate the current findings with those for singlecharacter words? As we reviewed previously, single-character words activate the left IPL in a variety of tasks that require explicit phonological processing (e.g., Booth et al., 2006; Liu et al., 2009). This phonological processing may take the form of linking directly the orthographic information with syllabic representation. For the compound words, however, in additional to activate syllabic representations for constituent morphemes, the co-occurrence information for the two constituent morphemes should also be activated (Zhou et al., 1999; Zhou and Marslen-Wilson, 2009). This activation of co-occurrence information at the left IPL, which plays a part in constructing phonological representations for the whole word by integrating constituents' representations (i.e., syllables), must be supported by appropriate orthographic information, as only the mixed pseudohomophones, not the pure pseudohomophones, showed the IPL activation. Given that the activation of this co-occurrence information is modulated by the frequency of the whole words and the constituent morphemes (Zhou and Marslen-Wilson, 2009), an interesting issue for further studies is how the activation in the left IPL is affected by these factors.

In previous research, the left angular gyrus did not show activation for single-character words in tasks that might or might not activate semantic information in the previous research (e.g., Kuo et al., 2003; Lee et al., 2004; Liu et al., 2006), but it did show up in a task judging the semantic relatedness between two characters (Chou et al., 2009). Here, for mixed pseudohomophones, we also observed the left angular gyrus activation. It is possible that this region plays a general role in semantic integration, as the processing of compound words may need to evaluate and integrate the semantic properties of constituent morphemes and whole words (Zhou et al., 1999).

Finally, how do we relate the current findings with those for alphabetic pseudohomophones? Compared with the control nonwords sharing most letters with the pseudohomophones, English pseudohomophones activated the left IFG, precentral gyrus, and cingulate cortex in a lexical decision task (Newman and Joanisse, 2011, see also Edwards et al., 2005). The authors attribute the pseudohomophone effects observed in these regions to phonologically mediated activation of the base words. However, these authors did not make explicit that their pseudohomophones shared most orthographic information with the base words (i.e., these pseudohomophones were more similar to the mixed pseudohomophones rather than pure pseudohomophones in this study). In this study, we also observed the left IFG activation for the mixed pseudohomophones. Importantly, we additionally observed the left IPL and angular gyrus activation. As we argued earlier, this additional activation, plus the evident connectivity between the IPL and IFG, may indicate that the role of orthographic information in the "phonologically mediated" semantic activation. The processing of orthographic information plays an indispensable role in reading logographic Chinese and accessing lexical semantics (Zhou and Marslen-Wilson, 1999, 2000, 2009; Zhou et al., 1999).

# **CONCLUSION**

By asking participants to carry out lexical decision to Chinese compound words and by introducing different types of

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 9 — #9

pseudohomophones, we found a significant delay in rejecting mixed pseudohomophones and no such effect for pure pseudohomophones. Neurally, relative to non-word controls, mixed pseudohomophones activated the bilateral IFG, left IPL, left angular gyrus, and regions related to cognitive control; the processing of mixed pseudohomophones modulated the "phonological pathway" from the left IPL to the left IFG. For pure pseudohomophones, they showed no significant brain activation as compared with their non-word controls. These findings provide support for an interactive view according to which access to lexical semantics in reading logographic Chinese is driven by the interaction between orthographic and phonological information.

# **ACKNOWLEDGMENTS**

This study was supported by National Basic Research Program of China (973 Program: 2010CB833904) and by grants from the Natural Science Foundation of China (30110972, 91232708) and Social Science Foundation of China (12&ZD119).

# **REFERENCES**


"fnhum-07-00753" — 2013/11/21 — 11:48 — page 10 — #10


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 July 2013; accepted: 20 October 2013; published online: 21 November 2013.*

*Citation: Zhan J, Yu H and Zhou X (2013) fMRI evidence for the interaction between orthography and phonology in reading Chinese compound words. Front. Hum. Neurosci. 7:753. doi: 10.3389/fnhum.2013.00753*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Zhan, Yu and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00753" — 2013/11/21 — 11:48 — page 11 — #11