# UNDERSTANDING DEVELOPMENTAL DYSLEXIA: LINKING PERCEPTUAL AND COGNITIVE DEFICITS TO READING PROCESSES

EDITED BY: Pierluigi Zoccolotti, Peter F. de Jong and Donatella Spinelli PUBLISHED IN: Frontiers in Human Neuroscience and Frontiers in Psychology

### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

> *The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-864-1 DOI 10.3389/978-2-88919-864-1

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **UNDERSTANDING DEVELOPMENTAL DYSLEXIA: LINKING PERCEPTUAL AND COGNITIVE DEFICITS TO READING PROCESSES**

## Topic Editors:

**Pierluigi Zoccolotti,** Sapienza University of Rome, Italy **Peter F. de Jong,** University of Amsterdam, Netherlands **Donatella Spinelli,** Università di Roma "Foro Italico", Italy

Understanding the mechanisms responsible for developmental dyslexia (DD) is a key challenge for researchers. A large literature, mostly concerned with learning to read in opaque orthographies, emphasizes phonological interpretations of the disturbance. Other approaches focused on the visual-perceptual aspects of orthographic coding. Recently, this perspective was supported by imaging data showing that individuals with DD have hypo-activation in occipito-temporal areas (a finding common to both transparent and opaque orthographies). Nevertheless, it is difficult to infer causal relationships from

Figure by Donatella Spinelli

activation data. Accommodating these findings within the cognitive architecture of reading processes is still an open issue.

This is a general problem, which is present in much of the literature. For example, several studies investigating the perceptual and cognitive abilities that distinguish groups of children with and without DD failed to provide explicit links with the reading process. Thus, several areas of investigation (e.g., acoustic deficits or magnocellular deficiencies) have been plagued by replication failures. Furthermore, much research has neglected the possible contribution of comorbid symptoms. By contrast, it is now well established that developmental disorders present a large spectrum of homotopic and heterotopic co-morbidities that make causal interpretations problematic. This has led to the idea that the etiology of learning difficulties is multifactorial, thus challenging the traditional models of DD. Recent genetic studies provide information on the multiple risk factors that contribute to the genesis of the disturbance.

Another critical issue in DD is that much of the research has been conducted in English-speaking individuals. However, English is a highly irregular orthography and doubts have been raised on the appropriateness of automatically extending interpretations based on English to other more regular orthographies. By contrast, important information can be gotten from systematic comparisons across languages. Thus, the distinction between regular and irregular orthographies is another potentially fruitful area of investigation.

Overall, in spite of much research current interpretations seem unable to integrate all available findings. Some proposals focus on the cognitive description of the reading profile and explicitly ignore the distal causes of the disturbance. Others propose visual, acoustic or phonological mechanisms but fail to link them to the pattern of reading impairment present in different children.

The present Research Topic brings together studies based on different methodological approaches (i.e., behavioural studies examining cognitive and psycholinguistic factors, eye movement investigations, biological markers, neuroimaging and genetic studies), involving dyslexic groups with and without comorbid symptoms, and in different orthographies (transparent and opaque) to identify the mechanisms underlying DD. The RT does not focus on a single model or theory of dyslexia but rather brings together different approaches and ideas which we feel are fruitful for a deeper understanding developmental dyslexia.

**Citation:** Zoccolotti, P., de Jong, P. F., Spinelli, D., eds. (2016). Understanding Developmental Dyslexia: Linking Perceptual and Cognitive Deficits to Reading Processes. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-864-1

# Table of Contents

*07 Editorial: Understanding Developmental Dyslexia: Linking Perceptual and Cognitive Deficits to Reading Processes* Pierluigi Zoccolotti, Peter F. de Jong and Donatella Spinelli

## **1. ORTHOGRAPHIC LEARNING**


Hua-Chen Wang, Eva Marinus, Lyndsey Nickels and Anne Castles

## **2. DEALING WITH MULTIPLE STIMULI**

*43 The contribution of discrete-trial naming and visual recognition to rapid automatized naming deficits of dyslexic children with and without a history of language delay*

Filippo Gasperini, Daniela Brizzolara, Paola Cristofani, Claudia Casalini and Anna Maria Chilosi

*58 Modeling individual differences in text reading fluency: a different pattern of predictors for typically developing and dyslexic readers* Pierluigi Zoccolotti, Maria De Luca, Chiara V. Marinelli and Donatella Spinelli

## **3. PRE-LEXICAL PROCESSES IN DYSLEXIA**


Muriel A. Lobier, Carole Peyrin, Cédric Pichat, Jean-François Le Bas and Sylviane Valdois

## **4. META-ANALYSES OF NEUROIMAGING STUDIES**

*124 Reading the dyslexic brain: multiple dysfunctional routes revealed by a new meta-analysis of PET and fMRI activation studies*

Eraldo Paulesu, Laura Danelli and Manuela Berlingeri

*144 Functional neuroanatomy of developmental dyslexia: the role of orthographic depth*

Fabio Richlan

## **5. BIOLOGICAL INDICATORS OF LEARNING AND LEARNING DEFICITS**


## **6. EYE MOVEMENTS AND THE MAGNOCELLULAR HYPOTHESIS OF DYSLEXIA**

*188 A similar correction mechanism in slow and fluent readers after suboptimal landing positions*

Benjamin Gagl, Stefan Hawelka and Florian Hutzler

## **7. THE PHONOLOGICAL HYPOTHESIS OF DYSLEXIA: FROM GENERAL TO SPECIFIC TESTS OF THE HYPOTHESIS**

## **7.1. Role of auditory and speech perception**

*198 Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia*

Victoria Leong and Usha Goswami

*212 The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia*

Jeremy M. Law, Maaike Vandermosten, Pol Ghesquiere and Jan Wouters

## **7.2 Reading and handwriting quality**

*224 Neuroimaging correlates of handwriting quality as children learn to read and write*

Paul Gimenez, Nicolle Bugescu, Jessica M. Black, Roeland Hancock, Kenneth Pugh, Masanori Nagamine, Emily Kutner, Paul Mazaika, Robert Hendren, Bruce D. McCandliss and Fumiko Hoeft

## **7.3 Orthographic-phonological binding hypothesis**

*239 Does pronounceability modulate the letter string deficit of children with dyslexia? A study with the rate and amount model*

Chiara V. Marinelli, Daniela Traficante and Pierluigi Zoccolotti

## **7.4 Learning serial order information hypothesis**

*255 Order short-term memory is not impaired in dyslexia and does not affect orthographic learning*

Eva Staels and Wim Van den Broeck

## **7.5 Comorbidity with language delay**

*271 Age, dyslexia subtype and comorbidity modulate rapid auditory processing in developmental dyslexia*

Maria Luisa Lorusso, Chiara Cantiani and Massimo Molteni

## **8. MORPHOLOGICAL STRUCTURE IN A TRANSPARENT ORTHOGRAPHY**

*287 The effect of morphology on spelling and reading accuracy: a study on Italian children*

Paola Angelelli, Chiara Valeria Marinelli and Cristina Burani

## **9. MODELING DYSLEXIA WITHIN THE COMORBIDITY PERSPECTIVE**

*297 The intergenerational multiple deficit model and the case of dyslexia* Elsje van Bergen, Aryan van der Leij and Peter F. de Jong

## Editorial: Understanding Developmental Dyslexia: Linking Perceptual and Cognitive Deficits to Reading Processes

#### Pierluigi Zoccolotti 1, <sup>2</sup> \*, Peter F. de Jong<sup>3</sup> and Donatella Spinelli 2, 4

<sup>1</sup> Department of Psychology, Sapienza University of Rome, Rome, Italy, <sup>2</sup> Neuropsychology Unit, IRCCS (National Institute for Research and Treatment) Fondazione Santa Lucia, Rome, Italy, <sup>3</sup> Department of Child Development and Education, University of Amsterdam, Amsterdam, Netherlands, <sup>4</sup> Department of Human Movement Sciences and Health, University of Rome "Foro Italico", Rome, Italy

Keywords: dyslexia, reading

### **The Editorial on the Research Topic**

## **Understanding Developmental Dyslexia: Linking Perceptual and Cognitive Deficits to Reading Processes**

The problem of causation has proven particularly elusive in the case of developmental dyslexia (DD). The field has been dominated by very general hypotheses, such as the idea that DD is caused by a phonological deficit and/or an impairment of the magnocellular pathway. Results are contrasting and causal unidirectional links have not been persuasively demonstrated.

Some studies in the Research Topic (RT) re-examine these general hypotheses from the critical perspective of more selective predictions. Others focus on less general deficit hypotheses and stay closer to reading by investigating specific aspects of the reading process such as orthographic learning ability or the ability to deal with multiple-stimulus displays. Studies benefit from new research paradigms as well as new information from research areas such as neuroimaging or genetics. Below, we sketch the general questions tackled by these studies.

### Edited and reviewed by:

Hauke R. Heekeren, Freie Universität Berlin, Germany

> \*Correspondence: Pierluigi Zoccolotti

pierluigi.zoccolotti@uniroma1.it

Received: 04 November 2015 Accepted: 15 March 2016 Published: 31 March 2016

### Citation:

Zoccolotti P, de Jong PF and Spinelli D (2016) Editorial: Understanding Developmental Dyslexia: Linking Perceptual and Cognitive Deficits to Reading Processes. Front. Hum. Neurosci. 10:140. doi: 10.3389/fnhum.2016.00140 ORTHOGRAPHIC LEARNING

Unlike standard studies, which provide a static snapshot of reading performance, learning studies allow asking questions about how children acquire words. Kwok and Ellis and Suárez-Coalla et al. capitalize on the observation that presenting pseudo-words in repeated blocks reduces the size of the length effect (Martens and de Jong, 2008). Results are generally in keeping with the idea that dyslexic children are impaired in forming orthographic representations and continue to use sublexical reading during the course of learning. Wang et al. examine orthographic learning as a function of specific (phonological and surface) individual reading profiles using a new learning task. They point out that orthographic knowledge predicts orthographic learning over and above phonological decoding and that orthographic impairment is actually more important than phonological impairment in the learning of new words.

## DEALING WITH MULTIPLE STIMULI

Current models of reading focus on single word reading but are commonly extended to explain reading in more natural contexts i.e., text reading. One potentially important way to understand DD is to contrast reading of single vs. multiple stimulus displays.

Control children read multiple items faster than single items; this indicates that they process the next visual stimulus while uttering the current target; dyslexic children fail to show such an advantage (Zoccolotti et al., 2013). A paradigm that captures the need to smoothly integrate all the various sub-components involved in reading (except for orthographic analysis) is rapid automatized naming or RAN (Denckla and Rudel, 1976). Two studies capitalize on this observation (Gasperini et al.; Zoccolotti et al.) and point out the importance of considering the multicomponential nature of reading to obtain a full description of DD.

## FROM ATTENTIONAL HYPOTHESES OF DYSLEXIA TO THE HYPOTHESIS OF PRE-LEXICAL LOCI OF THE DISTURBANCE

Various attentional deficits have been identified in dyslexic children (e.g., Vidyasagar and Pammer, 2010), but their precise role is still underspecified.

Kezilas et al. report the possible causes of letter position dyslexia. Their evidence is in keeping with the idea of a deficit in the coding of letter positions at the orthographic-visual analysis stage of reading. Lukov et al. note various forms of double dissociations between reading and attention deficits: attention categories, such as sustained, selective, orienting and executive attention functioning, do not effectively map into reading difficulties. Lobier et al.'s study stems from the visual attention (VA) span deficit hypothesis of DD (Bosse et al., 2007); it shows dysfunctions in a categorization task for multiple (but not single) alphanumeric (and non-alphanumeric) stimuli.

Overall, attentional deficits are clearly dissociated from reading deficits; thus, specific hypotheses (such as the VA span deficit hypothesis) are needed to explain reading related attentional deficits.

## META-ANALYSES OF NEUROIMAGING STUDIES

There has been a dramatic increase in studies on reading that are based on imaging paradigms. Paulesu et al. report a metaanalysis of 53 neuroimaging studies of DD. When activations are analyzed, those of dyslexic subjects (but not controls) indicate a distributed set of local malfunctions in "associative" regions normally involved in more than one behavior/cognitive domain. Richlan's meta-analysis focuses on whether different manifestations of dyslexia across languages are associated with different functional neuroanatomical manifestations. The effect of orthography is a relevant general question, which is underscored also in other papers (Angelelli et al.; Lukov et al.; Kezilas et al.). In particular, Angelelli et al. demonstrate that even in a very regular language (such as Italian) morphological information is a useful resource for both reading and spelling. Future neuroimaging studies should be usefully informed by the articulated conclusions of these meta-analyses.

## BIOLOGICAL INDICATORS OF LEARNING AND LEARNING DEFICITS

Schiavone et al. show that two EEG biomarkers recorded in 3-year-old children from families at risk of dyslexia correlate with performance in various tasks including reading fluency, phonological awareness, orthographic knowledge and RAN assessed at 9 years of age. Hasko et al. investigate whether the EEG neurophysiological profile of children with dyslexia before intervention predicts the success or failure of future training. Longitudinal and intervention studies in dyslexia that include biomarkers are rare and important: these two studies indicate the growing interest in the biological indicators of dyslexia and learning deficits.

## THE MAGNOCELLULAR HYPOTHESIS OF DYSLEXIA

A well-known hypothesis sees DD as due to a magnocellular deficit (Stein, 2001). Possibly indicating little interest in this theoretical framework, no work in the RT directly tests this hypothesis. However, in their extensive meta-analysis Paulesu et al. note the absence of any deficit in the V5/MT area (the core magnocellular region) in dyslexics. A key area of investigation in the magnocellular hypothesis is the study of eye movements (Boden and Giaschi, 2007). The study by Gagl et al. confirms that slow readers process words by means of serial decoding but have corrective processes similar to those of proficient readers after landing at unfavorable positions within a word. Overall, these findings are not in keeping with the notion that magnocellular dysfunction generates DD.

## THE PHONOLOGICAL HYPOTHESIS OF DYSLEXIA: FROM GENERAL TO SPECIFIC TESTS OF THE HYPOTHESIS

Much research on DD is based on "the pivotal role of phonemic awareness as a predictor of individual differences in reading development" (Melby-Lervag et al., 2012). However, correlation between abilities does not mean that a deficit in phonological abilities causes a reading deficit as it is difficult to exclude the alternative possibility, i.e., that the lack of reading experience associated with DD causes poor performance on meta-phonological tests. Some studies examine the relationship between phonology and dyslexia through more tuned questions with respect to questions than those adopted in previous research.

Leong and Goswami move within the oscillatory temporal sampling framework of dyslexia (the reader can also find relevant information on a recent RT; Goswami et al.). Law et al. examine whether auditory, speech perception, and phonological skills are tightly interrelated or contribute independently to reading. Gimenez et al. evaluate the correlation between reading and handwriting at the beginning of formal handwriting instruction with the hypothesis that handwriting and reading may initially share a common neural mechanism.

A variant of the phonological hypothesis is that DD is due to an inability to bind orthographic and phonological information (Blomert, 2011). Marinelli et al. test this hypothesis by contrasting it with the idea that the reading deficit may be due to a deficit at the pre-lexical graphemic level.

Within this general framework, a recent hypothesis refers to a deficit in learning serial order information either in the consolidation phase of learning (Szmalec et al., 2011) or at the STM level (Martinez Perez et al., 2012, 2013). Staels and van den Broeck consider this latter possibility and provide evidence against this view.

The idea that dyslexia can be ascribed to many factors raises the question of co-morbidities in the genesis of the behavioral disturbances shown by children with dyslexia. Consistently with previous data (e.g., Brizzolara et al., 2006; Chilosi et al., 2009), Lorusso et al. demonstrate the important modulating role of a previous language delay on DD.

## MODELING DYSLEXIA WITHIN THE COMORBIDITY PERSPECTIVE

Several of the studies in the RT point out the multi-factorial nature of reading deficits. A theoretical perspective which is particularly suited to this aim is that reading (and more generally learning) disorders can be effectively described within a comorbidity perspective (Pennington, 2006). Drawing on

## REFERENCES


Pennington's model, as well as on Plomin and Kovas's (2005) generalist genes hypothesis of learning (dis)abilities, van Bergen et al. propose the intergenerational multiple deficit model in which both parents confer liability via intertwined genetic and environmental pathways.

## FINAL REMARKS

The main tendency of the studies presented in the RT is to move away from broad, general hypotheses of the disorder, such as phonological or attentional ones, and to consider hypotheses that on the one hand are more explicit about the perceptual and linguistic processes specifically involved in reading (such as orthographic learning ability or the ability to deal with multiplestimulus displays) and on the other try to link these mechanisms to a proximal analysis of the reading processes (as in the analysis of letter position dyslexia). Overall, dyslexia emerges as a multiple-cause deficit and in this light future research should be oriented toward considering the problem of comorbidity.

## AUTHOR CONTRIBUTIONS

All authors gave a similar contribution. PZ: wrote part of the first draft of the paper. DS: made several changes to various versions of the manuscript. PdJ: made several changes to various versions of the manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Zoccolotti, de Jong and Spinelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Visual word learning in adults with dyslexia

## *Rosa K. W. Kwok and Andrew W. Ellis\**

*Department of Psychology, University of York, York, UK*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

### *Reviewed by:*

*Heinz Wimmer, University of Salzburg, Austria Rob Davies, Lancaster University, UK*

### *\*Correspondence:*

*Andrew W. Ellis, Department of Psychology, University of York, York, YO10 5DD, UK e-mail: andy.ellis@york.ac.uk*

We investigated word learning in university and college students with a diagnosis of dyslexia and in typically-reading controls. Participants read aloud short (4-letter) and longer (7-letter) nonwords as quickly as possible. The nonwords were repeated across 10 blocks, using a different random order in each block. Participants returned 7 days later and repeated the experiment. Accuracy was high in both groups. The dyslexics were substantially slower than the controls at reading the nonwords throughout the experiment. They also showed a larger length effect, indicating less effective decoding skills. Learning was demonstrated by faster reading of the nonwords across repeated presentations and by a reduction in the difference in reading speeds between shorter and longer nonwords. The dyslexics required more presentations of the nonwords before the length effect became non-significant, only showing convergence in reaction times between shorter and longer items in the second testing session where controls achieved convergence part-way through the first session. Participants also completed a psychological test battery assessing reading and spelling, vocabulary, phonological awareness, working memory, nonverbal ability and motor speed. The dyslexics performed at a similar level to the controls on nonverbal ability but significantly less well on all the other measures. Regression analyses found that decoding ability, measured as the speed of reading aloud nonwords when they were presented for the first time, was predicted by a composite of word reading and spelling scores ("literacy"). Word learning was assessed in terms of the improvement in naming speeds over 10 blocks of training. Learning was predicted by vocabulary and working memory scores, but not by literacy, phonological awareness, nonverbal ability or motor speed. The results show that young dyslexic adults have problems both in pronouncing novel words and in learning new written words.

**Keywords: word learning, reading, dyslexia, word length, repetition, working memory, phonological awareness**

## **INTRODUCTION**

The problems that dyslexic children and adults experience in reading and spelling have been well documented, even if there is continuing debate about the underlying causes of those difficulties (Snowling, 2001; Vellutino et al., 2004; Van den Broeck and Geudens, 2012). One aspect of reading skill that has received less attention than most in the literature, however, is how dyslexics learn new written words and how their ability to learn new words compares with that of normal readers (Reitsma, 1983; Ehri and Saltmarsh, 1995; Mayringer and Wimmer, 2000; Share and Shalev, 2004; Thomson and Goswami, 2010; De Jong and Messbauer, 2011). The current paper develops a methodology for studying basic aspects of word learning that we believe has considerable potential and applies it to understanding visual word learning in groups of dyslexic adults and normally-reading controls.

As children grow older, reading becomes an important source of new words that they must learn to recognize and understand if they are to function effectively (Cunningham et al., 2002; Cunningham, 2006; Nation, 2008, 2009). Nowhere is this more true than in higher education where, if students are to progress satisfactorily, they must learn new words connected with their academic studies that are often encountered first in written form (Mortimore and Crozier, 2006). Our concern in the present study is not with how dyslexics learn to associate new words with meanings, but rather with the process by which initially unfamiliar words become familiar through exposure and repetition, reaching the point where they can be recognized and processed as whole units rather than in piecemeal fashion.

The starting point for our investigation was a study by Weekes (1997) who asked skilled adult readers (undergraduate students at a UK university) to read aloud a mixture of familiar words and invented nonwords as quickly as possible. Naming latencies were measured as the time between a word or nonword appearing on the screen and the participant beginning to pronounce it. The words were either high frequency (e.g., bed, large) or low frequency (e.g., beg, latch): the nonwords were pronounceable sequences of letters that could be words but happen not to be (e.g., bam, lorge). Words and nonwords varied in length from 3 to 6 letters. In line with previous studies, naming latencies were substantially slower for the nonwords than for the familiar words (cf. Lupker et al., 1997; Rastle et al., 2003). Latencies for the nonwords increased substantially as letter length increased. In contrast, low frequency words showed only a small effect of length on naming speeds while high frequency words showed no significant effect at all. Stronger effects of length on naming latencies for nonwords than words in skilled readers have now been reported in English, German and French (Ziegler et al., 2001; Juphard et al., 2004; Valdois et al., 2006) while stronger effects of length on latencies for low than high frequency English words have been reported by Yap and Balota (2009) and others.

Weekes (1997) argued that slower reading of nonwords than familiar words, and larger effects of letter length for nonwords than words, could be explained within the dual-route (DRC) model of visual word recognition proposed by Coltheart et al. (2001). According to the DRC model, when an unfamiliar word or nonword is encountered for the first time, it is translated from written into spoken form through the application of lettersound (grapheme-phoneme) conversion rules (referred to in the DRC model as the *nonlexical route*). The grapheme-phoneme conversion rules act in a serial, left-to-right manner, working systematically through a novel word from the beginning to the end until a pronunciation has been generated (Coltheart and Rastle, 1994). As a new word becomes familiar through repeated encounters, entries are created for that word in the mental lexicon. In the DRC model that process of lexicalization involves creating a representation of the written form of the word in an *orthographic input lexicon* and a representation of its spoken form in a *phonological output lexicon*. A route from print to sound becomes available for the newly-learned word through the two lexicons. This is known as the *lexical route*. Access to the orthographic input lexicon for familiar written words is both fast and parallel, with all of the component letters in a word being processed simultaneously. As a result, pronouncing a familiar word (lexical route) is faster than generating the pronunciation of an unfamiliar word or nonword (nonlexical route) and the impact of letter length is greatly reduced in familiar words (see Coltheart et al., 2001, pp. 238–239, where a simulation of the Weekes, 1997, results is presented). The more familiar a word is, the more its pronunciation will be captured by the lexical route, hence the progressively smaller effect of length seen in low and high frequency words.

If this account is broadly correct, it should be possible to observe the transition from nonlexical to lexical reading by presenting unfamiliar words or nonwords repeatedly. When the novel items are read for the first time, naming should reflect the operation of the nonlexical route: latencies should be slow and sensitive to the number of letters in the sequence. But as the novel words become familiar, lexical representation should be established and processing should make the transition from nonlexical to lexical reading, with naming latencies becoming become faster and less affected by length. Maloney et al. (2009) observed the beginnings of this transition. They took the 100 nonwords varying in length from 3 to 6 letters that were used by Weekes (1997) and presented them to skilled readers in four consecutive blocks of trials. Participants were instructed to read each one aloud as quickly as possible. As predicted, naming latencies became faster across the four blocks as the items became more familiar and the effect of length reduced.

In unpublished experiments we have replicated and extended Maloney et al.'s (2009) results. In one experiment we measured naming latencies for 4-letter, single-syllable nonwords and 7 letter, two-syllable nonwords. The nonwords were presented 10 times in consecutive blocks of trials, using a different random order of presentation in each block. Accuracy was very high across the experiment. In the first block, when all of the nonwords were new and unfamiliar, naming latencies were relatively slow and the effect of length was substantial. Reaction times (RTs) then reduced with repeated presentations and the impact of length diminished, becoming non-significant after five or six presentations of the nonwords. We obtained the same pattern of results in a second experiment using a different set of nonwords. In that experiment we also invited the participants back for a second testing session 7 days after the first session to assess the extent to which the learning effects persisted in the absence of any further experience with the nonwords. Naming latencies in block 1 of day 7 were a little slower than at the end of day 1, but much faster than at the start of day 1, demonstrating considerable retention of lexical knowledge about the newly-learned items. By the fourth block of day 7, the effect of length had completely disappeared: the nonwords had become familiar, created lexical entries, and been unitized to the point where they were read aloud in the same way as familiar words.

The present paper compares the performance of university and college students with a diagnosis of dyslexia with typicallyreading controls on the same task. Nonwords composed of either 4 or 7 letters were presented 10 times in a first testing session, then 10 more times in a second testing session 7 days later. Accuracy of reading the nonwords aloud was assessed along with naming latencies. Bruck (1990) and Ben-Dror et al. (1991) found slower and less accurate reading of both words in nonwords in American college dyslexics than controls. Similar results have been reported for Polish (Reid et al., 2006) and Swedish (Wolff, 2009) dyslexic university students and controls. Less accurate reading aloud of both words and nonwords by student dyslexics than controls was reported by Snowling et al. (1997) and Hatcher et al. (2002) in very similar participant groups to those reported here (see also Callens et al., 2012; Deacon et al., 2012). These observations, combined with reports of less proficient reading of both nonwords and words by dyslexic children (Zoccolotti et al., 2005; Reid et al., 2006; Wolff, 2009; Paizi et al., 2013), led us to expect that the dyslexic students in our experiment would be slower and possibly less accurate than controls throughout the experiment, not only when the nonwords were presented for the first time, but even after multiple encounters.

We also expected that the adult dyslexics would show stronger effects of letter length on reading speed than the controls. There are two reasons why such a difference could come about. First, it has often been proposed that nonword reading presents particular problems for dyslexics (Rack et al., 1992; Herrman et al., 2006; though see Van den Broeck and Geudens, 2012). Wimmer (1996), for example, found that 10-year-old German dyslexic children read nonwords more slowly than younger normal readers who were matched to the dyslexics on the speed of reading familiar, high frequency words. If nonlexical reading is indeed differentially poor in many dyslexics, length effects should be greater in dyslexics than typical readers because the dyslexics will require more time per additional letter to convert that letter into sound.

Second, if dyslexics are slower than typical readers to create new lexical entries, then in the course of an experiment involving 20 presentations of each nonword across two separate sessions, the dyslexics may be slower than the controls to create orthographic and phonological representations for the novel items. The result would be that they spend more time reading nonlexically (with consequent length effects) and would be slower to switch to lexical reading (with reduced length effects). We are not aware of any studies of word learning in dyslexia that have involved adult participants, but research involving dyslexic children suggests problems learning both the spoken and the written forms of new words. Regarding the learning of spoken word-forms, Mayringer and Wimmer (2000) found that German-speaking dyslexic children were impaired at learning novel spoken words that were taught as the names of children shown in pictures. In contrast, the dyslexics were unimpaired at learning to associate familiar German names with pictures of children. The authors concluded from this that the dyslexic children's difficulty lay in learning the new spoken words rather than in associating names with people (see also Elbro and Jensen, 2005; Thomson and Goswami, 2010).

Mayringer and Wimmer (2000) suggested that if dyslexics have problems learning new written words, part of those problems could lie in learning the spoken (phonological) forms rather than their written (orthographic) forms. Visual word learning involves creating phonological as well as orthographic representations: difficulties in learning spoken word-forms would be expected to impact on visual word learning. The few published studies of visual (rather than spoken) word learning in dyslexia suggest, however, that dyslexics have problems learning new written wordforms over and above any problems they experience in learning spoken words (Reitsma, 1983; Ehri and Saltmarsh, 1995; Share and Shalev, 2004; De Jong and Messbauer, 2011; O'Brien et al., 2013). Reitsma (1983; Expt. 3) compared visual word learning in Dutch children with reading disabilities with learning in a group of younger normal readers. The children first practiced reading aloud novel words embedded in sentences. Three days later they were asked to read aloud the novel words as quickly as possible as they were presented individually on a computer screen. Half of the novel words were presented in exactly the same written form as in the training while the other half were presented in a form that had a different spelling but was pronounced the same. (An equivalent English example might be to train children to read *breet* then test them three days later on either *breet* or *breat*). The normal readers were faster to read aloud the versions of the novel words that they had been trained on three days earlier than the re-spelled version, though they were faster on both than on entirely new and untrained nonwords (so faster on *breet* than *breat* but faster on both of them than on *broat*). In contrast, the children with reading disability read both forms of the trained novel words (*breet* and *breat*) faster than the untrained items (*broat*) but showed no difference between the versions of the trained items that preserved the original spellings (*breet*) and the versions that changed those spellings (*breat*). The implication of these results is that the normal readers learned both the orthographic and phonological forms of the novel words in training and retained that knowledge through to the test three days later. The disabled readers remembered something of the phonological forms of the trained novel items across the retention interval but seemed not to retain any detectable orthographic information.

If dyslexic children combine less efficient nonlexical reading with slower creation of lexical entries, we would expect them to show larger length effects in nonword reading than typicallyreading controls. We would also expect dyslexics to show larger effects of letter length in word reading arising from the fact that they are less efficient than controls at switching from nonlexical to lexical reading so read more words nonlexically than controls do. This prediction is supported by reports of stronger effects of letter length on naming latencies for real words in dyslexic children than controls in English, Dutch, German, Spanish and Italian (e.g., Ziegler et al., 2003; Marinus and De Jong, 2010; Paizi et al., 2011; Davies et al., 2013; Martelli et al., 2014).

Dyslexics may have difficulty learning new spoken and written word-forms but dyslexic Italian children have been reported to read words faster than nonwords (Paizi et al., 2013) thereby demonstrating some acquisition of word-specific knowledge. Paizi et al. (2013) also reported faster reading of high than low frequency words in dyslexic Italian children, indicating that regular exposure facilitates the creation of effective lexical entries in those readers. If dyslexics are capable of building up a vocabulary of words they can read in a relatively wholistic manner, albeit more slowly and effortfully than typical readers, that could explain the reduction in the impact of letter length on word reading with age that Zoccolotti et al. (2005) and De Luca et al. (2008) observed in both dyslexic Italian children and controls. Hence, on the basis of this admittedly incomplete literature, much of which is concerned with children rather than adults, we expected to see signs of word learning in the dyslexic participants in our experiment (i.e., faster naming latencies across blocks and a reduction in the impact of letter length with repeated exposure). We expected, however, that word learning would occur more slowly in the dyslexic participants than in controls (typical readers) and that if convergence between reading speeds for shorter and longer items was achieved, it would require more presentations of the nonwords.

Finally, our participants were given a short battery of tests to characterize their broader cognitive abilities. The cognitive profiles of dyslexic students at the same institution as many of the participants in the present study (the University of York, UK) were described a decade ago by Hatcher et al. (2002) and more recently by Warmington et al. (2013b). Hatcher et al. (2002) found that the student dyslexics performed at comparable levels to normally-reading controls on nonverbal ability (Raven's Advanced Progressive Matrices) but more poorly on a range of measures including verbal ability (WAIS-R vocabulary), word reading and spelling, forward and backward digit span, phonological tasks [object naming, digit naming and spoonerisms (exchanging sounds between words)] and mental arithmetic. Similar profiles were reported by Snowling et al. (1997) and Warmington et al. (2013b) for UK student dyslexics and Callens et al. (2012) for Belgian dyslexic students. A wider review and meta-analysis of dyslexia in adults is provided by Swanson and Hsieh (2009).

In addition to comparing the dyslexics and controls on the test battery, we used regression analyses to explore the ability of performance on the different cognitive tests to predict two aspects of performance in the experiment, namely initial reading speeds for the longer (7-letter) nonwords and the change in reading speeds across the 10 presentations in the first testing session. Initial reading speeds assess efficiency of converting unfamiliar letter sequences into sounds (in DRC terms, the efficiency of the nonlexical route), while the change in RTs across repetitions assesses the efficiency of word learning and the switch from nonlexical to lexical reading. Previous research has associated the speed and accuracy of reading nonwords or unfamiliar words with phonological awareness (Durand et al., 2005; Melby-Lervåg et al., 2012). For example, Pennington et al. (1990) documented persisting deficits in phonological awareness in adult dyslexics that were particularly linked to problems with nonword reading. Training studies have suggested, however, that phonological awareness must be linked to a knowledge of how letters map onto phonemes if improvements in phonological awareness are to be translated into improvements in reading (Hatcher et al., 1994; Melby-Lervåg et al., 2012).

Word learning has been more strongly associated with working memory than with phonological awareness (Gathercole et al., 1997, 1999; Avons et al., 1998). For example, Gathercole et al. (1999) reported an association between phonological working memory and vocabulary size in both 4-year-old and teenage children. Experimental studies by Jarrold et al. (2009) and Majerus and Boukebza (2013) reported a relationship between verbal working memory and ability to learn the form (rather than the referent) of new words by children and teenagers while Martin and Ellis (2012) found that word learning in an artificial second language by university students was predicted by performance on phonological short-term / working memory taks. Short-term and working memory have consistently been found to be impaired in dyslexia (Swanson et al., 2009) which may relate to the problems in word learning mentioned above.

## **MATERIALS AND METHODS PARTICIPANTS**

Participants were 30 students with a diagnosis of dyslexia (20 female, 10 male) and 30 typical readers who served as a control group (12 female, 18 male). The dyslexic students had a mean age of 21.5 years (*SD* = 3.6; range 17–36) while the controls had a mean age of 20.7 years (*SD* = 3.2; range 17–32). All were native speakers of English with normal or corrected-to-normal vision. The participants were students at the University of York (*n* = 27 per group), York Saint John University (*n* = 1 per group) and York College (*n* = 2 per group). The participants with dyslexia had all been diagnosed by a registered educational psychologist and supplied a copy of their diagnosis documents to the experimenters. Individuals with additional learning disabilities, a history of mental illness, epilepsy or other neurological disorders were excluded. Participants received either course credit or a small payment. The experiment was approved by the Ethics Committee of the Department of Psychology, University of York.

## **TEST BATTERY**

The psychological test battery given to all the participants contained tests assessing vocabulary, reading and spelling, phonological awareness, working memory, nonverbal ability and motor speed. Published tests were scored according to the test manuals and the results are presented as standardized scores.

## *Vocabulary*

Vocabulary was assessed using the Vocabulary subtest of the WASI which requires participants to define words verbally.

### *Word reading*

Word reading was assessed using the reading subtest of the Wide Range Achievement Test (WRAT 4; Wilkinson and Robertson, 2006) which involves reading aloud single words of increasing length and difficulty (from see to synecdoche) and the Sight Word Efficiency subtest of the Test of Word Reading Efficiency (TOWRE SWE; Torgesen et al., 1999) which requires participants to read aloud as many words of increasing length and difficulty as possible in 45 s.

### *Nonword reading*

Nonword reading was assessed using the Phonemic Decoding Efficiency (PDE) subtest of the TOWRE which requires participants to read aloud as many nonwords of increasing length and difficulty as possible in 45 s.

### *Word spelling*

Word spelling was assessed using the Spelling Subtest of the WRAT 4 which requires participants to write single words to dictation.

### *Phonological awareness*

Phonological awareness was measured using that part of the elision test from the Comprehensive Test of Phonological Processing (CTOPP; Wagner et al., 1999) in which a single initial, medial or final phoneme of a word must be deleted and the participant must say what remains (e.g., deleting the /k/ from "fixed" and responding "fist").

### *Working memory*

Working memory was assessed using four tests from the Automated Working Memory Assessment (AWMA; Alloway, 2007). All the tests used span procedures in which sequence lengths were increased to the point where three or more errors were made within a block of trials. Standardized scores were calculated for each test. *Verbal short-term memory* was measured using immediate serial recall of lists of digits presented auditorily at a rate of 1/s. *Verbal working memory* was assessed using a test in which participants were presented with a sequence of spoken sentences. They were required to decide whether each sentence was true or false then recall the final words of each of the sentences at the end of the sequence. *Visuospatial short-term memory* was assessed using a dot matrix task in which a sequence of red dots appeared in squares of a 4 × 4 grid at a rate of one per 2 s. At the end of the sequence, the participant was required to touch the squares of the grid in the same order. *Visuospatial working memory* was measured using a spatial recall task. Participants were presented with pairs of shapes. The shape on the right always had a red dot in it. The shape on the left was either the same as the one on the right or different. The shape on the left could also be rotated with respect to the one on the right. The participant's task was first to say whether the two shapes were the same or different. After making those judgments to a sequence of pairs of shapes, the participant then had to indicate in the correct order where the red dot was positioned in each of the shapes on the right using a compass display with three points.

### *Nonverbal ability*

Nonverbal ability was assessed using the matrix reasoning subtest of the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999).

### *Motor speed*

Motor speed was assessed using a set of tapping tasks (Warmington et al., 2013a). Participants were asked to tap keys on a computer keyboard as many times as possible within 5 s. The start and end of each time interval was signaled both visually and auditory. The task consisted of three conditions with 6 trials in each condition. In Condition 1, the participants tapped one key using the index finger of their preferred hand as many times as possible. In Condition 2, the participants alternately tapped two keys using the index finger of their preferred hand as many times as possible. In Condition 3, the participants alternately tapped two keys using the first two fingers of their preferred hand as many times as possible. The score is the average time between taps across the three conditions.

### **EXPERIMENTAL STIMULI**

The experimental stimuli were 12 4-letter, single-syllable nonwords and 12 7-letter, two-syllable nonwords. To reduce problems of voice key activation, none of the stimuli began with a voiceless fricative ("f," "s," "sh," or "th"). The 4- and 7-letter items were matched on naming accuracy from a pilot study involving typical student readers. They were also matched on mean log bigram frequency (4-letter mean = 3.28, range 2.72–3.57; 7-letter mean = 3.27, range 3.10–3.43; Duyck et al., 2004) and on initial letters and phonemes. The 4-letter experimental nonwords were: *brup, carg, dreb, jeph, lont, munt, nate, plin, relb, trok, varb,* and *zort*. The 7-letter experimental nonwords were: *blispod, coftrip, drentcy, joshule, larquof, mattoch, nelpoon, pronnet, roffler, trimsol, vushood,* and *zadroon*. Sixteen additional nonwords (8 4-letter and 8 7-letter) were selected for use in practice trials prior to the main experiment.

## **PROCEDURE**

Participants attended for two sessions. The first session began with the participants reading and signing a consent form then completing the psychological assessment battery. That took approximately 45 min. After a break of around 10 min they began the experimental task. They were given practice at reading aloud 8 4-letter and 8 7-letter nonwords presented in a random order. That was followed by the 10 blocks of the experiment. Participants were seated approximately 60 cm from a computer screen on which the nonwords were displayed in black, lower case letters on a white background. The nonwords were presented in 18-point Times New Roman font with a height on the screen of approximately 10 mm. Each trial consisted of a centrallypresented fixation cross displayed for 1000 ms, followed by the nonword stimulus for 2000 ms, then a blank screen for 1000 ms before the next trial began. Participants were instructed to read each nonword aloud as quickly and as accurately as possible. The 24 nonwords were presented once in a random order. Participants were informed when the block was complete and pressed the space bar on a computer keyboard to initiate the next block when they were ready to continue. This process was repeated across 10 blocks with the stimuli being presented in a different random order in each block. Participants wore headphones with a highsensitivity microphone connected to a voice key that was linked to the computer. Presentation of the stimuli and recording of naming latencies was controlled by E-prime experiment generator software (version 1.2; Schneider et al., 2002). The experimenter noted any trials in which the participant misread a nonword, hesitated or made a false start or other form of error.

Participants returned 7 days later for the second session which was a repeat of session 2 involving reading all the experimental nonwords aloud 10 more times in 10 blocks using a different random order in each block.

## **RESULTS**

### **PERFORMANCE ON THE TEST BATTERY**

**Table 1** shows the results for the dyslexics and controls on the battery of tests together with the results of *t*-tests comparing the two groups along with the effect sizes (*r*; Field, 2009). Dyslexics performed significantly less well than the controls on every test except nonverbal reasoning. The effect sizes for the differences between the groups were largest for nonword reading, followed by spelling and word reading. The effect sizes for the differences between groups on verbal and visuospatial working memory tasks were similar.

### **PERFORMANCE ON THE EXPERIMENTAL TASK**

Naming errors, hesitations and failures to activate the voice key were removed from the analysis of performance on the experimental task along with RTs less than 100 ms or longer than 2.5 *SD*s above the mean (defined separately for each participant in each block and for each length). Table S1 (Supplementary Materials) shows the full results (accuracy and mean RTs for correct, trimmed responses). Accuracy was very high (97.3% correct overall and never less than 95.5% correct for either group in any condition or block of trials). Given the high levels of accuracy in both groups, nonparametric Mann-Whitney *U* tests found no significant difference between dylexics and typical readers on overall accuracy across the two days for either 4-letter nonwords, *U*(60) = 464, *Z* = 0.208, *p* = 0.835, or 7-letter nonwords, *U*(60) = 346, *Z* = −1.548, *p* = 0.122. Wilcoxon matched pairs, signed ranks tests found no difference between accuracy for 4- vs. 7-letter nonwords across the two sessions for both groups of participants combined, *W*(12) = 23.0, *Z* = 1.26, *p* = 0.209.

### *Naming latencies (RTs)*

The main analyses focused on the RT data from the experimental task. **Figure 1** shows the pattern of RTs for correct, trimmed responses across blocks for the dyslexics (in red) and the controls (in blue). Inspection of the figure indicates that naming latencies were slower for the dyslexics than the controls throughout the experiment. At the start of the experiment, both groups were


**Table 1 | Results of the dyslexic and typical readers on the psychological test battery.**

slower to read aloud 7- than 4-letter nonwords. The difference in naming RTs for shorter and longer nonwords reduced with repetitions, but the dyslexic participants appear to have required more exposures to the nonwords before the RTs for shorter and longer items converged. These indications were explored in a series of ANOVAs. When Mauchly's test of sphericity was significant, the Greenhouse-Geiger correction was applied. Full details of the statistical analyses are presented in the Appendix (Supplementary Materials) where effect sizes are reported in terms of the partial eta squared statistic (η<sup>2</sup> *<sup>p</sup>*). We will summarize the important outcomes here.

*Global analysis.* The first ANOVA was a global analysis conducted on the RT data for both testing sessions with Group, Day, Blocks and Length as factors. There were significant main effects of Group (faster overall RTs for the controls than the dyslexics), Day (faster RTs on day 7 than day 1), Blocks (RTs becoming faster across blocks) and Length (faster overall RTs to 4- than 7-letter nonwords). All of the interactions were significant, including the interaction between Group and Length (larger length effects in the dyslexics than the controls) and Groups × Blocks × Length (the reduction in the length effect across blocks occurring more quickly in the controls than in the dyslexics). These results were explored further by means of separate analyses of RTs in day 1 and day 7, including separate analyses of the performance of the dyslexic and control groups on each day.

*Day 1.* Day 1 RTs were analyzed with Group, Blocks and Length as factors. There were significant main effects of Group (faster RTs in the controls than the dyslexics), Blocks (RTs becoming faster across blocks) and Length (faster RTs to 4- than 7-letter nonwords). All of the interactions were significant. Day 1 RTs were then analyzed separately for controls and dyslexics. The controls showed significant main effects of Blocks and Length with a Blocks × Length interaction. Bonferroni-corrected *t*-tests were used to compare RTs to 4- and 7-letter nonwords in blocks 1–10. The effect of length was significant for the controls in blocks 1, 2, and 3 but was no longer significant from block 4 onwards. The dyslexics also showed effects of Blocks and Length combined with a Blocks × Length interaction. In their case, Bonferroni-corrected *t*-tests found effects of length in blocks 1–5, 7, 9, and 10 with marginally significant effects in blocks 6 and 8 (see Appendix; Supplementary material).

In sum, nonword naming RTs in day 1 were slower for the dyslexics than the controls. Both groups showed significant effects of length in the first three blocks, but while the controls showed no difference in naming speed after block 3, the dyslexics continued to show longer RTs to 7- than 4-letter nonwords throughout day 1.

*Day 7.* The next set of analyses focused on RTs in day 7. As in day 1, there were main effects of Group (faster RTs in the controls than the dyslexics), Blocks (RTs becoming faster across blocks) and Length (faster RTs to 4- than 7-letter nonwords). A significant Blocks × Length interaction reflected an overall reduction in the effect of length across blocks. There were also significant Group x Blocks and Group × Length interactions reflecting more change across blocks and stronger effects of length in the dyslexics than the controls. The 3-way Group × Blocks × Length interaction was marginally significant (*p* = 0.06). These interactions were explored further by means of separate analyses of day 7 RTs for controls and dyslexics.

Controls showed effects of Blocks and Length on day 7 with a significant Blocks × Length interaction. Bonferroni-corrected *t*-tests found a difference in RTs to 4- and 7-letter nonwords in block 1 only. Dyslexics also showed effects of Blocks and Length with a Blocks × Length interaction. In their case, Bonferronicorrected *t*-tests found effects of length in blocks 1, 2, and 3, but not from block 4 onwards.

In sum, the controls showed a small effect of length at the start of day 7, but that effect disappeared by block 2. Dyslexics required 3 or 4 presentations in day 7 before they began to show (for the first time) no significant difference between naming RTs to short and long nonwords.

## **PREDICTORS OF INTIAL NONWORD READING SPEED AND NOVEL WORD LEARNING**

The final set of analyses brought together performance on the test battery with two aspects of the naming latency data. Nonlexical reading skill (decoding) was measured in terms of RTs to 7-letter nonwords in block 1 of day 1 while novel word learning was measured in terms of the change in RTs to 7-letter nonwords from block 1 to block 10 on day 1.

The number of predictor variables was reduced before the regression analyses were run, and some of the variables were transformed to improve the normality of their distributions. There were high correlations among the two word reading tests and the word spelling test (*r*s = 0.67–0.84, all *p*'s < 0.001). A composite Literacy score was therefore calculated for each participant by averaging the standardized scores from the WRAT Reading, TOWRE word reading and WRAT Spelling tests. To avoid using nonword reading in one task to predict nonword reading in another task, performance on the TORE-PDE nonword reading task was not included in the composite Literacy score. Substantial correlations were also observed among the four tests of working memory (*r*s = 0.50–0.56, all *p*'s < 0.001). A composite Working memory score was therefore computed for each participant by averaging the standardized scores from the four working memory tasks.

Univariate normality was tested for each predictor and the dependent variables (RTs to 7-letter nonwords in blocks 1 and 10 of day 1). Phonological awareness, Nonverbal ability and Motor speed were found to violate the assumption of normality (Kolmogorov-Smirmov test of normality, *p* < 0.05). Distributions approximated normality most closely when Phonological awareness was reverse transformed (thereby reversing the normal direction of correlations) and Nonverbal ability and Motor speed were square root transformed. RTs were log transformed to reduce skew.

Reducing the number of variables helps to reduce the risks associated with multicollinearity (intercorrelation among the predictor variables). Multicollinearity among the final versions of the predictor variables was assessed using the variance inflation factor (VIF). VIF scores of less than 4 indicate that the result will not significantly influence the stability of the parameter estimates (Myers, 1990). VIF scores for the predictor variables ranged between 1.04 and 3.01.

**Table 2** shows the correlations among the final predictor variables; also the correlations between the predictor variables and RTs to 7-letter nonwords in block 1 of day 1. There were significant correlations among all the predictor variables except Nonverbal ability which did not correlate significantly with any of the other predictors. All of the predictors except Nonverbal ability correlated significantly with RT, with Literacy showing the highest correlation, followed by Vocabulary, Working memory, Motor speed and Phonological awareness.

Linear mixed effects modeling was used to explore the ability of Vocabulary, Literacy, Phonological awareness, Working memory, Nonverbal ability and Motor speed to predict initial nonword reading speed and novel word learning. Linear mixed effects (LME) methods analyze all the available data and do not rely on averaging across participants or across items. They are


**Table 2 | Correlations among the predictor variables, and between the predictor variables and naming RTs for 7-letter nonwords in block 1 of day 1.**

*\*p* < *0.05, \*\*p* < *0.01. Note that phonological awareness was reverse transformed (thereby reversing the normal direction of correlations). Nonverbal ability and motor speed were square root transformed. RT was log transformed.*

particularly useful for analysing data from heterogeneous groups (such as individuals with dyslexia) because they allow differences in the baseline performance among participants and items (*random effects*) to be separated from the effects of the predictor variables (*fixed effects*) (Baayen et al., 2008; Jones et al., 2008). The analyses were conducted in R using the lme4 (Bates et al., 2012) and languageR (Baayen, 2009) packages.

### *Predicting initial nonword reading speed*

The contribution of each predictor variable to predicting RTs for 7-letter nonwords presented in block 1 of day 1 was evaluated by using likelihood ratio tests to compare a model that contained all the fixed and random effects with a sequence of models in which different predictor variables were removed one at a time. These analyses showed that Literacy made a significant independent contribution to predicting nonword naming speed, χ2 (10) = 16.12, *p* < 0.001; β = −0.005, *t* = −4.30, *p* < 0.001. In contrast, Vocabulary, χ<sup>2</sup> (10) = 2.71, *p* = 0.096, Phonological awareness, χ<sup>2</sup> (10) <sup>=</sup> <sup>1</sup>.41, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.235, Working memory, <sup>χ</sup><sup>2</sup> (10) = <sup>1</sup>.53, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.217, Nonverbal ability, <sup>χ</sup><sup>2</sup> (10) = 1.37, *p* = 0.243, and Motor speed, χ<sup>2</sup> (10) = 1.12, *p* = 0.293, made no independent contributions.

### *Predicting learning*

Novel word learning was assessed in terms of the change in naming RTs for 7-letter nonwords between blocks 1 and 10 of day 1. RTs from both blocks were entered into the analysis. A categorical variable of Time was created to reflect the change in RTs between blocks 1 and 10. A set of predictor variables were then created which were the interactions involving Time with Vocabulary, Literacy, Phonological awareness, Working memory, Nonverbal ability and Motor speed. This makes it possible to evaluate the contribution of each independent variable to predict change in naming RTs to the 7 letter nonwords across blocks (Shek and Ma, 2011; Field, 2012). The effect of the categorical variable of Time was significant, χ2 (11) = 516.29, *p* < 0.001, reflecting the reduction in RTs from block 1 to block 10. The interactions of Time with Vocabulary, χ2 (17) = 6.57, *p* < 0.05; β = 0.002, *t* = 2.57, *p* < 0.05, and Time with Working memory, χ<sup>2</sup> (17) = 26.12, *p* < 0.001; β = 0.003, *t* = 5.14, *p* < 0.001, were also significant. The interactions of Time with Literacy, χ<sup>2</sup> (17) = 0.71, *p* = 0.401, Phonological awareness, χ<sup>2</sup> (17) <sup>=</sup> <sup>1</sup>.79, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.181, Nonverbal ability, <sup>χ</sup><sup>2</sup> (17) = <sup>3</sup>.65, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.100, and Motor skill, <sup>χ</sup><sup>2</sup> (17) = 0.10, *p* = 0.753, made no independent contributions to predicting RT change across blocks.

In sum, reading latencies for the more difficult, 7-letter nonwords seen for the first time correlated significantly with all of the predictor variables except Nonverbal ability. The highest correlation was with Literacy. When the ability of each of the variables to predict naming RT was assessed in the context of the other variables (in analyses which took into account the differences between participants and items in overall naming speed), only Literacy was significant. Novel word learning was assessed as the change in RTs for 7-letter nonwords between blocks 1 and 10 of day 1. Only Vocabulary and Working memory predicted the degree of learning across blocks in session 1.

## **DISCUSSION**

The adult dyslexics in the current experiment were all studying at university or in a college of higher education. They performed at a comparable level to typically-reading controls on a test of nonverbal ability (matrix reasoning) but had lower vocabulary scores, slower and less accurate reading and spelling of words, less efficient reading of nonwords, poorer phonological awareness, poorer performance on both verbal and nonverbal tests of span and working memory, and slower motor speed. These findings match other reports in the literature that dyslexics in higher education have cognitive problems that extend beyond reading and writing to wider aspects of linguistic, working memory and motor performance while typically sparing nonverbal reasoning (cf. Bruck, 1992; Gallagher et al., 1996; Snowling et al., 1997; Hatcher et al., 2002; Smith-Spark et al., 2003; Smith-Spark and Fisk, 2007; Callens et al., 2012; Warmington et al., 2013b). The working memory problems extend to visuospatial as well as verbal tasks (cf. Smith-Spark and Fisk, 2007; Menghini et al., 2011; Hachmann et al., 2014).

The largest difference between dyslexics and controls in the present study (as indicated by the effect size) was on the TOWRE Phonemic Decoding Efficiency test (Torgesen et al., 1999), a test of nonword reading. A great deal of effort is put into teaching phonic decoding skills to dyslexic children in the UK (Rose, 2009). The dyslexics who participated in our study had mastered the letter-sound correspondences of English sufficiently to enable them to read correctly nonwords like *drentcy* and *larquof* on the first encounter, but they were substantially slower than the controls. The results of the TOWRE-PDE indicate that pronouncing unfamiliar nonwords (and, by extension, unfamiliar real words) remains a problem for dyslexics in higher education (cf. Bruck, 1990; Ben-Dror et al., 1991; Reid et al., 2006; Wolff, 2009).

In the experimental task, the typical readers behaved very similarly to the participants in Maloney et al. (2009) who were drawn from a similar population. Letter length exerted a major effect on reading speeds for nonwords seen for the first time, but the impact of length declined as naming latencies reduced across blocks, becoming nonsignificant from block 4 of day 1. The results showed, therefore, that skilled adult readers can create representations of unfamiliar letter sequences after 4 or 5 presentations that allow them to recognize and pronounce the novel "words" quickly and to process their component letters in parallel.

The dyslexics were substantially slower at reading the nonwords throughout both sessions of the experiment. When the dyslexics read the 7-letter nonwords for the first time in block 1 of day 1, they did so with a mean latency that was over 300 ms slower than the controls. When performance on the 4- and 7-letter nonwords was compared, the dyslexics required 57 ms per letter in order to pronounce a nonword seen for the first time where the controls required just 23 ms per letter (less than half as much as the dyslexics). Ability at reading and spelling real words ("literacy") predicted decoding speed across the two groups. When the effect of literacy was taken into account there was no additional effect of vocabulary, phonological awareness or working memory on decoding speed for these particular readers.

The dyslexics in the present study were clearly capable of visual word learning. **Figure 1** shows that their naming latencies reduced across blocks and that their naming latencies to 4- and 7-letter nonwords eventually converged. Learning occurred considerably more slowly than in the dyslexics, however, than in the typical readers. Whereas the difference in RTs between shorter and longer nonwords became nonsignificant in the typical readers around the middle of session 1, the dyslexics showed slower naming of longer nonwords throughout session 1, only losing the length effect part-way into session 2 (day 7). The present study confirms, therefore, that the problems with word learning that have been documented in dyslexic children persist into early adulthood, even in high-functioning dyslexics (cf. Reitsma, 1983; Ehri and Saltmarsh, 1995; Mayringer and Wimmer, 2000; Share and Shalev, 2004; Elbro and Jensen, 2005; Thomson and Goswami, 2010; De Jong and Messbauer, 2011).

Importantly, the naming latencies for the dyslexics remained substantially longer than those of the typical readers through to the end of session 2. **Figure 1** suggests that the difference between the two groups had more or less stabilized by the second half of session 2. We know that dyslexic university and college students read familiar words aloud more slowly than normal readers (Bruck, 1990; Ben-Dror et al., 1991; Reid et al., 2006; Wolff, 2009): one interpretation of that finding and the present evidence is that no amount of exposure to individual words will allow dyslexic students to reach the point where they can convert them from print to sound as efficiently as typical readers.

In terms of the DRC model of reading (Coltheart et al., 2001), less efficient reading of nonwords in the TOWRE-PDE test and in the experimental task indicates less efficient functioning of the nonlexical route in undergraduate dyslexics than in typical readers. Slower convergence between RTs to shorter and longer nonwords in the dyslexics suggest that the creation of new lexical entries in the orthographic input lexicon and the phonological output lexicon occurs less efficiently in adult dyslexics than typical readers. This results in a slower switch-over from sublexical to predominantly lexical reading in the dyslexics. Finally, the fact that nonword reading remains slower in the dyslexics than the controls even at the end of session two, combined with the fact that adult dyslexics are slower than controls to read familiar words aloud, indicates that the lexical route also functions less efficiently in adult dyslexics than in typical readers. That could be due to slower operation of the two lexicons or the pathways between them, or it could also be due to less efficient functioning of the final stages involving activating phoneme sequences and converting those sequences into articulation. Problems at the phonological output stage in dyslexics that compromise the functioning of both the lexical and nonlexical routes would be compatible with other evidence for impairments in dyslexics at the speech output stage (see Coltheart, 2005; Ziegler et al., 2008; Hawelka et al., 2010, for discussions of developmental dyslexia within a DRC framework).

Across the two groups, the ability to learn novel words (measured here as the change in RTs to longer nonwords between blocks 1 and 10 of day 1) was predicted by vocabulary and working memory. Ricketts et al. (2007) found that vocabulary predicted the ability of normal 8–10-year-olds to read words with irregular or exceptional spellings but did not predict their ability to read nonwords. By definition, irregular words like *deaf* or *yacht* violate the grapheme-phoneme correspondences of English. Nonlexical procedures cannot read those words correctly: readers must rely instead on word-specific learning and the creation of lexical entries. The results of Ricketts et al. (2007) are therefore in line with the present findings, albeit for a younger group of readers.

If a reader has a larger vocabulary, novel words they encounter in reading are likely to have more orthographic and phonological neighbors; that is, familiar words that look and sound like the novel words, differing from them by only a few letters or phonemes. Storkel et al. (2006) taught adults novel spoken words paired with novel objects through stories and pictures. Learning was better for nonwords with many neighbors than for nonwords with few neighbors. In the DRC model, words that are already established in the orthographic and phonological lexicons support the processing of new words or nonwords which resemble them. This is done through interactions between the two lexicons and the systems that encode and represent letter and phoneme sequences. Those interactions allow the model to process nonwords with many neighbors more efficiently than nonwords with fewer neighbors. Lexical support for novel words during learning could explain the advantage for nonwords with many neighbors reported by Storkel et al. (2006) and the benefit of a larger vocabulary found by Ricketts et al. (2007) and in the present study.

As regards the contribution of working memory, we noted in the Introduction that studies of children and young adults by Jarrold et al. (2009), Majerus and Boukebza (2013) and Martin and Ellis (2012) found a relationship between working memory and the ability to learn novel words, with working memory apparently related more closely to acquiring new word-forms rather than their meanings. Those observations fit well with the present findings. The DRC model does not engage with the working memory literature directly, but an important part of working memory is the interaction between short- and long-term memory systems exemplified by the interaction between phoneme representations and lexical entries (the phonological output lexicon in the DRC model). Jarrold et al. (2009) and Martin and Ellis (2012) explained the relationship they observed between verbal short-term memory and word learning in terms of individual differences in the ability to maintain accurate phonological representations of novel words. Majerus et al. (2006) argued that maintaining information about the order of phonemes in words is particularly important for successful word learning. In that context, we note the report by Hachmann et al. (2014) that short-term recall of order information is particularly impaired in dyslexia, which may contribute to their word learning problems.

Phonological awareness did not emerge as a predictor of either initial naming RTs or learning when the contributions of the other predictors were taken into account. Research has established that phonological awareness alone is not enough to improve decoding skills: only when phonological training is combined with training on the mappings between letters and phonemes does reading improve (Hatcher et al., 1994; Melby-Lervåg et al., 2012). Knowledge of the links between letters and sounds may be better captured by the kind of measures of word reading and spelling that went into the Literacy variable in the present study than by phonological awareness based on spoken stimuli and responses.

In conclusion, our results show that adult dyslexics in the UK university and further education system continue to experience difficulty reading novel words and nonwords. They are slower to read nonwords aloud than typical readers, requiring more time per letter to pronounce unfamiliar sequences of letters. They show learning of novel words as a result of repeated exposures, but they require more exposures than typical readers before they establish effective lexical representations. Even after multiple presentations their speed of reading aloud is substantially slower than typical readers. They remain slower than typical readers even at reading familiar words aloud. Across both dyslexic and typical readers, decoding speed for nonwords was predicted by skill at reading and spelling real words ("literacy") while individual differences in word learning were predicted by vocabulary size and working memory. As others have also shown, the problems that adult dyslexics experience extend beyond reading and spelling to word learning, vocabulary, phonological awareness, working memory and even basic motor speed. Taken together, those problems will conspire to make it very challenging for adult dyslexics to function successfully within higher education.

## **ACKNOWLEDGMENTS**

We thank Dr. Lisa Henderson and Dr. Meesha Warmington for advice on the design of the study and selection of the test battery. Jo Coulthard helped with recruiting participants while Carol Yuan and Madeline Croucher assisted with data gathering.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00264/abstract

### **REFERENCES**


*J. Speech Lang. Hear. Res.* 49, 1175–1192. doi: 10.1044/1092-4388 (2006/085)


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 December 2013; paper pending published: 24 January 2014; accepted: 09 April 2014; published online: 06 May 2014.*

*Citation: Kwok RKW and Ellis AW (2014) Visual word learning in adults with dyslexia. Front. Hum. Neurosci. 8:264. doi: 10.3389/fnhum.2014.00264*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Kwok and Ellis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Influence of context-sensitive rules on the formation of orthographic representations in Spanish dyslexic children

## *Paz Suárez-Coalla\*, Rrezarta Avdyli and Fernando Cuetos*

*Department of Psychology, University of Oviedo, Oviedo, Spain*

### *Edited by:*

*Donatella Spinelli, Università Degli Studi di Roma "Foro Italico", Italy*

#### *Reviewed by:*

*Angela Heine, Hochschule Rhein-Waal, Germany Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

### *\*Correspondence:*

*Paz Suárez-Coalla, Department of Psychology, University of Oviedo, Plaza Feijoo s/n, 33003 Oviedo, Spain e-mail: suarezpaz@uniovi.es*

Spanish-speaking developmental dyslexics are mainly characterized by poor reading fluency. One reason for this lack of fluency could be a difficulty in creating and accessing lexical representations, because, as the self-teaching theory suggest, it is necessary to develop orthographic representations to use direct reading (Share, 1995). It is possible that this difficulty to acquire orthographic representations can be specifically related to words that contain context-sensitive graphemes, since it has been demonstrated that reading is affected by this kind of graphemes (Barca et al., 2007). In order to test this possibility we compared a group of dyslexic children with a group of normal readers (9– 13 years), in a task of repeated reading. Pseudo-words (half short and half long) with simple and contextual dependent rules were used. The length effect reduction on the reading speed, after repeated exposure, was considered an indicator of orthographic representation development, as the length effect is strong when reading unknown words, but absent when reading familiar words. The results show that dyslexic children have difficulties in developing orthographic representations, not only with context-sensitive graphemes, but also with simple graphemes. In contrast to the control children, in the dyslexic group differences between reading times for short and long stimuli remained without significant changes after six presentations. Besides, this happened with sensitive context rules and also with simple grapheme–phoneme conversion rules. On the other hand, response and articulation times were greatly affected by length in dyslexic children, indicating the use of serial reading. Results suggest that the problems related to storing orthographic representations could be caused by a learning deficit, independently of whether the word contained context-sensitive rules or not.

**Keywords: dyslexia, orthographic representations, fluency, transparent orthography, context-sensitive rules**

### **INTRODUCTION**

Dyslexic children learning to read in transparent orthographic systems make relatively few errors in the reading of words when compared with dyslexics using opaque orthographic systems (Rack et al., 1992; Yap and Van der Leij, 1993; Sprenger-Charolles et al., 2000; De Jong and van der Leij, 2002; Nikolopoulos et al., 2003). Several cross-linguistic studies have shown that orthographic depth largely determines the reading skills of dyslexics, so that decoding problems are more evident in opaque orthographic systems, such as English, than in transparent orthographic systems (Wimmer and Goswami, 1994; Landerl et al., 1997). It appears that the high consistency between graphemes and phonemes facilitates learning of the alphabetic code, and consequently reading accuracy, even in dyslexic children.

Dyslexics in transparent orthographic systems, however, fail to achieve an acceptable level of reading speed (Wimmer, 1993; Spinelli et al., 2005); their reading is generally slow and laborious, similar to the reading of dyslexics in deep orthographic systems. (Oney and Goldman, 1984; Landerl, 2001; Zoccolotti et al., 2005; Suárez-Coalla and Cuetos, 2012). As the reading speed problems are more striking than the accuracy problems for these dyslexics (although they are also more error-prone than age-matched children), difficulty in acquiring reading speed is considered a marker

of dyslexia in transparent orthographic systems such as Spanish, German, Italian, or Greek (Ziegler et al., 2003; Davies et al., 2007; Constantinidou and Stainthorp, 2009; Wimmer et al., 2010). In particular, dyslexics are much slower than normal children reading long words and non-words (Davies et al., 2007; De Luca et al., 2008; Suárez-Coalla and Cuetos, 2012).

Why don't dyslexic children develop reading fluency? Is it because they have difficulties learning and automating the grapheme–phoneme conversion rules? Probably, because differences between dyslexic and normal children are bigger in reading non-words (Yap and Van der Leij, 1993; Snowling, 1995), which indicate problems in using the sublexical route. Although in recent years, several authors have questioned this conclusion mostly based on methodological considerations (see Van den Broeck and Geudens, 2012, for a very thorough discussion). But dyslexics are also poorer in reading familiar words (Hatcher et al., 2002), which indicates difficulties to develop orthographic representations of the words. Then, what does prevent them from forming and accessing orthographic representations? According to the self-teaching theory, orthographic representations are developed through accurate and repeated reading (Reitsma, 1989; Share, 1995, 1999; Cunningham et al., 2002; De Jong et al., 2009). If dyslexics have difficulty developing orthographic representations, they should be even slower than normal individuals when reading frequent words, because they also have to read them by the sublexical route; indeed, some studies have confirmed this observation (Defior et al., 1998; Barca et al., 2006). The self-teaching hypothesis has been tested in different languages (Hebrew, English, and Dutch), with the findings suggesting that few exposures are required to form orthographic representations and pass from a sublexical to a lexical reading. However, this transition is more difficult for children with dyslexia (Manis, 1985; Reitsma, 1989). Additionally, it has been suggested that dyslexics are inefficient in learning graphemic materials because they were slower than controls in the learning rate of novel words when previous experience with texts was minimized (Pontillo et al., 2014).

Different methodologies have been proposed to investigate when the orthographic representations are formed (writing from dictation, choosing between several homophones of the target stimulus, reading latencies, etc.), but one widely recently used is based on the reduction of the length effect. From the study by Weekes (1997), it is well known that in typical readers, word length (number of letters) has a large influence on reading unfamiliar words and pseudowords, but has a small effect on low frequency words and no effect at all on high frequency words. The explanation, according to the dual route model, is that low frequency words and pseudowords are read in a serial or sublexical way, so the more graphemes, the larger the latencies. On the other hand, familiar words have a representation in the orthographic lexicon; so, when reading familiar words, all of the letters are identified in parallel, and the difference between the latencies of short and long words disappears (Coltheart et al., 2001). Therefore, the formation of orthographic representations will be reflected in a reduced length effect after repeated exposures to pseudowords (repeated reading). As a demonstration of this effect, Maloney et al. (2009) asked a group of participants to read the 100 Weeks study. They found that reading times were getting shorter, but more crucially, there was a reduction in the length effect: differences between long and short non-words became increasingly smaller.

In a recent study, Kwok and Ellis (2014) used this length effect methodology to study the formation of orthographic representations in dyslexic adults. Participants had to read a list of 24 non-words, half short and half long, repeated across ten blocks. Results showed a reduction in the difference in reading latencies between short and long words across blocks in normal readers, but dyslexics only showed convergence in the second session 7 days later. It seems that adult dyslexics need more exposures than control readers to create lexical representations.

With the same methodology, Suárez-Coalla et al. (2014) presented eight unfamiliar words, four long and four short, to a group of Spanish children with dyslexia and a control group to read in six different blocks. In a first experiment the unfamiliar words were presented within the context of a story and in a second experiment the words were presented in isolation. Reading and articulation times for the first and last block of the unfamiliar words were compared. In both experiments a decrease of the influence of length for the control group was found. However, for the dyslexic children, the influence of length remained unchanged after the repeated reading of the unfamiliar words. These results seem indicate that dyslexic children may be unable to develop orthographic representations, at least after six exposures, and that they may need to read each word more times.

Why do dyslexic children need more exposures to the words than normal children? It is quite possible that these results are a consequence of the difficulties they have in using grapheme– phoneme rules. Slow and inaccurate reading could prevent these children from developing orthographic representations. If so, the difficulties in forming orthographic representations will be higher for words associated with difficult rules, as for example those containing context-sensitive graphemes (Rastle and Coltheart, 1998; Rey and Schiller, 2005; Barca et al., 2007).

The Spanish orthographic system has 30 graphemes and is highly consistent; however, the pronunciation of "c" and "g"depends on the letter that follows (e.g., the letter "g" is pronounced as /γ/ when it is followed by "a," "o," "u"; but it is pronounced as /χ/ when followed by "e" and "i"). Therefore, the Spanish orthographic system is transparent, but contains some context-sensitive graphemes. Reading words and pseudowords is affected by graphemic complexity (complex GPC rules) in different languages (English: Rastle and Coltheart, 1998; French: Rey and Schiller, 2005; Italian: Barca et al., 2006). In Italian, a transparent language similar to Spanish, the graphemic complexity (contextuality) was tested in young Italian readers (third and fifth grades) using words with simple or contextual letter-sound conversion rules (Barca et al., 2007). In both groups, the words with contextual rules were read more slowly than words with simple rules. According to this result, we predict that it would be harder to build up orthographic representations for novel words that contain context-sensitive GPC rules than words that are made up of simple GPC rules only. We consider that this effect could be stronger for children with dyslexia than for normal readers because these rules are more difficult to learn and decode for dyslexics. So if these children have problems automating GPC rules, the context-sensitive GPC rules could entail an increased difficulty for them.

As a consequence, our goal in this study was to test, using the length effect reduction, if dyslexic Spanish children have problems in developing orthographic representations after repeated reading, and if these problems are greater when words include context-sensitive graphemes. Therefore, including context-sensitive graphemes would allow us testing whether dyslexics have poor learning of pseudowords because they have difficulties in learning and automating the grapheme–phoneme conversion rules.

In addressing that objective, we compared a group of Spanish dyslexic children with a group of normal readers on a task of repeated reading of pseudowords. Reading and articulation times were collected in order to discover if differences between short and long pseudowords decreased after repeated reading, and if this reduction of length effect was context-sensitive. In addition to reaction times (RTs), we have included articulation time (ATs), following previous studies (Davies et al., 2012; Suárez-Coalla and Cuetos, 2012), where ATs was a measure sensitive to the reader's ability and characteristics of the stimuli.

### **EXPERIMENT PARTICIPANTS**

A total of 50 children took part in the study, all native Spanish speakers with normal, or corrected to normal vision and without any known cognitive impairment (apart from dyslexia). Children did not have sensory disorders. They all had received adequate schooling. Twenty five were dyslexics: their ages ranged between 8 and 13 years (*M* = 10.36, SD = 1.5) and 25 were normal readers (*M* = 10, SD = 1.5). Both groups were matched by gender (13 female and 12 male) and age. The dyslexic children were attending a private center for individualized treatment and received special attention in school. Both groups shared the same social background (middle-class families in all cases). For the diagnosis of the dyslexic children, in addition to the Wechsler Intelligence Scale for Children (WISC; Wechsler, 2001), a Spanish reading process assessment battery – PROLEC-R (Cuetos et al., 2007) was used. The battery was administered individually and required the child to read aloud a list of 40 words and pseudowords as quickly and as accurately as they could. These words varied quite broadly in frequency (high or low) as well as in length (five and eight letters). Accuracy and reading speed (measured as the time taken to complete the task) were scored. Children in the control group were also assessed using the PROLEC-R battery and the WISC test. The average intelligence quotient (IQ) in the dyslexic group was 106, ranging from 90 to 116; in the control group the mean IQ was 115 ranging from 95 to 126. Both groups were matched regarding performance IQ; the dyslexic children differed significantly from the control group in verbal IQ (*p* = 0.006; common in people with dyslexia, and reported in other studies, e.g., Perea et al., 2014). Reading scores varied between the dyslexic and the control group; besides the dyslexic group scores were 1.5–2 SD below the average for each age category in the reading assessment battery (see **Table 1**). Furthermore, we confirmed significant differences between groups (dyslexics vs. controls) in reading accuracy of words [*t*(48) = −5.18; *p* < 0.001]; reading speed of words [*t*(48) = 4.90; *p* < 0.001]; reading accuracy of pseudowords



*IQ, intelligence quotient; M, mean; SD, standard deviation.*

[*t*(48) = −7.62; *p* < 0.001]; and reading speed of pseudowords [*t*(48) = 5.88; *p* < 0.001].

The study was approved by the Ethics Committee of the Department of Psychology, University of Oviedo. Before performing the experiments, informed written consent from all parents and teachers was obtained. A document was given to the parents describing the objectives of the study, the type of tasks to be performed and their duration. The study involved only children whose parents signed the informed consent forms. Additionally, before starting the experiment, tasks were explained to the children and they were asked if they agreed to participate in the study. All children agreed to participate in the tasks.

### **MATERIALS**

Sixteen pseudowords in Spanish, half including consistent graphemes (d, t, m, or p), and half context dependent graphemes (g, j, z, or c), were used for this experiment. The pronunciations of contextual dependent graphemes varies according to the vowel that follows, as explained above. Half of the pseudowords were short (four letters, two syllables) and half long (six letters, three syllables); all had a consonant vocal (CV) syllabic structure (e.g., mepa, polato, zuge, gukato). According to the orthographic depth hypothesis, learning the alphabetic code is influenced by the orthographic consistency (Seymour et al., 2003); therefore, there would be more reading errors and lower reading fluency when stimuli are inconsistent, as opposed to when they are consistent. This would in turn cause difficulties in the formation of orthographic representations.

### **PROCEDURE**

The participants were asked to read aloud the pseudowords which were presented in random order within each of six blocks. For each trial, this sequence was followed: an asterisk was placed as a fixation point for 500 ms; this was followed by a blank screen for another 500 ms, and then the pseudoword appeared on the screen for another 3500 ms. A pilot study was conducted to determine timing of this sequence. We found that a shorter time was insufficient for children with dyslexia, as they did not have time to read the entire stimulus; without being able to read the stimulus, it would have been impossible for them to form representations. After each block, a pause was marked and participants pressed the space bar to continue. Before conducting the experiment, six practice trials were run in order to familiarize the children with the reading task. Stimuli were presented through the DMDX software (Forster and Forster, 2003) in a laptop computer (12--) using a 24 point Arial font, colored black on a white background. Once the children were seated, the following instructions appeared on the screen: "Some invented words are going to appear in the screen; you must read them aloud as quickly as possible and without making any mistake." The task was performed individually in a quiet room at the children's school, or in the private center. The test lasted approximately 15 min. The children were not corrected if they misread pseudowords, thus trying to simulate the natural conditions of individual reading (self-teaching). Once the data were gathered, they were analyzed with the CheckVocal (Protopapas, 2007) software in order to obtain the correct responses, the reaction and articulation times.

### **ANALYSIS**

Using the SPSS 19 statistical package, a mixed between-within subjects analysis of variance was conducted. Group (2: dyslexics *vs*. controls) was the between-subjects factor; and block (2: first *vs*. sixth), stimulus type (2: consistent vs. context dependent graphemes) and length (2: short *vs*. long) the within-subjects factors.

Two dependent variables were considered: RTs (the time from the stimuli appearing on the computer screen until the child began to read) and ATs (the time children spent reading the stimuli). The AT has not been widely used in the literature, although there are some studies that have used this measure (Davies et al., 2012; Suárez-Coalla and Cuetos, 2012). This measure seems interesting, as far as children are concerned, because the length effect in ATs could be an indicator (other than RTs) of serial reading. A length effect in the ATs of dyslexic children was found, which was interpreted as absence of orthographic representations and thus sequential reading (Suárez-Coalla and Cuetos,2012).We used only the correct responses for the RTs and ATs analyses; these responses are important in enabling us to discover if the lack of automatization of phoneme–grapheme rules is the problem concerning orthographic representations.

### **RESULTS**

A total of 4,800 responses were obtained from both groups, with 2,400 responses from each group. Considering the six blocks of stimuli, the dyslexic group committed a total of 465 errors (19.37%), and 73 non-responses (3.04%). Thirty (1.25%) responses were considered outliers (2 SDs above or below the mean). In contrast, the control group had a total of 207 errors (8.62%), three non-responses (0.12%), and 23 (0.96%) responses that were considered outliers. In the following analysis, only RTs and ATs to correct responses were used.

### *Reaction times*

In the ANOVA we found a main effect of group [*F*(1,48) = 46.400, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.502], with the dyslexic group slower than the control group; a block effect [*F*(1,48) = 4.690, *p* = 0.036, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.093], as a consequence of the reduction in the RTs across blocks; a stimulus type effect [*F*(1,48) = 15.897, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.257], with longer RTs for pseudowords with context-dependent graphemes than pseudowords with consistent graphemes; and a length effect [*F*(1,48) = 60.681, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.569], due to RTs being faster for short than for long stimuli. We also found a block by group interaction [*F*(1,48) <sup>=</sup> 10.370, *<sup>p</sup>* <sup>=</sup> 0.002, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.184], as the difference between RTs in first and last block was greater in the control than in the dyslexic group; a length by group interaction [*F*(1,48) <sup>=</sup> 6.727, *<sup>p</sup>* <sup>=</sup> 0.013, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.128], showing a larger difference between short and long stimuli in the dyslexic than in the control group; and a block by length by group interaction [*F*(1,48) = 4.833, *p* = 0.033, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.095]. This latter interaction indicates that differences between short and long pseudowords decrease after repeated reading in the control group, but not in the dyslexic group (see **Figure 1**). In a more detailed analysis (comparing the first block with the rest of the blocks) it was found that the

reduction of the length effect was only significant for the last block.

In order to further explore the results and confirm the decrease of length effect in the control group, RTs were separately analyzed for control and dyslexic children. In the analysis of the control group, we found a block effect, [*F*(1,24) = 17.927, *p* = 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.428], indicating the decrease of RTs with the repeated reading in this group; a stimulus type effect, [*F*(1,24) = 18.051, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.429], showing faster RTs in pseudowords with consistent graphemes than in pseudowords with contextdependent graphemes; and a length effect [*F*(1,24) = 23.211, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.492], with longer RTs for long than short pseudowords.

We also found a stimulus type by length interaction [*F*(1,24) <sup>=</sup> 6.488, *<sup>p</sup>* <sup>=</sup> 0.018, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.213], as the length effect was smaller in pseudowords with consistent graphemes than in pseudowords with context-dependent graphemes; the block by stimulus type was close to significance [*F*(1,24) = 3.571, *p* = 0.071, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.130], indicating that RTs of pseudowords with consistent graphemes decreased more quickly with repeated reading; and a block by length interaction [*F*(1,24) = 18.629, *p* < 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.437], as the typical readers showed a length effect reduction with the repetitions.

By contrast, in the dyslexic group only a length effect was found [*F*(1,24) <sup>=</sup> 36.220, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.622], indicating slower RTs for the longer pseudowords. The stimulus type effect was close to significance [*F*(1,24) = 3.776, *p* = 0.065, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.146], with longer RTs for pseudowords with context-dependent graphemes than for pseudowords with consistent graphemes. No effect of block was found indicating that reading times did not decrease after 6 exposures in the dyslexic group (see **Table 2** for the RTs in block 1 and block 6).



*Stim. type, stimulus type (consistent vs. context-dependent graphemes); Cont. dep., context dependent; B1, block 1; B6, block 6; RTs, reaction times; ATs, articulation times; M, mean; SD, standard deviation.*

### *Articulation times*

In the ANOVA on articulation times, we found a main effect of group [*F*(1,48) <sup>=</sup> 13.166, *<sup>p</sup>* <sup>=</sup> 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.223], with longer ATs in the dyslexic group; block [*F*(1,48) = 6.267, *<sup>p</sup>* <sup>=</sup> 0.016, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.120], with longer ATs were longer in the first than in the sixth block; stimulus type [*F*(1,48) = 139.871, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.753], as pseudowords with contextdependent graphemes took more time than pseudowords with consistent graphemes; length [*F*(1,48) = 356.407, *p* < 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.886], with shorter ATs for short than long pseudowords. Moreover, we found a length by group interaction [*F*(1,48) <sup>=</sup> 27.866, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.377], and a block by length interaction [*F*(1,48) = 5.438, *p* = 0.024, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.106], indicating that the length effect was more evident in the dyslexic than in the control group and decreaded in the last compared to the first block. A more detailed analysis (comparing the first block with the rest of blocks) showed that the reduction of the length effect already appeared in block 5 (block by length interaction) and was maintained in block 6 (see **Figure 2**).

As with the RTs, we conducted separate analyses for controls and dyslexics. In the analysis of the control group data, we found a stimulus type effect, [*F*(1,24) = 278.345, *p* < 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.921], with faster ATs in pseudowords with consistent graphemes than in pseudowords with context-dependent graphemes; and a length effect [*F*(1,24) = 703.892, *p* < 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.967]. Moreover, the block by length interaction was significant [*F*(1,24) <sup>=</sup> 8.238, *<sup>p</sup>* <sup>=</sup> 0.008, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.256], as the length effect decreased across blocks.

In the group with dyslexia, we found a length effect, [*F*(1,24) <sup>=</sup> 144.292, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.868], with shorter ATs for short than in long stimuli; and a stimulus type effect [*F*(1,24) <sup>=</sup> 45.494, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>μ</sup><sup>2</sup> <sup>=</sup> 0.674], with longer ATs for pseudowords with context-dependent graphemes than pseudowords with consistent graphemes. By contrast, the block by length interaction was not significant, indicating that length continued to affect the ATs after six repetitions.

### **DISCUSSION**

In this study, we addressed the difficulty of Spanish-speaking dyslexic children in developing orthographic representations and investigated whether this difficulty is related to words that contain context-sensitive graphemes. In order to test this hypothesis, we compared children with dyslexia and typical readers using pseudowords with or without contextual grapheme–phoneme rules. The length effect reduction on reading speed, after repeated exposure, was considered as an indicator of orthographic representation development.

The results showed that dyslexic children were significantly slower at reading (RTs and ATs) than controls in all blocks, especially with long pseudowords. Additionally, children in the control group reduced the RTs across blocks (83 ms difference between the first and the sixth block), while the RTs of dyslexics remained the same through repetitions (2 ms difference between the first and the sixth block).

A critical finding was that typical readers showed a significant reduction of the length effect in the sixth block after repeated reading, i.e., the difference between short and long pseudowords was not significant in the last block (only 16 ms difference), suggesting development of orthographic representations. By contrast, dyslexic children continued to manifest a length effect in the sixth block (174 ms difference between short and long stimuli). These results are consistent with studies in other orthographic systems reporting that dyslexics have difficulties in storing the orthographic representations of words (Hogaboam and Perfetti, 1978; Manis, 1985; Reitsma, 1989; Ehri and Saltmarsh, 1995; Martens and de Jong, 2008; Kwok and Ellis, 2014).They also confirm results recently obtained with Spanish dyslexic children using the same methodology (Suárez-Coalla et al., 2014).

Regarding the orthographic consistency (i.e., consistent vs. context-dependent rules), we found that pseudowords with context-dependent rules are associated with longer RTs, ATs, and greater number of reading errors, in both dyslexics and controls. Therefore, it seems that context-dependent rules were more difficult to learn and automate, even for typical readers, in accordance with other studies (Rastle and Coltheart, 1998; Rey and Schiller, 2005; Barca et al., 2006). Nevertheless, the influence of the context-dependent graphemes on the formation of the orthographic representations seems to be stronger on the control group than on the dyslexic children, since in the normal children the reduction of the length effect after repeated reading was smaller for the pseudowords with these graphemes than for the pseudowords composed of consistent rules. On the other hand, in the dyslexic children, the length effect was similar for both types of graphemes. This suggests that dyslexics may be having problems forming orthographic representations even for words with consistent rules. In fact, dyslexic children were not able to develop orthographic representations with six exposures and continue using sublexical reading for all new words. They probably need more exposures to achieve a direct reading, as suggested by Kwok and Ellis (2014) in their study with dyslexic adults. Overall, we conclude that dyslexic children show a selective learning deficit in forming orthographic representations, independent of whether stimuli contained consistent or not context-sensitive rules. This independence from context-sensitive rules suggests a lexical locus for the learning difficulty of children with dyslexia.

Notably, dyslexic children remained slower than controls for both short and long stimuli. This highlights the known difficulties of dyslexics to read new words or pseudowords (Rack et al., 1992; Grainger et al., 2003; Suárez-Coalla and Cuetos, 2012). Their reading speed was more or less constant throughout the task, even showing longer RTs in the last block than controls on the first exposure. There is a possibility that inaccuracy interferes with orthographic learning because the correct mastering of the alphabetic code seems crucial; more times a word is accurately read, the greater the chances to store the representation in memory (Share, 1995). In this study, we found that dyslexic children made more

errors than children without dyslexia, although an improvement in reading accuracy along the blocks occurred for both groups of children.

Besides RTs, dyslexics were also slower than controls in ATs. This measure, similarly to the RTs, decreased along the blocks and was affected by orthographic consistency and length, with more time needed to pronounce context-dependent and long pseudowords, than consistent and short ones. These results are in keeping with Davies et al. (2012) proposal that cognitive processes continue after response onset when word pronunciations is still not yet fully prepared. We should underscore, however, that the length effect was stronger in dyslexics and, furthermore, it did not decrease across the blocks, as it did for the controls. This means that dyslexic children continue doing a serial reading, even after several repetitions.

Finally, considering these results and those of other studies (Kwok and Ellis, 2014; Suárez-Coalla et al., 2014), it will certainly be interesting to perform a study with a larger number of repetitions, and in different days, in order to know if dyslexic children are able to develop orthographic representations with more exposures (Kwok and Ellis, 2014).

In summary, this study addressed the formation of orthographic representations in dyslexic children and the possible influence of context-sensitive rules. Previous studies have investigated this issue, but this is the first time the possibility that the formation of orthographic representations depends on the presence of context-dependent rules has been studied. Our results indicate that Spanish dyslexic children have problems to form orthographic representations (independent of the presence of context dependent graphemes) and continue using sublexical reading even after several exposures.

### **ACKNOWLEDGMENT**

This study was funded by Grant PSI2012-31913 from the Spanish Government.

### **REFERENCES**


and typical word and pseudoword reading in a transparent orthography. *Read. Writ.* 26, 721–738. doi: 10.1007/s11145-012-9388-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 January 2014; accepted: 24 August 2014; published online: 04 December 2014.*

*Citation: Suárez-Coalla P, Avdyli R and Cuetos F (2014) Influence of context-sensitive rules on the formation of orthographic representations in Spanish dyslexic children. Front. Psychol. 5:1409. doi: 10.3389/fpsyg.2014.01409*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Suárez-Coalla, Avdyli and Cuetos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Tracking orthographic learning in children with different profiles of reading difficulty

## *Hua-Chen Wang\*, Eva Marinus , Lyndsey Nickels and Anne Castles*

*Department of Cognitive Science, ARC Centre of Excellence in Cognition and its Disorders, Macquarie University, Sydney, NSW, Australia*

### *Edited by:*

*Peter F. De Jong, University of Amsterdam, Netherlands*

### *Reviewed by:*

*Wim Van Den Broeck, Vrije Universiteit Brussel, Belgium David L. Share, University of Haifa, Israel*

### *\*Correspondence:*

*Hua-Chen Wang, Department of Cognitive Science, ARC Centre of Excellence in Cognition and its Disorders, Macquarie University, Sydney, NSW 2109, Australia e-mail: huachen.wang@mq.edu.au* Previous studies have found that children with reading difficulties need more exposures to acquire the representations needed to support fluent reading than typically developing readers (e.g., Ehri and Saltmarsh, 1995). Building on existing orthographic learning paradigms, we report on an investigation of orthographic learning in poor readers using a new learning task tracking both the accuracy (untimed exposure duration) and fluency (200 ms exposure duration) of learning novel words over trials. In study 1, we used the paradigm to examine orthographic learning in children with specific poor reader profiles (nine with a surface profile, nine a phonological profile) and nine age-matched controls. Both profiles showed improvement over the learning cycles, but the children with surface profile showed impaired orthographic learning in spelling and orthographic choice tasks. Study 2 explored predictors of orthographic learning in a group of 91 poor readers using the same outcome measures as in Study 1. Consistent with earlier findings in typically developing readers, phonological decoding skill predicted orthographic learning. Moreover, orthographic knowledge significantly predicted orthographic learning over and beyond phonological decoding. The two studies provide insights into how poor readers learn novel words, and how their learning process may be compromised by less proficient orthographic and/or phonological skills.

**Keywords: orthographic learning, developmental dyslexia, subtypes, phonological decoding, orthographic knowledge**

## **INTRODUCTION**

Orthographic learning has been defined as the transition from the slow sounding out of an unfamiliar new word to the rapid automatic recognition of the same word. It is widely acknowledged that beginning readers need to make this transition in order to become proficient readers (e.g., Ehri and Wilce, 1983; Share, 1995; Castles and Nation, 2008). In this study, we explored orthographic learning in children with poor reading ability and investigated the factors that are associated with their success in acquiring new orthographic representations.

Most developmental theories propose that the sounding out of words, phonological decoding, is an important mechanism for reaching the final stage of automatic reading (for a review, see Ehri, 2005). Among these theories, the self-teaching hypothesis is associated with a strong claim for the importance of phonological decoding in orthographic learning (Share, 1995, 1999). It proposes that phonological decoding is the first and most important step of orthographic learning, providing an opportunity for this learning to take place. The act of phonological decoding is proposed to allow the reader access to a word's spoken form, as well as to draw their attention to the order and identity of the letters. This, together with repeated exposure to the new word, assists the reader in establishing an orthographic representation. According to the self-teaching hypothesis, although phonological decoding is crucial in orthographic learning, it is not the only factor: there is a secondary, orthographic processing component, which also determines the success of orthographic learning, although the nature of this mechanism is little understood (Share, 1995, 2011).

If phonological decoding is important for acquiring orthographic representations, proficient phonological decoding processes should increase the likelihood of successful orthographic learning. Conversely, impaired phonological decoding processes should be expected to lead to difficulties in orthographic learning. Indeed, abundant studies seem to support the view that deficits in phonological processing skills may be a primary cause of reading difficulties (e.g., Rack et al., 1992; Stanovich and Siegel, 1994) as well as orthographic learning difficulties (Share and Shalev, 2004).

However, a large body of evidence on heterogeneity within the dyslexic population and on the existence of different subtypes of developmental dyslexia suggests that the relationship between phonological decoding skills and orthographic learning may not be straightforward (Castles and Coltheart, 1993; Manis et al., 1996; Stanovich et al., 1997; Valdois et al., 2003; Castles et al., 2010b; Jones et al., 2011; McArthur et al., 2013; Peterson et al., 2013). The outcomes of these studies show that impairment in phonological decoding and impairment in automatic whole-word recognition can occur selectively—one aspect of reading can be impaired while the other develops normally. Namely, children with a *surface dyslexia* profile struggle to read irregular words (e.g., *yacht*) but are not impaired in reading nonwords (e.g., *grep*). This indicates that they are specifically impaired in recognizing whole words, whilst their phonological decoding skills are intact. Conversely, children with a *phonological dyslexia* profile show impairments in nonword reading but not irregular word reading, indicating a specific impairment in phonological decoding processes while being intact in sight word recognition skills. Note that children with these profiles are seen as falling at the ends of a normal continuum of ability on the relevant reading subskills, not as qualitatively distinct subtypes.

In sum, based on the view that phonological decoding is the primary factor in successful orthographic learning, the reading profiles of surface dyslexia and phonological dyslexia seem to be somewhat of a paradox. How are some beginning readers able to build up orthographic knowledge despite poor decoding skills (phonological dyslexics) and why do some children with proficient decoding skills fail to build up orthographic representations (surface dyslexics)? In order to address these questions we need to examine how orthographic learning occurs in these two subtype profiles.

To date, only two studies have explicitly contrasted differences in orthographic learning in children with phonological and surface dyslexia. Castles and Holmes (1996) found that, as expected by their specific reading profiles, children with a surface dyslexia profile were poorer at learning novel irregular words (measured by an orthographic choice task in which the child had to choose the target item from its distracters) than children with a phonological dyslexia profile. However, since the novel words in their study were all irregular, it may be that children with a surface profile performed more poorly than children with a phonological profile because their usual phonological decoding strategy is not effective for such items (e.g., how would one phonologically decode a word like "laugh"?).

Bailey et al. (2004) built on the results of Castles and Holmes (1996) by comparing orthographic learning of both regular and irregular words in children with profiles of phonological dyslexia and surface dyslexia. They found that overall, the two profiles were no different from each other, but were both more impaired than chronological age controls in orthographic learning (as measured by reading accuracy). In addition, they found that both children with a surface profile and the controls showed an advantage in learning regular words as compared to irregular words. In contrast, children with a phonological profile showed no difference between regular and irregular words, suggesting that phonological decoding was not relied on during orthographic learning. Although this study provided more insight into the orthographic learning processes of these two profiles of dyslexia, there are still some limitations. First, orthographic learning results were based on reading accuracy only1 . Thus, the finding that children with a surface profile were more accurate in reading regular than irregular words may have been a function of the "decodability" of the words rather than orthographic learning *per se* (e.g., *cat* can be read correctly by phonological decoding, whereas this is not possible for *yacht*). Hence, to determine whether children with surface dyslexia are indeed better at acquiring (and not just decoding) regular orthographic representations than irregular ones, improved measures of orthographic learning with minimal influence from decoding ability are required. Second, selection of the subgroups was based on a relatively lax criterion. Instead of selecting children with a surface profile that were impaired on irregular word reading only, and phonological profile children that were impaired on nonword reading only, Bailey and colleagues based their selection on a discrepancy score between nonword and irregular word reading. Hence, for example, a phonological profile child in their study could have been poor on both nonword and irregular word reading, but with irregular word being relatively better than nonword reading. The design of the study presented here allows us to address these problems.

The first aim of the present study was to further extend our understanding of orthographic learning in children with surface and phonological profiles. By studying orthographic learning in these two subgroups we also aimed to bridge the gap between previous work on orthographic learning (mostly conducted with normal readers) and the extensive literature on subtypes in dyslexia. Building on the studies of Castles and Holmes (1996) and Bailey et al. (2004), we used a more stringent subgroup criterion in selecting participants and developed a novel paradigm to explore orthographic learning. Just like Bailey et al. we included a sample of typical readers as controls so that we could not only compare the performance of the children with different profiles, but also contrast their performance to that of normal readers.

We also included a broader range of measures of orthographic learning than in the previous studies. Given that spelling tasks are often difficult for poor readers, we included an orthographic choice task. Finally, we developed a new learning paradigm that assesses reading accuracy under both untimed and time-limited exposure conditions. Time-limited exposure reading accuracy is interpreted here as a fluency measure of item specific orthographic knowledge, as rapid recognition of words is considered a hallmark of the acquisition of orthographic representations (Yap and van der Leij, 1993; Marinus et al., 2012). This reasoning is similar to the idea that the time that is required to read a word is reduced when a word is read as a whole unit rather than by phonological decoding (e.g., Coltheart, 1983; Ehri, 2005). An additional benefit of this paradigm is that it allowed us to tap orthographic learning by tracking improvement of fluency over learning cycles. Hence, we were able to monitor orthographic learning in a dynamic and ongoing fashion. Finally, just like Bailey and colleagues, we included both regular and irregular words in order to see if we could replicate the regularity effect for children with a surface profile, and the absence of a regularity effect for children with a phonological profile.

In our novel word-learning paradigm, novel letter strings were assigned regular or irregular pronunciations and presented in three learning cycles. After each cycle, we measured reading accuracy under both untimed and time-limited stimulus exposure duration. After the three cycles were completed, traditional spelling and orthographic choice tasks were administered.

<sup>1</sup>Bailey et al. (2004) also used a spelling task to measure orthographic learning, however, due to the fact that accuracy of spelling was at floor, only reading accuracy results were discussed.

The untimed reading condition provided the opportunity for children to decode and build up orthographic representations of the novel words. In order to measure whether orthographic learning had taken place, each untimed reading block was followed by a time-limited exposure block in which items were presented for only 200 ms.

As mentioned earlier, this paradigm not only allows us to explore whether the two groups with contrasting reading profiles differ in their orthographic learning performance, but also to examine whether and to what extent orthographic learning improves with number of learning exposures. Previous studies examining the transition from decoding to rapid word recognition (as measured by increases in reading speed) have found that children with dyslexia need many more exposures to acquire novel word representations than typically developing readers (Reitsma, 1983; Manis, 1985; Ehri and Saltmarsh, 1995). Reitsma (1983, Experiment 3) reported that even six exposures to novel words was not enough to result in any increase in reading speed (taken as an index of orthographic learning) in a group of children with dyslexia. In contrast, in the same experiment, a group of younger readers without reading difficulties showed a steep increase in word reading speed. Note that none of these studies made a distinction between different profiles of reading difficulty. Hence, we used the current paradigm to monitor orthographic learning within two groups of poor readers with contrasting reading profiles.

Using the same paradigm, but with a larger sample of poor readers, we conducted a second study to explore to what extent different reading and language skills predict orthographic learning. For this purpose, we drew on an explicit model of component processes involved in skilled reading, the dual-route model of reading aloud (Coltheart et al., 1993, 2001). The six components of this model include: letter analysis; letter-sound conversion, phonemic buffer, orthographic lexicon, semantics, phonological lexicon. We used regression analyses to investigate the association between these components and orthographic learning. Reading and language skills mapping onto the six different components were used as predictors, and the orthographic learning results were used as outcome measures.

## **STUDY 1**

As outlined in the Introduction, the existence of children with surface and phonological reading profiles challenges the role of phonological decoding in orthographic learning. The aim of Study 1 was to investigate how orthographic learning takes place in poor readers with contrasting reading profiles. Study 1 consisted of two parts. The first part aimed to validate the group membership of the phonological and surface profiles. In order to do this we measured language and reading skills involved in reading processes based on the dual-route model of reading aloud. In the context of the dual route model, we expect the children with a phonological profile to be impaired in the letter-sound knowledge process of the nonlexical route. In contrast, children with a surface profile are thought to be impaired in the lexical route, the orthographic lexicon in particular.

In the second part of Study 1, we used the novel word learning paradigm to investigate orthographic learning of the two profiles. The questions of interest were: (1) Are these children able to learn novel words at all? If so, is their learning rate slower than controls? This part of the study aimed to replicate previous studies suggesting that children with dyslexia are impaired at orthographic learning (Reitsma, 1983, 1989) (2). Will children with a phonological profile, having impaired phonological decoding skills, be less efficient at learning novel words than control children and those with a surface profile? Alternatively, will children with a phonological profile learn novel words faster than children with a surface profile as predicted by their subtype reading profiles? (3) Will children with phonological and surface profiles differ in the size of the regularity effect? Typically developing children have been shown to learn regular words better than irregular words as regular words are more "phonologically decodable" than irregular words (Wang et al., 2012). However, as suggested by Bailey et al. (2004), children with a phonological profile may show no effect of regularity on orthographic learning due to their impaired phonological decoding skill. Instead, they may learn novel words via some kind of rote association between the sound and the form of the novel words bypassing the phonological decoding process. Children with a surface profile, in contrast, may show a normal word regularity effect on orthographic learning as they have average phonological decoding skills.

## **PARTICIPANTS**

Ninety-one poor readers (average age 9.3, range 7.2–12.3) were recruited from schools, clinics or via newspaper advertisements to participate in a reading training study at Macquarie University. Children were included in the study if they scored at least one standard deviation below average for their age on one or both subscales (irregular word and nonword reading) of the Castles and Coltheart 2 test (CC2; Castles et al., 2010a). All poor readers scored within the normal range on non-verbal IQ (Kaufman Brief Intelligence Test, K-Bit; Kaufman and Kaufman, 1990).

From this larger sample we selected two groups of poor readers: one with a surface profile and one with a phonological profile. We will from here refer to them as the "surface group" and the "phonological group." The criteria for a surface profile were performance within the normal range (z-score > −1.00) on nonword reading accuracy and below average performance on irregular word reading (z-score < −1.00, which is equivalent to the bottom 15% of the norms). In addition, to ensure a discrepancy in skills, the z-score difference between nonword and irregular word reading had to be more than 0.5. The same test was administered twice in two sessions that were 8 weeks apart. Only children with consistent reading profiles across the two sessions were included. Nine poor readers fitted our stringent criteria of a surface profile on both testing sessions. Next, we identified children showing consistent profiles of phonological dyslexia (nonword z-score < −1.00, irregular word > −1.00, with a difference of more than 0.5), resulting in a subsample of 22. From this sample we selected nine participants with a phonological profile, matching the surface group in age, IQ and level of impairment on the relevant reading subtest. Finally, we recruited nine agematched typical readers that were participating in reading studies at Macquarie University as controls. The reading accuracy of these controls was within one standard deviation below the average range and 1.5 standard deviation above the average range scores on all three subscales of the CC2 (please see **Table 1** for the characteristics of the three groups).

### **SUBGROUP VALIDATION**

In this first part of Study 1, we validated subgroup membership by examining the language and reading skills of the two groups with contrasting reading profiles. We designed tasks that aimed to tap different components of the dual route model of reading aloud (see **Figure 1**). This model proposes a *lexical* route through which words are directly recognized as whole units and a *nonlexical* route through which words are decoded phonologically.

As can be seen from **Figure 1**, each of these routes consists of a number of processing components, some shared across the routes and some separate. When a reader sees a printed word, the letters will first be recognized in the *letter analysis* component. Then in the nonlexical route, the graphemes of the word are phonologically decoded by the *letter-sound knowledge* component (also referred to as the "grapheme-to-phoneme conversion" component). In the lexical route, the orthography of known words is activated as a whole unit in the *orthographic lexicon*. Subsequently, in the *semantic system*, the meaning of the word is activated and then in the *phonological lexicon* the sound of the word is activated. The final component of the model is the *phonemic buffer* where phonemes are activated and temporarily stored before they are spoken.

### *Measures of reading processes*

Each of the six basic components in the dual-route model was assessed with one test as described in the sections below. Test– retest reliability (Pearson's *r*) is reported for each measure based on scores over two testing sessions that are 8 weeks apart, with a sample of 115 children, aged 7–12 in a larger reading training study (McArthur et al., 2013).

### *Letter analysis*

Letter analysis was measured with a cross-case copying task (McArthur et al., 2013). This task consists of 14 letters, 7 in upper case and 7 in lower case. For lower case letters the child was asked to write down the upper case of the same letters (e.g., t − T), and vice versa for upper case letters. Test–retest reliability, *r* = 0.75.

### *Letter-sound knowledge*

The ability to convert letters or letter strings into sounds was tested with the Letter-Sound Test (LeST, Larsen et al., 2011). Each child was asked to produce the appropriate sound for 51 single-letter and multiletter graphemes. The items were presented on individual flash cards. The graphemes were selected as being consistent, in other words they had the same pronunciation in more than 75% of occurrences of that grapheme according to

**Table 1 | Characteristics and reading processing skills of the control, phonological, and surface groups.**


*The asterisks in the "phonological profile" and "surface profile" columns indicate significant differences compared to control group. \*p* < *0.05, \*\*p* < *0.01,* <sup>+</sup>*p* < *0.1.*

the CELEX database (Baayen et al., 1993). Test–retest reliability, *r* = 0.84.

### *Orthographic lexicon*

Word-specific orthographic knowledge was assessed with the DOOR/DOAR lexical decision test (McArthur et al., 2013). Thirty target words, ranging in frequency from 3 to 625 instances per million words, were selected from the Children's Printed Word Database (CPWD, Masterson et al., 2003). All words were selected to have alternative, homophonic spellings with adjustments of the vowel (e.g., FLAME changed to FLAIM) or a consonant (e.g., CURL changed to KURL). Each item was presented paired with its alternative homophonic spelling (e.g., DOOR and DOAR). The child was asked to circle the correct spelling. Test–retest reliability, *r* = 0.57.

### *Semantics*

Semantic knowledge was measured with the Peabody Picture Vocabulary Test 4 (PPVT-IV, Dunn and Dunn, 2007). For each item the child was presented with four pictures and asked to point to the picture that was named by the tester. The administration of the test was stopped when the child made more than eight errors in a set of 12 items. Scores were standard scores with a mean of 100 and a standard deviation of 15. Test–retest reliability, *r* = 0.84.

## *Phonological lexicon*

The ability to access the phonological lexicon was measured with the Naming subtest of the Assessment of Comprehension and Expression (ACE6–11 test; Adams et al., 2001). The child was asked to name 25 pictures. No stopping rule was applied. Test–retest reliability, *r* = 0.87.

## *Phonemic buffer*

We tested the phonological output buffer with a standardized nonword repetition task, a subtest of the NEuroPSYchology (NEPSY) test (Korkman et al., 1998). In this task, the child was asked to listen to and orally repeat digitally recorded nonwords (e.g., crumsee). Scores were standard scores with a mean of 10 and a standard deviation of 3. Test–retest reliability, *r* = 0.72.

## **RESULTS: SUBGROUP VALIDATION**

**Table 1** presented the performance of the surface, phonological and control groups on the selection measures and the other measures of reading processing skills. The two groups were significantly different on the selection measures: nonword and irregular word reading accuracy. In addition, and as would be predicted, the phonological group performed significantly more poorly on the letter-sound knowledge test and the surface group performed significantly more poorly on the orthographic knowledge test (DOOR/DOAR). In addition, the difference on the nonword repetition test (NEPSY) approached significance (*p* = 0.05), with the surface group appearing to outperform the phonological group. However, as both groups still performed within the normal range on this task, this result is not discussed further. The two groups did not differ on any other measure. The results of the assessment of reading processing skills therefore confirmed that the phonological group had inferior letter-sound knowledge in the nonlexical route and the surface group showed lower proficiency of the orthographic knowledge in the lexical route.

## **ORTHOGRAPHIC LEARNING TASK** *Materials*

This task consisted of eight four to five-letter nonwords (e.g., *vack*), four of which were assigned with regular pronunciations and the other four with irregular pronunciations (please see Supplementary Material). The nonwords were created in the same way as the items used in Wang et al. (2011), but the items are not identical to Wang et al. due to differences in the experimental design. The regular items were pronounced according to a set of typical grapheme-phoneme correspondence rules (Rastle and Coltheart, 1999). "Typical" was defined on the basis that the pronunciation of the vowel occurred in more than 50% of words containing that vowel grapheme in both the CELEX database (Baayen et al., 1993) and the CPWD (Masterson et al., 2003). The irregular nonwords had pronunciations that did not follow typical letter-sound rules: the allocated pronunciation of the vowel in the target word occurred in fewer than 50% of words in the CELEX and the CPWD. All of the irregular pronunciations were nevertheless existing grapheme-phoneme correspondences in English. However, the pronunciations were infrequent and did not occur in the context of the vowels and the final consonants (bodies) of the irregular nonwords that were used in this task. For example, the nonword *cleap* was assigned a pronunciation "claip"; *ea* is pronounced this way in, for example, *great*, *break,* but is always pronounced "ee" when followed by *–p* (e.g., *heap, leap*).

## *Procedure*

Children were tested individually in a quiet room. They learned four regular items followed by four irregular items. For both regular and irregular words, the same procedure was used an initial exposure phase, learning trials and two post-tests (see **Figure 2**).

During the initial exposure phase, the child was first presented with a picture with elves and was told they were going to learn the names of some of these elves. Next, the tester introduced the spoken forms of the four target nonwords ("elves names") to the children (initial exposure). This was necessary in order to expose the children to the pronunciations of the irregular nonwords. After this, the child was seated in front of a computer and the nonwords appeared on the screen one at a time. During the first presentation on the computer screen the tester said: "The name of this elf is \_\_\_\_." The children were not asked to read or repeat the novel words at this point and no accuracy was recorded. After a nonword had been introduced to the child orally and in print in the exposure phase, the first cycle of the learning trials began and reading accuracy was recorded. The four nonwords would appear on the screen one by one in a randomized order, and the child was asked to read them aloud. This was the untimed exposure reading. Feedback was provided regardless of whether the child read the target word correctly or not, to give an equal number of phonological exposures to each word. For example, after each response

was given by the child, the experimenter said, "that's correct, it's a ferb" for correct responses; or "not quite, it is a ferb" for incorrect responses.

All three rounds of untimed reading were followed by a block of time-limited reading (200 ms presentation, with #### as backward masks) of the target words. This set up allows us to obtain an ongoing measure of orthographic learning (i.e., the ability to recognize words instantaneously) after each exposure (i.e., untimed reading with plenty of time to decode the word plus feedback). Again, all target words were presented in random order. This step was introduced to the child as the "speed reading game." One block of untimed reading followed by one block of time-limited exposure duration reading was considered a cycle, and this cycle was repeated three times.

### *Post-test measures*

After the three learning cycles were completed, two post-tests were conducted to measure orthographic learning using both spelling and orthographic choice tasks. For the spelling task, the tester dictated all trained words in a random order. The children were asked to write down the elves' names exactly as they had learned them on the computer. For the orthographic choice task, each target item (e.g., *ferb*) was presented together with its homophonic foil (e.g., *furb*) and two visual distractors (e.g., *ferq*, *furq*) on one A4 sheet of paper. The children were asked to choose the correct spelling of the elf's name that they had learned from those four options. These two tasks were measured immediately after the learning trials and again after an hour to increase assessment reliability and statistical power. Thus, eight was the maximum score across two testing points for the orthographic choice task and spelling task for each word type—regular and irregular.

### **RESULTS: ORTHOGRAPHIC LEARNING**

### *Learning cycles: untimed and time-limited exposure duration measures*

**Table 2** summarizes results of the orthographic learning trials for untimed and time-limited exposure duration reading for the two profiles of poor readers and the controls. We aimed to examine the improvement in learning over cycles for regular and irregular items between the three groups of children with different reading profiles. We ran a repeated measures ANOVA with cycle (1, 2, 3), regularity (regular items, irregular items), and exposure duration (untimed, time-limited) as within-subject factors, and group (phonological profile, surface profile, controls) as a between-subject factor. We specified two planned contrasts on the between-subject factor in order to compare the performance of the three different groups. The first contrast compared the performance of the two poor reader groups with the controls and the second contrast compared the performance of the two poor reader groups.

We found a main effect of cycle, *F*(2, 25) = 19.46, *p* < 0.01, η2 *<sup>p</sup>* = 0.45, but no interaction between cycle and group, *Fs* < 1; nor were any of the higher level interactions between cycle and group with either regularity or/and exposure duration significant (*Fs* < 1). This indicates that across regular and irregular words, untimed and time-limited exposure conditions, all three groups improved over the learning cycles and that the degree of improvement did not differ between the groups.

The main effect of regularity was significant, *F*(1, 24) = 28.39, *p* < 0.01, η<sup>2</sup> *<sup>p</sup>* = 0.54, but the interaction between regularity and group was not, *F*(1, 24) = 1.79, *p* = 0.19. All three groups performed better on regular words than on irregular words. However, the interaction between regularity and exposure duration was significant, *<sup>F</sup>*(1, 24) <sup>=</sup> <sup>5</sup>.48, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.03, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.19. Considering the patterns of means across the conditions, this interaction indicated that for regular words, performance did not differ for untimed and time-limited exposure duration, *t*(1, 26) = 0.77, *p* = 0.45. However, for irregular words, performance was be better under the time-limited condition than under the untimed exposure duration, *t*(1, 26) = −2.23, *p* = 0.03. There was also an interaction between exposure duration and group, *F*(2, 24) = 4.33, *p* = 0.03. The interaction reflected the fact that for the control and surface group, there were no differences between exposure duration [control: *t*(1, 8) = 0.54, *p* = 0.61; surface: *t*(1, 8) = 1.04, *p* = 0.33]; but for the phonological group performance was better in the time-limited condition compared to the untimed condition, *t*(1, 8) = 3.46, *p* < 0.01.

Finally, there was a main effect of group, *F*(2, 24) = 8.56, *p* < 0.01, η<sup>2</sup> *<sup>p</sup>* = 0.42. The first planned contrast (both poor reader groups vs. controls) showed that, across conditions (regular/irregular, untimed/time-limited), the controls performed better than the poor reader groups, *<sup>F</sup>*(1, 24) <sup>=</sup> <sup>17</sup>.12, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01, <sup>η</sup><sup>2</sup> *p* = 0.42. However, there was no difference in overall performance between the two poor reader groups, *Fs* < 1.

It should be noted that the performance of the control group is at ceiling on the regular items at the later cycles and hence did not meet the statistical assumption of equal variance. Therefore, we ran a nonparametric randomization test that does not make


### **Table 2 | Reading accuracy across learning cycles.**

any assumptions about the distribution of the data (Lunneborg, 2001). This randomization test was conducted for the regular as well as the irregular items on the main effect of group. The results confirmed that across cycle and exposure duration conditions, the controls performed better than the two poor reader groups (regular: *p* = 0.03; irregular: *p* < 0.01).

In summary, it was found that all three groups improved over learning cycles, but across learning cycles, the controls performed better than both poor reader groups. Importantly, the performance of the children with phonological and surface profiles did not differ. In addition, all three groups performed better on items with regular pronunciations than those with irregular pronunciations, and there was no difference in this regularity effect between the surface and phonological profiles. However, we need to interpret the results with caution as the controls were at ceiling for the regular items in the untimed reading condition. Lastly, it was found that for irregular words but not regular words, the performance was better under the time-limited exposure duration condition than under the untimed condition, particularly for the phonological group. This can be explained by the fact that the untimed condition provides an opportunity to decode a word, and in the case of irregular items, decoding results in incorrect responses. This result indicated that the timed condition has minimal influence from phonological decoding.

### *After learning cycles: spelling and orthographic choice measures*

**Table 3** summarizes results of the spelling and orthographic choice measures. We ran repeated measures ANOVAs with word regularity (regular items, irregular items) as a within-subject factor and group (phonological profile, surface profile, controls) as a between-subject factor. We specified the same two planned contrasts on the between-subject factor to compare the performance of the three different groups. Analyses were conducted separately for the spelling and orthographic choice tasks.

For the spelling task, the main effect of regularity approached significance, *<sup>F</sup>*(1, 24) <sup>=</sup> <sup>4</sup>.20, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.052, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.15, and there was no interaction between regularity and group (*Fs* < 1). The main **Table 3 | Accuracy on Spelling and Orthographic Choice Measured after the Learning Cycles (with SDs in brackets).**


effect for group was significant, *<sup>F</sup>*(2, 24) <sup>=</sup> <sup>6</sup>.11, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01, <sup>η</sup><sup>2</sup> *p* = 0.34. The planned contrasts showed that this group main effect was reflecting the significantly lower performance of the surface group compared to the controls, *<sup>F</sup>*(1, 24) <sup>=</sup> <sup>10</sup>.44, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01, <sup>η</sup><sup>2</sup> *p* = 0.30, as well as the phonological group, *F*(1, 24) = 7.67, *p* = 0.01, η2 *<sup>p</sup>* = 0.24. The phonological group on the other hand, performed equally as well as the controls, *Fs* < 1.

For the orthographic choice task, the difference between regular and irregular items was not significant, *F*(1, 24) = 2.61, *p* = 0.12, nor was the interaction between word regularity and group, *Fs* < 1. As with spelling performance, there was a main effect of Group, *<sup>F</sup>*(2, 24) <sup>=</sup> <sup>12</sup>.97, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.52. The planned contrasts showed that the surface group was worse on the orthographic choice task compared to the controls, *F*(1, 24) = 19.93, *p* < 0.01, η<sup>2</sup> *<sup>p</sup>* = 0.45, and compared to the phonological group, *<sup>F</sup>*(1, 24) <sup>=</sup> <sup>18</sup>.97, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.44. Again, the phonological group performed at the same level as the controls, *Fs* < 1. An additional analysis confirmed that all three groups performed above chance level (25% accuracy), even the surface group, *t*(1, 8) = 7.54, *p* < 0.01.

In summary, the pattern of results of the spelling task was consistent with that of the orthographic choice task. For both tasks, the phonological group did not perform differently from the controls whereas the surface group was worse than both the phonological group and the control group. However, in contrast to the findings of the learning trials, the children did not perform better on the regular items than on the irregular items for the orthographic choice task and the difference in performance only approached significance for the spelling task.

### **DISCUSSION**

The first question of interest in Study 1 was whether children with reading difficulties are able to learn novel words at all or to learn at the same pace as typically developing readers. We found that, just like the controls, children with both types of reading profile showed learning over the learning cycles as evidenced by untimed and time-limited reading accuracy, and the rate of improvement was no different from controls. This is in contrast with previous studies that have found little or no evidence of orthographic learning in children with dyslexia (Reitsma, 1983, 1989; but also see Staels and van den Broeck, 2013). The difference between the results of the present and previous studies may be explained by the sensitivity of the measure of orthographic learning. In the current study orthographic learning was monitored online during the exposure trials, whereas Reitsma (1983, 1989) found that Dutch poor readers showed no evidence of orthographic learning measured by an orthographic choice task after learning took place. However, we found that although the two dyslexic groups showed evidence of orthographic leaning, their reading accuracy overall was worse than the controls, and this is in line with previous studies using a group of mixed dyslexics (e.g., Manis, 1985; Share and Shalev, 2004).

The second question of interest was to explore the contrasting predictions made by the hypothesis of phonological decoding as the primary factor in orthographic learning vs. that of the children's reading profiles. According to the phonological decoding hypothesis, the phonological group, with their impaired phonological decoding skill, would be predicted to perform less well than the controls and the surface group. In contrast, the subtype profiles of the two groups predict that the phonological group would be better at orthographic learning than the surface group based on their superior sight word reading ability. The results of the present study did not fully support either of the hypotheses but was more consistent with the children's subtype profiles. The performance of the two poor reader groups did not differ on either untimed or time-limited exposure duration reading accuracy. However, the phonological group was found to be no different from the control group and to outperform the surface group on the subsequent spelling and orthographic choice task. The surface group was also significantly worse than the controls on the spelling and orthographic choice tasks. It should be noted that both the control and the phonological group performed close to ceiling on the orthographic choice tasks, hence making the differences between the two groups hard to detect. Nevertheless, the orthographic learning results of these two groups were in line with the selective difficulties in their reading profiles within the framework of the dual route model. The results of Study 1 thus showed that the difference in phonological decoding skill between the two subtype profiles did not directly translate into differences in the ability to acquire novel word representations, which is in contrast with the hypothesis that phonological decoding is the primary determinant of successful orthographic learning.

It is important to note that the surface group did not outperform the phonological group on untimed reading accuracy and did perform worse than the control group. We did not expect this result as we preselected the surface group to have average phonological decoding skills. The finding that the two poor reader groups did not differ in untimed reading accuracy might be explained by the test items that were used. For this task we used only four, regular, four to five letter nonwords. It might be the case that the test was not sensitive enough and/or might not have had enough statistical power to differentiate between the two poor reader groups. In addition, the finding that the surface group was worse at decoding the novel words than the control group could be due to the superior decoding ability of the control group. That is, although the surface group was within the average range on the nonword reading selection measure (*z* = −0.57, *SD* = 0.21), they were still on average worse than controls (*z* = 0.15, *SD* = 0.66), *t*(1, 17) = −2.47, *p* = 0.03.

The last aim of Study 1 was to examine the effect of word regularity on orthographic learning. All groups showed higher reading accuracy when learning regular items than when learning irregular items. This implies that phonological decoding *was* used by both poor reader groups, as well as the typical readers, during the orthographic learning process. Together these results suggest that phonological decoding plays a role in orthographic learning for both subtype groups, yet it is also clear that this skill is not sufficient to fully account for the success of orthographic learning.

If phonological decoding skill cannot fully explain successful orthographic learning then what are the other factors determining this learning process? In order to further explore orthographic learning in poor readers, the second half of this study investigated the predictors of orthographic learning beyond phonological decoding. It has been proposed that poor readers could be relying on alternative learning strategies in order to compensate for poor phonological decoding skills (Stanovich and Siegel, 1994; Siegel et al., 1995; Castles et al., 1999). For example, it is possible that for children who have difficulties with phonological decoding, vocabulary knowledge is relied on more heavily during orthographic learning. In support of this, previous studies have found that when decoding can only be partially successful, in the case of irregular novel words, contextual information and vocabulary knowledge play a role in orthographic learning (Wang et al., 2011, 2013; Duff and Hulme, 2012). Similarly, word meaning has also been found to assist the reading of irregular words (e.g., Nation and Snowling, 1998; Ouellette, 2006; Bowey and Rutherford, 2007; Ricketts et al., 2007; McKay et al., 2008; Nation and Cocksey, 2009).

In addition to phonological decoding skills and vocabulary knowledge, pre-existing orthographic knowledge has also been considered an important factor in orthographic learning (Cunningham et al., 2002; Conners et al., 2010). Share (1995)suggested that although phonological decoding is the primary component of orthographic learning, the secondary, orthographic component determines how quickly and accurately orthographic representations are acquired.

In sum, we aimed to develop a more detailed picture of the reading processes associated with orthographic learning by exploring the strengths of the relationship between different reading and language skills and orthographic learning of regular and irregular words.

## **STUDY 2**

For the purpose of exploring how well different skills involved in reading predict orthographic learning of regular and irregular words within a larger group of poor readers, we again drew on the language and reading skills in the dual route model as we did for subgroup validation in Study 1 (see **Figure 1** earlier).

As noted earlier, phonological decoding is often assumed to be the key to orthographic learning. Based on this hypothesis, we predicted that skills reflecting nonlexical processing (phonological decoding in particular) would be important for orthographic learning of both regular words and irregular words. However, when accurate decoding is compromised or not possible (such as when phonological decoding skill is impaired or when words are irregular), knowledge of semantics, phonology or orthography may become more important.

### **METHODS**

### *Participants*

The same cohort of 91 poor readers screened in Study 1 participated in Study 2. As mentioned in Study 1, these children scored at least one standard deviation below average for their age on one or more of the two subscales (irregular word and nonword reading) of the Castles and Coltheart 2 word reading test (CC2; Castles et al., 2010a). On average, the children scored −1.59 (*SD* = 0.65) on nonword reading; and −1.40 (*SD* = 0.67) on irregular word reading.

### *Materials and procedure*

We assessed the poor readers on tasks tapping the six basic components of the dual-route model. In addition, all children completed the same orthographic learning task described in Study 1. In order to increase statistical power for the analyses used in Study 2, we created another set of nonword stimuli, consisting of an additional four regular and four irregular items. The extra set of items was created in the same way as described in Study 1. The same procedure of the orthographic learning task was applied in a separate session 8 weeks after the first set of nonwords were learnt. Each child was tested individually in a quiet room, and the children took approximately 100–120 min to complete all the assessments. The results of the six reading and language skill measures were used as predictors, and the orthographic learning performances were used as outcome measures.

### **RESULTS**

To investigate how well each reading subcomponent predicts orthographic learning, a set of correlations, followed by stepwise multiple regressions, was conducted with the dual route processing components as predictors, and the various orthographic learning measures as the dependent variables. Regressions were carried out in addition to correlations as the predictor tasks are themselves intercorrelated, in order to identify the relationship between these factors and orthographic learning outcomes when the intercorrelations between the variables are controlled.

**Table 4** shows the results of a series of correlations and partial correlations controlling for age and non-verbal IQ between the outcomes of the orthographic learning task (outcome measures: no. 1–8) and the components involved in lexical, nonlexical, and both routes (predictors: no. 9–14). Before the effects of age and non-verbal IQ were partialled out, all of the components involved in the reading routes correlated with almost all measures of orthographic learning. After controlling for age and IQ, the main difference is that the associations between orthographic learning measures and semantic knowledge (PPVT), and phonological lexicon functioning (ACE) were no longer significant.

The results of the regression analyses are summarized in **Table 5**. In the first step, age and non-verbal IQ were entered, followed by all the other potential predictor variables at step 2. Overall, letter-sound knowledge and orthographic knowledge seemed to be the best predictors of orthographic learning. Untimed exposure duration reading accuracy was predicted by letter-sound knowledge and phonemic buffer efficiency. Timelimited exposure duration reading accuracy was predicted by letter-sound knowledge, phonemic buffer efficiency, and wordspecific orthographic knowledge (functioning of orthographic lexicon). Spelling accuracy was predicted by letter-sound knowledge and orthographic knowledge, whereas orthographic choice accuracy was only predicted by orthographic knowledge. For irregular items, orthographic knowledge was the only significant predictor for all measures except untimed exposure duration reading accuracy, which was predicted by both orthographic knowledge and letter-sound knowledge.

It should be noted that although it seemed that letter-sound knowledge was a better predictor of performance for the regular items than the irregular items, and that orthographic knowledge was a better predictor for irregular items than regular items, these correlational differences between regular and irregular items did not reach significance. However, across the two predictors, orthographic knowledge was a significantly better predictor than letter-sound knowledge for scores on spelling and orthographic choice measures, regardless of word regularity (regular spelling: *z* = −2.17, *p* = 0.03; irregular spelling: *z* = −2.30, *p* = 0.02; regular orthographic choice: *z* = −2.82, *p* < 0.01; irregular orthographic choice: *z* = −3.67, *p* < 0.01).

### **DISCUSSION**

The results showed that letter-sound knowledge predicted the outcomes of all measures assessing regular word learning except for orthographic choice. Letter-sound knowledge also predicted the untimed reading accuracy during irregular word learning, but it did not predict any other measures assessing irregular word learning. The ability to repeat nonwords was used as an index of phonemic buffer proficiency, and performance on this task predicted orthographic learning of regular words when learning was measured by accuracy in reading aloud (both timed and untimed).

Our measure of orthographic knowledge predicted success on our dynamic measures of orthographic learning of both regular and irregular words except for untimed reading accuracy of


**Table 4 | Correlations of the outcome measures and the predictors with (below the diagonal line) and without (above the diagonal line) age and nonverbal IQ controlled.**

*Predictors 9–14 refer to the six reading components tested by: Cross-case copying (Letter Analysis, 9); Letter-sound test (Letter-sound knowledge; 10); Door/Doar Lexical Decision (Orthographic lexicon, 11); PPVT (Semantic knowledge, 12); ACE (Phonological lexicon, 13); NEPSY (Phonemic buffer, 14); Age (15); Non-verbal IQ (K-Bit, 16).*

*Values represent Pearson's correlation coefficients; numbers in bold indicate p* < *0.05, 2-tailed.*

regular words. The fact that orthographic knowledge predicted time-limited but not untimed accuracy in reading the regular items is interesting as it suggests that, when rapid and fluent access to the orthographic representation is required, orthographic knowledge may play a more important role than when reading is untimed. In addition, orthographic knowledge was a better predictor than letter-sound knowledge when orthographic learning was measured by spelling and orthographic choice tasks. Finally, in contrast to the prediction that poor readers may use alternative skills such as vocabulary knowledge when learning to read, better functioning of the semantic system and/or phonological lexicon did not predict better orthographic learning.

## **GENERAL DISCUSSION**

In this paper we examined orthographic learning in poor readers. Study 1 focused on orthographic learning of regular and irregular novel words in children with surface and phonological reading profiles. We developed a novel paradigm to track orthographic learning online. Participants were first asked to read the presented novel words untimed, then the items were presented under time-limited exposure duration of 200 ms. This cycle was repeated three times and followed by spelling and orthographic choice tasks. This set up allowed us to track orthographic learning more dynamically (i.e., untimed and time-limited reading accuracy) than traditional measures (such as spelling and orthographic choice), that typically take place after learning has taken place. With our novel and traditional measures of orthographic learning, we aimed to examine the role of phonological decoding in orthographic learning.

More specifically, we wanted to investigate whether phonological decoding is primary in orthographic learning as is widely proposed (e.g., Brady and Shankweiler, 1991; Byrne, 1992, 1998; Share, 1995). In this context, the orthographic learning of two subgroups is particularly interesting: children with specific difficulties with phonological decoding (a phonological profile) and those with specific difficulties in orthographic knowledge (a surface profile). If phonological decoding is indeed primary to orthographic learning, this should result in poorer orthographic learning in children with a phonological profile and normal orthographic learning in children with a surface profile. In addition to our measures of orthographic learning, we also examined the degree to which the different poor reader groups relied on phonological decoding by comparing the difference in their performance on regular and irregular word learning.

We found that children with phonological and surface profiles showed the same amount of orthographic learning on the dynamic measures (scores on untimed and time-limited trials). However, orthographic learning was still less efficient overall compared to that of the age-matched controls. This finding is consistent with previous studies suggesting that poor readers take longer to learn to read novel words (e.g., Manis, 1985; Share and Shalev, 2004). The results of our study add evidence that this less efficient orthographic learning is already apparent during online learning trials. The finding that children with a surface profile have superior phonological decoding ability but did not outperform children with a phonological profile seems to be inconsistent with the view of phonological decoding as the primary factor for orthographic learning. However, a key feature of the self-teaching hypothesis can also explain this finding. According to this hypothesis, orthographic learning is item based. Consequently, what is relevant to the success of orthographic learning is the correct decoding of the items to be learnt rather than one's


**Table 5 | Summary of regression results predicting orthographic learning of regular and irregular words from lexical and nonlexical processing components.**

*\*p* < *0.05.*

phonological decoding ability in general. On the untimed trials in our dynamic measure, the children with a surface profile did not decode the words better than the children with a phonological profile. In other words: they did not show superior decoding skills on this task to start with. Hence, in this regard it is not surprising that they did not do better than the children with phonological dyslexia on the time-limited exposure duration trials.

Our findings on the traditional measures (spelling and orthographic choice) painted a different picture. Here the children with a surface profile performed more poorly than both the children with a phonological profile and the controls, which is consistent with what was found by Castles and Holmes (1996). This result is also consistent with the prediction based on the two groups' reading difficulties within the framework of the dual route model: the phonological group had normal sight word reading ability and the surface group had impaired sight word reading ability. Moreover, the children with a phonological profile performed as well as the controls despite their poorer performance on reading accuracy during the learning trials. This imbalance between decoding performance and orthographic learning results suggests again that orthographic learning ability cannot be explained by phonological decoding ability alone.

One possible explanation for the inconsistent performance of the phonological group across dynamic and traditional measures is that different task demands are imposed by the different orthographic learning tasks used for this study. More specifically, the dynamic measures (untimed and time-limited reading aloud) required verbal output whereas the traditional measures (spelling and orthographic choice) did not. As phonological impairment in reading is often associated with deficit in verbal output (Hulme and Snowling, 1992; Szenkovits and Ramus, 2005), it is not surprising that the phonological group performed worse than typical readers on measures requiring verbal output compared to those not requiring verbal output. This explanation is also consistent with findings from Study 2 (results will be discussed in more detail later), where performance on dynamic measures was more strongly associated with letter-sound knowledge and phonemic buffer functioning than was performance on traditional measures.

The role of phonological decoding in orthographic learning was also examined by manipulating word regularity. We found word regularity effects for the dynamic measures (untimed and time-limited reading) but not for the traditional post-test measures (spelling and orthographic choice). Moreover, these effects were the same for all groups. The regularity effect found in dynamic measures suggests that phonological decoding does play a role during the learning process, even for children with a phonological profile. However, the word regularity effect was not significant for any of the post-test measures. The absence of a regularity effect for spelling and orthographic choice is in line with outcomes of studies examining regularity effects in word recognition. These studies showed that regularity effects are restricted to tasks involving reading aloud and typically not found in word recognition tasks, such as lexical decision (e.g., Coltheart et al., 1979; Seidenberg et al., 1984; Schmalz et al., 2013; but see Parkin, 1982).

Study 2 explored to what degree the different skills that underlie reading predicted orthographic learning of regular and irregular words in poor readers. We selected the underlying component skills from the dual-route model of reading as predictors, and as outcome variables we used the dynamic and traditional orthographic learning measures. According to the view that phonological decoding is the primary factor to successful orthographic learning, we expected that skills reflecting nonlexical processing (letter-sound knowledge in particular) would be stronger predictors of orthographic learning of both regular and irregular words than skills reflecting lexical processing. We found that lettersound knowledge is indeed a strong predictor of orthographic learning.

However, we did not find letter-sound knowledge to be a stronger predictor of orthographic learning than skills reflecting lexical processing. In fact, we found that orthographic knowledge, a lexical processing factor, was a good predictor of both regular and irregular word learning, and an even better predictor than letter-sound knowledge for spelling and orthographic choice. The association between orthographic knowledge and orthographic learning appeared to be particularly evident when the measure of learning directly tapped word-specific (spelling and orthographic choice) and fluent (timed-limited reading) access of orthographic representations.

Thus, both Study 1 and 2 showed that orthographic knowledge was associated with success in orthographic learning. In Study 1, we found that children with average phonological decoding skill but good orthographic knowledge showed normal orthographic learning on the traditional learning measures. In contrast, children with an opposite reading profile—impaired orthographic knowledge but good phonological decoding skill showed impaired orthographic learning. In Study 2, we found that orthographic knowledge significantly predicted orthographic learning even after phonological decoding skills were controlled for. Orthographic knowledge also appeared to be a stronger predictor than phonological decoding skill when orthographic learning was measured by traditional measures (spelling and orthographic choice). Together these findings suggest that orthographic knowledge is not only important in orthographic learning, but also that having impaired orthographic knowledge could be more detrimental than having impaired phonological decoding skill when learning new words. The importance of orthographic knowledge in orthographic learning has also been reported in previous studies with typically developing readers (Cunningham, 2006; Conners et al., 2010). Similarly, the self-teaching hypothesis suggests that, although phonological decoding provides the opportunity for orthographic learning to take place, orthographic knowledge is the secondary factor required for successful orthographic learning (Share, 1995, 2011). Our results support the view that orthographic knowledge is important in orthographic learning, but challenge the view that orthographic knowledge is a "secondary" factor. It seems that orthographic knowledge may actually be equally important or even more important than phonological decoding in building up orthographic representations.

However, it must be considered in this context that there are two alternative explanations as to why orthographic knowledge might be a significant predictor of orthographic learning. First, the way orthographic knowledge was measured in this study could be seen as a measure of the children's historic ability to acquire orthographic representations. Hence, the relationship between orthographic knowledge and orthographic learning could simply be that both are a reflection of the children's ability to acquire lexical representations: children with better abilities to acquire orthographic representations will be better at both the task tapping the orthographic lexicon (as a result of past orthographic learning) and at our orthographic learning task (current orthographic learning). Second, it could be that existing orthographic representations contribute to the actual process of acquiring new representations with children using this knowledge during the learning process. This might occur by using analogies of known words or utilizing familiarity of orthographic patterns. Further research is required to investigate how exactly existing orthographic knowledge assists orthographic learning of novel words.

As mentioned earlier, one might expect that for poor readers, the success of orthographic learning might rely on alternative skills, such as semantics, compensating for poor phonological decoding skills. In contrast to this prediction, this study did not find any evidence that pre-existing semantic and phonological knowledge (measured by PPVT-IV and ACE6-11) predicted orthographic learning of regular or irregular words. However, the orthographic learning paradigm in this study did not provide word-specific vocabulary (semantic and phonological) knowledge for the novel words. Hence, there was little opportunity for the children to use such skills. It would be interesting for future studies to explore whether word-specific vocabulary knowledge affects orthographic learning in poor readers. Further investigation using a design that provides vocabulary knowledge of the novel words prior to written exposure, such as that used in Wang et al. (2011, 2013), is needed.

There are a number of limitations of the present study that require further consideration. First, due to the manipulation of word regularity, the items were read to the children as an initial exposure before the learning cycles started, and during the learning cycles feedback was provided. Consequently, although after the initial exposure the children were asked to first read the target words by themselves to simulate a partial self-teaching paradigm, it was not an independent learning environment. Therefore, it is possible that the results would have been different had the children learned the words without the experimenter's input. For example, the children with a phonological dyslexia profile may have benefited from the input and feedback to compensate for their poor decoding of the regular items, resulting in their untimed and time-limited reading accuracy not being different from the children with a surface dyslexia profile. Although providing feedback is still ecologically valid, as children often receive feedback when learning to read, particularly with irregular words, we cannot interpret the results in the context of pure self-teaching.

Second, it is possible that factors beyond phonological decoding and orthographic knowledge influenced the pattern of impaired orthographic learning in surface dyslexia and normal orthographic learning in phonological dyslexia. According to the self-teaching hypothesis (Share, 1995), phonological decoding draws the reader's attention to the order and identity of the letters in the word, and produces the phonology. This then allows bonding to occur between the phonology and the orthography via some kind of associative learning procedure. It is possible that differences in the ability to establish associations between phonology and orthography are the source of the difference in orthographic learning skill between the two subtypes of dyslexia. In studies that did not make distinctions between subtypes, children with dyslexia were found to have difficulties in learning paired associations (Gascon and Goodglass, 1970; Vellutino et al., 1975; Messbauer and de Jong, 2003; Litt et al., 2013). In addition, other abilities may also contribute to individual differences in learning to read, such as mapping the output of lettersound correspondences to existing phonology of a word (e.g., *was:* from/w..aa..ss/. . . to /woz/; Elbro et al., 2012), and capitalizing contextual and syntactic information (Tunmer et al., 1987; Tunmer and Chapman, 2004). Future studies are required to further investigate orthographic learning in children with subtypes of reading profiles.

In sum, Study 1 used a novel paradigm that allowed us to explore the role of phonological decoding and track orthographic learning in two groups of poor readers who had contrasting reading impairments. The two poor reader groups showed orthographic learning patterns that were consistent with their reading profiles, which suggested that phonological decoding skill alone is insufficient for acquiring orthographic representations. Study 2 was the first to break down components of reading processes based on the dual route model of reading and to use these components to explore factors associated with orthographic learning. The results of this study indicated that, in addition to phonological decoding (letter-sound knowledge), prior orthographic knowledge also predicted the success of orthographic learning. Together, the outcomes of the two studies suggest that phonological decoding plays a role in orthographic learning of both regular and irregular words, and for children with and without phonological decoding difficulties. Orthographic knowledge was also found to be important in orthographic learning, especially when the measures of learning directly tapped word-specific and fluent access to orthographic representations.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00468/abstract

### **REFERENCES**


Kaufman, A., and Kaufman, N. (1990). *Kaufman Brief Intelligence Test*. Circle Pines, MN: American Guidance Service, Inc.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 December 2013; accepted: 08 June 2014; published online: 04 July 2014. Citation: Wang H-C, Marinus E, Nickels L and Castles A (2014) Tracking orthographic learning in children with different profiles of reading difficulty. Front. Hum. Neurosci. 8:468. doi: 10.3389/fnhum.2014.00468*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Wang, Marinus, Nickels and Castles. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The contribution of discrete-trial naming and visual recognition to rapid automatized naming deficits of dyslexic children with and without a history of language delay

## **Filippo Gasperini \*, Daniela Brizzolara , Paola Cristofani , Claudia Casalini and Anna Maria Chilosi**

Department of Developmental Neuroscience, IRCCS Fondazione Stella Maris, Pisa, Italy

### **Edited by:**

Pierluigi Zoccolotti, Sapienza University of Rome, Italy

### **Reviewed by:**

Marialuisa Martelli, Sapienza University of Rome, Italy Luís Faísca, Universidade do Algarve, Portugal

### **\*Correspondence:**

Filippo Gasperini, Department of Developmental Neuroscience, IRCCS Fondazione Stella Maris, Viale del Tirreno 331, 56018 Calambrone, Pisa, Italy e-mail: fgasperini@fsm.unipi.it

Children with Developmental Dyslexia (DD) are impaired in Rapid Automatized Naming (RAN) tasks, where subjects are asked to name arrays of high frequency items as quickly as possible. However the reasons why RAN speed discriminates DD from typical readers are not yet fully understood. Our study was aimed to identify some of the cognitive mechanisms underlying RAN-reading relationship by comparing one group of 32 children with DD with an age-matched control group of typical readers on a naming and a visual recognition task both using a discrete-trial methodology, in addition to a serial RAN task, all using the same stimuli (digits and colors). Results showed a significant slowness of DD children in both serial and discrete-trial naming (DN) tasks regardless of type of stimulus, but no difference between the two groups on the discrete-trial recognition task. Significant differences between DD and control participants in the RAN task disappeared when performance in the DN task was partialled out by covariance analysis for colors, but not for digits. The same pattern held in a subgroup of DD subjects with a history of early language delay (LD). By contrast, in a subsample of DD children without LD the RAN deficit was specific for digits and disappeared after slowness in DN was partialled out. Slowness in DN was more evident for LD than for noLD DD children. Overall, our results confirm previous evidence indicating a name-retrieval deficit as a cognitive impairment underlying RAN slowness in DD children. This deficit seems to be more marked in DD children with previous LD. Moreover, additional cognitive deficits specifically associated with serial RAN tasks have to be taken into account when explaining deficient RAN speed of these latter children. We suggest that partially different cognitive dysfunctions underpin superficially similar RAN impairments in different subgroups of DD subjects.

**Keywords: dyslexia, RAN, discrete-trial naming, discrete-trial recognition, language delay**

## **INTRODUCTION**

One of the most robust research findings on the cognitive bases of developmental reading disorders (also known as Developmental Dyslexia, DD) is a deficit of children with DD on rapid serial naming tasks (for reviews see Wolf and Bowers, 1999 and Kirby et al., 2010). The most commonly used measure of this is the rapid automatized naming (RAN) task, in which subjects are presented with arrays of high frequency items (letters, digits, colors, or objects) and asked to name them as quickly as possible. Usually, children with DD perform this task accurately but slowly.

RAN speed deficits in children with DD have been first demonstrated in English speaking readers. However, since early Englishbased research in the 1970s and 1980s, a strong relation between RAN speed measures and reading acquisition has been documented in a wide array of languages, with both inconsistent (e.g., French: Plaza and Cohen, 2003) and consistent (e.g., German: Wimmer, 1993; Dutch: de Jong and van der Leij, 1999; Finnish: Holopainen et al., 2001; Italian: Di Filippo et al., 2005) alphabetic orthographies.

Despite this evidence, we are currently some way from obtaining a complete understanding of the reasons why RAN performance is related to reading.

Research aiming to reveal the nature of this relationship dates back to the early 80s (for a review see Bowers and Swanson, 1991); from then, many studies have investigated the issue, but evidence is mostly correlational (for reviews see Wolf et al., 2000 and Kirby et al., 2010). Only in the last decade there have been a few studies which experimentally manipulated factors that may account for RAN-reading relationship (Neuhaus and Swank, 2002; Jones et al., 2009; Georgiou et al., 2013; Zoccolotti et al., 2013). Although a consensus begins to emerge for some possible underlying mechanisms, the debate is still largely open for some others.

Substantial agreement exists that post-lexical access articulatory factors do not account for differences on RAN speed among children with different levels of reading ability. For example, when total time to complete RAN tasks is experimentally segregated into a *pause time—*namely the duration of pauses between items in sequenced articulations—and *articulation time*—namely the time to articulate each item pause time and not articulation time significantly predicts reading ability, in both normal and DD readers (Georgiou et al., 2006; Araújo et al., 2011b). Hence, these results point to the inter-item processing—namely the processes intercurring from attentional disengaging from one stimulus to name retrieval for the next one—as the *locus* that drives RAN-reading association.

According to one influential view, RAN should be considered as part of a larger phonological construct together with phonological awareness (PA) and phonological short-term memory (PSTM), in that it primary reflects the rate of access to and retrieval of stored phonological information in long-term memory (Wagner et al., 1993; Pennington et al., 2001; Chiappe et al., 2002). Evidence supporting a role of phonological lexical access in mediating the RAN-reading relationship comes from studies using a discrete-trial methodology. These studies have often found that latency to name singularly presented highly familiar stimuli (the same used in the serial RAN tasks) is significantly correlated with reading measures, in both samples of children unselected for reading skill (Levy and Hinchley, 1990) and children with DD (Bowers and Swanson, 1991; Fawcett and Nicolson, 1994; Chiappe et al., 2002; Jones et al., 2009).

However, not all authors consider RAN performance as just another instance of phonological processing ability together with PA and PSTM. For example, Wolf and Bowers (1999) emphasize the multi-componential nature of serial RAN tasks, placing stronger emphasis on the efficiency with which multiple processes (attentional, visual, phonological, semantic and articulatory) are integrated through precise timing mechanisms, which in turn calls into question general processing speed. As a consequence, Wolf and Bowers (1999) state that phonological processes—which they index essentially through PA—and mechanisms underlying RAN performance represent two independent sources of variability in reading ability.

A good deal of evidence supports Wolf and Bowers' position. For example, RAN and PA are not strongly correlated (Swanson et al., 2003). Moreover, many cross-sectional and longitudinal studies have found that RAN speed contributes uniquely to the variance in reading ability when PA is controlled (Torgesen et al., 1997; Parrila et al., 2004; Kirby et al., 2009). However, PA does not end all of phonological processing; in their empirical model of phonological processing, Wagner et al. (1993) identified three significantly correlated but distinct factors: PA (blending and segmenting sounds from words), PSTM (digit span), and lexical access (naming speed). Therefore, the fact that RAN speed and PA are only modestly correlated and each predicts a unique proportion of variance in reading is not in conflict with the hypothesis that phonological processing abilities, in particular access to and retrieval of phonological codes from long-term memory, contribute substantially to RAN-reading relationship.

Other possible cognitive mechanisms underlying RAN speed and mediating its relationship with reading include early visual processes. RAN efficiency might reflect both the initial ability to visually analyze the stimuli's constituent features and the subsequent proficiency to integrate visual pattern information with stored representations. The role of visual processing in RANreading relationship has so far received minor attention and remains ill understood. Some indirect evidence seems consistent with the hypothesis of a contribution of low-level visual factors to RAN performance and its association with reading (Stainthorp et al., 2010; Araújo et al., 2011a). Using a visual naming task Araújo et al. (2011a) manipulated some variables related to early stages of visual processing of objects and found that, in contrast to control readers, naming performance of DD subjects did not improve with color or 3 dimensional object presentation compared with black-and-white or 2-dimensional object presentation respectively. These results lead the authors to state that "*processes involved in early visual feature analysis or in the integration of visual information stored in long-term memory might be affected in dyslexia*" (p. 224). However, directly comparing the contribution of a RAN task and a visual search task using the same stimulus materials, Di Filippo et al. (2006) found that the disadvantage of DD children with respect to controls on RAN tasks remained unchanged when the visual search performance was partialled out by covariance analysis. These latter findings are at odds with a significant role of early visual processing in driving RAN-reading relationship.

Further explanations of RAN-reading relationship arise from experimental evidence that serial RAN tasks are usually more closely correlated with reading than discrete-trial RAN paradigms (Bowers, 1995; Pennington et al., 2001; Jones et al., 2009; Logan et al., 2011). One interpretation emphasizes the importance of sequential requirements of RAN tasks as a way of explaining their relation with reading, as it is reasonable that speeded naming of multiple items in a matrix format requires both inhibition of previous (already named) stimuli and efficient processing of upcoming items (Jones et al., 2009). Actually, recent findings indicate that both foveal and peripheral processing occur while performing RAN tasks as well as in reading. Recent experimental evidence indicates that children with DD have difficulty not only distinguishing between multiple competing phonological representations at foveal stages of processing (when a verbal response is required), but also between multiple activated orthographic codes during parafoveal processing (Jones et al., 2013). More generally, RAN and reading could be related because several items are visible at once in both tasks, allowing subjects to pre-process some items in parafovea. Studies on text reading have shown that for typical readers the availability of upcoming words in the parafovea increases the speed with which the text is read (Sereno and Rayner, 2000); however, parafoveal information may in fact act as a source of interference for children with DD, in both reading (Chace et al., 2005; Pernet et al., 2006) and RAN tasks (Jones et al., 2009).

Overall, despite a growing number of studies in the last three decades, cognitive mechanisms underlying RAN-reading relationship are still a matter of considerable debate. It should also be considered that processes mediating RAN-reading relationship may at least partially change with reading development, when a progressive shift occurs from a serial grapheme-phoneme decoding strategy toward a parallel "sight-word" reading. Recent data on Greek speaking children (Protopapas et al., 2013) support this hypothesis by showing that the amount of common variance between serial RAN and reading was mostly explained by discrete-trial naming (DN) in 2nd graders, while a stronger contribution from sequential processing was evident among 6th graders. Based on these results, it is plausible to expect that processes driving RAN-reading relationship may also be different between normally developing and DD readers.

The general aim of the present study was to give a contribution in identifying cognitive mechanisms underlying RAN-reading relationship in children learning Italian, a language characterized by a transparent orthography. In a previous study, we obtained evidence that RAN and not phonological processing abilities (as assessed through PA and PSTM tasks) may represent the main cognitive marker of DD in Italian children (Brizzolara et al., 2006). This evidence is consistent with a growing body of studies in which RAN speed has been shown to be a strong predictor of reading ability in orthographically transparent languages, both in sample of children unselected for reading ability and in children with reading disability (e.g., for Finnish: Holopainen et al., 2001; for German: Landerl, 2001; for Dutch: de Jong and van der Leij, 2003; for Greek: Papadopoulos et al., 2009). Because of the relevance of cognitive processes underlying RAN speed for reading ability in transparent orthographies, a deeper understanding of mechanisms driving RAN-reading relationship seems of particular relevance.

In the present study, we focused on some of the cognitive processes underpinning RAN speed which may differentiate between children with DD and average readers, in particular, visual recognition of single items, lexical access and sequential processing of multiple items.

Firstly, we compared a group of DD children with an agematched control group of typical readers on a DN paradigm in which items were digits or colors.

Secondly, children with DD and their controls were contrasted on a motor choice-reaction time task using the same stimuli as in the DN, in which subjects had to discriminate between a target stimulus and four distracters. Such experimental manipulation at the output stage of DN removes much if not all the nameretrieval component, allowing a direct test for the role of early visual processes involved in naming of isolated items. To our knowledge this is the first attempt to investigate the potential contribution of early visual factors to RAN-reading relationship using the same material as the serial RAN task, with specific focus on single-item processing level. Di Filippo et al. (2005, 2006) had previously made a similar experimental manipulation, using a visual search task with the same stimulus material as RAN; however, in Di Filippo et al.'s studies control on early visual processing was more wide-ranging (including scanning of the stimuli) and less specifically focused on single stimulus identification.

At the same time, investigating both DN and recognition in the same subjects using the same stimulus material allows testing the role of lexical access in differentiating RAN performance of DD and average readers, once potential differences at the stage of visual identification have been removed.

A further step of our study was to compare children with DD and controls on a serial RAN paradigm using the same stimuli as in the other two tasks, to verify if the expected differences in serial RAN survived after statistical control for the possible significant effects of both DN and discrete recognition (DR). If serial RAN continues to discriminate between children with DD and controls, then a further contribution to RAN-reading relationship has to be probably found in cognitive factors outside single-item processing (such as sequencing of multiple items and/or higher demands on precise timing mechanisms posed by serial RAN).

In our study children with DD were further assigned to two sub-groups according to whether or not they had a history of early language delay (LD) as determined retrospectively by parental report. In previous work we demonstrated a somewhat different cognitive profile in Italian DD children with and without a history of LD (Brizzolara et al., 2006; Chilosi et al., 2009; Pecini et al., 2011). In fact, while both groups shared a comparable RAN speed deficit, only DD children with a previous LD showed a moderate but widespread verbal deficit, extending from phonological processing (PA and PSTM) to other components of linguistic processing (lexical phonology, semantics and syntax). However, as the classification of LD in children with DD is based on a retrospective criterion (i.e., based on parents' report), in the present study a set of standardized behavioral and cognitive tests was also administered. These included measures of sub-lexical and lexical reading, written text comprehension, PSTM, verbal semantic knowledge and vocabulary. We expected to find lower scores among DD children with LD than among DD children without LD (LD-DD and noLD-DD, respectively) in all the oral verbal measures and in the test of reading comprehension, consistently with previous evidence indicating a moderate but widespread verbal deficit (i.e., not limited to phonological processing) among LD-DD children that does not affect noLD-DD subjects (Chilosi et al., 2009).

There is now mounting evidence that DD is an heterogeneous neurobiological condition (Eckert, 2004; Jednoróg et al., 2013) associated with multiple impairments in different cognitive domains (Bosse et al., 2007; Menghini et al., 2010), including phonological processing (Vellutino et al., 2004), early visual analyses (Stein, 2001; Martelli et al., 2009), skills automatization (Nicolson et al., 2001) and visual-spatial attention (Hari and Renvall, 2001; Franceschini et al., 2012). As a consequence, it would not be surprising if different cognitive deficits underpin impaired RAN performance in subgroups of DD children with distinct neurocognitive phenotypes.

On the basis of the reviewed evidence, we expected that children with DD considered as a group would be significantly slower than chronological age-matched controls on both serial RAN and DN tasks. We also expected that differences between DD and control groups on serial RAN tasks survived statistical control for differences in DN tasks. Both expectations seemed plausible in children learning a transparent orthography, as Italian, in which DD has been mainly characterized by a reading fluency deficit in the face of rather accurate decoding (for a review see Wimmer and Schurz, 2010; see also Zoccolotti et al., 1999 for evidence in Italian). The characteristic reading speed deficit of these children might be, in fact, equally well explained by a deficient orthography to phonology mapping at both sub-lexical and lexical level of reading, indexed by impaired DN (de Jong, 2011), as well as by a reduced ability to simultaneously process multiple adjacent items in the written text, tapped by the unique contribution of serial RAN to reading difficulties (Protopapas et al., 2013).

Predictions about DR tasks in our study were more open due to the variable results reported in the literature.

Hypotheses on possible different cognitive mechanisms underlying RAN-reading relationship in LD-DD and noLD-DD were more speculative, as to our knowledge this is the first study to address such topic. One possibility is that impaired lexical access is particularly relevant for RAN speed deficits in LD-DD children. Although name-retrieval deficits have often been described in both DD subjects without apparent previous or concurrent language difficulties (Wolf and Obregon, 1992; Swan and Goswami, 1997; Hanly and Vanderberg, 2010) and in children with specific oral language difficulties (also known as Specific Language Impairment, SLI; Kail and Leonard, 1986; Befi-Lopes et al., 2010; Coady, 2013), word finding difficulties might be more pronounced in LD-DD children in comparison with noLD-DD children: the two groups might share a common phonological lexical access deficit, while only the former would show an additional semantic deficit. Indeed, evidence exists for both a phonological (Coady, 2013) and a semantic (Kail and Leonard, 1986; Befi-Lopes et al., 2010) account of naming difficulties in children with oral language difficulties.

If impaired lexical access is particularly relevant for RAN speed deficits in LD-DD children, then slower performance on the DN tasks should be more evident in these subjects than in noLD-DD children. However, this might be true for the color more than for the digit condition. In fact, conceptual activation would mediate mainly color naming (Heurley et al., 2013), while more direct arbitrary mappings from visual stimuli to phonological labels occur for digits (Manis et al., 1999). Possibly, the unique contribution of serial RAN tasks over the one played by DN tasks would be smaller in LD-DD than in noLD-DD subjects.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Informed consent was obtained from all the parents of participants in compliance with the Helsinki Declaration.

Participants with DD were selected on the basis of consecutive referrals to the Division of Child Neurology and Psychiatry of the University of Pisa from May 2009 to October 2010 for suspected specific learning disabilities. Criteria for inclusion in the DD sample were the following:


Thirty two (18 males and 14 females) children fulfilled these criteria. The mean age for children with DD was 11 years and 2 months, with a standard deviation (SD) of 1 year and 9 months. The youngest children were second graders and the oldest were eighth graders. More specifically, 15 children attended Primary school (from second grade to fifth grade) while 17 children attended Secondary school (from sixth to eighth grade). All DD participants were Italian native speakers. No child had been diagnosed as Attention Deficit Hyperactivity Disorder (ADHD) at the time of assessment.

Each child's clinical history was investigated by means of an assessment interview with his or her parents; this was carried out by the same child neuropsychiatrist with special expertise in speech and language disorders (A.C.) who assessed oral language abilities of children. The parents were also asked to fill out a questionnaire (Brizzolara et al., 2006; Chilosi et al., 2009) on motor, cognitive, and language developmental milestones. In order to encourage the parents to recall basic language milestones, examples of typical children's utterances were provided. A child was considered to have a positive history of LD if the analysis of his/her questionnaire revealed at least one of these signs: (1) no vocabulary burst before 24 months; (2) late combinatory use of words, that is, after 30 months; (3) persistence of grammatically incomplete sentences after 4 years of age; and (4) persistence of phonological mispronunciations after 4 years of age. On the basis of these criteria 18 children (11 males, 7 females) were considered as having had a LD. They had a mean age of 11 years and 1 month (SD = 2 years and 4 months). No language delay (noLD) was documented retrospectively in 14 children (7 males and 7 females). Their mean age was 11 years and 5 months. No significant difference for age (*F*(1,30) = 0.21, *ns*) was present between LD and noLD DD groups.

One group of 32 Italian, native-speaking typical readers served as control for the children with DD. These participants were selected from several Primary and Secondary public schools and individually matched with DD subjects by sex and chronological age (± 3 months). For the children to be included, they also had to perform normally on the same standardized single word reading test (see below) and non-verbal intelligence tests used in children with DD. The mean age of the control group was 11 years and 2 months (SD = 1 year and 8 months), a value not significantly different from that of the DD group as expected on the basis of the selection criterion (*F*(1,62) = 0.01, *ns*).

Control children were further subdivided into two groups, in which subjects were matched individually with DD children of LD and noLD group respectively. As expected, each DD group did not differ significantly from its own control group for age (*F*(1,34) = 0.06 for the comparison involving the LD children and *F*(1,26) = 0.00 for the comparison with the noLD children, both *ns*).

### **MATERIALS AND PROCEDURE**

### **Non-verbal intelligence**

Non-verbal intelligence was assessed by Raven's Colored Progressive Matrices (Raven, 1984; Italian standardization, Pruneti et al., 1996) for children from third to fifth grade and by Picture Completion and Block Design subtests of WISC-III (Wechsler, 1991; Italian standardization Orsini and Picone, 2006) jointly considered for children from sixth to eighth grade, in both the DD and control samples.

The cut-off for non-verbal intelligence within normal limits was set at one SD below the mean of the normative sample for the appropriate age level; for children from sixth to eighth grade the average between the Picture Completion and Block Design scaled scores was computed.

Mean *z* score (and SD) for non-verbal intelligence of the DD and control groups are reported in **Table 1**. The two groups did not differ significantly for non-verbal intelligence (*F*(1,62) = 0.16, *ns*).

Likewise, no differences existed in non-verbal intelligence between the DD groups and their control subgroups (*F*(1,34) = 0.77 for the comparison involving LD children and *F*(1,26) = 0.20 for the comparison with the noLD children, both *ns*), as well as between the LD-DD and the noLD-DD subjects (*F*(1,30) = 0.83, *ns*).

Mean *z* scores (and SDs) for non-verbal intelligence of the DD and control subgroups are reported in **Table 1**.

### **Reading assessment**

Reading level for inclusion in the DD or in the control group was assessed using one subtest from the Battery for the Evaluation of DD and Dysorthography, 2nd Edition (Sartori et al., 2007). In this subtest subjects have to read aloud as quickly and accurately as possible four lists of 28 words with high or low frequency (4 to 8-letter long). Total number of errors and speed of reading (syllables/second) are computed. Raw scores were converted to *z* scores according to standard reference data; normative data are available separately for each grade from second to eighth grade.

A *z* score lower than 2 with respect to the mean of the normative sample for either accuracy or speed was taken as the cut-off criterion for inclusion in the DD group. This disjunctive criterion was used because it has been shown that subjects with DD can flexibly adapt their speed-accuracy rate (Hendriks and Kolk, 1997); consequently, a selection based on both parameters might fail to detect selective cases of pathological performance. At the same time, to limit overlaps between the DD and the control group, we adopted a conservative criterion: for children to be included in the control group the performance in the reading test could not be lower than one SD below the mean of the normative sample for either accuracy or speed.

Mean (and SD) *z* scores for the reading measures of both the DD and the control group are reported in **Table 1**. The control readers' performance was close to zero for both accuracy and speed. On the contrary, DD readers performed very poorly on both parameters. As expected on the basis of selection criteria, DD children scored significantly worse than control readers on both accuracy (*F*(1,62) = 10.05, *p* < 0.01) and speed (*F*(1,62) = 22.18, *p* < 0.001). Given the heterogeneity of age in our sample, we compared Primary with Secondary school children in both speed and accuracy raw scores of single-word reading. No significant difference emerged for both measures (*F*(1,30) = 1.49, *p* = 0.23, η <sup>2</sup> = 0.05 for speed; *F*(1,30) = 1.64, *p* = 0.21, η <sup>2</sup> = 0.05 for accuracy).

**Table 1** also reports the mean (and SD) *z* scores for the reading measures of both LD- and noLD-DD children and the corresponding control children. No difference emerged between



z scores for both non-verbal intelligence level and reading measures were computed according to the corresponding normative data for age (non-verbal intelligence level) or grade level (reading measures; for details see the text).

LD and noLD groups on accuracy (*F*(1,30) = 1.59, *ns*) and speed (*F*(1,30) = 1.46, *ns*). On the contrary, as expected, both DD samples differed significantly from the corresponding control group on both the reading parameters (as regards LD children: *F*(1,34) = 6.25, *p* < 0.05 and *F*(1,34) = 11.02, *p* < 0.01 for accuracy and speed respectively; as regards noLD participants: *F*(1,26) = 13.90, *p* < 0.01 and *F*(1,26) = 76.09, *p* < 0.001 for accuracy and speed, respectively).

### **Other literacy measures**

Reading decoding abilities were further investigated by means of a test of non-word reading, which is usually considered to tap the sub-lexical reading route. Non-word reading was assessed with a subtest from the same standardized battery of the word reading test (Sartori et al., 2007). In this subtest subjects have to read aloud as quickly and accurately as possible three lists of 16 non-words (5 to 9-letter long) in line with the phonotactic and phonographic rules of the Italian language. Total number of errors and speed of reading (syllables/second) are computed. Raw scores were converted to *z* scores according to standard reference data; normative data are available separately for each grade from second to eighth grade.

Text reading comprehension was also investigated, using a standardized test from the MT Reading battery (Cornoldi and Colpo, 1995, 1998). A meaningful passage is presented without a time limit. The participant has to read it silently and respond to multiple-choice questions. Stimulus materials, number of questions (10 to 15) and related reference norms vary from school level. Raw scores were converted to *z* scores according to standard reference data.

Mean (and SD) *z* scores for the non-word reading and written text comprehension measures of both the DD groups are reported in **Table 2**. The two groups did not differ significantly on nonword reading (*F*(1,30) = 2.03, *ns* and *F*(1,30) = 1.07, *ns* for accuracy and speed, respectively). A significant difference emerged for text reading comprehension (*F*(1,30) = 4.95, *p* < 0.05), with a lower performance from LD-DD than noLD-DD children. In absolute terms, the mean performance of the former subgroup was more than one SD below the mean of the normative sample.

### **Oral verbal measures**

*Phonological short-term memory (PSTM).* PSTM abilities were examined with a shortened version of a word span test (Ferretti et al., 2003), requiring the child to repeat two lists of Italian highfrequency, disyllabic words varying for phonological similarity. Stimulus presentation is controlled by a PC using a dedicated software. For each list (with or without phonological similar words), sequences of increasing length are presented (from two to seven words). The child is required to repeat the words in the correct order. The list presentation is interrupted when the child fails on three out of five series of the same length. For each list, memory span is the number of words of the longest sequence correctly repeated at least in three out of five presentations. The raw scores were converted to *z* scores according to standard reference data of the test, separately for each list, and then averaged to get a single *z* score for each subject.

**Table 2** shows the means and SDs for the PSTM task of both LD-DD and noLD-DD group.

PSTM was significantly lower in the LD than in the noLD-DD children (*F*(1,30) = 7.44, *p* < 0.05), falling more than one SD below the mean of the normative sample in the former.

*Verbal semantics.* Verbal semantic knowledge was investigated with the WISC-III Information sub-test (Wechsler, 1991; Orsini and Picone, 2006). Raw scores were transformed into scaled scores on a 1–19 scale, with mean = 10 and SD = 3. Data for this test were available for 28 children with DD of our sample (15 LD and 13 noLD; see **Table 2**). The two groups differed significantly on this subtest (*F*(1,26) = 5.86, *p* < 0.05): LD-DD children performed worse than noLD-DD children, scoring as a group more than one SD below the population mean.

*Vocabulary.* Word knowledge (and also verbal concept formation) was examined using the WISC-III Vocabulary sub-test

**Table 2 | Means and SDs (in brackets) of sub-groups of DD participants with (LD) and without (noLD) a history of language delay on literacy (non-word reading and written text comprehension) and oral verbal measures**.


z scores for the literacy and Phonological STM measures and scaled scores for the WISC-III subtests were computed according to corresponding normative data for grade level (reading measures) or age (Phonological STM and WISC-III subtests) (for details see the text).

(Wechsler, 1991; Orsini and Picone, 2006). Raw scores were transformed into scaled scores on a 1–19 scale, with mean = 10 and SD = 3. Data for this test were available for 28 children with DD of our sample (15 LD and 13 noLD; see **Table 2**). The LD children performed somewhat lower than the noLD-DD subjects on the Vocabulary sub-test, but the difference was not significant (*F*(1,26) = 1.59, *ns*).

## **Experimental tasks** *RAN test*

*Stimuli.* For the RAN (or serial rapid naming) tasks, materials were adapted from Di Filippo et al. (2005, 2006). Stimuli were matrices of digits or colors on a white background. In each condition, five different stimuli were presented. The digits were 2, 4, 6, 7, and 9, generated with Helvetica font (size 24) and black typed. The colors were presented in small 1 by 1 cm squares; they were black, blue, red, yellow, and green. There were five rows of 10 stimuli in each matrix for a total of 50 stimuli. There was one matrix for each condition. A smaller matrix with two rows of five stimuli was also created for both digit and color condition to be used in a practice trial (see below). Target sequence was randomized within each matrix.

*Procedure.* Stimuli were displayed on a PC screen. The child was requested to name each stimulus in the matrix as quickly and accurately as possible, working from left to right and from top to bottom. A practice session with a small (10 stimuli) matrix was run for each condition; during this session, the examiner corrected any errors made by the child. For each condition (numbers or colors) time to complete the task was measured using a stopwatch and total time in seconds was used as the dependent measure. Naming errors were also recorded.

## *Discrete-trial naming (DN) test*

*Stimuli.* Stimuli were the same as in the RAN test, except that in the DN tests both number and colors appeared singly on the PC screen on a white background. Target sequence was randomized for each condition. Similarly to the RAN task, for each condition (numbers or colors) a total of 50 stimuli were presented within a single block.

*Procedure.* For each condition (numbers and colors), stimuli were displayed singly in the center of a PC screen. Presentation was controlled by SuperLab Pro 2.0 package (Cedrus Corporation, 2002; San Pedro, California). The child was asked to name each digit or color as fast and accurately as possible. The stimulus remained on the screen until the subject responded or for a time limit of 6000 ms. Then a 500 ms blank followed before the next stimulus appeared. In order to avoid false responses, subjects were explicitly requested not to self-correct if they realized a naming error had occurred. For each condition, participants were given a practice session with 10 stimuli; at the end of this session children were corrected for any naming errors and taught to avoid self-corrections. Vocal latencies were recorded using the SuperLab Pro 2.0 package. Naming errors were also computed. In a few instances, trials were not valid due to technical failures or false responses; more generally, all latencies under 250 ms were counted as invalid trials. Only latencies for valid and correct trials were analyzed. For each condition the median response latency and the total number of naming errors were computed. Median latency was chosen as measure of central tendency to remove the influence of outlier values.

## *Discrete-trial recognition (DR) test*

*Stimuli.* Stimuli and format presentation were the same as in the DN test. Target sequence was randomized for each condition, but the order of presentation was different than in the DN test.

*Procedure.* For each stimulus the trial sequence and temporal parameters were the same as in the DN test. The stimulus was singly displayed at the center of the PC screen and disappeared with the subject's response or after a time limit of 6000 ms. Then, a 500 ms blank followed before the next stimulus was presented. A motor choice-reaction time task was used to estimate efficiency of visual processing underpinning single-item naming. Participants were asked to press one of two keys on a response pad as fast and accurately as possible when the target stimulus appeared (number 7 for the digit condition and a green square for the color condition) and to press the other key when all the other stimuli were displayed. Responses to the target were made using the index finger of the dominant hand, while those to the other stimuli with the index finger of the non-dominant hand. Instructions were given to keep both index fingers on the corresponding keys for the entire session. For each condition, a practice session with 10 stimuli was given, in which participants' errors were corrected.

Both latency and accuracy of response were recorded by Super-Lab Pro 2.0 for each stimulus. Latencies under 150 ms were considered as invalid trials, as they could be either technical failures or anticipations. Only latencies for valid or correct trials were analyzed. For each condition both median response latency and total number of recognition errors were computed.

## **General procedure**

Participants were tested individually in a quiet room, located at the Division of Child Neurology and Psychiatry of Pisa University for children with DD or at their own school for control children. All the tests were administered in a single session. Each session started with the non-verbal intelligence tests, followed by the reading test and then by the experimental tasks. For each group, presentation sequence of the three experimental tasks was counterbalanced across participants according to a 3 × 3 Latin Square design. Likewise, for each experimental task the order of the two conditions (digits and colors) was counterbalanced across participants using a 2 × 2 Latin Square design.

## **DATA ANALYSIS**

A first series of analyses compared the performances of control and DD subjects considered as one group on the RAN, DN and DR tests, separately for each condition (digits or colors).

Accuracy was very high for every task and condition, for both control and DD children. The mean percentages of errors are presented in **Table 3** for all three tasks according to condition (digits and colors) separately for control and DD subjects. Even in the DR tests where level of accuracy was slightly lower, percentage of errors was always below 5% and not statistically different between DD and control children (*F*(1,62) = 1.85 for comparison on DR of digits and *F*(1,62) = 0.15 for comparison on DR of colors, both *ns*). As a consequence, for each experimental task statistical comparisons were restricted to time measures. Mean total naming times in RAN tests and median response latencies in DN and DR tests were submitted to a one-way ANOVA with group (DD children, controls) as unrepeated factor. ANOVAs were carried out separately for digits and colors in each experimental task. In order to assess equality of variances between DD and control children the Levene's test was run for each comparison we made. In no case, variances differed significantly between groups (all *p* > 0.05).

For each comparison between DD and control groups effect size was also calculated by computing the Eta squared (η 2 ) value, which indicates the proportion of variance in the dependent variable explained by the independent variable (the reading group in our case).

As significant differences between the two groups on both RAN and DN tests emerged, an ANCOVA on mean RAN times using DN median latencies as covariates was carried out in order to determine the possible modulating role of the performance on the DN tests over the RAN tests. Two separate ANCOVAs were run for digit and color conditions.

In a second series of analyses the same statistical techniques were used to compare, separately, both LD- and noLD-DD children with their own control subgroups. As in the whole DD sample, percentages of errors (see **Table 3**) were very low for all three experimental tasks in both DD subgroups; it was slightly higher for the DR tasks, although always under 6% for both groups and with no significant differences from controls for both LD-DD group (*F*(1,34) = 0.41 and *F*(1,30) = 1.24 for digits and colors respectively, both *ns*) and noLD-DD group (*F*(1,26) = 1.57 and *F*(1,26) = 2.97 for digits and colors respectively, both *ns*). As a consequence, accuracy was not further examined. Time measures of each experimental task were submitted to two separate one-way.

ANOVAs with group (either LD- or noLD-DD *vs*. respective controls) as unrepeated factor. When the Levene's test for equality of variances between groups was applied, for only 1 out of 12 comparisons (the one on RAN of digits for LD-DD *vs*. control children) variances significantly differed between groups (*F* = 5.19, *p* < 0.05); in this case, violation of equality of variances was corrected using the Welch-Satterthwaite method. Effects sizes using η 2 value were also computed for the different comparisons between LD and noLD DD children and respective controls.

Again, as both RAN and DN time measures were significantly different in most cases, when necessary mean RAN times were submitted to ANCOVAs with group (either LD- or noLD-DD children *vs*. respective controls) as unrepeated factor and DN median latencies as covariates, separately for the digit and color condition, to determine if differences on RAN performance survived when DN performances were partialled out.

Finally, one-way ANOVAs were carried out to directly compare LD- and noLD-DD children on the three experimental tasks. Effects sizes using η <sup>2</sup> were calculated for each comparison.

For all statistical analyses significance level was set at *p* < 0.05.

### **RESULTS**

### **WHOLE DD SAMPLE**

Means (and SDs) for the three experimental tasks of both the whole DD sample and control group are reported in **Table 4** according to type of stimulus.

ANOVA on RAN times revealed a significant group effect, for both digits (*F*(1,62) = 27.95, *p* < 0.001) and colors (*F*(1,62) = 8.96, *p* < 0.01): DD children were significantly slower than controls in both RAN of digits and colors. Effect size was very large for digits (η <sup>2</sup> = 0.31, that is 31% of RAN times for digits explained by reading group) and medium-high for colors (η <sup>2</sup> = 0.13).

A significant group effect emerged also for DN response latencies, regardless of type of stimulus: latencies to respond were higher in DD group than in control group, for both digits (*F*(1,62) = 29.16, *p* < 0.001) and colors (*F*(1,62) = 23.22, *p* < 0.001). For both types of stimulus effect size was very large (η <sup>2</sup> = 0.32 and η <sup>2</sup> = 0.27 for digits and colors respectively).

**Table 3 | Mean percentages of errors and SDs (in brackets) of the whole DD sample, of the sub-groups of DD participants with (LD) and without (noLD) a history of language delay and the respective control groups on the three experimental tasks as a function of type of stimulus (digits and colors)**.



**Table 4 | Means and SDs (in brackets) of whole DD sample, of the sub-groups of DD participants with (LD) and without (noLD) a history of language delay and the respective control groups on the three experimental tasks as a function of type of stimulus (digits and colors)**.

On the contrary, no statistical difference resulted in DR response latencies between DD subjects and normal readers. This was true for both digits (*F*(1,62) = 0.96, *ns*) and colors (*F*(1,62) = 0.41, *ns*). Effect size was very small for both types of stimuli (η <sup>2</sup> = 0.01 in both conditions). It should be noted that DD and control participants did not differ also for number of errors on DR tasks (see above Section on Data Analyses); then a speedaccuracy trade-off on these tasks in the DD sample is to be excluded.

Results of ANCOVA on RAN times using DN response latencies as covariates gave different results for the two types of stimulus. When DN response latencies were partialled out, the group effect on RAN times remained statistically significant for digits (*F*(1,62) = 6.81, *p* = 0.01) but not for colors (*F*(1,62) = 0.26, *ns*), with effect size medium for the former (η <sup>2</sup> = 0.10) and negligible for the latter (η <sup>2</sup> = 0.00).

### **LD-DD SAMPLE**

**Table 4** shows means and SDs for all experimental tasks of both LD-DD children and respective controls according to type of stimulus.

LD-DD participants and their controls differed significantly on RAN speed for both digits (*F*(1,34) = 31.14, *p* < 0.001) and colors (*F*(1,34) = 9.60, *p* < 0.01) with large effect sizes of the group factor for both types of stimulus (η <sup>2</sup> = 0.48 and η <sup>2</sup> = 0.22 for digits and colors, respectively).

Significant differences between the groups were also evident on the DN response latencies, regardless of type of stimulus: mean response latencies in DN tasks were higher in LD-DD children than in typically developing readers for both digits (*F*(1,34) = 34.60, *p* < 0.001) and colors (*F*(1,34) = 18.32, *p* < 0.001). In both conditions effect sizes were very large (η <sup>2</sup> = 0.50 and η <sup>2</sup> = 0.35 for digits and colors, respectively).

By contrast, no significant group effect was evident on DR response latencies. This applied to both digit (*F*(1,34) = 1.56, *ns*) and color condition (*F*(1,34) = 1.44, *ns*). Effect size was small for both types of stimuli (η <sup>2</sup> = 0.04 for both digits and colors). As already reported, also accuracy level did not differ between LD-DD children and their controls on DR tasks.

When the ANCOVA was performed on RAN times with DN latencies as covariates, a significant group effect was still evident in the digit condition (*F*(1,34) = 9.06, *p* < 0.01), with effect size remaining large (η <sup>2</sup> = 0.21), but not in the color condition (*F*(1,34)=1.77, *ns*) for which the effect size of group was small (η <sup>2</sup> = 0.05).

### **noLD-DD SAMPLE**

Mean (and SDs) for the three experimental tasks of both noLD-DD participants and respective control group are reported in **Table 4** according to type of stimulus.

As for RAN times, a significant group effect was evident only in the digit condition (*F*(1,26) = 5.15, *p* < 0.05): the noLD-DD group performed more slowly than control average readers on RAN of digits. Effect size for the group factor was large (η <sup>2</sup> = 0.16). RAN times for colors were not significantly different between the two groups (*F*(1,26) = 0.96, *ns*), the effect size being small (η <sup>2</sup> = 0.04).

ANOVA on DN response latencies showed significant differences between groups for both digits (*F*(1,26) = 4.06, *p* = 0.05) and colors (*F*(1,26) = 5.52, *p* < 0.05): noLD-DD children were slower than control children with both types of stimulus. Effect sizes were medium-high and large for digits (η <sup>2</sup> = 0.13) and colors (η <sup>2</sup> = 0.17), respectively.

Finally, DR latencies did not differ significantly between noLD-DD and controls, for both digits (*F*(1,26) = 0.06, *ns*) and colors (*F*(1,26) = 0.03, *ns*), with no appreciable effect size of group in both conditions (η <sup>2</sup> = 0.00 for both digits and colors). No difference emerged also for accuracy level (see Section on Data Analyses), ruling out the possibility of a speed-accuracy trade-off in the noLD-DD group.

When differences on DN latencies were partialled out by ANCOVA, differences between the two groups on RAN times for digits disappeared (*F*(1,26) = 1.47, *ns*), with a small effect size of group (η <sup>2</sup> = 0.05).

### **LD-DD VS. noLD-DD SAMPLE**

No significant difference emerged between LD-DD and noLD-DD children, regardless of the experimental task and the type of stimulus (*F*(1,30) = 0.23 and *F*(1,30) = 1.46 for RAN of digits and colors respectively, *F*(1,30) = 2.18 and F(1,30) = 0.39 for DN of digits and colors respectively, *F*(1,30) = 0.03 and *F*(1,30) = 0.12 for DR of digits and colors, respectively).

Effect size of the group was small or absent for all comparisons, with the only exception of DN of digits in which a medium effect size was evident (η <sup>2</sup> = 0.07).

## **DISCUSSION**

Children with DD were significantly slower than controls on serial RAN task, in both the digit and the color condition. By contrast, accuracy was quite high in both groups. These results are entirely consistent with those from a wide literature documenting deficient RAN speed in subjects with DD in spite of a very low incidence of naming errors (Wolf and Bowers, 1999; Kirby et al., 2010).

DD participants also showed significantly longer latencies than controls on naming items presented in a discrete form. Slowness of DD children on the DN task was evident for both digits and colors and emerged as a robust group effect in both the conditions. This finding confirms a substantial amount of evidence showing slowness of children with DD in naming familiar items even under the simple condition in which items are singularly presented (Bowers and Swanson, 1991; Fawcett and Nicolson, 1994; Chiappe et al., 2002; Jones et al., 2009; Zoccolotti et al., 2013). The typical interpretation offered for this result is in terms of impaired lexical access and/or retrieval from long-term memory (Walsh et al., 1988; Bowers and Swanson, 1991; Pennington et al., 2001). Such interpretation would also be consistent with results from another line of research documenting impaired performance of children with DD on both confrontation naming and naming to definition tasks (for a review see Snowling, 2000).

If a word-retrieval deficit is the reason for delayed vocal reaction times of children with DD on DN tasks, the same deficit could be easily identified as one factor underlying RAN difficulties of these subjects as serial RAN tasks necessarily involve single items naming. Indeed, this is one of the prominent explanations of such difficulties (Torgesen et al., 1997; Pennington et al., 2001; Chiappe et al., 2002). By this reasoning, RAN speed relates to reading performance as the former taps the same efficiency of accessing and retrieving phonological information the latter requires for accurately and effortlessly mapping orthography onto phonology at both lexical and sub-lexical level.

However, naming of single items not only requires lexical access, but also visual recognition of the item to be named. Thus, interpretation of impaired performance of children with DD on both DN and (at least partially) RAN tasks in terms of a name-retrieval deficit remains speculative, although plausible, until a deficit of visual processing on DN performance can not be excluded.

To this aim, in the present study we introduced a motor choicereaction time task using the same singularly presented stimuli as in the DN task, where participants had to discriminate between a target stimulus and four distracters. Results on this task did not discriminate between children with DD and typical reader controls; response latencies in our DR task were almost the same in the two groups, regardless of type of stimulus. No statistical difference emerged also for level of response accuracy, so leaving out a speed-accuracy trade-off possibility in the performance of participants with DD. Then, our results do not support the hypothesis that some early visual deficit in single item recognition subtends deficient performance of DD subjects on DN tasks and consequently their reduced speed also on RAN tasks. To the best of our knowledge this is the first time a control on visual-perceptual factors underlying performance on DN task is made using the same stimulus materials as in such task.

Results discussed up to now support the role of a phonological-retrieval deficit in explaining slowed response latencies in the DN tasks, and possibly also reduced speed in RAN tasks, in subjects with DD. However results in our DD sample indicate that other mechanisms underpinning RAN performance contribute in mediating its relationship with reading. In fact, when influence of DN tasks latencies on RAN speed was controlled by covariance analysis, differences between DD and normal readers on RAN survived remaining robust for the digit condition, while disappearing for the colors condition. The unique contribution of serial RAN tasks to reading performance over that played by discrete-trial format of naming tasks is well documented in the literature (Bowers and Swanson, 1991; Jones et al., 2009; Logan et al., 2011; Georgiou et al., 2013; Zoccolotti et al., 2013).

Several aspects of serial RAN tasks might account for the greater differentiation of reader groups by this measure compared to that by discrete trial measures. One of these aspects refers to oculomotor requirements for efficient left-to-right visual scanning of stimuli presented in a matrix format, very similar to those necessary for efficient reading of texts. Another putative mechanism has been identified in attentional processes pertinent to the managing of serial information, especially those underlying parafoveal processing of upcoming items, consequent saccadic preparation, eye-movement execution and subsequent articulating of speech output. In a recent study where total response times (from the stimulus onset to the end of its pronunciation) were recorded, Zoccolotti et al. (2013) found that typical readers were significantly faster reading words arranged in rows than singly displayed words, at odds with participants with DD who had a disadvantage in reading multiple stimuli for long words. This last result is in line with other evidence documenting deficient parafoveal processing in subjects with DD (Chace et al., 2005; Jones et al., 2009, 2013). Interpretations of RAN deficits in terms of managing serial information are also reminiscent of another longstanding theory of RAN underpinnings proposed by Wolf, Bowers and colleagues (Bowers, 1995; Wolf and Bowers, 1999; Wolf et al., 2000). According to Bowers (1995) ". . .*although rapide/precise timing mechanisms may underpin performance on all the naming speed measures, the serial format for naming speed requires additional coordination of processes used to extract information from serial visual arrays*" (p. 211–212), thus making RAN tasks more similar to reading than DN naming tasks.

In our study, the unique contribution of serial RAN over that of DN in discriminating between DD and average readers was evident for the digits, but not for the colors, condition. Indeed, this pattern of results seems consistent with the interpretations which emphasize the role of processes pertaining to management and integration of multiple sub-processes in mediating RAN-reading relationship. A stronger predictive role of RAN of alphanumeric stimuli than non-alphanumeric stimuli over reading is not uncommon in the RAN literature (Walsh et al., 1988; McBride-Chang, 1996; Schatschneider et al., 2004). The usual explanation is that digits (like letters) constitute highly constrained categories that can be processed more "automatically" with practice than colors and figures (Wolf and Bowers, 1999). In turn, faster naming for alphanumeric than non-alphanumeric stimuli would let the integration of multiple sub-components involved in serial RAN to occur more efficiently with the former (Protopapas et al., 2013), also making RAN of letters and digits a closer approximation to fluent reading.

In the present study we treated RAN times as a unitary measure, not distinguishing between times to articulate each item and duration of pauses between subsequent articulations. As a consequence, another possible explanation for the unique contribution of serial RAN tasks to reading performance over that played by discrete-trial format might be that a lower articulation rate of participants with DD with respect to controls would selectively lengthen RAN times in the former without affecting response latencies in the DN tasks. However, various considerations make this hypothesis unlikely. First, although exceptions exist (e.g., Georgiou et al., 2008), recent investigations of the RAN components have mostly agreed that interitem pauses and not articulation times are significantly related to reading (Neuhaus et al., 2001; Georgiou et al., 2006; Araújo et al., 2011b). Second, in a recent study where vocal reaction times and pronunciation times during a single-item reading task were measured, longer pronunciation times of DD with respect to control readers were found only for non-words and words exceeding six letters (Martelli et al., 2014). However, in our study words to be articulated for the naming tasks were all very high frequency words and shorter than six letters. Moreover, in a previous study on Italian second graders (unselected for reading ability) the association between articulation rate and reading speed of a text (which includes pronunciation times) was virtually absent; likewise, the contribution of RAN total times to reading speed of text remained significant and substantially unchanged after controlling for articulation rate (Gasperini et al., 2008).

The pattern of results described above refers to the whole sample of children with DD. Did this pattern hold when analyses were run separately on the two subgroups with (LD) or without (noLD) a history of previous LD? In the last years a number of studies has provided evidence for a partially different behavioral and neurocognitive profile between LD- and noLD-DD children (Brizzolara et al., 2006; Scuccimarra et al., 2008; Chilosi et al., 2009; Pecini et al., 2011). Consistently with such evidence, we expected a concurrent language weakness in the former group, which might have a particularly relevant impact on the DN performance of DD children with LD. Indeed, this is what we found. The LD-DD children scored significantly lower than the noLD-DD children in measures of phonological processing (PSTM), verbal semantic knowledge and written text comprehension, confirming our previous results. Moreover, although both DD groups were slower compared to typical readers on the DN task, naming single items proved to be more difficult for LD- than noLD-DD children, as indicated by the more marked effect size for in LD than in noLD participants DD ( in the comparison with the control group) for both digits and colors. Indeed, a possible stronger lexical access deficit in the LD- than in noLD-DD group had been anticipated. However, such a deficit was not accounted for by a semantic retrieval impairment in the former group, in addition to a phonological retrieval deficit shared by both DD groups, as we had hypothesized. In fact, the DN deficit of the LD-DD children was not more evident for colors than for digits, as it would be expected if a semantic impairment would underlie the lexical access deficit of the LD-DD children. One possible explanation for the more marked lexical access difficulties of these subjects might well be only in terms of a phonological retrieval deficit, which would be more pronounced for the LD-DD children. Such an explanation would be consistent with data showing poorer phonological processing abilities in children with SLI than in children with DD without oral language problems (Kamhi and Catts, 1986; Tallal et al., 1997).

By contrast, both LD- and noLD-DD groups were indistinguishable from controls on the DR task, regardless of the type of stimulus, on both response latencies and accuracy.

Overall, these results indicate that children with DD have a discrete-item naming deficit which cannot be accounted for by a visual-perceptual impairment, but needs to be explained as a name-retrieval deficit. Such a deficit, is more marked in DD children with a previous LD and a concomitant verbal weakness.

Also other differences between LD- and noLD-DD children emerged in our study. A different pattern of results was in fact evident in the two groups when analyzing data of the RAN task. First, while for LD-DD children differences from controls were present regardless of the type of stimulus, for noLD-DD children impaired RAN speed occurred only for digits. Moreover, while a significant slowness of LD-DD children with respect to controls on RAN task survived for digits (but non for colors) after controlling for differences in DN response latencies, no significant difference was still evident on RAN of digits between noLD-DD and control readers when DN speed was controlled.

The absence of a "specific" serial RAN deficit for colors in both DD groups is consistent with the hypothesis of a general reduced role of processes pertaining to the managing of multiple activities in RAN of non-alphanumeric stimuli (Protopapas et al., 2013). However, a "non-specific" RAN deficit for colors was still evident in LD-DD children, possibly as a consequence of their marked name-retrieval deficit for this type of stimuli, which would "propagate" to the serial condition. The same would not occur in the noLD-DD group, who showed a much smaller deficit of lexical access for colors with respect to LD children, with the result that noLD-DD children did not differ significantly from controls on RAN of colors.

The difference between the two DD groups was however more relevant on RAN of digits, in which both samples were impaired, but only in the LD children significant slowness with respect to controls survived after controlling for differences in response latencies in the DN task. This finding seems indicative of a greater difficulty of LD- with respect to noLD-DD children in rapidly and precisely integrating different cognitive and linguistic processes underlying RAN performance with digits. On the basis of our data reasons for a different involvement of "synchronization" deficits in the two DD subtypes cannot be much more than speculative. It might be that the larger name-retrieval deficit of LD- than noLD-DD children is responsible for a greater impairment of precise/timing mechanisms in the former than in the latter. When lexical access severely taxes the processing capacity of the child, in fact, integrating such a process with other activities is also severely affected. However, also a specific deficit with a precise/timing mechanism and/or sequential processing *per se* might affect LD DD children, accounting for the unique contribution of serial RAN over the one by discrete naming in differentiating this group from controls.

It remains unclear why noLD-DD subjects showed a "nonspecific" RAN deficit for digits, but not for colors; their lexical access deficit was in fact very similar for the two types of stimulus. An explanation for such difference in terms of a reduced efficiency of noLD-DD children in simultaneously performing multiple activities, which would mainly occur for digits rather than colors, seems unlikely; not only a statistically significant difference of these subjects from controls on RAN of digits was not more evident once differences on DN of the same material was controlled for, but the group effect size *per se* (as resulted from the covariance analysis) was small.

Results of our study were obtained on groups of subjects widely ranging in school grade (from second to eighth). Such heterogeneity might raise some doubts on the interpretation of results concerning cognitive underpinnings of RAN speed deficit in children with DD. However, when we compared absolute speed of reading (syllables/s in reading of lists of single words) between the subgroup of Primary school children (from second to fifth grade) and the subgroup of Secondary school children (from sixth to eighth grade) no significant difference emerged. The absolute reading speed level of both subgroups was comparable with that of Italian second grade typical readers, as provided by the norms of the standardized reading test employed (Sartori et al., 2007). At such level, Italian normal readers are in a stage in which they are shifting from sub-lexical (serial) to lexical (holistic) written word processing (Zoccolotti et al., 2005; Orsolini et al., 2006) and, consequently, the need for a rapid access to lexical phonology becomes crucial. Indeed, according to some authors, the characteristic reading speed deficit of Italian children with DD would mainly reflect a problem in acquiring efficient use of a lexical strategy in reading (Zoccolotti et al., 1999; Orsolini et al., 2009). However, recently Zoccolotti et al. (2013) highlighted a further*locus* for the impaired reading fluency of Italian DD in a reduced advantage with multiple over single words in reading of these subjects with respect to average readers. Both of these impairments (lexical-reading and simultaneous processing of multiple items) are consistent with deficits in lexical access and in managing the sequential information in serial RAN for DD children, as suggested by our results.

One limitation of the present study is that evidence of LD in the pre-school years is retrospective, but this was unavoidable as the children with DD were referred to us for assessment of academic achievements at school age. Such limitation might raise some doubts as to the reliability of classification of participants with DD according to whether or not they had a history of previous LD. However, we also found *concomitant* weaker verbal abilities of the LD- than the noLD-DD group, not limited to deficient oral phonological processing but also encompassing impaired verbal semantic knowledge and text reading comprehension, despite absence of differences in reading decoding abilities.

Another weakness of our study is the relatively small size of each group with DD, when DD participants were subdivided according to the presence or absence of a previous LD. In these conditions significant, but not strong enough, effects of both reading group and type of stimulus on performance in the different experimental tasks might not emerge because of low statistical power. Then, future studies should address issue of cognitive deficits underpinning RAN impaired performance of different categories of DD subjects using larger sample sizes for each subgroup. It should be noted, however, that in our study analyses of the group effect size on the experimental measures turned out to be small at best for those comparisons which were not statistically significant. Then, a substantial change of our results with larger samples does not seem very likely.

Despite the above limitations, we think our study offers a contribution to a better understanding of the reasons why RAN speed discriminates between DD and typical readers, as well as in pointing out possible different cognitive mechanisms at the basis of RAN impairment in different subgroups of children with DD. Recently, there has been a growing interest for potential differences in cognitive processes underpinning RAN performance in different populations of subjects. However, up to now this interest has mainly concentrated on identifying possible developmental differences in the pattern of interrelations among different naming paradigms and/or measures and reading in samples of mostly typically developing readers in different school grades (e.g., Georgiou et al., 2006; de Jong and Messbauer, 2011; Protopapas et al., 2013). In the present study we expanded such aim by comparing different subtypes of children with DD, classified according to whether or not they had a history of previous language delay, which represent two partially different neurocognitive phenotypes. Results from this comparison indicate that this is a worth pursuing goal and a potentially fruitful area of research, as superficially similar RAN impairments in different populations of subjects with DD may obscure at least partially different underlying cognitive deficits.

## **AUTHORS' CONTRIBUTION**

Filippo Gasperini, Child Neuropsychologist, PhD, contributed to the design of the work, acquisition, analysis, and interpretation of the data, to writing the paper and final approval of the version to be published.

Daniela Brizzolara, Child Neuropsychologist, contributed to the design of the work, to interpretation of the data and to writing the paper, revising the paper critically for intellectual content and final approval of the version to be published.

Paola Cristofani, Child Psychologist, contributed to acquisition and analysis of experimental data and final approval of the version to be published.

Claudia Casalini, Child Psychologist, contributed to acquisition of clinical data, revising the final manuscript and final approval of the version to be published.

Anna Maria Chilosi MD, PhD, Specialist in Child Neurology and Psychiatry, contributed to acquisition of anamnestic data of all children with DD, to writing, revising the paper critically for intellectual content and final approval of the version to be published.

## **ACKNOWLEDGMENTS**

We would like to thank Irene Fioretti for collecting some of the cognitive, reading and experimental data on typically developing readers of the study at their schools.

We are also grateful to Claudia Brancati, who helped in creating digital stimulus materials for experimental tasks of the study.

## **REFERENCES**


a regular orthography. *J. Educ. Psychol.* 95, 22–40. doi: 10.1037/0022-0663. 95.1.22


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 March 2014; accepted: 05 August 2014; published online: 04 September 2014*.

*Citation: Gasperini F, Brizzolara D, Cristofani P, Casalini C and Chilosi AM (2014) The contribution of discrete-trial naming and visual recognition to rapid automatized naming deficits of dyslexic children with and without a history of language delay. Front. Hum. Neurosci. 8:652. doi: 10.3389/fnhum.2014.00652*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Gasperini, Brizzolara, Cristofani, Casalini and Chilosi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Modeling individual differences in text reading fluency: a different pattern of predictors for typically developing and dyslexic readers

#### *Pierluigi Zoccolotti 1,2, Maria De Luca2 \*, Chiara V. Marinelli <sup>2</sup> and Donatella Spinelli 2,3*

*<sup>1</sup> Department of Psychology, Sapienza University of Rome, Rome, Italy*

*<sup>2</sup> Neuropsychology Unit, IRCCS Fondazione Santa Lucia, Rome, Italy*

*<sup>3</sup> Department of Human Movement Sciences and Health, University of Rome "Foro Italico", Rome, Italy*

### *Edited by:*

*Peter F. De Jong, University of Amsterdam, Netherlands*

### *Reviewed by:*

*Stefan Hawelka, University of Salzburg, Austria Athanassios Protopapas, University of Athens, Greece*

### *\*Correspondence:*

*Maria De Luca, Neuropsychology Unit, IRCCS Fondazione Santa Lucia, Via Ardeatina 306, 00176 Rome, Italy e-mail: m.deluca@hsantalucia.it*

This study was aimed at predicting individual differences in text reading fluency. The basic proposal included two factors, i.e., the ability to decode letter strings (measured by discrete pseudo-word reading) and integration of the various sub-components involved in reading (measured by Rapid Automatized Naming, RAN). Subsequently, a third factor was added to the model, i.e., naming of discrete digits. In order to use homogeneous measures, all contributing variables considered the entire processing of the item, including pronunciation time. The model, which was based on commonality analysis, was applied to data from a group of 43 typically developing readers (11- to 13-year-olds) and a group of 25 chronologically matched dyslexic children. In typically developing readers, both orthographic decoding and integration of reading sub-components contributed significantly to the overall prediction of text reading fluency. The model prediction was higher (from ca. 37 to 52% of the explained variance) when we included the naming of discrete digits variable, which had a suppressive effect on pseudo-word reading. In the dyslexic readers, the variance explained by the two-factor model was high (69%) and did not change when the third factor was added. The lack of a suppression effect was likely due to the prominent individual differences in poor orthographic decoding of the dyslexic children. Analyses on data from both groups of children were replicated by using patches of colors as stimuli (both in the RAN task and in the discrete naming task) obtaining similar results. We conclude that it is possible to predict much of the variance in text-reading fluency using basic processes, such as orthographic decoding and integration of reading sub-components, even without taking into consideration higher-order linguistic factors such as lexical, semantic and contextual abilities. The approach validity of using proximal vs. distal causes to predict reading fluency is discussed.

**Keywords: reading, individual differences, dyslexia, suppression effect, RAN, vocal reaction times**

## **INTRODUCTION**

Fluent reading of texts is an important requisite for school achievement. The present study was aimed at investigating the factors that modulate individual differences in this skill.

Fluent reading aloud requires the integration of multiple subcomponents (or process them in a cascaded manner according to the terminology adopted by Protopapas et al. (2013). When a word is being fixated and decoded readers plan the next saccade (based on para-foveal pre-processing of text on the right) but keep information about the previous words of the text so that they are able to utter them; readers also have to understand and memorize the meaning of what they are reading. A measure of this multipleprocessing task is the asynchrony between eye position and speech output, referred to as eye-voice span (Buswell, 1921) or eye-voice lead (Fairbanks, 1937); indeed, typically developing readers are able to scan and process words much in advance of the word they are actually uttering.

In adult proficient readers reading aloud occurs fluently and effortlessly, with maximum reading speed for texts (in standard conditions) estimated at approximately 300 words per minute (Carver, 1982). Notably, even higher estimates are obtained using paradigms such as the rapid serial visual presentation which control for the influence of eye movements (e.g., Rubin and Turano, 1992). However, this performance is the endpoint of several years of practice, indicating slow power-function improvement in fluency (Zoccolotti et al., 2009). Notably, increases in reading speed (see data in Carver, 1982), as well as in the size of the eye-voice span (Buswell, 1921), have been observed up to college age.

Many children fail to acquire adequate reading skills, a deficit referred to as developmental dyslexia. Children with dyslexia do not learn to read fluently (e.g., Wimmer, 1993), produce frequent paralexias and characteristically have a very small eye-voice lead (e.g., De Luca et al., 2013). The literature on this disorder is large, particularly that focused on interpreting the nature of reading errors (e.g., see Castles et al., 2006; Temple, 2006; Friedmann and Lukov, 2008; Hulme and Snowling, 2014). Here we focus on the speed deficit of dyslexic children, that is, the deficit in reading fluency that is especially noted in languages with regular orthography (Wimmer, 1993; Zoccolotti et al., 1999). Considering reading fluency as the end-point of the integration of multiple sub-components of reading, some key questions arise. Which components contribute to the reading slowness shown by dyslexic children and how can they be measured and characterized? Does the need to integrate multiple sub-processes also contribute to generating the reading deficit?

To understand individual differences in reading fluency in typically developing and dyslexic children, we started from the working hypothesis that at least two basic factors contribute to the ability of all children to read fluently. The first is efficient orthographic analysis, i.e., *orthographic decoding,* and the second is the ability to integrate decoding of the on-going stimulus with utterance of the target and programming of the next saccade, i.e., *integration of reading sub-components*. The present preliminary study was aimed at evaluating whether these two components explain a relevant portion of the individual differences in text reading fluency. To rationalize our focusing on these two processes we capitalize on two major lines of reading speed research. The first one characterizes the basic difficulty in orthographic processing encountered by dyslexic children. The second one features studies that contrasted discrete and multiple presentations of stimuli and provides information on the integration of reading sub-components in typically developing and dyslexic readers. Below, we briefly review these two lines of research.

### **ORTHOGRAPHIC DECODING DEFICITS IN DYSLEXIA**

A vast literature shows that orthographic decoding is the key difficulty in developmental dyslexia. Indeed, very clear reading deficits are detected in reading single words, i.e., when the requirement to read is stripped of the need to place the stimulus within a sentence and to pronounce it (e.g., van den Broeck and Geudens, 2012).

One related question is whether a developmental deficit can also be reliably detected for single letters or short letter strings. It is generally held that children with dyslexia show deficits in reading words (e.g., Katz and Wicklund, 1971) but not in recognizing letters (e.g., Katz and Wicklund, 1972). Notably, this sparing has also been shown with methodologies that allow controlling for the general difficulty of the task. For example, Martelli et al. (2009) examined the contrast threshold to identify single letters and words and found that dyslexic and typically developing readers needed about the same amount of contrast to identify single letters but differed greatly in the case of long words. Bosse et al. (2007) found that dyslexic children were not impaired in identifying briefly presented letters but had severely impaired visual spans, i.e., they were unable to process a multi-element array of consonants in parallel. In a later study (Lassus-Sangosse et al., 2008), they showed that the string letter deficit was present only when the presentation of letters was simultaneous not when it was sequential. In a similar vein, De Luca et al. (2010) found that dyslexic children were only mildly affected in letter, bigram and two-letter syllable tasks but were severely affected in the case of both words and non-words. Performance in these latter tasks was well accounted for by a single global factor referred to as a "*letter-string*" factor to mark, on one hand, that it was present only in the case of multi-letter displays and, on the other, that it was independent from lexical activation.

The presence of this global letter-string factor has been confirmed in a number of studies that provide information about its characteristics (Zoccolotti et al., 2008; Paizi et al., 2011, 2013; Di Filippo and Zoccolotti, 2012). In particular, the global factor that marks the decoding deficit of children with dyslexia was present when they named orthographic but not pictorial stimuli (Zoccolotti et al., 2008) and when targets were presented visually but not acoustically (Marinelli et al., 2011). Notably, in all these studies the global factor accounted for a very large proportion of the variance in group differences between dyslexic and typically developing readers. Overall, children with dyslexia are severely impaired in decoding when the task requires the parallel processing of a string of letters presented visually regardless of whether the letter string represents a legal word or not. We proposed that this global factor indicates a deficit in a pre-lexical "grapheme description" independent of case, font, location or orientation (see Marsh and Hillis, 2005). Dehaene et al. (2005) proposed a neural model to account for the abstract ability to process words regardless of their location, font and size. According to the Local Combination Detector (LCD) model written words are encoded by a hierarchy of detectors tuned to increasingly larger and more complex word fragments (visual features, single letters, bigrams, quadrigrams and, possibly, words). Over years of practice, learning of local combination detectors allows portions of the left ventral occipito-temporal visual system (referred to as visual word form area, VWFA) to become attuned to the regularities of the writing system, yielding fast parallel processing in reading (Cohen et al., 2000, 2002). The construction of this mechanism seems defective in dyslexic children (Richlan et al., 2009; Pontillo et al., 2014). This mechanism fits well with the pre-lexical "grapheme description" we found defective in dyslexic children.

In the cited studies of dyslexic children (i.e., Zoccolotti et al., 2008; Marinelli et al., 2011; Paizi et al., 2011, 2013; Di Filippo and Zoccolotti, 2012), in agreement with the predictions of the rate and amount model (RAM, Faust et al., 1999) the presence of a letter-string factor was inferred through linear regression analysis on the basis of performance on a large variety of tasks (reading high- or low-frequency words of different lengths, making lexical decisions on words or pseudo-words, etc). Notably, the predictions of the RAM apply at both a group and individual level (Faust et al., 1999). Thus, one may use the parameters of the linear regression of the condition means of a given dyslexic child over those of the total group of readers to obtain estimates of the impairment of the child in terms of the global factor (for a discussion on this point see Kail and Salthouse, 1994). For example, van den Boer et al. (2013) recently showed that the slope and the intercept were expressing different reading processes: the slope indicated the degree of serial processing while the intercept expressed the overall reading speed of words and non-words. Based on the RAM, individual slopes calculated for reading words and pseudo-words using RTs (De Luca et al., 2010) or mean total reading times per item (Di Filippo and Zoccolotti, 2012) correlated significantly with reading speed (and accuracy) in a standard reading test.

However, when studying reading with a correlational approach as in the present study, the use of a single target task may prove advantageous to establish individual performance as compared to the extraction of a single index from a variety of experimental conditions. On the one hand, it is considerably more economical. On the other hand, it avoids the difficulty of obtaining reliable regression coefficients (i.e., slopes and intercepts) at an individual level. Indeed, these are typically based on relatively few conditions and few trials per condition on each observer; thus, individual outliers may occasionally be present for whom the linear regression accounts for only a small proportion of variance. As described in greater detail below, in the present study, we selected a task particularly apt to measure orthographic decoding ability i.e., reading visually displayed single pseudo-words with the instruction to read as fast as possible (ASAP). This task captures the critical characteristics of the letter-string factor because it is in the visual modality and it calls for the fast parallel processing of a string of graphemes without requiring direct access to the lexicon. At the same time, it does not imply the ability to deal with multiple items as this represents a separate factor contributing to reading fluency. Note that processing of a letter string requires dealing with multiple elements (i.e., a set of graphemes) in parallel. Thus, if parallel processing for string is not developed, such as when learning to read, integration processes are evident also within a single word, and, for example, this is indicated by multiple fixations on the string and/or parceled uttering of the target. In the present context with 6th graders, we only focus on the contrast between the orthographic decoding of a single (although in itself complex) target with the ability to integrate this processing with the decoding of other adjacent targets as typical of functional reading.

### **INTEGRATION OF READING SUB-COMPONENTS: DISCRETE- vs. MULTIPLE-STIMULUS PRESENTATION**

Fluent reading requires the ability to integrate the decoding of the on-going stimulus with utterance of the target and programming of the next saccade. This ability implies various sub-components. Previous research has shown that sub-components, such as visual scanning or eye movements, are not affected *per se*in dyslexic children. Thus, scanning and eye movements appear largely unimpaired if non-linguistic stimuli are presented (e.g., Brown et al., 1983; Olson et al., 1983; De Luca et al., 1999). Similarly, no articulatory deficit is present (e.g., Di Filippo et al., 2005; Wimmer et al., 1998).

However, there is evidence suggesting that *integration of the subcomponents involved in reading* is defective in children with dyslexia also when they perform a non-orthographic task. This evidence comes from studies comparing the presentation of discrete- vs. multiple-stimulus displays. Indeed, several of these studies stemmed from research on the paradigm known as "rapid automatized naming" or RAN (Denckla and Rudel, 1974, 1976). In the typical display, the child has to name 50 stimuli (i.e., digits, patches of colors, drawings of objects, etc.) regularly placed on a sheet of paper. Only a few targets (usually five) are used for each trial. The children are trained so they have no uncertainty about the repeated target names. Denckla and Rudel (1976) reported that dyslexic children performed this task more slowly than typically developing readers but were relatively accurate. The nature of the dyslexic children's difficulty in this seemingly simple task has been debated.

Some authors see RAN as just another example of a phonologically laden task (e.g., Ramus et al., 2003). In this view, dyslexic children are slow because of their inefficiency in retrieving the color, digit or picture names. Some correlational evidence goes in this direction. Thus, performance on RAN tasks generally correlates with performance on other phonological awareness tasks (Katz, 1986; Wagner and Torgesen, 1987; Compton et al., 2001; Chiappe et al., 2002). An alternative interpretation was advanced by Wolf and Bowers (1999; see also Wolf et al., 2000). They proposed that RAN is highly correlated with reading as it reproduces its "*microcosm*," i.e., it involves all the sub-components comprising functional reading with the exception of orthographic decoding (see also Blachman, 1984). In this view, dyslexic children are impaired because they are slow in organizing a fluent stream of multiple processes. In this hypothesis, the comparison between discrete and multiple presentations of stimuli is crucial, as only the latter format should show a relationship with reading. By contrast, according to a phonological explanation inefficiency in retrieving color, digit or picture names is expected in both cases. Supporting Wolf and Bowers's view, much research has shown that if stimuli (i.e., digits, colors, pictures) are presented individually, correlations with reading skills are lower than with serial naming (e.g., Stanovich et al., 1983; Bowers, 1995; Chiappe et al., 2002; Logan et al., 2011).

Several studies have dealt with this issue in the last few years. de Jong (2011) examined the development of the relationship between RAN and reading fluency as a function of the format (i.e., discrete vs. serial stimulus presentation) in first, second and fourth grade children. The author found that similar formats of RAN and reading were more strongly related than dissimilar formats among "advanced readers" (i.e., children that read words by sight; almost all 2nd and 4th grade children). Discrete RAN was more related to discrete reading fluency of high-frequent one-syllable words than with the serial reading of these words, while serial RAN was more related to serial words reading fluency than with discrete word reading. Moreover, discrete RAN made a unique contribution in predicting discrete word reading among "advanced readers," whereas serial RAN did not. On the contrary, for "beginning readers" (i.e., those who still read such words serially), RAN was the strongest predictor (whereas the contribution of discrete RAN was negligible) in word reading irrespective of the serial-discrete format (see also de Jong, 2008). Note that serial RAN predicted a large amount of unique variance in serial word reading in both advanced and beginning readers. In a recent study, Georgiou et al. (2013) compared discrete and serial RAN in a variety of experimental conditions. They found that RAN was related to reading partly because it involved serial processing (no correlation with reading was present in the case of discrete naming) and partly because it required the oral production of the different names of the stimuli. In fact, the correlation with reading dropped when subjects were instructed to give fixed oral responses to target and non-target stimuli (i.e., yes or no, 2 or 5, and apple or chicken). Georgiou et al.'s (2013) findings indicate that the whole set of cognitive operations involved in reading is necessary to yield the relationship between RAN tasks and reading. In the same vein, it has been observed that scanning the same RAN targets to cross out a given target is not correlated with reading (see also Wimmer et al., 1998; Landerl, 2001; Di Filippo et al., 2005; Georgiou et al., 2013).

Logan et al. (2011) found that serial naming uniquely predicted reading and that the relation was stronger when isolated naming was controlled for, suggesting that isolated naming functioned as a suppressor variable in the relation between serial naming and reading. In the case of suppression an independent variable contributes little or no variance to the dependent variable but may have a sizeable beta weight because it "purifies" one or more independent variables of their irrelevant variance, thereby allowing their predictive power to increase (Capraro and Capraro, 2001). Notably, specific analyses are needed to show these suppression effects (typically not adopted in early research on serialdiscrete RAN). Evidence for a suppressive effect was recently confirmed by Protopapas et al. (2013) who compared the performance on discrete and serial naming of digits, objects, and words of second and sixth grade Greek children. Discrete and serial word reading correlated highly in younger children but less in older children. A reading–naming dimension explained the data well for the younger children; by contrast, a dimension in terms serialdiscrete processing emerged with older children. Thus, although RAN and reading are correlated at different ages the underlying structure of this relationship may actually change as a function of reading experience. So, younger children appeared to process stimuli predominantly as a series of isolated items while older children start using serial procedures in a cascaded manner effectively. Protopapas et al. (2013) also examined the contribution of naming tasks over and above that of the effect of discrete word reading through regression and communality analyses. The results for communality analyses are particularly relevant here as we used the same approach. For sixth graders, multiple RAN contributed unique variance to the prediction of serial words, while discrete word reading was a minor contributor. The reverse held for younger children; in this case there was a large contribution of discrete word reading and multiple RAN did not explain unique variance.

Notably, most research on discrete and multiple targets is correlational and direct experimental comparisons between these two types of presentation are very few (particularly in the case of reading tasks). One possible reason is that different (and not directly comparable) dependent measures are characteristically used in the two domains. Studying the reading of isolated words (and non-words) largely rests on the analysis of vocal reaction times (RT). Thus, only the time between stimulus onset and the beginning of the vocal response is measured; this putatively captures the decoding part of the response, whereas the actual pronunciation is usually considered as not interesting (but, for a recent analysis of the characteristics of the pronunciation component of the response see Davies et al., 2013; Martelli et al., 2014). By contrast, reading fluency with multiple stimuli, such as word lists or texts, is measured by calculating the time needed to entirely process each stimulus. Thus, the whole time needed to decode and utter a target is considered in this case. Analysing total reading time of discrete stimuli (i.e., the time from onset of the stimulus to the end of the pronunciation) allows for a direct comparison between reading of discrete vs. multiple words (or non-words).

Using this approach, we recently found that 12 years-old typically developing readers had a clear advantage on multiple over discrete items in both RAN and reading tasks (Zoccolotti et al., 2013). Thus, they were able to partially process the next visual stimulus while uttering the current target, producing the time advantage over discrete items. The children with dyslexia of the same age showed a smaller advantage for multiple stimuli in naming colors and digits but presented the opposite pattern in reading, i.e., they were faster when they read discrete than multiple targets. Accordingly, we proposed that dyslexic children's great impairment on multiple arrays indicates a selective difficulty in integrating the multiple subcomponents of the reading task (Zoccolotti et al., 2013). As stated above, direct comparisons of reading under discrete and serial conditions are lacking; thus, to the best of our knowledge, we cannot compare our data with those of other laboratories. Using a somewhat different paradigm, Jones et al. (2009) directly compared discrete and multiple RAN-type tasks and reported that dyslexic young adults showed a greater deficit for multiple than discrete items, whereas non-dyslexic individuals showed a marginal facilitation with this format.

Overall, it seems that the integration of multiple subcomponents (analogous to those implied in reading) is defective in dyslexic children over and above the basic nuclear deficit in decoding words (Zoccolotti et al., 2013). Thus, in the present study, we considered integration ability as a separate factor in predicting reading fluency.

## **PRESENT STUDY**

The present study aimed to evaluate the factors that account for individual differences in the reading fluency of typically developing and dyslexic readers. As dependent measure we chose to examine reading of texts rather than single words because it has a clear functional value and includes dealing with both orthographic materials and multiple target displays. These two latter aspects correspond to the two critical factors we selected to account for children's ability to read fluently: (1) decoding strings of letters presented visually (referred to as *orthographic decoding)*; and (2) integrating decoding of the on-going stimulus with utterance of the target and motor preparation of the next saccade, which requires parafoveal analysis of the future target (referred to as *integration of reading sub-components)*. Both factors are active when children read a meaningful text. However, measuring reading fluency does not directly allow understanding which of them is responsible (and to what extent) for a reading delay because both factors are involved in the performance. Indeed, only one of them (or both but to a variable degree) may be inefficient. A model that separately evaluates the contribution of these two factors may offer, at least in principle, the basis for future investigations of selective disturbances of each factor and/or their interaction.

To measure these two factors separately, we selected single pseudo-word reading and a standard RAN task requiring the naming of digits (or colors). As stated above, single pseudoword reading appears as a particularly appropriate measure of the ability to decode a string of letters. Critically, on one hand, this performance does not require integrating multiple subcomponents (as in standard reading) and on the other hand it does not involve the orthographic lexicon. The performance of digit (or color) RAN represents a particularly suited measure of the ability to integrate the various sub-components typically involved in reading except for orthographic decoding (and keeping lexical and semantic processing aside).

This proposal may be seen as a simplified schema of the processes involved in text reading fluency. As proposed above, the motivation to develop this model stems from the observation that dyslexic children's impairment on multiple stimuli cannot be entirely explained by their single word performance (Zoccolotti et al., 2013). Although they have many different key features, most accepted models of reading, such as the dual route model (Coltheart et al., 2001), the CDP+ model (Perry et al., 2007) or the triangle model (Plaut et al., 1996), focus on the word level; thus, they are only partially informative when examining dyslexic children's reading slowness on texts and more generally when the aim is to predict reading fluency.

Clearly, the proposed model is only a skeleton focused on the processes that, based on previous research, we expect to be closely related to individual differences in text reading fluency. A full model would require specifying all the processes involved in reading fluency (e.g., spelling out all the processes that converge to determine the "integration of the reading sub-components" factor); this enterprise is beyond the aims of the present study which was intended as a first step in this direction. At any rate, it is important to keep in mind that other factors may also play a role in predicting individual differences in text reading fluency. In particular, higher-order linguistic factors may moderate this relationship. These in turn should include efficiency in accessing the orthographic and the phonological lexicon as well as semantic and contextual abilities. In this present preliminary study, however, we were specifically interested in examining how much individual differences in reading fluency can be accounted for by relying only on basic reading processes.

One question concerns the relative independence of the two factors considered. For instance, to explain dyslexics' difficulties in RAN tasks Wolf et al. (2000) proposed that there are "*connections among processes underlying naming speed, automatic orthographic pattern recognition, word identification, and reading fluency*" (Wolf et al., 2000). According to this "connection" hypothesis, one would expect the two factors to be partially related in their influence on reading fluency.

Operationally, we tested whether two variables (discrete pseudo-word reading and multiple RAN) alone or in combination significantly predicted reading fluency on meaningful texts. For RAN, both digit and color stimuli were used. It has been proposed that these two sets of stimuli generate partially different patterns of response (e.g., van den Bos et al., 2002). Notably, naming digits requires the arbitrary mapping of visual stimuli into phonological labels and is expected to produce generally more automatic processing; naming colors is mediated by semantic activation and yields generally slower and less automatized responses than digit stimuli. Thus, we decided to analyze digit and color conditions separately. As a measure, we considered a unit (i.e., total reading time per item) that was directly comparable with both discrete and multiple stimulus presentations as well as reading and naming tasks. We expected both variables i.e., discrete pseudo-word reading and multiple RAN, to contribute unique variance to the prediction and evaluated whether they also shared a common portion of the variance. Moreover, we used an additional control task, i.e., naming times for the isolated presentation of digits (or colors) which, based on previous research, was expected to contribute to the variance indirectly by acting as a suppressor variable (Logan et al., 2011; Protopapas et al., 2013; Logan and Schatschneider, 2014). As we expected predictors to show varying degrees of inter-correlation we used commonality analysis, a type of multiple linear regression that allows partitioning the total variance explained by independent variables into variance unique to each variable and variance shared by a subset of independent variables (Pedhazur, 1982). Commonality analysis is particularly suited when collinearity of predictors is expected as well as the presence of suppression effects (Nimon and Reio, 2011). Based on previous research, we expected a suppression effect of the discrete naming variable (Logan et al., 2011; Protopapas et al., 2013).

First, we present data relative to a group of typically developing readers (Study 1); second, we present data relative to a group of dyslexic readers, highlighting possible differences in the weight of predictors between the two groups (Study 2). In the main text we report data using the digit conditions (both RAN and discrete naming); we synthetically report the same analyses for the color conditions for both typically developing and dyslexic children as Supplementary Materials.

### **STUDY 1: PREDICTING READING FLUENCY IN TYPICALLY DEVELOPING READERS**

Below we present data from a group of 11- to 13-year-old children with typical reading development. At this age level acquisition of reading speed is almost complete (Zoccolotti et al., 2009). Furthermore, recent evidence indicates that in children in this age range the processing of multiple displays is well differentiated from that of isolated stimuli (Protopapas et al., 2013).

### **MATERIALS AND METHODS**

### *Participants*

Forty-three typically developing readers (20 males and 23 females; mean age 11.6 ± 0.4 years) participated in the experiment. Nonverbal IQ level was assessed using Raven's Colored Progressive Matrices. All children scored well within the normal limits according to the Italian norms (Pruneti et al., 1996); mean raw score was 28.8 ± 3.4; mean z score was −0.32 ± 0.80. Reading efficiency was assessed by the MT Reading test (Cornoldi and Colpo, 1995, see below). All participants had normal or corrected-to-normal visual acuity.

## *MT reading test*

The child reads a passage aloud within a 4-min time limit. Reading time (s/syllable) and accuracy (number of errors, adjusted for the amount of text read) are scored (Cornoldi and Colpo, 1995). As for raw data, the average reading time per syllable was 0.23 s (*SD* = 0.04), and the mean number of errors was 6.2 (*SD* = 3.6). Mean z scores (based on normative values, Cornoldi and Colpo, 1995) were near zero for all parameters (0.02 and −0.09 for reading time and accuracy, respectively).

Note that, for the specific aims of the present study, the reading speed at the MT test was the dependent measure for estimating text reading fluency. As for all other measures (see below), an inverse transformation was applied to the data so that item/s was considered in the statistical analyses.

### *Reading pseudo-words*

Twenty 5- and 20 7-letter pseudo-words (matched for initial phoneme across lengths) were derived from words by changing one (or two) letter(s) of each word (see Appendix). Words were selected from the LEXVAR database (Barca et al., 2002; http://www.istc.cnr.it/grouppage/lexvar) and were matched for frequency across length (mean log frequency = 1.4) as well as for bigram frequency (according to the children corpus of word frequency by Marconi et al., 1993). The mean number of syllables was 2.0 for five-letter items and 2.9 for seven-letter items.

Pseudo-words appeared in black lowercase Times New Roman on a white background. Center-to-center letter distance subtended 0.4◦ horizontally at a viewing distance of 57 cm. Items were singly presented on a PC screen in two blocks, separately for the two lengths.

## *Naming digits and colors*

Stimuli were five digits (2, 4, 6, 7, and 9) and five colored squares (black, yellow, and the primary green, red, and blue, digitally defined according to the red, green and blue (RGB) triplets for standard colors) on a white background. Both digits and color names had a mean number of syllables of 1.8 (mean of letter length = 4.6 for colors and 4.4 for digits, respectively) and did not differ for bigram frequency (Marconi et al., 1993). Note that pseudo-words in the reading experiment did not differ from digit names for bigram frequency (Mann-Whitney *U* Test, *Z* = −0.28, n.s.), but differed for number of syllables (Mann-Whitney *U* Test, *Z* = 6.18, *p* < 0.0001) and letters (Mann-Whitney *U* Test, *Z* = 5.54, *p* < 0.0001); pseudo-words differed from color names for bigram frequency (Mann-Whitney *U* Test, *Z* = −4.06, *p* < 0.0001), number of syllables (Mann-Whitney *U* Test, *Z* = 5.17, *p* < 0.0001) and number of letters (Mann-Whitney *U* Test, *Z* = 5.54, *p* < 0.0001).

In the discrete stimulus condition, a single digit (color) appeared in the screen. Twenty-five digit- and 25 color-trials were given in two separate blocks. In the multiple stimuli condition (RAN), 100 digits and 100 colored squares were printed on separate sheets of A4 paper; there were two sheets for each stimulus type (for a total of four sheets), each containing an array of 50 items arranged in 10 rows of five columns.

Each digit (Helvetica, black) subtended 0.9◦ and each square 2.5◦, horizontally, both in the discrete (at 57 cm viewing distance) and the multiple (at 40 cm viewing distance) conditions.

### *Procedure*

Each participant was tested individually in a quiet room. All experiments were administered the same day with a pause after each condition.

In the discrete condition, both digit (color) stimuli and pseudo-words were displayed singly on a PC screen controlled by DMDX software (Forster and Forster, 2003) according to the following trial sequence: 15 ms acoustic tone, 400 ms blank field, 250 ms fixation cross, stimulus onset. The stimulus disappeared at pronunciation onset or after 4000 ms. Stimuli appeared in a pseudo-randomized fixed order in each block. The child was instructed to name the digit (or color name) or to read the pseudo-word aloud as fast and accurately as possible. Reaction time was measured and the whole utterance was digitally recorded.

In the multiple condition, a total of four sheets (two for each of the types of stimulus) were presented to the participant. The child was instructed to name the items aloud as fast and accurately as possible, progressing row-by-row and from left to right. The total time to complete the task was measured with a stopwatch and the errors were noted.

A short practice preceded task execution, separately for the different conditions. The order of conditions (discrete, multiple) as well as the order of type of stimulus (color, digits; five- or seven-letter pseudo-words) was counterbalanced across participants.

## *Data analysis*

In the discrete condition, naming or reading times per item were the time between the onset of the stimulus and the offset of the vocal response (manually detected by means of Check Vocal software; Protopapas, 2007).

In the multiple condition, total naming times per lists were computed and divided by the number of stimuli in the arrays (100) to obtain a measure of naming time per item.

Preliminary analyses indicated some moderate tendency of the distribution of time scores to be skewed as often reported for this type of measures. In particular, the discrete digit naming condition deviated appreciably from normal distribution (Chi-Square goodness-of-fit test = 9.03, *p* < 0.05) although data from the other conditions did not deviate significantly (all *p*s > 0.05). Thus, inverse transformations for all measures were used, i.e., number of items/s. Normality tests indicated that none of these scores deviated from the normal distribution (all *p*s > 0.05 according to the Chi-squared goodness-of-fit test). So, this measure was adopted for all conditions.

Z scores were computed separately for digits and colors based on the group condition means and SDs. This was done separately for the discrete and multiple conditions. To obtain a single measure for pseudo-word reading performance in the discrete conditions, data for five- and seven-letter pseudo-words were collapsed. Z scores were computed based on the group condition means and SDs and averaged to obtain a single z score for pseudo-words.

To summarize, the final time measures entered in the analyses were: text reading (MT reading test), discrete pseudo-word reading, multiple digit (or color) RAN and discrete digit (or color) naming.

To test the influence of predictors on reading fluency in text reading we used commonality analysis, a method of variance partitioning designed to identify proportions of variance in the dependent variable that can be attributed uniquely to each of the independent variables, and proportions of variance that are attributed to various combinations of independent variables (Pedhazur, 1982; Nimon, 2010). To test our hypothesis that fluency in text reading can be effectively predicted by orthographic decoding and integration of reading sub-components, we first ran an analysis using only discrete pseudo-word reading and multiple digit (or color) RAN as predictors. Then we added the additional predictor "discrete digit (or color) naming" to see whether there was an increase in the explanatory power of the analysis.

## **RESULTS**

**Table 1** presents the matrix of inter correlations between all predictors and the dependent variable, i.e., fluency in text reading. A 0.003 *p* level (based on Bonferroni correction for multiple comparisons) was adopted. An inspection of the table identifies a number of major results:


– finally, discrete naming (whether digits or colors) and discrete pseudo-word reading are significantly correlated.

**Tables 2A,B** presents the results of the multiple regression analysis using the digit conditions. **Table 2A** reports the commonality coefficients for the multiple digit RAN and discrete pseudo-word reading variables. As to the percentage of variance explained (see the rightmost column in **Table 2A**), the unique contributions of the "multiple digit RAN" (27.24%) and "discrete pseudo-word reading" (34.61%) variables are present as well as the commonality between the two predictors (38.15%).

Unique and common contributions are summarized in **Table 2B** along with other parameters of the analysis, including the total variance explained by the model (37%) and the standardized β coefficients (and their significance values). For the sake of presentation we refer to this model as "*Model 1*". The last column of the table reports the percentage of variance explained by the two factors considered (due to the presence of the common variance of the two factors the sum of the values exceeds 100%). By and large, results for the color conditions ("*Model 1 color"*) are consistent with those for the digit conditions (see Supplementary Materials).

**Tables 3A,B** presents "*Model 2*," i.e., the commonality coefficients when the "discrete digit naming" variable is added as a predictor to the multiple regression analysis. Unique and common contributions are summarized in **Table 3B** along with the other parameters of the analysis, including the total variance explained by the model and the standardized β coefficients. Note that the total variance explained by "*Model 2"* increases substantially with respect to "*Model 1*,*"* passing from 37 to 52%. This increase is due to the influence of the "discrete naming" variable; specifically, the effect of this variable is suppressive with regard to the influence of the "discrete pseudo-word reading" variable (coefficient: −0.14 corresponding to 27.52% of explained variance, **Table 3A**). Again, results for the color conditions were similar (see Supplementary Materials).

## **DISCUSSION**

Both basic factors, i.e., orthographic decoding and integration of reading sub-components contributed significantly to the overall prediction of text reading fluency. Furthermore, the prediction was higher when discrete naming was added to the model than

**Table 1 | Matrix of correlations between all predictors and the dependent variable, i.e., speed in text reading (MT test) for the group of proficient readers.**


*<sup>\*</sup>p* < *0.003.*

**Table 2 | (A) Commonality coefficients and percentage of explained variance for predictors of text reading ("Multiple RAN" and "Discrete pseudo-word reading"): proficient readers (MODEL 1). (B) Unique and common contributions of "Multiple RAN" and "Discrete pseudo-word reading" to fluency measure: proficient readers (MODEL 1).**


*Adj., adjusted; St., standardized; Unique, predictor's unique effect; Common, predictor's common effects; Total, Unique* <sup>+</sup> *Common; % of R2, Total/R2.*

**Table 3 | (A) Commonality coefficients and percentage of explained variance for predictors of text reading ("Multiple RAN," "Discrete digit naming" and "Discrete pseudo-word reading"): proficient readers (MODEL 2). (B) Unique and common contributions of Multiple RAN, Discrete Naming and Discrete pseudo-word reading to fluency measure: proficient readers (MODEL 2).**


*Adj., adjusted; St., standardized; Unique, predictor's unique effect; Common, predictor's common effects; Total, Unique* <sup>+</sup> *Common; % of R2, Total/R2.*

when only the two original factors were considered. The general pattern of findings was similar for the digit and color conditions indicating that is the variance common to these two sets of stimuli to carry the relationship.

As to the orthographic decoding factor, performance in discrete pseudo-word reading exerted a large unique influence in the analyses with both the two- and three-factor models (i.e., "*Models 1* and *2*"). We proposed that this factor marks the individual efficiency of the pre-lexical graphemic description of the letter string (Zoccolotti et al., 2008).

As to the integration of the reading sub-components factor, the presence of a unique contribution of multiple RAN confirms that RAN tasks capture a proportion of variance (coefficient 0.07; about 13% of explained variance in "*Model 2*,*"* **Table 3A**) which is different from that accounted for by orthographic processing. This is in keeping with the idea that the RAN paradigm captures a portion of variance related to the processing of multiple stimuli.

The two variables also exerted a substantial influence together. One might think that the degree of efficiency in dealing with orthographic analysis of a string of letters contributes to managing multiple stimuli. In this vein, the interaction between multiple naming and reading would change as a function of reading experience. There is some evidence that the correlation between RAN and reading increases with reading experience (Kirby et al., 2003). Furthermore, Protopapas et al. (2013) recently reported that the co-variance between reading and RAN is best expressed in terms of a reading-naming latent structure in younger children and in terms of a serial-discrete dimension in more experienced children.

These relationships are schematized in **Figure 1**. Note that orthographic decoding and integration of reading subcomponents influence reading fluency directly (both singly and interaction between each other). An indirect influence is also presented in the figure; indeed, the discrete digit naming variable exerted a suppressive effect selectively on discrete pseudo-word reading (but not on multiple RAN). A suppressor variable is one that is not directly correlated with the dependent variable but acts indirectly through another predictor(s) (note the insignificant correlation in **Table 1** between reading fluency and digit or color naming). When added to the model the suppressive factor allows for a better overall prediction by accounting for some irrelevant variance in the predictor variables resulting in an increase of the relationship between the predictors and the outcome. This was clearly the case when we passed from the two-variable ("*Model 1*") to the three-variable analysis ("*Model 2*") and obtained an increase in explanatory power (from 37 to 52%; from 31 to 43%, in the case of the color conditions).

The idea that naming isolated non-orthographic items can have a suppressive effect in accounting for individual differences in reading was first conceived by Logan et al. (2011) and later supported by Protopapas et al.'s (2013) findings. Furthermore, Logan and Schatschneider (2014) recently re-analyzed seven different studies and confirmed that isolated naming acts as a suppressor variable in the relation of serial naming with reading. The present results are in part consistent with these previous studies and in part different. In considering the different outcomes it must be

noted that Logan et al. (2011) only examined tasks with nonorthographic stimuli. By contrast, we observed ("*Model 2*") that the suppressive effect of the discrete digit naming variable was mostly on discrete pseudo-word reading (i.e., −27.52%) and was not detected directly on multiple RAN (a very small suppressive effect, i.e., −6.58%, was present on the variance common to discrete pseudo-word reading and RAN).

This pattern of findings can be used to try to understand the nature of the suppressive effect. As this was unknown until recently, only tentative proposals can be advanced. For example, as their data indicated a suppressive effect over RAN, Logan et al. (2011) originally proposed that eye movements and parafoveal processing should be examined as possible targets of future research to explain the suppressive effect (for similar considerations see Logan and Schatschneider, 2014). In the present study the suppressive effect of discrete naming was on discrete pseudo-word reading, i.e., a condition with single, foveally presented orthographic stimuli; thus, Logan and co-workers' proposal would not easily fit the present data.

Another possibility is that what is being suppressed is naming speed. Within this idea, discrete naming taps the efficiency in the retrieval of phonological labels (whether directly linked to arbitrary mappings as in the case of digits or through semantic activation as in the case of colors). Efficient naming of discrete digit (or color) with ASAP instructions shares variance with discrete pseudo-word reading as it has in common the requirement to quickly retrieve and activate a phonological label after stimulus onset. By contrast, discrete naming is not directly related to reading fluency; thus, efficient phonological retrieval is not the reason that pseudoword decoding is related to text fluency. As stated above, one may envisage that the key factor for pseudo-word reading to predict reading is that it captures variance related to the processing of a (relatively long) string of graphemes.

Yet another, more general, alternative is that the portion of variance of the discrete naming variable which generates the suppressive effect is the requirement for a fast response to an externally triggered imperative stimulus under ASAP instructions. Indeed, this requirement is common to the discrete naming and discrete pseudo-word reading while it is not shared by discrete naming and text reading fluency (where is the subject to set his/her own pacing in proceeding through the text). In this vein, what is being suppressed by the discrete naming of colors/digits can be seen as expressing individual "cognitive speed." While this term may appear overly general, Faust et al. (1999) specify rather specific conditions to define this dimension and we refer to their formulation here. Accordingly, cognitive speed expresses the commonality that is present across many speeded decision tasks and that indicates the overall information processing rate characteristic of a given individual. Typical within this frame are studies of the general slowing observed with aging (e.g., Cerella, 1990). Faust et al. (1999) showed that commonality emerges quite clearly in factor analyses of tasks requiring a response under time constraints, i.e., in conditions in which the subject must respond ASAP to an external stimulus that triggers the response. In this perspective, cognitive speed marks the individual information processing rate across many tasks and modalities.

The present data are consistent with this interpretation although they cannot prove it. For this reason in the scheme of **Figure 1** we use the neutral term "suppressive factor," even though we feel that the "cognitive speed" factor represents a coherent and comprehensive framework to interpret it. Further comments on the suppressive factor will be advanced in the general discussion.

## **STUDY 2: PREDICTING READING FLUENCY IN DYSLEXIC CHILDREN**

The development of reading progresses from early acquisition of orthographic decoding to a later ability to effectively integrate decoding with the other sub-components of reading. In the words of Buswell (1921): *"An immature reader . . . tends to keep the eye and voice very close together, in many cases not moving the eye from a word until the voice has pronounced it. Reading of this type becomes little more than a series of spoken words because there is no opportunity to anticipate the meaning in large units."*

This pattern of reading was confirmed experimentally by Protopapas et al. (2013) examining discrete and serial naming of digits, objects and words in Greek second and sixth graders. Discrete and serial word reading correlated very highly in Grade 2 but only moderately in Grade 6. Protopapas et al. (2013) concluded that "*word fluency tasks in Grade 2 are apparently accomplished largely as a series of isolated individual word naming trials even though multiple individual letters in each word may be processed in parallel. In contrast, specifically serial procedures are applied in Grade 6, presumably via simultaneous processing of multiple individual words at successive levels*."

As young readers dyslexic children may be expected to process stimuli in an isolated fashion, as indicated by their smaller eye-voice lead (Buswell, 1921; Fairbanks, 1937; De Luca et al., 2013). According to the proposed model, this can be captured in part from their (defective) performance on the multiple RAN tasks; furthermore, one may believe that the orthographic decoding factor is particularly important in these children as compared to typically developing readers. This may be expressed as greater weight of this factor in the prediction or, alternatively, as a dominant role of this factor over and above the moderating influence of the discrete naming variable.

### **MATERIALS AND METHODS**

### *Participants*

Twenty-five children with dyslexia (14 males and 11 females; mean age 11.8 ± 0.8 years) participated in the experiment. Children were comparable for age and gender to the typically developing readers in Study 1. To assess non-verbal IQ levels, we used the scores obtained by 12 children with dyslexia on Raven's Colored Progressive Matrices. All children scored well within the normal limits according to Italian norms (Pruneti et al., 1996). Mean raw score was 27.3 ± 2.6; mean z score was −0.66 ± 0.62. Wechsler Intelligence Scale for Children (WISC) data were available for the other 13 children with dyslexia; scores were well within the normal range for both performance and verbal subscales (mean total score 96.2 ± 10.1). All participants had normal or corrected-to-normal visual acuity.

The children with dyslexia scored at least 1.65 standard deviations below the norm for either speed or accuracy on the MT Reading test (Cornoldi and Colpo, 1995). As for raw data, the average reading time per syllable was 0.51 s (*SD* = 0.17), and mean number of errors was 21.7 (*SD* = 9.1). Based on normative values (Cornoldi and Colpo, 1995), mean z scores were −2.50 and −2.97, for reading time and accuracy respectively.

As for typically developing children, an inverse transformation was applied to the data so that item/s was considered in the statistical analyses. So, reading speed at the MT test (in terms of word/s) was the dependent measure to estimate text reading fluency.

### *Experimental conditions procedure*

All measures were computed as described above.

As for reading/naming time measures, the pseudo-word reading condition deviated appreciably from normal distribution (Chi-Square goodness-of-fit test = 12.78, *p* < 0.001) while data from all the other conditions did not deviate significantly (all *p*s > 0.05). Thus, inverse transformations for all measures were used, i.e., number of items/s, as for typically developing children. Normality tests indicated that none of these scores deviated from the normal distribution (all *p*s > 0.05 according to the Chi-squared goodness-of-fit test).

### *Data analysis*

As described above.

### **RESULTS**

**Table 4** presents the matrix of inter correlations between all predictors and the dependent variable (i.e., speed in text reading), for the sample of children with dyslexia. A 0.05 significance level was adopted; as we were interested in comparing this pattern of results with those of typically developing readers no correction for multiple comparisons was considered in this case.

The general pattern of correlations is similar to that observed with typically developing children. One main difference emerges: discrete naming (both digits and colors) is correlated with text reading. This is at variance with what occurs for typically developing readers where no correlation was detected.

**Table 5A** presents the commonality coefficients for the multiple RAN and discrete pseudo-word reading variables for the dyslexic children using the digit conditions. There is a detectable unique contribution of the multiple RAN variable (10.16% of explained variance). The unique contribution of the discrete pseudo-word reading variable is large (36.41%). Finally, the two predictors share 53.43% of the variance. Unique and common contributions of the two variables are summarized in **Table 5B** along with other parameters of the analysis, including the variance explained by the model and the standardized β coefficients. Note that the total variance explained by the model (referred to as "*Model 3*") is high (69%). Results of the color conditions are again similar (see Supplementary Materials).

An additional multiple regression was carried out by adding the discrete digit naming predictor. The results of this analysis are presented in **Tables 6A,B**. Notably, the proportion of explained variance was the same after adding this variable (69%; "*Model 4*"). In the analysis the discrete digit naming variable shares some variance with the multiple RAN and pseudo-word reading variables **Table 4 | Matrix of correlation between all predictors and the dependent variable (text reading fluency), speed in text reading (MT test) for the group of dyslexic readers.**


*\*p* <0.05.

**Table 5 | (A) Commonality coefficients and percentage of explained variance for predictors of text reading ("Multiple RAN" and "Discrete pseudo-word reading"): dyslexic readers (MODEL 3). (B) Unique and common contributions of "Multiple RAN" and "Discrete pseudo-word reading" to fluency measure: dyslexic readers (MODEL 3).**


*Adj., adjusted; St., standardized; Unique, predictor's unique effect; Common, predictor's common effects; Total, Unique* <sup>+</sup> *Common; % of R2, Total/R2.*

but does not exert a suppressive effect (as in the sample of typically developing readers). The parallel results for the color conditions are reported Supplementary Materials.

## **DISCUSSION**

In the case of children with dyslexia, the model with only two predictors i.e., "multiple RAN" and "discrete pseudo-word reading," accounts for a large proportion of variance (69%) and no increase in explanatory power is obtained by adding the corresponding discrete naming variable. A note of caution in interpreting these data is in order given the relatively small sample size of dyslexic children, particularly considering the type of statistical analyses. This suggests the importance that the pattern of results be replicated in a different, larger sample, before definite conclusions be drawn. At any rate, results similar to those obtained considering the digit conditions were found using the color conditions. This finding points to the stability of the pattern observed at least within the sample examined.

Notably, the general structure of the model is similar to that of typically developing readers (as schematized in **Figure 1**). In the case of children with dyslexia, however, no suppressive effect of the discrete digit (or color) naming variable was detected when this was added to the model. Thus, it appears that for these children discrete pseudo-word reading performance is so heavily loaded with orthographic decoding that no additional power can be obtained by considering the moderating effect of discrete naming, or individual "cognitive speed" as proposed above.

## **GENERAL DISCUSSION**

To predict individual differences in text reading fluency in typically developing and dyslexic readers, we chose to evaluate factors that, based on previous research, clearly distinguished children with and without a reading deficit. We reasoned that the two selected tasks would selectively measure two different basic processes of reading fluency, i.e., the ability of the child to process a letter string and the ability to integrate this processing with ongoing analysis of the text. For the time being, we have purposely ignored all higher-level linguistic processes, such as activation of lexical and semantic information and on-going syntactic processing, to determine how much individual reading rate depends on basic reading processing.

### **PREDICTING SPEED IN READING MEANINGFUL TEXTS**

The main result of the study is that the ability to decode letter strings (measured by the pseudo-word reading variable) and the ability to integrate the various sub-components at work in


**Table 6 | (A)Commonality coefficients and percentage of explained variance for predictors of text reading ("Multiple RAN," "Discrete digit naming" and "Discrete pseudo-word reading"): dyslexic readers (MODEL 4). (B)Unique and common contributions of Multiple RAN, Discrete Naming and Discrete pseudo-word reading to fluency measure: dyslexic readers (MODEL 4).**

*Adj., adjusted; St., standardized; Unique, predictor's unique effect; Common, predictor's common effects; Total, Unique* <sup>+</sup> *Common; %of R2, Total/R2.*

reading (measured by the RAN variable) jointly allow accounting for a sizeable amount of variance in reading fluency on meaningful texts. The reliability coefficient for our dependent measure, i.e., the MT Reading test time (Cornoldi and Colpo, 1995), is reported to be ca 0.90. Thus, the basic reading processes examined allow accounting for approximately two-thirds of the true variance in text fluency. This holds for both typically developing readers and dyslexic children although with a partially different pattern of predictors (see below). Notably, this high prediction occurs without considering higher level linguistic processes, which involve the activation of lexical, semantic and contextual information.

Below we discuss some specific, and partially open, questions related to the variables considered in the study; in the last section we speculate on the advantage of modeling reading deficits based on proximal rather than distal causes.

### **PSEUDO-WORD READING**

Orthographic decoding contributed importantly to the prediction of reading fluency. The pseudo-word reading task putatively captures the ability to process a letter string and produce an appropriate phonological output. In the introduction, we presented evidence that children with dyslexia show a selective deficit when they have to deal with a letter string presented visually, the deficit being very similar whether the stimulus is a word or a pseudo-word (Zoccolotti et al., 2008; De Luca et al., 2010; Marinelli et al., 2011). We proposed that this deficit marks a prelexical impairment in forming a graphemic description of the stimulus, i.e., a deficit in the abstract representation of a letter string (Zoccolotti et al., 2008). In neural terms, the LCD model proposes that this ability rests on the output of a hierarchy of detectors tuned to increasingly larger and more complex word fragments (Dehaene et al., 2005). In this hypothesis the underlying factor refers essentially to visual perception.

Alternative hypotheses can also be considered to interpret this ability. One idea is that the phonological component of the processing is essential for generating the difference between dyslexic and control readers and in mediating the relationship with reading. Against a strict phonological interpretation, it has been shown that dyslexic readers' deficit is selective for the visual modality and the same stimuli presented acoustically are responded to flawlessly (Marinelli et al., 2011). Furthermore, clear deficits are present also when children have to process strings of consonants in tasks that minimize the influence of phonological activation (i.e., visual span paradigm; Bosse et al., 2007; Valdois et al., 2012). In the same vein, we recently completed a lexical decision experiment in which we used as foils either pronounceable pseudo-words (such as DASU) or unpronounceable non-words made of consonants (such as RNGM). Group differences in responses to words, pseudo-words and non-words were all accounted for by the same (letter-string) global factor indicating that pronounceability of the foil was not critical in mediating the deficit of dyslexic children (Marinelli et al., under revision). A more advanced hypothesis is that the binding between orthographic and phonological information is crucial in generating the dyslexic deficit (Ziegler et al., 2010a; van den Broeck and Geudens, 2012). Some recent neuroimaging evidence points in this direction. In a fMRI study, van der Mark et al. (2011) detected a significant disruption of the functional connectivity between the VWFA and left inferior frontal and left inferior parietal language areas in children with dyslexia. Therefore, the possibility must be considered that the critical underlying factor in the pseudo-word reading task is the need to connect a string of graphemes to the corresponding phonological output.

The possibility should also be considered that lexical activation contributes to performance of the pseudo-word reading task. On the whole, this hypothesis seems unlikely on several grounds. It has been proposed that pseudo-words may generate lexical effects or that parts of pseudo-words may be recognized holistically (e.g., Moll et al., 2009). However, this generally occurs under very specific conditions, such as when they are presented intermingled with words, but this did not occur in the present experiment. Furthermore, lexical attempts at reading pseudo-words are much more frequent among children learning to read an irregular orthography such as English than a regular orthography such as German (e.g., Wimmer and Goswami, 1994).

Overall, orthographic decoding plays an important role in the prediction of fluency in reading a text in a regular orthography such as Italian. Whether this performance essentially marks the efficiency of the graphemic processor of letter strings or of a mechanism binding the output of this processor to phonological processing is beyond the aims of the present study and is a question open to future research.

### **RAN**

The finding that performance on the RAN tasks actually predicts reading fluency confirms much previous research (Wolf and Bowers, 1999; Wolf et al., 2000). Based on evidence summarized in the Introduction, we considered that RAN tasks selectively capture the ability to integrate the various sub-components necessary for effective reading but exclude orthographic decoding. Critical in this perspective is the finding that RAN correlates with reading only if the task requires serial processing and active production of specific names (Georgiou et al., 2013), as occurs in reading. The present results indicate that RAN tasks account for a sizeable amount of variance (more than 10% in both groups of children) over and above that accounted for by orthographic decoding, and that accounted in common by the two factors. This finding confirms previous observations by Protopapas et al. (2013) who found RAN to contribute unique variance over and above discrete word reading at least in 6 grade children. Overall, the RAN tasks capture individual variability linked to the ability to deal with multiple targets; note that this variability cannot be explained in terms of processing the same stimuli when presented in a discrete format (Georgiou et al., 2013; present data).

It is not clear at present whether these individual differences can be ascribed to a single identifiable mechanism. One hypothesis proposes that slowness in RAN tasks depends on a multiple, or domain-general, temporal processing deficit in dyslexic children (Farmer and Klein, 1995). However, a systematic check of this hypothesis failed to reveal any indication that a deficit in temporal processing *per se* underlies the reading deficit of dyslexic individuals (Chiappe et al., 2002). Alternatively, one can speculate that individual differences in the fluency to deal with multiple visual stimuli, such as digits or color, with the aim of naming them rest on a more specific skill. At least in part, this represents an individual trait present prior to school experience as it has been found that performance on RAN tasks at a pre-school stage significantly predicts later efficiency in reading (e.g., Bishop, 2003), However, this does not exclude that efficiency in RAN tasks is progressively tuned through reading itself (e.g., Torgesen et al., 1994). In fact, through reading training, children get much experience in integrating target identification with visual scanning, parafoveal pre-analysis and pronunciation. Thus, when we examine individual RAN speed in children who already attended school for a number of years, we measure a skill that has had received partial reinforcement from reading experience itself. In support of this view, the distinction between single-multiple stimuli processing becomes prominent in modulating the relationship with reading only after a number of years of schooling (Protopapas et al., 2013). Furthermore, while RAN tasks are correlated with reading across very different ages, the size of this correlation increases with reading experience (Kirby et al., 2003). Thus, the link in **Figure 1** between "orthographic decoding" and "integration of reading sub-components" sketches a relationship between the two factors that is bidirectional and may presumably change with age and reading experience.

In the same vein, note that a much more complex model could be proposed following the (not unlikely) view that reading experience affects fluency, and fluency may affect both integration and decoding. Feedback links should integrate the model and different analyses may contribute to evaluating the direction and weight of each influence; however, we see the present study only as a first step in modeling individual variations in reading fluency in Italian typically developing readers and dyslexic children.

### **SUPPRESSIVE FACTOR**

The performance on the discrete digit or color naming task contributed as a third factor, and in a suppressive manner, to the prediction of reading fluency in typically developing but not in dyslexic readers. Above, we tentatively discussed a few alternative interpretations. Admittedly, the present data do not allow to persuasively select between a naming speed and a cognitive speed interpretation and only speculative considerations can be advanced at this point. However, as stated in the comments of study 1, cognitive speed seems to provide a theoretically sound interpretation and one that is potentially worth of further research.

Faust et al. (1999) define cognitive speed as the overall information processing rate characterizing a given individual across a variety of tasks (Faust et al., 1999). Indeed, in conditions with ASAP instructions the time measures of performance (RTs) on different tasks are always highly correlated. Due to this very large co-variation, if a standard factor analysis is applied to the data a single factor accounts for a large proportion of individual variability (Faust et al., 1999). At first glance, this finding contrasts with the well-known fact that RTs are particularly sensitive in picking up differences due to experimental manipulations. See, for instance, the effects of psycholinguistic variables (such as word frequency, orthographic neighbors, etc) on word recognition that typically reveal significant effects with differences of a few milliseconds. When testing the effects of experimental manipulations, this large co-variation is controlled for by the use of repeated measures designs (which essentially partial out the correlation between measures across experimental conditions). However, if one wants to examine individual differences (rather than the effects of experimental manipulations) one must face the fact that the time measures will all be highly correlated, particularly when the general format of task and response is kept constant (as in RT tasks in which the subject has to respond ASAP to an external imperative stimulus). In these cases, the presence of correlation will substantially modulate the relationships between the specific factors investigated. This represents a problem if, as in the present case, no correlation is actually expected between reading fluency and cognitive speed *per se* (as shown by Bonifacci and Snowling, 2008); however, measures of cognitive speed will correlate with other predictors provided that they share the general task format which may indeed be more important than the specific type of stimuli. In this vein, it is interesting that the discrete digit/color naming task has a large suppressive effect on pseudoword reading with which it shares a general format (i.e., an ASAP response), but has no detectable effect on the RAN tasks with which it shares the type of stimulus (digits or colors) but not the general format.

This framework may help placing the lack of an effect in dyslexic children. Based on readers' data, we should expect the cognitive speed factor to modulate pseudoword reading. However, this influence was not significant because the dramatic slowness of dyslexic children in orthographic decoding also implies huge individual differences at this level and dominates over the cognitive speed factor.

In the introduction and above we have cited evidence indicating that a global factor marks individual performance in speeded reading tasks and effectively discriminates between dyslexic and typically developing readers. However, the global factor that marks dyslexics' performance (Zoccolotti et al., 2008) and the cognitive speed factor described by Faust et al. (1999) are clearly distinct constructs. Dyslexic children are slow across many tasks, but only if they require the processing of orthographic strings. By contrast, according to Faust et al. cognitive speed refers to a more general construct, spanning across different stimuli and modalities.

### **MODELING INDIVIDUAL DIFFERENCES IN READING FLUENCY: PROXIMAL vs. DISTAL CAUSES**

The present approach should be distinguished from several previous attempts to predict individual reading performance (e.g., Torgesen et al., 1997; Muter and Snowling, 1998; Compton et al., 2001; Kirby et al., 2003; Ziegler et al., 2010b; Landerl et al., 2012; Warmington and Hulme, 2012). In these studies, the authors aimed to predict reading using a variety of cognitive measures but without explicitly attempting to make a componential analysis of reading behavior. Characteristically, a spectrum of linguistic, meta-phonological, visual and also RAN measures are used jointly to examine which predictor(s) is (are) more strongly related to the reading dependent measure.

One way to distinguish this approach from the present one is to see it as focusing on distal (as opposed to proximal) causes of behavior. Within a proximal approach (such as in the present study), reading behavior is described in terms of the building blocks of the reading processes (see further comments below). By contrast, a distal approach has the more ambitious goal of searching for the ultimate origin of normal and disordered behavior. Thus, predictors are considered as inherently independent causes of behavior and, as such, the presence of uni-directional links between putative causes and effects is an essential tenet of this approach. However, this assumption is problematic when using cognitive markers as distal "causes" of individual variability in reading.

This point has been often discussed in relation to phonological awareness. One popular view considers phonological awareness as a critical ability for the beginning of reading and defective phonological awareness as a possible cause of dyslexia (for a review see Melby-Lervag et al., 2012). In this vein, phonological awareness is a distal cause of reading behavior, exerting a unidirectional relationship. However, this assumption is questioned by the observation that the critical learning for the conscious awareness of phonemes actually occurs during schooling (Morais et al., 1979). The influence of school experience is particularly clear in studies comparing later-schooled children (i.e., children who start school 1 or 2 years after the usual age) with children matched for age but differing for school experience, and children matched for schooling but differing for age (Alcock et al., 2010; Cunningham and Carroll, 2011; for similar data on Italian children see Scalisi et al., 2013). Thus, phonological awareness may be seen more as a consequence than a cause of reading. More complex interpretations have been advanced that propose the presence of a reciprocal relationship between phonological awareness and reading (e.g., Perfetti et al., 1987). However, even if this were appropriate, it would seriously undermine the validity of using phonological awareness as a distal, unidirectional predictor of reading.

Although this question has been often discussed in relation to phonological awareness, it presumably also refers to other general cognitive predictors, such as vocabulary breadth or visual scanning. Indeed, the same argument may well apply to RAN even though the change in performance after the beginning of schooling is not as abrupt as in the case of phonological awareness tasks (e.g., Scalisi et al., 2013). As stated above, it seems reasonable to envisage that children tune their ability to integrate scanning, identify and name visual targets mostly through reading experience and it is with reading experience that individual differences in the fluidity of carrying out such complex behaviors come out most clearly (Kirby et al., 2003; Protopapas et al., 2013).

Overall, for a distal approach to be effective it is crucial that cognitive predictors be independent from the behavior to predict, i.e., that the direction of causality be unidirectional. However, this assumption seems very difficult to hold in view of the strict bidirectional relationship that most of the cognitive abilities (as phonological abilities and integrating skills) typically entered in prediction studies hold with reading.

Another critical characteristic of the distal approach in the case of reading and dyslexia is that the nature of the relationship between the cognitive measures and reading are usually left under-specified in terms of actual processes. One can imagine, for example, various ways in which low short-term memory, small vocabulary or inefficient ability to segment or blend phonemes can indeed affect the acquisition of reading. Yet, no explicit relationship is typically formulated as to which specific cognitive deficiency should produce which selective effect on reading. Put in other terms, predictors do not have a specific place in the architecture of reading.

In the present study, we viewed the predictions about reading from the perspective of proximal causes. According to some influential authors, this approach has its own autonomy even in the absence of a full description of the distal causes of dyslexia, although these will eventually need to be investigated (e.g., Jackson and Coltheart, 2001, 2002). In the proximal approach, it is not critical that expertise in orthographic decoding and integration of reading sub-components are progressively tuned throughout schooling and, more generally, with reading experience, i.e., that they are not fully independent causes exerting a unidirectional influence on reading fluency. What is crucial is to spell out the building blocks of the reading process and evaluate their individual and interactive influence on individual reading fluency. In this view, note that here we qualify RAN performance as a measure of a specific component of the reading process, namely the integration of reading sub-components (see Georgiou et al., 2013), not as a general cognitive predictor. There is a long tradition of studies based on the proximal approach, particularly stemming from the dual route model (Coltheart et al., 2001). Most often, they have dealt with the analysis of single case studies. Here, we propose that a proximal approach may help to re-think the correlational studies predicting individual variability in text reading.

Finally, a novel methodological element of the approach used in the present study is the homogeneity of the measures adopted. By focusing specifically on reading speed, we used the same measure (i.e., total reading time per item) across both independent and dependent variables. Most previous studies on the prediction of reading used mixed measures and included reading accuracy as a dependent variable even in cases in which speed measures were used as predictors (Logan and Schatschneider, 2014). In these cases, variations in the format of the measures used might have unknown effects on the pattern of relationships found.

### **CONCLUSIONS**

The results of the present study indicate that fluency in reading texts depends heavily on basic reading processes, i.e., the ability to decode letter strings and to integrate the various reading subcomponents. This prediction occurs without considering the role of lexical, semantic and contextual information. Although these processes may also exert some influence, it seems that they can only complement the prediction in view of the large proportion of variance accounted for by basic reading processes. In typically developing readers, the prediction becomes more effective when the suppressive effect of stimulus-triggered naming speed under ASAP instructions is considered, suggesting a putative indirect role of individual cognitive speed.

### **ACKNOWLEDGMENTS**

This work was supported by grants from the Department of Health and Sapienza University. We would like to thank Drs. Claudio Barbaranelli, Kim Nimon and Silvia Primativo for their advice and help with the statistical analyses, Maria Pontillo for help in data collection, and Claire Montagna for the linguistic revision of the text.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.01374/abstract

### **REFERENCES**


and typical word and pseudoword reading in a transparent orthography. *Read. Writ.* 26, 721–738. doi: 10.1007/ s11145-012-9388-1


of reading aloud. *Psychol. Rev*. 114, 273–315. doi: 10.1037/0033-295X. 114.2.273


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 March 2014; accepted: 17 September 2014; published online: 18 November 2014.*

*Citation: Zoccolotti P, De Luca M, Marinelli CV and Spinelli D (2014) Modeling individual differences in text reading fluency: a different pattern of predictors for typically developing and dyslexic readers. Front. Psychol. 5:1374. doi: 10.3389/fpsyg.2014.01374 This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Zoccolotti, De Luca, Marinelli and Spinelli. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX**

### **PSEUDO-WORDS**

acria; barta; carvo; cospa; curno; dribo; ersia; fergo; gorra; lispo; macca; natto; pesso; pocre; pucca; risbo; terpa; tuore; turra; vazio. aldirgo; ardesto; bilevio; candima; conzane; cunallo; dascone; finecia; guaspia; nattoga; pestora; podilla; rucchia; runazzo; tarenno; tembara; tigiala; tivarna; valtano; vamione.

## The locus of impairment in English developmental letter position dyslexia

## *Yvette Kezilas1\*, Saskia Kohnen1, Meredith McKague2 and Anne Castles1*

*<sup>1</sup> Department of Cognitive Science, ARC Centre of Excellence in Cognition and its Disorders, Macquarie University, Sydney, NSW, Australia <sup>2</sup> Melbourne School of Psychological Science, The University of Melbourne, Melbourne, VIC, Australia*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

#### *Reviewed by:*

*Jackie Masterson, Institute of Education, UK Dror Dotan, Tel Aviv University, Israel*

#### *\*Correspondence:*

*Yvette Kezilas, Department of Cognitive Science, ARC Centre of Excellence in Cognitive and its Disorders, Macquarie University, Australian Hearing Hub, Level 3, 16 University Avenue, NSW 2109, Australia*

*e-mail: yvette.kezilas@mq.edu.au*

Many children with reading difficulties display phonological deficits and struggle to acquire non-lexical reading skills. However, not all children with reading difficulties have these problems, such as children with selective letter position dyslexia (LPD), who make excessive migration errors (such as reading *slime* as "smile"). Previous research has explored three possible loci for the deficit – the phonological output buffer, the orthographic input lexicon, and the orthographic-visual analysis stage of reading. While there is compelling evidence against a phonological output buffer and orthographic input lexicon deficit account of English LPD, the evidence in support of an orthographic-visual analysis deficit is currently limited. In this multiple single-case study with three English-speaking children with developmental LPD, we aimed to both replicate and extend previous findings regarding the locus of impairment in English LPD. First, we ruled out a phonological output buffer and an orthographic input lexicon deficit by administering tasks that directly assess phonological processing and lexical guessing. We then went on to directly assess whether or not children with LPD have an orthographic-visual analysis deficit by modifying two tasks that have previously been used to localize processing at this level: a same-different decision task and a non-word reading task. The results from these tasks indicate that LPD is most likely caused by a deficit specific to the coding of letter positions at the orthographic-visual analysis stage of reading. These findings provide further evidence for the heterogeneity of dyslexia and its underlying causes.

**Keywords: phonological output deficit, orthographic input lexicon deficit, orthographic-visual analysis deficit, migration errors, substitution errors, developmental dyslexia**

### **INTRODUCTION**

The last three decades have seen an emphasis on the role that impaired phonological processing plays in developmental dyslexia. Various researchers have posited that at the core of dyslexia lies an impairment in the ability to represent, store, and retrieve speech sounds (Stanovich, 1988; Snowling, 1998, 2001; Ramus, 2003). This phonological deficit is proposed to be linked to the difficulty children with dyslexia experience in learning the mappings between letters and speech sounds, which is often remediated using phonics training (see Castles et al., 2009; McArthur et al., 2012). The phonological deficit account of dyslexia is supported by a multitude of correlational, longitudinal, and training studies that have found developmental dyslexia to typically be associated with poor phonological awareness (e.g., Høien et al., 1989), slow lexical retrieval skills (e.g., Denckla and Rudel, 1976), and poor verbal short-term memory (e.g., Mann et al., 1980; Mann and Liberman, 1984).

However, not all children with dyslexia have a phonological impairment. For example, children with surface dyslexia appear to have no difficulties with mapping letters onto speech sounds, as is evidenced by their ability to read non-words as proficiently as their peers (e.g., Castles and Coltheart, 1993; Broom and Doctor, 1995; Castles and Coltheart, 1996; Temple, 1997). Instead, surface dyslexics have been thought to have problems with orthographic

processing, resulting in excessive reading errors where an irregular word is sounded out incorrectly using common letter-sound rules (e.g., *yacht* is read as if it rhymed with *matched*). The existence of cases of developmental dyslexia where phonological processing appears intact suggests that while some dyslexias may be attributed to an impairment in phonological processing, other dyslexias are not. Here, we provide further evidence for the heterogeneity of dyslexia and its underlying causes by furthering the investigation of the locus of impairment in English-speaking children with developmental letter position dyslexia (LPD).

The hallmark symptom of LPD is an elevated tendency to make "migration errors," where the order of letters within migratable words (more commonly known as anagrams) is confused, resulting in the misreading of a word as its migration partner (e.g., *slime* is read as "smile"). While migration errors are frequently made by beginning readers (Kohnen and Castles, 2013), English children with LPD have been found to make up to four times the number of migration errors made by their peers (Kohnen et al., 2012). Children with LPD have particularly high migration error rates when reading words where the transposition of letters in the middle of a word can lead to another word (e.g., *slime–smile, diary–dairy*). Intriguingly, cases of selective LPD have been documented, where all other reading processes appear intact (Friedmann and Rahamim, 2007; Kohnen et al., 2012). Children

with selective LPD read as accurately and as fluently as their peers – except when they are asked to read migratable words.

There arefour studies that have investigated the locus of impairment in developmental LPD – two in Hebrew (Friedmann and Rahamim, 2007; Friedmann et al., 2010a), one in Arabic (Friedmann and Haddad-Hanna, 2012), and most recently one in English (Kohnen et al., 2012). All four studies have used the cognitive model of reading aloud illustrated in **Figure 1** to identify the locus of impairment in LPD. Following this model, when a word is encountered in print, its visual properties undergo orthographicvisual analysis. This stage involves identifying the word's letters, coding the position of the letters within the word, and binding the letters to the word. Following these initial computations, the word is processed via three routes: (1) the lexical route (orthographic input lexicon to phonological output lexicon), (2) the lexical-semantic route (orthographic input lexicon to phonological output lexicon via the semantic system), and (3) the non-lexical route (grapheme–phoneme conversion). Typically, the lexical and lexical-semantic routes successfully process all words within a reader's orthographic input lexicon (storage for familiar words) but fail to process non-words. In contrast, the non-lexical route successfully sounds out non-words and words that follow typical letter to sound rules ("regular words" such as *surf*, *blame*, and

*hand*), but fails to provide accurate pronunciation for irregular words (such as *yacht, come*, and *friend*). According to the model, after the written input has progressed through these routes, the phonemes that make up the word are assembled and held active in the phonological output buffer until a verbal response is made.

Using this model, previous research has proposed three possible loci for the migration errors seen in LPD (Friedmann and Rahamim, 2007; Kohnen et al., 2012). First, the migration errors may occur at the phonological output buffer as the phonological code is being prepared for pronunciation. Strong evidence against this hypothesis comes from the observation that children with LPD perform within the average range on standardized tests that draw heavily on the phonological output buffer (e.g., phonological awareness and verbal short-term memory assessments; Friedmann and Rahamim, 2007; Kohnen et al., 2012). Furthermore, Kohnen et al. (2012) reported that the majority of the migration errors made by their sample of English LPDs could not be attributed to the swapping of phonemes in the output buffer. For example, the swapping of the phonemes in *cloud* (/k/ /l/ /aw/ /d/) does not create the migration error "could" (/k/ /U/ /d/; Kohnen et al., 2012). Rather, the deficit causing this error must occur before the graphemes in the word have been converted into their appropriate phonemes.

Second, migration errors may occur due to an orthographic input lexicon deficit. On this account, LPDs are proposed to have fewer lexical entries in their orthographic input lexicon (i.e., have a smaller sight-word vocabulary) than is typical for their age. When the lexical entry matching a target word cannot be found in the lexicon, a lexical guessing strategy is adopted resulting in an error that is visually similar to the target word. This possibility is unlikely however, as LPDs have been found to read non-migratable, irregular words (e.g., *yacht*) as proficiently as their peers, indicating that their orthographic input lexicon is intact (Friedmann and Rahamim, 2007; Kohnen et al., 2012). Furthermore, if the migration errors made by LPDs are the result of lexical guessing, they should also make other lexical similarity errors, such as substitution errors (e.g., reading *slime* as "slide"). This is not the case – their reading errors appear to be selective to the transposition of letters within words (Friedmann and Rahamim, 2007; Kohnen et al., 2012).

The third and final possibility following **Figure 1** is that LPD is caused by a deficit specific to the coding of letter positions within words at the orthographic-visual analysis stage of reading. Of the three possible deficits (phonological output buffer, orthographic input lexicon, and orthographic-visual analysis), an orthographicvisual analysis deficit currently provides the most parsimonious explanation for the available data. Two pieces of evidence suggest that LPD is caused by an orthographic-visual analysis deficit. First, in Hebrew, LPDs have been found to make excessive migration errors on a same-different decision task (e.g., responding "same" to *slime-smile;* Friedmann and Rahamim, 2007; Friedmann et al., 2010a). Two of the three cases of English LPD reported by Kohnen et al. (2012) also showed this effect. Because the same-different decision task is thought to tap prelexical processing (see Besner et al., 1984; Kinoshita and Norris, 2009), LPDs' poor performance on this task has been taken as evidence for an orthographicvisual analysis deficit (Kohnen et al., 2012). Second, in Hebrew, LPDs have been found to make more word responses to migratable items (e.g., reading *slime* as "smile," and *forg* as "frog") as well as non-word responses (e.g., reading *pilf* as "plif"), indicating that the cognitive mechanism that is defective in LPD is common to both lexical and non-lexical routes (Friedmann and Rahamim, 2007). There are two components of the model that are common to both routes: orthographic-visual analysis and the phonological output buffer. As previously outlined, there is strong evidence refuting a phonological output buffer deficit account of LPD. Therefore, the finding that LPDs in Hebrew make more word and non-word responses to migratable items has been taken as evidence for an orthographic-visual analysis deficit, which then has knock on effects to both lexical and non-lexical reading.

There are, however, two pieces of data that appear inconsistent with an orthographic-visual analysis deficit account of English LPD. First, one of the three LPD cases reported by Kohnen et al. (2012) did not make excessive migration errors on a same-different decision task. As the same-different decision task should reveal an orthographic-visual analysis deficit, this finding may suggest that the migration errors made by this case (identified as EL) are not caused by this deficit. Second, while the LPDs in Kohnen et al.'s (2012) study made more word

responses to migratable items (e.g., reading *slime* as "smile," and *forg* as "frog") than controls, they did not make more nonword migration responses than controls (e.g., reading *pilf* as "plif "). This finding proves problematic for an orthographicvisual analysis deficit account of English LPD, as a deficit at the initial, orthographic-visual analysis stage of reading should produce migration errors in both lexical and non-lexical reading. The aim of the present study was to follow up on these two unexpected findings to clarify the locus of impairment in English LPD.

One plausible reason why EL did not make excessive migration errors on the same-different decision task is that he was adopting a strategy during the task whereby he compared each letter across the two words. In Kohnen et al.'s (2012) task, participants were presented with two words side by side, and were given as much time as they needed to make their response. AsKohnen et al. (2012) have suggested, these task conditions give participants the opportunity to compare each letter across the two words, rather than comparing the two words to one another as is intended by the task. If attention is focused on each individual letter, each letter's position is no longer processed in relation to the position of the other letters within the word. This means that letter positions will less likely be confused, and migration errors will less likely be made.

Additionally, there are two plausible reasons why the LPDs in Kohnen et al.'s (2012) study may not have made excessive nonword migration responses, where the order of letters in a non-word stimulus is confused, resulting in a non-word response (e.g., reading *pilf* as "plif"). First, while letters in familiar words are thought to be processed in parallel via the lexical routes, letters in nonwords are thought to be processed serially via the non-lexical route (Rastle and Coltheart, 1998; Friedmann and Rahamim, 2007). The serial processes that underpin non-word reading might therefore reduce the likelihood that an LPD will make non-word migration errors (Friedmann and Rahamim, 2007; Kohnen et al., 2012). Second, research in both Hebrew and English has shown that there are specific variables that influence whether or not LPDs make word migration errors. For example, LPDs are most likely to make a word migration error when a low-frequency word can migrate into a higher frequency word via the transposition of two adjacent, internal letters [e.g., reading *trail* (frequency = 18) as "trial" (frequency = 58)]. It is plausible, therefore, to hypothesize that there is also a set of variables that influence whether a non-word migration error will be made, and that variation across item sets on such variables might account for differences in results.

### **THE PRESENT STUDY**

The aim of this multiple single-case study with three Englishspeaking LPDs was to replicate and extend previous research regarding the locus of impairment in LPD.

First, we aimed to replicate previous findings suggesting that LPD is not caused by a phonological output buffer deficit. We then sought to replicate the finding that the migration errors seen in LPD are not the result of lexical guessing due to an orthographic input lexicon deficit.

Following this, we aimed to address two findings that appear to be inconsistent with an orthographic-visual analysis deficit account of LPD. The first inconsistent finding is that EL, one of Kohnen et al.'s (2012) LPDs, did not make more migration errors on a same-different decision task than controls. The second finding that appears at odds with this account is that all three LPDs in Kohnen et al.'s (2012) study did not make more non-word migration responses (e.g., reading *pilf* as "plif") than controls. The present study therefore aimed to extend Kohnen et al.'s (2012) study by modifying the same-different decision task and the non-word reading task in an attempt to clarify the locus of impairment. Specifically, we extended Kohnen et al.'s (2012) work by (1) administering a sequential presentation variant of the same-different decision task, (2) including a consonant–string condition in the same-different decision task, and (3) manipulating the bigram frequency of the non-words presented in the reading aloud task.

A sequential variant of the same-different decision task was administered to eliminate a possible letter-by-letter matching strategy. That is, rather than presenting the words side by side, where a direct comparison between each word's letters can be made, we presented items one after the other. Under sequential presentation, we expected all three LPDs in the present study to be significantly poorer than controls at detecting when two migratable words are different. To provide a further test of the orthographic-visual analysis deficit account of LPD, we included a consonant–string condition in the task. If LPD is due to a letter position coding deficit at the orthographic-visual analysis stage of reading, then LPDs should be poorer than controls at identifying when two migratable items are different from one another, regardless of the lexicality of the items.

In the present study, we also manipulated the bigram frequency of the non-words in the reading aloud task. One plausible reason why Kohnen et al.'s (2012) LPDs did not make more non-word migration errors than controls when reading aloud non-words (e.g., reading *pilf* as "plif") is that there may be various factors that influence whether or not a non-word migration error will be made. Previous research has shown that the written frequency of a word's migration counterpart, relative to the item itself, influences whether or not a migration error will be made. For example, Friedmann and Gvion (2001) found that the most common migration error made by LPDs was the reading of a non-word (which by definition has a written frequency of 0) as a word (e.g., *coisun* read as "cousin"). The next most common migration error was the reading of a word as its higher frequency counterpart [e.g., *trail* (frequency = 18) read as "trial" (frequency = 58)]. Following these findings, it is plausible to hypothesize that the bigram frequency of the nonword migration counterpart, relative to the bigram frequency of the non-word itself, will influence whether or not a nonword migration error will be made. Our exploratory hypothesis was therefore that LPDs would be more likely to migrate a low bigram frequency non-word into its higher bigram frequency non-word counterpart [e.g., reading *plif* (BF = 180) as "pilf" (BF = 1251)].

### **MATERIALS AND METHODS**

Ethics approval for this project was granted by Macquarie University Human Research Ethics Committee. Participants and their parents gave verbal and written consent to their involvement in the study.

### **PARTICIPANTS**

Participants in this study were three children: LM, EL, and LL. LM, was a 9-year 8-month-old girl in her second semester of grade 4 when we first met her and was homeschooled by her mother1. EL was a participant in Kohnen et al.'s (2012) study and was recruited for the present study when he was 9 years 8 months old and about to commence grade 5 at a mainstream school. Our third participant, LL, was an 11-year 9-month-old girl who had commenced grade 7 at a mainstream school two weeks before we met her.

All three children were initially referred to us because their parents were concerned about their spelling ability. Their reading skills were reported by their parents to be within the average range for their age. Both LM and LL's hearing and vision were reported as normal. EL had long-sightedness and astigmatism, which were corrected for with glasses. He had also been diagnosed with pendular nystagmus (involuntary repetitive rhythmic movement of eyes from side to side). All three children had no diagnoses of developmental delay or difficulties [e.g., AD(H)D, SLI].

Each LPD's performance on the standardized tests used to assess for a phonological output buffer deficit was compared to the test's age-appropriate normative data. Each LPD's performance on the experimental tasks was compared to a control group of average readers without LPD. We recruited two different gradematched control groups. Six grade 4 controls were used as a control group for LM and EL (*M* age = 10 years 1 month, SD age = 2 months). Two grade 6 controls and three grade 7 controls were used as a control group for LL (*M* age = 12 years 3 months, SD age = 7 months).

### **PROCEDURE**

Participants were tested over multiple testing sessions at Macquarie University. Testing sessions went for between 90–150 min in length including breaks. All relevant property statistics for the experimental tasks were derived from N-Watch (Davis, 2005). All experimental reading aloud tasks and the visual lexical decision task were administered using flash cards. Unless otherwise specified, Crawford and Garthwaite's (2002) *t*-test was used to compare each LPD's task performance to controls, and Fisher's exact was used to compare each LPD's performance on one condition to another condition.

### **RESULTS**

### **TESTS DETERMINING ELIGIBILITY**

LM, EL and LL were identified as having LPD based on their scores on the Letter Position Test (LetPos: Kohnen et al., 2014). The Let-Pos is a reading aloud test consisting of 60 anagram words (30 anagram pairs, e.g., *slime – smile*), presented over two pages. There are three types of errors that can be made on this test: "migration errors" (reading a word as its migration partner, e.g., reading *slime* as "smile"), "word errors" (reading a word as any word other than its migration partner, e.g., reading *slime* as "slide"), and "other

<sup>1</sup>Homeschooling for LM followed a strict and regulated curriculum matched to mainstream education. The work completed by home-schooled students has to be documented and monitored regularly.

errors" (reading a word as a non-word, e.g., reading *slime* as "slome"). The normative data for the LetPos was collected in the final term of the school year. LPDs were selected on the basis that their LetPos performance was more than one standard deviation below the mean for "migration errors," and within one standard deviation of the mean for "word errors" and "other errors," when compared to the grade-appropriate normative data.

LPD participants were also selected to have no obvious reading problems, other than the reading of migratable words. Specifically, they were selected only if they had normal irregular word and nonword reading, as assessed by the Castles and Coltheart Reading Test (CC2: Castles et al., 2009). Both LM and EL were within the average range for their age (an *z*-score between –1 and +1) on both the irregular word and non-word reading components of the test. While LL was within the average range on the irregular word component of the CC2, she was below average on the nonword reading component of the test2. She was included in the study, however, because her non-word reading errors appeared to stem from an underlying problem with reading letters in their correct order. For example, LL made non-word migration errors such as reading *borp* as "brop."When these migration errors were removed from her score, her non-word reading was within the average range.

Control participants were selected to be average on the irregular word and non-word reading subtests of the CC2 and to be within one standard deviation of the mean on each component (migration, word and other errors) of the LetPos.

### **ASSESSING THE PHONOLOGICAL OUTPUT BUFFER**

A phonological output deficit should manifest itself in poor performance on tasks that require phoneme production and/or manipulation. To investigate whether LPD is caused by a phonological output buffer deficit, LM, EL and LL were assessed on phonological awareness, speed of lexical retrieval and verbal shortterm and working memory. If their migration errors are caused by a phonological output buffer deficit, they should be below average on these tasks compared to age-appropriate normative data.

Phonological awareness was assessed using the Segmenting Non-words and Phoneme Reversals subtests of the Comprehensive Test of Phonological Processing (CTOPP,Wagner et al., 1999). In the Segmenting Non-words subtest children are given a series of non-words, which they are asked to repeat, and then say one sound at a time (e.g., "dray, d–r–ay"). In the Phoneme Reversal subtest children are asked to first repeat a non-word, and then to reverse the sounds to make it sound like a real word (e.g., "nus, sun").

Speed of lexical retrieval was assessed using the Rapid Naming subtests of the CTOPP. LPDs were assessed on their ability to rapidly name letters, digits, objects and colors, which were each assessed separately. In these subtests, LPDs were asked to name 36 items presented on a single page as quickly as they could.

The Repetition of Nonsense Words subtest of the NEPSY (Korkman et al., 1998) and the Digit Span subtest of the Weschsler Intelligence Scale for Children Fourth Edition (WISC-IV; Wechsler, 2003) were used to assess verbal short-term and working memory. In the Repetition of Nonsense Words subtest, children are asked to repeat non-words (e.g., *bu-l*ε*ks-tıs*). The Digit Span subtest has two components – Forward Digit Span, and Backwards Digit Span. In the Forward Digit Span children are asked to repeat strings of digits in the same order as they heard them, and in the Backwards Digit Span subtest children have to repeat strings of digits in reverse order.

**Table 1** shows that all LPD participants were within (or even above) the average range (*z*-score between −1 and +1) on all nine measures of phonological processing. In addition LM, EL, and LL were asked to orally repeat the words after the experimenter for which they had previously made a migration error on in a reading aloud task. Each LPD performed this task without making a single migration error.


**Table 1 |** *Z* **scores on standardized tests used to assess for a phonological output deficit (average range is between –1 and +1).**

<sup>2</sup>Note that there are currently no normative data published for children who are LL's age (11 years 9 months). LL's performance on the CC2, as well as that of her control group, was therefore compared to the normative data of children between the ages 11 years and 11 years 5 months.

Taken together, these findings indicate that the migration errors made by the three LPDs in the present study cannot be attributed to a phonological output buffer deficit.

### **ASSESSING THE ORTHOGRAPHIC INPUT LEXICON**

To investigate whether LM, EL and LL have an orthographic input lexicon deficit we administered a reading aloud non-migratable, irregular words task. Irregular words were used to ensure that access to the orthographic input lexicon was obligatory for a correct response to be made. If LPDs have an orthographic input lexicon deficit, they should be poorer at this task than controls.

To explicitly test whether their excessive migration errors are the result of lexical guessing, we administered two tasks: a reading aloud migratable and substitution words task, and a visual lexical decision task. If LPDs' migration errors are the result of lexical guessing, they should make more substitution errors than controls on a reading aloud task (e.g., reading *track* as "trick"), as well as more substitution errors on a visual lexical decision task (e.g., accepting *esho* (derived from *echo*) as a word).

### *Reading non-migratable, irregular words*

Participants were asked to read aloud 87 non-migratable words which were selected to contain at least one letter-sound rule that was atypical (e.g., pearl, cousin) according to Regcelex (Baayen et al., 1995), a program used to compute the rule based pronunciation of a letter-string (Coltheart et al., 2001). Because we were interested in each LPD's lexical reading skills, errors that appeared to stem from a difficulty in ordering letters in words (e.g., reading *chalk* as "chlak") were removed from the error analysis. Both LM and EL made 12.64% errors on this task, which was not significantly different from their control group, who made 9.58% errors (SD = 2.48%; *t* = 1.14, *p* = 0.15 one-tailed). LL made 6.90% errors on this task, which was not significantly different from her control group who made 8.28% errors (SD = 2.36%; *t* = 0.53, *p* = 0.31 one-tailed).

Eighteen of the 87 experimental words were items that had already been administered in the irregular word reading component of the CC2. We therefore conducted an additional analysis including irregular words that were not part of the CC2 (*N* = 69). All three LPD's made as many errors as controls in this additional analysis (all *p* > 0.15 one-tailed).

This finding suggests that LM, EL and LL have as many entries in their orthographic input lexicon as controls, and that they have no difficulty in accessing these entries.

### *Reading aloud migratable and substitution words*

Participants read aloud 58 migratable words, which were created from 29 word pairs that were different via the transposition of two internal letters (e.g., *slime-smile*). Migratable words were intermixed with 30 substitution words created from 15 pairs of words that were different via the substitution of a single internal letter (e.g., *track-trick*). Substitution words were matched as closely as possible to migratable words on length (migratable: *M* = 5.07, SD = 0.53; substitution: *M* = 5.07, SD = 0.69), relative written frequency between a word and its partner (migratable: *M* = 27.51, SD = 36.83; substitution: *M* = 36.61, SD = 36.62), and the number of substitution neighbors (migration: *M* = 4.86, SD = 3.48; substitution: *M* = 4.90, SD = 3.18). The item pairs were presented over separate tasks such that participants did not read a word and its partner in the same task. These words were intermixed with 122 words, which were not used to address the research questions in the present study.

Three error types were analyzed: (1) migration errors, where a migratable word was read as its partner, (2) substitution errors, where a substitution word was read as its partner, and (3) "*N*" errors, which included substitution errors (e.g., reading *slime* as "slide"), addition errors (reading *slime* as "slimes"), and deletion errors (reading *slime* as "slim") made on all migratable and substitution words. Incorrect reading responses that were potentially due to sounding the word out rather than one of these three error types (e.g., reading *bread* as "breed") were not included in the analysis.

The results are outlined in **Table 2**. All three LPDs made more migration errors than controls (LM: *t* = 21.95, *p* < 0.001 onetailed; EL: *t* = 9.49; *p* < 0.001 one-tailed; LL: *t* = 4.81, *p* < 0.01 one-tailed). Because there was no variance in the number of substitution errors made by controls, a Fisher's exact test was used (instead of Crawford's *t*-tests) to compare LPDs' performance to their respective control groups. All three LPDs made as many substitution errors as controls (all *p* > 0.5 one-tailed). Both LM and EL made as many *N* errors as controls (both *t* = 1.11, *p* = 0.16 one-tailed). Because there was no variance in the number of *N* errors made by LL's control group, a Fisher's exact test was used instead of Crawford's *t*-test, which indicated that she made as many *N* errors on the task as controls (*z* = 0.71, *p* = 0.24 one-tailed).

The finding that LM, EL and LL's reading errors were selective to the migration of letters within words suggests that their LPD cannot be attributed to lexical guessing.


*Numbers in parentheses denote standard deviation of the mean for control groups.*

*\*\*p* < *0.01, \*\*\*p* < *0.001 compared to control group.*

### *Visual lexical decision*

A visual lexical decision task was also administered to determine whether migration errors made by LPDs were the result of lexical guessing. Forty non-migratable words formed the word condition in this task. Three non-word conditions were created by modifying the word items – a migratable non-word condition (*coisun* (derived from *cousin*); *N* = 16), a single-substitution nonword condition (*eamly* (derived from *early*), *N* = 12), and a double-substitution non-word condition (*provare* (derived from *private*); *N* = 12). Single and double substitution items were included because both have previously been used in research as a comparison condition for migratable items (e.g. Perea and Lupker, 2004; Perea and Fraga, 2006; Beyersmann et al., 2011, 2012, 2013).

Items in the migratable non-word condition were matched as closely as possible to items in the single- and doublesubstitution condition on bigram frequency (migration condition: *M* = 719.04, SD = 415.91; single-substitution condition: *M* = 578.54, SD = 336.08; double-substitution condition: *M* = 713.68, SD = 553.36), and the written frequency of the words that they were derived from (migratable *M* = 87.56, SD = 125.20; single-substitution: *M* = 96.89, SD = 116.29; double-substitution: *M* = 72.64, SD = 112.39). Words and non-words were intermixed with 112 additional items, which were not used to address the research questions in the present study. Items were presented over two separate tasks, such that a non-word and the word it was derived from were not presented in the same task.

So that we could be relatively certain that a "word" response to a non-word was due to the participant misreading the nonword as the word it was derived from, non-words in the migration condition and the double-substitution condition did not have a single substitution neighbor. Furthermore, the non-words in the single substitution condition did not have a single substitution neighbor other than the word that they were derived from. To further ensure that participants' "word" responses were due to their misreading of the non-word as its word partner, we removed non-words derived from words that participants did not know. We determined whether or not a participant knew a word based on their performance on the "word" condition of the visual lexical decision task, and their reading aloud of these words. If a participant could not read aloud the word *and* did not recognize the word in the visual lexical decision task, the word was defined as unknown, and hence its non-word counterpart was removed from their individual analysis. This comprised 5.00% of LM's data, 2.50% of EL's data, and 2.92% (SD = 3.68%) of their control group's data. For LL, 2.50% of her data was removed, and 1.00% (SD = 2.24%) of her control group's data was removed.

The results are outlined in **Table 3**. All three LPDs accepted more migratable non-words as words than controls (LM: *t* = 3.59, *p* = 0.01 one-tailed; EL: *t* = 2.59, *p* = 0.02 one-tailed; LL: *t* = 2.90, *p* = 0.02 one-tailed). Both EL and LL accepted as many single and double substitution non-words as words as controls (both *t* < 1.12, *p* > 0.16 one-tailed). LM, however, accepted more single and double substitution non-words as words than controls (single: *t* = 2.51, *p* = 0.03 one-tailed; double: *t* = 4.54, *p* = 0.003 one-tailed).

The finding that EL and LL's excessive errors on the visual lexical decision task were selective to the migration condition suggests that their migration errors are not the result of lexical guessing. In contrast, LM's excessive errors on the task were not selective to the migration condition – she also made more substitution errors on the task than controls. This finding suggests that a lexical guessing strategy may have been the cause of LM's migration errors on the visual lexical decision task.

### **ASSESSING THE ORTHOGRAPHIC-VISUAL ANALYSIS STAGE OF READING**

To investigate whether LPD is caused by an orthographic-visual analysis deficit, we administered a sequential same-different decision task and a reading aloud non-words task. If LPD is caused by an orthographic-visual analysis deficit, LPDs should make more migration errors than controls on tasks that tap prelexical processing (e.g., same-different decision) since orthographic-visual analysis is a prelexical process. Furthermore, if their migration errors are caused by an orthographic-visual analysis deficit, LPDs should make more migration errors than controls during lexical and non-lexical reading.

### *Sequential same-different decision*

The sequential same-different decision task consisted of 139 word pairs and 139 consonant–string pairs3, which were four or five letters in length. Half of the items were the same (e.g., *beard–beard*; *bfgsk–bfgsk*), and half were different (*beard–bread*; *bfgsk–bfsgk*). Half of the items in the different condition were different via the transposition of internal letters (e.g., *trial–trail*), and half were different via the substitution of a single letter (e.g., *chuck– check*). Items were included in both the same and the different condition (i.e., participants made responses to both *trial–trail* and *trial–trial*). Six versions of the task were created and presented over two sessions, such that participants only saw one version of the item (either in the same *or* in the different condition) in a single session. These 280 items were intermixed with an additional 280 items (half same, half different), which were not used to address the research questions in the present study.

Same-different decision trials were presented using DMDX software (Forster and Forster, 2003). A schematic of a single trial is outlined in **Figure 2**. The first item was both backwards masked and presented in a different case to the second item to ensure that participants could not match the items based on low-level perceptual overlap. Participants were instructed to press a button with their right hand if they thought the two items were the same, and to press a button with their left hand if they thought the two items were different. Participants were given eight practice trials before commencing the task. No performance-based feedback was given to participants at any stage during the task.

As LPDs have been found to have intact letter identification skills (Friedmann and Rahamim, 2007; Kohnen et al., 2012), the

<sup>3</sup>The task was designed to have 140 word pairs and 140 consonant–string pairs. However, one word pair in the same migration condition (e.g., *slime–slime*) and one consonant–string pair in the different migration condition (e.g., *dktlp–dltkp)* were removed from the analysis as they were not presented correctly.


**Table 3 | Percentage of migration errors, single-substitution (sub) errors and double-substitution (sub) errors on the visual lexical decision task.**

*Numbers in parentheses denote standard deviation of the mean for control groups.*

*\*p* < *0.05, \*\*p* < *0.01 compared to control group.*

substitution condition was used as an indication of baseline performance on the task. If LPD is due to an orthographic-visual analysis deficit, LPDs should be poorer than controls at detecting a difference between two migratable items (e.g., *slime–smile*), relative to the baseline condition (e.g., *tiger–timer*).

**Table 4** displays participants' accuracy on the different conditions (i.e., their ability to detect that two items are different). Participants' *d* scores based on their hits (correctly responding "different" to two different items e.g., *slime–smile*) and false alarms (incorrectly responding "different" to two same items e.g., *slime–slime*) on the migration and substitution condition are also included in **Table 4**.

All statistical analyses for the task were based on participants' accuracy on the different migration condition relative to their accuracy on the different substitution condition, using the Revised Standardized Difference Test (RSDT: Crawford and Garthwaite, 2005). All three LPDs were significantly poorer than controls at detecting that two migratable words were different relative to the substitution condition, however this only reached significance for EL and LL (EL: *t* = 4.68, *p* = 0.003 one-tailed; LL: *t* = 2.82, *p* = 0.02 one-tailed; LM: *t* = 1.74, *p* = 0.07). All three LPDs were not significantly poorer than controls at detecting that

two migratable consonant strings were different, relative to the substitution condition (all *t* < 1.10, *p* > 0.16).

The finding that all three LPDs were no poorer than controls at detecting a difference between two migratable consonant–strings seems inconsistent with an orthographic-visual analysis deficit account of LPD. If LPD is caused by an orthographic-visual analysis deficit, then LM, EL and LL should be poorer than controls at detecting a difference between two migratable items, regardless of their lexicality.

However, this result may have been due to participants not having enough time to process the entire consonant–string. Letters in words are thought to be processed in parallel as a single unit of information. In contrast, there is no higher-order representation for consonant strings, and therefore each letter needs to be processed serially as its own unit of information. The limited stimulus presentation time in the task (400 ms) may have therefore meant that children only had enough time to process the beginning letters of the items in the consonant–string condition. If only the beginning letters are processed, then a correct response to many of the items in the different migration condition would require intact letter identification skills, but not necessarily intact letter position coding skills. For example, if participants


**Table 4 | Percentage accuracy for the different migration (mig) and substitution (sub) conditions on the same-different decision task, and** *d*- **scores.**

*Numbers in parentheses denote standard deviation of the mean for control groups.*

are presented with the consonant–string pair *stlkd-skltd*, but they only have enough time to process the first three letters of the consonant string, *stl-skl*, participants need only detect that the letter identities *t* and *k* are different from one another to make a correct response. If participants were only processing the beginning letters of the consonant–string pairs, then the finding that LPDs did not make more errors on the migration condition is not surprising, as LPDs have been found to have intact letter identification abilities (Friedmann and Rahamim, 2007; Kohnen et al., 2012).

One way to investigate whether or not participants had enough time to process all letters in the consonant–string condition is to see whether there is a position effect. If participants did not have enough time to process the entire consonant–string, we should find that they are better at detecting a difference between two consonant strings if the letters are different at the beginning of the pair, than if the letters are different at the end of the pair.

In a *post hoc* analysis, we explored whether there was merit in this alternative hypothesis. Items that differed via the substitution of a single letter in the first internal position of the word (e.g., *nkdcg-njdcg*) were classified as having a "beginning difference," and items that differed via the substitution of a single letter in the final internal position of the word (e.g., *fkmzd-fkmtd*) were classified as having an"end difference." The substitution condition rather than the migration condition was used because many of the different migratable items had both a beginning and end difference (e.g., *xtkjd-xjktd*).

All participants were combined to form one group for this item analysis. We used a Wilcoxon matched pairs test to compare the proportion correct on the two groups of items. Participants identified significantly more beginning differences (74.60%) in the consonant–string condition than end differences (58.574%; *z* = 2.51, *p* = 0.006 one-tailed). In contrast, participants identified as many beginning differences (93.57%) in the word condition as end differences (94.76%; *z* = 0.51, *p* = 0.304 one-tailed).

Following this finding, we decided to administer a samedifferent decision task with orthographically legal non-words (e.g., *scirm-scrim*). While the letters in legal non-words are not thought to be processed in parallel like words, the letters can be mapped onto a higher-order representation. For example, the consecutive letters *i* and *r* in the non-word *scirm* can be mapped onto the digraph *ir*. That is, the letters in legal non-words can be "chunked" (*s*, *c*, *ir* and *m*) and, for this reason, are likely to be processed faster than consonant–strings which cannot be chunked.

The non-word same-different decision task consisted of 96 non-word pairs. Forty-eight of the pairs were in the same condition, and 48 were in the different condition. Half of the items in the different condition were different via the transposition of two internal letters (e.g., *scirm-scrim*), and half were different via the substitution of a single letter (e.g., *froy-floy*). The same condition consisted of 48 non-word pairs. In contrast to the word and consonant string same-different decision task, non-words in the same condition were a new set of items, not derived from the items in the different condition (i.e., participants did not see *scirm-scrim* and *scirm-scirm*). Non-words were presented to participants during a single task, and under the same presentation conditions as described for the words and consonant–strings task.

By the time we assessed LM and EL on this task they were in the second semester of grade 5. Therefore, we compared their performance on this task to a new control group of 4 children in their second semester of grade 5.

**Table 4** displays participants' accuracy on the different conditions. Participants' *d* scores based on their hits (correctly responding "different" to two different items e.g., *scirm-scirm*) and false alarms (incorrectly responding "different" to two same items e.g., *garp-garp*) on each condition are also included in **Table 4**. False alarms were calculated from participants' performance on all 48 items in the same condition.

EL was significantly poorer than controls at detecting when two migratable non-words were different relative to the substitution condition (EL: *t* = 4.47, *p* = 0.01 one-tailed). LM and LL, however, did not show this effect (both *t* < 1.64, *p* > 0.10). We assessed for a position effect in the same way as we did for the consonant–string and word items. Participants correctly identified as many beginning differences (95.14%) as end differences (91.67%; *z* = 0.54, *p* = 0.30 one-tailed), indicating that they had enough time to process the entire letter string.

The finding that all three LPDs made more word migration errors than controls on a sequential same-different decision task is consistent with an orthographic-visual analysis deficit account of LPD, as is the finding that EL made more non-word migration errors on the task. The finding that LM and LL did not make more non-word migration errors on the sequential same-different decision task is, however, inconsistent with an orthographic-visual analysis deficit and will be followed up in the discussion.

*Reading aloud non-words.* Non-words were created from 25 non-word pairs which were migratable via the transposition of two internal adjacent letters (e.g., *torm–trom*). Pairs were selected to have a significant difference in bigram frequency between the two non-words (lower bigram frequency counterpart: *M* = 789.56 SD = 594.36; higher bigram frequency counterpart: *M* = 1389.80, SD = 841.41). Nonwords were selected to match their migration partner as closely as possible on substitution *N* (lower bigram frequency counterpart: *M* = 2.44, SD = 2.38; higher bigram frequency counterpart: *M* = 3.00, SD = 2.65). Non-words were randomized and intermixed with 25 additional monosyllabic non-words that were not used to answer the research questions in this paper. Three versions of the task were created such that participants did not see a non-word and its migration partner in the same task. Participants were told that all items were nonwords before commencing the task.

The results from the nonword reading task are presented in **Table 5**. Both LM and LL made significantly more nonword migration errors on the task than controls (LM: *t* = 6.46, *p* < 0.001 one-tailed; LL: *t* = 2.96, *p* = 0.02 one-tailed) and made as many non-migration related errors as controls (LM: *t* = 0.04, *p* = 0.48 one-tailed; LL: *t* = 1.82, *p* = 0.07 onetailed). EL did not make more nonword migration errors than controls (*t* = 0.18, *p* = 0.43 one-tailed) and made more nonmigration related errors than controls (*t* = 2.95, *p* = 0.02 one-tailed).

Following the finding that EL showed the opposite effect to that displayed by LM and LL (i.e., as many migration errors as controls, but more non-migration related errors than controls), we decided to inspect EL's non-word reading data more closely. We found that 23% of ELs non-migration errors were what we have termed, "over-sequential" errors. An "over-sequential" error was defined as an error that appeared likely to have occurred as a result of sounding out each letter in the nonword in isolation, and then blending these sounds together to form a spoken response. For example, EL read *kerm* as /k /E/ /r/ /m/. That is, instead of reading the letters *e* and *r* together to correctly form the sound /@r/, he sounded out these two letters separately. For two of these errors, EL first misread

**Table 5 | Percentage of migration errors (mig error) and non-migration related errors (non-mig error) on reading aloud non-words.**


*Numbers in parentheses denote standard deviation of the mean for control groups.*

*\*p* < *0.05, \*\*\*p* < *0.001 compared to control group.*

the non-word as its migration partner, and then self-corrected with an over-sequential error. Furthermore, for all but one of EL's over-sequential errors, EL demonstrated that he knew the sound associated with the multi-letter grapheme he oversequentialized by correctly producing it on at least two other items within the list. EL's control group did not make a single "over-sequential" error on this task. This finding suggests that EL's limited migration errors on this task (compared to the other LPDs in the study) may have been the result of him sounding out each letter in isolation of the other letters within the word.

The findings from the reading aloud non-words task suggest that LPD is most likely caused by an orthographic-visual analysis deficit. However, there appears to be variation in task performance among the three LPDs in the present study.

*Item variables influencing non-word migration errors.* In the present study, we also explored the possibility that there may be specific item variables that influence whether or not LPDs will make non-word migration errors. Specifically, we explored whether the bigram frequency of the non-word migration counterpart relative to the bigram frequency of the non-word itself, influenced whether or not a non-word migration error will be made.

We investigated the influence of bigramfrequency on non-word reading by analyzing the migration errors made by LM and LL. Specifically, we compared the number of migration errors made on the lower bigram frequency partner (*N* = 25) to the number of migration errors made on the higher bigram frequency partner (*N* = 25). The other participants' results (EL and both control groups) were not investigated in this additional analysis as they made very few migration errors on the task. Both LM and LL read as many non-words as their higher bigram frequency migration partner (LM: 40%, LL: 8%) as they did non-words as their lower bigram frequency partner (LM: 40%; LL: 24%; both Fisher's exact *p* > 0.12 one-tailed).

While bigram frequency was not found to mediate migration errors on this task, a *post hoc* analysis revealed that LM and LL's migration errors were influenced by the complexity of the graphemes that made up each non-word. LM and LL were more likely to migrate a two-letter grapheme into two single-letter graphemes (e.g., reading *kerm* as "k**re**m") than to migrate two single-letter graphemes into a two-letter grapheme (e.g., reading *krem* as "k**er**m"). Both LM and LL were found to migrate significantly more two-letter graphemes into single letter graphemes (LM: 66.67%, LL: 33.33%) than two single-letter graphemes into a two letter grapheme (LM: 11.11%, LL: 0%, both Fisher's exact *p* < 0.02 two-tailed).

An examination of the order of item presentation was conducted to investigate whether the errors where a two-letter grapheme migrated into two single-letter graphemes were due to participants being primed by the two single-letter graphemes. That is, we examined whether participants saw the two single letter graphemes (e.g., *frempt*) prior to making an error where they migrated a two-letter grapheme into these two-single letters (e.g., reading *kerm* as *krem*). Of the 18 errors made by LM and LL where a two-letter grapheme was migrated into two single letters (e.g., where *kerm* was read as "k**re**m"), only three errors were made directly after having seen a non-word that comprised the same two single letters (e.g., *frempt*).

### **DISCUSSION**

This study investigated the locus of impairment in three Englishspeaking children with developmental LPD. Previous research has used a cognitive model of reading aloud to identify three alternative processing components that may be the cause of LPD: the phonological output buffer, the orthographic input lexicon, and orthographic-visual analysis. First, we aimed to replicate previous findings that have ruled out a phonological output buffer deficit and an orthographic input lexicon deficit account of LPD.We then went on to extend previous findings that suggest LPD is caused by an orthographic-visual analysis deficit.

### **ASSESSING THE PHONOLOGICAL OUTPUT BUFFER**

It is plausible to assume that the excessive migration errors made by LPDs are due to the phonemes in the phonological output buffer being swapped around before the word is pronounced. Together with previous studies, our findings strongly refute this hypothesis (Friedmann and Rahamim, 2007; Kohnen et al., 2012; see also Collis et al., 2013). All three LPDs in the present study were either within or above the average range on various standardized tests that draw heavily on a functioning phonological output buffer to be completed successfully. Furthermore, LPDs were asked to repeat a subset of the migratable words that they had previously made a migration error on in a reading aloud task. Each LPD performed this task without making a single migration error, indicating that their reading aloud errors were not caused by an inability to produce the word's phonemes in the correct order.

In recent years, various researchers have suggested that underlying dyslexia is a phonological processing deficit (Stanovich, 1988; Snowling,1998,2001; Ramus,2003). The findingsfrom the present study indicate that, while some children with reading difficulties have phonological processing difficulties, other children's reading difficulties are likely to reflect an alternative processing deficit. For example, surface dyslexia is most likely caused by an orthographic processing deficit (e.g., Castles and Coltheart, 1993, 1996; Broom and Doctor, 1995; Temple, 1997), attentional dyslexia is most likely caused by a letter-to-word binding deficit (Rayner et al., 1989; Friedmann et al., 2010b), and LPD is most likely caused by a letter position coding deficit (for more discussion of heterogeneity within developmental dyslexia, see Castles et al., 2010; Zoccolotti and Friedmann, 2010; McArthur et al., 2013).

### **ASSESSING THE ORTHOGRAPHIC INPUT LEXICON**

It is also plausible to assume that the migration errors made by LM, EL and LL are the result of lexical guessing due to an impoverished orthographic input lexicon. The finding that all three LPDs read aloud non-migratable irregular words as well as controls indicates that this is not the case. Furthermore, EL and LL made more migration errors than controls during a reading aloud task and a visual lexical decision task but did not make more substitution and *N* errors than controls. These findings indicate that EL and LL's errors on these tasks were specific to the migration of letters within the word and were therefore not due to lexical guessing.

In contrast to EL and LL, LM made more migration errors than controls on the visual lexical decision task *and* more substitution errors on the task. This finding suggests that perhaps LM's tendency to make excessive migration errors is the result of lexical guessing. While this finding does not fall in line with our predictions, we believe that LM's lexical guessing was confined to this task, and that her broader tendency to make more migration errors than her peers cannot be attributed to a lexical guessing strategy. If LM's excessive migration errors are the result of lexical guessing, then she should have been found to make more errors that are visually similar to the target word when reading aloud (e.g., reading *slime* as "slide" or "slim") than controls. This was not the case. Like EL and LL, LM made more migration errors than controls when reading aloud, but the same amount of substitution and *N* errors.

## **ASSESSING THE ORTHOGRAPHIC-VISUAL ANALYSIS STAGE OF READING**

The first aim of the present study was to replicate the finding that LPD cannot be attributed to a phonological output buffer deficit or an orthographic input lexicon deficit. Our findings converge with previous research that has ruled out these two possible loci as the source of migration errors seen in LPD (Friedmann and Rahamim, 2007; Kohnen et al., 2012). Having addressed our first aim, we now turn to a discussion of our second aim: to extend the investigation of a possible orthographic-visual analysis deficit account of LPD.

The present study extendedKohnen et al.'s (2012)study in three ways: (1) administering a sequential same-different decision task, (2) administering consonant–strings and orthographically legal non-words in the sequential same-different decision task, and (3) manipulating bigram frequency in a non-word reading task. We hoped that making these changes would provide us with tasks that were more sensitive to an orthographic-visual analysis deficit, and hence enable us to draw stronger conclusions regarding the locus of impairment in English LPD.

In the present study, we administered a sequential samedifferent decision task to ensure that participants would be unable to adopt a strategy whereby they compare each letter in the pair to one another. We found that EL and LL made significantly more word migration errors on the task than controls. LM also showed this trend, however it did not reach significance. One key difference between the present study and Kohnen et al.'s (2012) study was EL's performance on the same-different decision task. While EL did not make more migration errors on Kohnen et al.'s (2012) simultaneous same-different decision task, he made significantly more migration errors on a sequential variant of the task in the present study. One interpretation of this finding is that EL was adopting a letter-by-letter matching strategy duringKohnen et al.'s (2012) simultaneous same-different decision task. When he was unable to adopt this strategy during the present study, due to the sequential presentation of words, he made significantly more migration errors than controls.

An alternative interpretation of EL's excessive migration errors on the same-different matching task in the present study is that a sequential variant of the task encourages participants to convert the word into a phonological form due to the limited presentation time of the items. It might therefore be that EL made excessive migration errors on the sequential task because he compared the words in each pair based on phonologicalform, whereas inKohnen et al.'s (2012) simultaneous task, words were compared based on their orthographic form. We believe this alternative hypothesis to be unlikely for two reasons. Firstly, a wealth of research has shown that responses made on a same-different decision task are based on prelexical orthographic representations rather than phonological representations (e.g., Besner et al., 1984; Kinoshita and Norris, 2009). Secondly, EL was found to be within (or above) the average range on tests that assess phonological processing. It is therefore highly unlikely that EL's excessive migration errors on the sequential same-different decision task could be reflecting a difficulty in comparing phonological forms.

A consonant–string condition in the same-different decision task was included in the present study under the assumption that a letter position coding deficit should manifest itself in responses to all letter-strings, regardless of lexicality. Contrary to our prediction, LPDs did not make more migration errors than controls on the consonant–string condition. We believe that this finding was due to the different mechanisms underlying the processing of letters in words and in consonant–strings. While letters in words are thought to be processed in parallel as a single unit, each letter in a consonant–string needs to be processed serially as a single unit. This means that letters in consonant–strings are likely to take longer to process than letters in words. The *post hoc* finding that participants were significantly better at identifying a difference between two consonant strings when the difference occurred toward the beginning of the consonant pair (*fktzm-fltzm*) than when the difference occurred toward the end of the consonant pair (*fktzm-fktlm*) suggests that 400ms was not enough time for participants to process the entire consonant– string. For this reason, we believe that participants' performance on the consonant–string condition cannot be taken as evidence for or against an orthographic-visual analysis deficit account of LPD.

Following this finding, we conducted a sequential samedifferent decision task with orthographically legal non-words. We found that while EL made significantly more migration errors than controls on this task, LM and LL did not. The finding that EL made more word *and* non-word migration errors on a same-different decision task strongly suggests that EL's excessive migration errors are caused by an orthographic-visual analysis deficit. In contrast, the finding that LM and LL made more migration errors on the word condition, but not on the non-word condition is not predicted by an orthographic-visual analysis deficit account of LPD. Rather, LM and LL should have been found to make more migration errors on a sequential same-different decision task, regardless of the lexicality of the items. However, LM and LL's data are still most consistent overall with an orthographic-visual analysis deficit. Further investigations may need to focus on the interaction between lexicality effects and orthographic-visual analysis deficits in LPD.

We also administered a non-word reading task in the present study. If LPD is caused by an orthographic-visual analysis deficit, we should find that LPDs not only make more word migration errors (e.g., reading *slime* as "smile") than controls, but also more

non-word migration errors (e.g., reading *pilf* as "plif"), as a deficit at the orthographic-visual analysis stage of reading should impede both lexical and non-lexical reading. In the present study, we found that LM and LL made more non-word migration errors (e.g., reading *pilf* as "plif") than controls. This finding is in contrast with Kohnen et al.'s (2012) finding that all three LPDs made as many non-word migration errors as controls. Interestingly, the one LPD in the present study who did not make excessive non-word migration errors (EL) was one of the three LPDs inKohnen et al.'s (2012) study who did not make excesive non-word migration errors when reading aloud. This finding is consistent with research in Hebrew that has found that while some LPDs make non-word migration errors, others do not (Friedmann and Rahamim, 2007). EL's oversequential errors (where each letter was sounded out in isolation and then blended together to form a response) in the present study suggest that individual differences in strategy use might be one predictor of whether or not LPDs will make non-word migration errors.

Contrary to our exploratory hypothesis, we found non-word bigram frequency to have no influence over whether or not a migration error was made. That is, LM and LL were no more likely to read a non-word as its higher bigram frequency partner than they were to read a non-word as its lower bigram frequency partner. The majority of migration errors made by LPDs occur when two adjacent letters in the middle of a word can migrate to form a new word. Considering it is the internal letters of the nonword that are most prone to migration, it is perhaps not surprising that the bigram frequency of the entire letter-string (external letters included) did not influence whether or not a migration error occurred. Instead, it may be other factors specific to the letters that are most susceptible to migration that influence whether or not a migration error will be made.

This suggestion was supported by the *post hoc* finding that the complexity of the non-word's internal grapheme/s influenced whether or not a migration error was made. We found that LM and LL were more likely to swap the letters in a twoletter grapheme around to form two single-letter graphemes, than to swap two single letters around to form a two-letter grapheme (i.e., *kerm* was read as "k**re**m" more than *krem* read as "k**er**m")4. One plausible explanation for this finding is that children are likely to be introduced to the sounds that the letters of the alphabet make (single-letter graphemes) before they are introduced to the sounds that two letters of the alphabet make together (two-letter graphemes). What this finding might therefore reflect is an age of acquisition effect. When the non-lexical route is provided with ambiguous letter position information, the default may be to resort to the letter-sounds that were first learnt. Future studies may seek to further our *post hoc* finding by directly testing the hypothesis that some graphemes may be more susceptible to migration than others.

<sup>4</sup>Note that following this finding we also analysed the influence of internal bigram frequency on migration errors using Solso and Juel's (1980) bigram frequency count database. That is, we analyzed whether or not LM and LL were more likely to migrate the lower frequency bigram *li* in the non-word *plim*into the higher frequency bigram *il* (resulting in the misreading of the non-word as "p**il**m"). We found no influence of internal bigram frequency on LM and LL's migration errors.

### **CONCLUSION**

The aim of this multiple single case study was to both replicate and extend previous findings regarding the locus of impairment in English LPD. Our findings converge with previous research by strongly suggesting that LPD cannot be attributed to a phonological output buffer or orthographic input lexicon deficit. Rather, our results suggest that LPD is most likely caused by a deficit specific to the coding of letter position at the orthographic-visual analysis stage of reading.

In line with previous studies, however, there was some variability in performance amongst the three children on the tasks designed to explicitly assess for an orthographic-visual analysis deficit. One thing that is becoming increasingly clear as research on LPD progresses is that localizing the source of the migration errors seen in LPD is no easy feat. While identifying what *does not cause* migration errors (i.e., a phonological output or orthographic input lexicon deficit) is relatively straightforward, identifying what *causes* migration errors is not as clear-cut. The findings from the present study suggest that variations in the manifestation of an orthographic-visual analysis deficit may be, at least in part, due to individual differences in strategy use. Therefore, to maximize the potential of localizing the deficit underpinning LPD, future research needs to ensure that the tasks used either eliminate or greatly reduce the opportunity for compensatory strategies to be adopted.

Finally, the finding that the three children in the present study were found to have great difficulty in reading migratable words, in the absence of any other obvious reading or spoken language difficulty, attests to the heterogeneity of dyslexia and its underlying causes. Our findings strongly suggest that not all children with reading difficulties have an impairment in phonological processing. Rather, our findings join a growing body of research in advocating the need to map this heterogeneity in developmental dyslexia, and to develop diagnostic tools that assess the variety of its underlying causes.

### **ACKNOWLEDGMENTS**

The authors thank LM, EL and LL, and their families for their participation in this study. They also thank Kristina Barisic, Danielle Colenbrander, and Erin Banales for their help with recruiting control participants for the study. They also thank Professor John Crawford for his advice regarding statistical analyses. This study was supported by a Macquarie University Research Excellence Scholarship (MQRES) to the first and an Australian Research Council Discovery Project grant (DP110103822) to the second author.

### **REFERENCES**

Baayen, R. H., Piepenbrock, R., and Gulikers, L. (1995). *The CELEX Lexical Database (CD-ROM*). Philadelphia, PA: Linguistic Data Consortium.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2014; accepted: 09 May 2014; published online: 03 June 2014. Citation: Kezilas Y, Kohnen S, McKague M and Castles A (2014) The locus of impairment in English developmental letter position dyslexia. Front. Hum. Neurosci. 8:356. doi: 10.3389/fnhum.2014.00356*

*This article was submitted to the journal Frontiers in Human Neuroscience. Copyright © 2014 Kezilas, Kohnen, McKague and Castles. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is*

*permitted which does not comply with these terms.*

## Dissociations between developmental dyslexias and attention deficits

## *Limor Lukov , Naama Friedmann\*, Lilach Shalev , Lilach Khentov-Kraus , Nir Shalev , Rakefet Lorber and Revital Guggenheim*

*School of Education and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel1*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

#### *Reviewed by:*

*Max Coltheart, Macquarie University, Australia Roberta Daini, Università degli Studi di Milano - Bicocca, Italy*

#### *\*Correspondence:*

*Naama Friedmann, Language and Brain Lab, School of Education and Sagol School of Neuroscience, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel e-mail: naamafr@post.tau.ac.il*

We examine whether attention deficits underlie developmental dyslexia, or certain types of dyslexia, by presenting double dissociations between the two. We took into account the existence of distinct types of dyslexia and of attention deficits, and focused on dyslexias that may be thought to have an attentional basis: letter position dyslexia (LPD), in which letters migrate within words, attentional dyslexia (AD), in which letters migrate between words, neglect dyslexia, in which letters on one side of the word are omitted or substituted, and surface dyslexia, in which words are read via the sublexical route. We tested 110 children and adults with developmental dyslexia and/or attention deficits, using extensive batteries of reading and attention. For each participant, the existence of dyslexia and the dyslexia type were tested using reading tests that included stimuli sensitive to the various dyslexia types. Attention deficit and its type was established through attention tasks assessing sustained, selective, orienting, and executive attention functioning. Using this procedure, we identified 55 participants who showed a double dissociation between reading and attention: 28 had dyslexia with normal attention and 27 had attention deficits with normal reading. Importantly, each dyslexia with suspected attentional basis dissociated from attention: we found 21 individuals with LPD, 13 AD, 2 neglect dyslexia, and 12 surface dyslexia without attention deficits. Other dyslexia types (vowel dyslexia, phonological dyslexia, visual dyslexia) also dissociated from attention deficits. Examination of 55 additional individuals with both a specific dyslexia and a certain attention deficit found no attention function that was consistently linked with any dyslexia type. Specifically, LPD and AD dissociated from selective attention, neglect dyslexia dissociated from orienting, and surface dyslexia dissociated from sustained and executive attention. These results indicate that visuospatial attention deficits do not underlie these dyslexias.

**Keywords: developmental dyslexia, attention, letter position dyslexia, attentional dyslexia, dissociation, neglect dyslexia, surface dyslexia**

### **INTRODUCTION**

One of the paths that the research of developmental dyslexia takes is the quest for a cognitive underlying source for developmental dyslexia. In this research we examine whether attention deficits are a source for developmental dyslexia, by searching for dissociations between the two. In our investigation we applied a neuropsychological perspective that treats both reading and attention as multifaceted constructs and as a result differentiates between types of dyslexia and between types of attention difficulties. Namely—beyond examining whether double dissociations can be found between developmental dyslexia and attention deficits, we ask more specific questions: we take specific dyslexias, analyze their possible relations to specific attention functions, and ask whether they can be dissociated from deficits in the relevant attention functions.

Dyslexia is a reading impairment that can result from brain damage (acquired dyslexia) or be present already before reading acquisition (developmental dyslexia). More than 10 types of developmental dyslexia have been identified, each resulting from deficits in different components of the reading process, and each having different characteristics (Marshall, 1984; Castles and Coltheart, 1993; Temple, 1997; Castles et al., 1999, 2006; Jones et al., 2011; Coltheart and Kohnen, 2012; Friedmann and Haddad-Hanna, 2014). Similarly, the neuroscience literature treats attention as a multifaceted system composed of several different attention networks (Posner and Petersen, 1990; Parasuraman, 2000; Tsal et al., 2005; Petersen and Posner, 2012). Tsal et al. (2005) describe four attentional subsystems (or functions) that are independent to some degree and can be localized in different anatomical loci. Therefore, in the current research we wish to explore the nature of the relation between specific types of

<sup>1</sup>This article is dedicated to the precious memory of Limor Lukov, our much-loved and appreciated student and teacher, an avid proponent of the dissociability of dyslexias from attention disorders and from phonological disorders. The article is mainly based on Limor's PhD research.

dyslexia and specific types of attention deficits and to learn about their shared and/or separate bases.

In what follows, we briefly describe the process of normal reading that we assume and the different types of dyslexia that stem from deficits in its different components, then discuss the different attention functions and different types of attention deficits that stem from deficits in its different components, and then discuss the relation between specific types of dyslexia and specific attention deficits.

### **THE PROCESS OF SINGLE WORD READING AND THE VARIOUS DYSLEXIA TYPES**

According to the dual-route model for single word reading (Patterson et al., 1985; Ellis and Young, 1996; Coltheart et al., 2001; Jackson and Coltheart, 2001; Castles et al., 2006; Coltheart and Kohnen, 2012, and others, see **Figure 1**), the early stage of reading is responsible for orthographic-visual analysis, including the identification of the abstract identity of letters in the word, the encoding of the relative position of letters within the word, and binding of letters to a word. The output of these components is held in a graphemic input buffer until it is processed in the next stages.

Each of the functions of the orthographic-visual analyzer is susceptible to a selective deficit, causing a different type of dyslexia, with different pattern of errors and effects on reading. A deficit in letter identification results in visual dyslexia, letter identification dyslexia, or letter agnosia (Nielsen, 1937; Marshall and Newcombe, 1973; Lambon Ralph and Ellis, 1997; Cuetos and Ellis, 1999; Brunsdon et al., 2006; Friedmann et al., 2012); a deficit in the encoding of letter position within words results in letter position dyslexia, characterized by letter migrations within the word (Friedmann and Gvion, 2001, 2005; Friedmann and Rahamim, 2007, 2014; Friedmann et al., 2010a; Friedmann and Haddad-Hanna, 2012, 2014; Kohnen et al., 2012).

A deficit in letter-to-word binding gives rise to attentional dyslexia, in which letters migrate between words (Shallice and

Warrington, 1977; Price and Humphreys, 1993; Saffran and Coslett, 1996; Hall et al., 2001; Humphreys and Mayall, 2001; Davis and Coltheart, 2002; Friedmann et al., 2010b). A further type of dyslexia that results from a deficit at the visual analysis stage, neglect dyslexia at the word level, is characterized by neglect of one side of the word, resulting in omissions, substitutions, or additions of letters on one of the sides of the word, typically on the left side (Bisiach et al., 1986; Ellis et al., 1987, 1993; Caramazza and Hillis, 1990; Cubelli et al., 1991; Haywood and Coltheart, 2001; Arduino et al., 2002, 2003; Vallar et al., 2010).

From the orthographic-visual analysis stage, the information flows in two routes: a lexical route and a sublexical route. The lexical route starts with an orthographic input lexicon, which stores the orthographic form of words the reader is acquainted with. The information that arrives from the orthographic-visual analyzer activates an entry of a word in the orthographic input lexicon. This entry, in turn, activates an entry in the phonological output lexicon, where information about the phonology of the word is stored, including consonants, vowels, stress position, and number of syllables. This phonological information then activates the phonological output buffer, which constructs the phonological representation from the consonants, vowels, and their order, and holds the phonological information until the word is spoken. The lexical route is the fast and accurate route for reading aloud. Another branch of the lexical route arrives from the orthographic input lexicon to the semantic-conceptual system, where the information about the meaning of the written word is stored.

The other route, the sublexical route, allows the reading of unfamiliar words, by converting graphemes into phonemes. This route may cause regularization in reading irregular words (such as reading *love* to rhyme with cove and *listen* with a pronounced "t"). The correct reading aloud of such irregular words requires reading through the lexical route. Recent studies of dyslexia teach us that the sublexical route converts consonants and vowels separately (Khentov-Kraus and Friedmann, 2011), and converts graphemes with sensitivity to phonological features (Gvion and Friedmann, 2010).

Again, different types of dyslexia result from deficits in various loci in these two routes. A deficit in the lexical route results in surface dyslexia (Marshall and Newcombe, 1973; Newcombe and Marshall, 1981, 1984, 1985; Coltheart et al., 1983; Coltheart and Funnell, 1987; Howard and Franklin, 1987; Castles and Coltheart, 1993, 1996; Weekes and Coltheart, 1996; Ellis et al., 2000; Judica et al., 2002; Castles et al., 2006; Friedmann and Lukov, 2008). Because readers with surface dyslexia cannot use the lexical route to read aloud, they read via grapheme-to-phoneme conversion. As a result, their reading is slower (Zoccolotti et al., 1999), and, in the case of irregular words and words for which grapheme-tophoneme conversion is ambiguous, also inaccurate.

An impairment in the sublexical route gives rise to phonological dyslexia, in which readers can read only via the lexical route, so they are only able to read correctly words that are already in their orthographic input lexicon, whereas they experience great difficulty in reading aloud nonwords and new words (Temple and Marshall, 1983; Glosser and Friedman, 1990; Coltheart, 1996; Friedman, 1996; Southwood and Chatterjee, 1999, 2001). This dyslexia can result from a deficit in graphemeto-phoneme conversion or from a deficit in the phonological output buffer (Guggenheim and Friedmann, 2014). Given the special nature of the sublexical route described above, specific types of dyslexia can result from a selective deficit in reading vowel letters (Khentov-Kraus and Friedmann, 2011; Friedmann and Haddad-Hanna, 2014), or a selective deficit in some phonological features like voicing (Gvion and Friedmann, 2010).

A deficit to both the lexical and the sub-lexical reading routes results in *deep dyslexia*, which causes semantic errors in reading (reading *smile* as "laugh," and *swam* as "swimming"), and inability to read nonwords and function words.

Thus, various types of dyslexia exist, with different patterns of errors in reading, which result from damage to different components of the reading process. Most of the subtypes of dyslexia were initially identified only in acquired dyslexia. In recent years we see more and more studies that provide robust evidence for the existence of subtypes of developmental dyslexia, which show striking similarity to subtypes of acquired dyslexia. This has been reported for developmental *surface dyslexia* (Broom and Doctor, 1995a; Temple, 1997; Masterson, 2000; Castles et al., 2006; Friedmann and Lukov, 2008), developmental *phonological dyslexia* (Broom and Doctor, 1995b; Temple, 1997; Guggenheim and Friedmann, 2014), developmental *deep dyslexia* (Stuart and Howard, 1995; Friedmann and Haddad-Hanna, 2014), developmental *letter position dyslexia* (Friedmann and Rahamim, 2007, 2014; Kohnen et al., 2012; Friedmann and Haddad-Hanna, 2014), developmental *visual dyslexia* (McCloskey and Rapp, 2000), developmental *attentional dyslexia* (Rayner et al., 1989; Shvimer et al., 2009; Friedmann et al., 2010b), developmental vowel dyslexia (Khentov-Kraus and Friedmann, 2011), and developmental *neglect dyslexia* (Friedmann and Gvion, 2002; Friedmann and Nachman-Katz, 2004).

### **ATTENTION FUNCTIONING**

Within the neuropsychological perspective, attention is also treated as a multifaceted construct. In the present study we adopted the model of four functions of attention proposed by Tsal et al. (2005). This model is derived from Posner and Petersen's (1990) influential theory of attention networks. The four-functions-of-attention model refers to four distinct functions within the attention regime: (a) sustained attention the ability to allocate attentional resources to a non-attractive task over time while maintaining a constant level of performance; (b) selective (spatial) attention—the ability to focus attention on a relevant target while ignoring adjacent distracters; (c) orienting of attention—the ability to direct attention over the visual or auditory field according to sensory input, and to disengage and reorient efficiently; (d) executive attention - the ability to resolve conflicts of information and/or responses.

In a study that compared the attention functioning of children with and without ADHD (Attention Deficit Hyperactivity Disorder), Tsal et al. (2005) reported that sustained attention deficits were the most frequent deficit, which characterized many of the participants in the ADHD sample, whereas each of the deficits in selective, orienting and executive attention characterized approximately half of participants in the ADHD sample. Importantly, Tsal et al.'s study showed that ADHD can entail deficits in any single (or combination of) attention function/s. Thus, different children with ADHD can have divergent clusters of attention deficits.

### **ON THE RELATION BETWEEN DYSLEXIA AND ATTENTION DEFICITS**

Attention disorders and reading disorders are often reported to co-occur (e.g., August and Garfinkel, 1990; Semrud-Clikeman et al., 1992; Snider et al., 2000; Willcutt and Pennington, 2000). Some researchers suggest that this co-occurrence is principled, and that attention lies at the basis of reading, and hence, attention deficits may underlie dyslexia. For example, Clark (1999) suggested that attention should be engaged at the word-target location before a saccade can be made to that location. Hoffman and Subramaniam (1995) have shown that spatial attention is a crucial mechanism in generating voluntary saccadic movements. Thus, visuospatial attention initiates the saccade, and the programming of the next saccade begins when visual attention shifts from the fovea toward the next word into the parafoveal area (Clark, 1999).

Other researches provide less-specific approaches to the relation between attention and reading but claim that attention is crucial for reading. For example, Reynolds and Besner (2006) suggest that attention is critical for translating print into speech, and Shaywitz and Shaywitz (2008) claim that attention has a role in reading, and that deficient attention may cause reading difficulties. Several previous studies reported certain attention deficiencies in children with dyslexia compared with typically developed children (Slaghuis et al., 1993; Casco and Prunetti, 1996; Casco et al., 1998; Vidyasagar and Pammer, 1999). For instance, Facoetti and his colleagues found that children with dyslexia (without specifying the types of dyslexia) did not benefit from exogenous (peripheral) precues although they did demonstrate improved performance when endogenous (central) precues were introduced (Facoetti et al., 2000, 2003a,b). According to these studies, difficulties in spatial attention (orienting and/or selective) may serve a causal role in dyslexia.

However, comorbid occurrence of two deficits is still not necessarily indicative of a principled relation between them. In the current study we aim to examine whether co-occurrences of reading and attention difficulties indicates a causal relation between the two. The way neuropsychology usually approaches questions of relations between modules and functions is by searching for dissociations and double dissociations: if a double dissociation between dyslexia and attention deficits is found, then reading and attention are independent modules that can be selectively impaired, and an impairment in one does not result from an impairment in the other. Thus, this study searched, first, for double dissociations between developmental dyslexia and attention deficit in general. The next stage is aimed to explore the finer relations between specific types of dyslexia and attention, asking whether in cases of comorbid impairments, specific dyslexias are linked to certain specific attention deficits. To the best of our knowledge, no study has tested the relations between subtypes of developmental dyslexia and subtypes of attention difficulties.2

### **DYSLEXIAS SUSPECTED TO HAVE ATTENTIONAL BASIS**

Three types of peripheral dyslexia that affect many Hebrew readers with developmental dyslexia present characteristics that seem to be related to attention, and are hence the best candidates for having attention deficits of some sort at their basis. One is *letter position dyslexia* (LPD, Friedmann and Gvion, 2001, 2005; Friedmann and Rahamim, 2007, 2014; Friedmann et al., 2010a; Friedmann and Haddad-Hanna, 2012, 2014; Kohnen et al., 2012; Kezilas et al., 2014). LPD is characterized by transpositions of middle letters within words. According to some analyses, it results from a difficulty in attention allocation to letters, whereby attention is allocated to the first and final letters in the word, and then to all middle letters together. This creates illusory conjunctions between middle letters and their positions (Friedmann and Gvion, 2001). The question that immediately arises is whether this is a general visuo-spatial attention of the type that is measured in attention tasks, or whether this is an orthographic-specific attention function that is specifically harnessed to reading.

*Attentional dyslexia* is another dyslexia that might stem from a deficit in attention (as the title that Shallice and Warrington selected for the 1977 article in which they first reported this type of dyslexia already suggests: "The possible role of selective attention in acquired dyslexia"). Attentional dyslexia is an impairment in binding letters to words, which results in migrations of letters between words (Shallice and Warrington, 1977; Warrington et al., 1993; Saffran and Coslett, 1996; Hall et al., 2001; Humphreys and Mayall, 2001; Davis and Coltheart, 2002; Mayall and Humphreys, 2002; Friedmann et al., 2010b). A possible attentional approach for attentional dyslexia would ascribe it to a deficit in selective attention that hampers the ability to glue letters to words, or the ability to focus on the target word and attenuate neighboring words.

Another theoretically possible point of contact between dyslexia and attention is *neglect dyslexia*, in which the deficit is related to a specific difficulty in shifting attention to one of the sides of the word, usually its left side. The main types of errors in this dyslexia are omissions, substitutions, and additions of letters in the neglected side (Vallar et al., 2010; and see Friedmann and Nachman-Katz, 2004; Nachman-Katz and Friedmann, 2007, 2008, 2009, 2010; Friedmann and Haddad-Hanna, 2014, for the developmental form of this dyslexia). A natural place to look for an attentional source of this dyslexia would be in orienting of attention. And again, the question is whether this is a general visuo-spatial attention or an orthographic-specific one.

Finally, a different sort of relation between attention and dyslexia may characterize *surface dyslexia*. As explained above, surface dyslexia is a deficit in reading via the lexical route that results in reading via the sublexical route. One may imagine several mechanisms in which attention deficits may give rise to surface dyslexia errors. One is a general one - given difficulties in sustained attention during childhood and during the time of learning to read, children may not be able to attend to classes and devote resources to learning to read, reading, and doing homework. As a result, they might not be familiar with many written words, their lexicon would be impoverished, and their reading would have to rely on the sublexical route3 . Similar indirect reduction of time allotted to reading may be caused by deficits in other attention functions such as selective attention, which affect the ability or motivation of a child to cope with the situation of reading in general. A more specific effect was suggested by Valdois and collegues (Valdois et al., 2004; Bosse et al., 2007). According to Valdois et al., an impairment of visual attention that reduces the visual attention span – the number of elements that can be identified in parallel—could also lead to reading letter-by-letter in a way typical to surface dyslexia4. Finally, it is also possible to imagine a more specific mechanism related to executive attention, assuming that executive attention is responsible for keeping the reader on the lexical route and resolving conflicting inputs that come from the output of the parallel sublexical route.

In the second part of this research we therefore assessed these specific questions on the fine relations between different types of dyslexia and specific attentional functions.

### **SOME EVIDENCE TO THE DISSOCIABILITY OF DEVELOPMENTAL DYSLEXIA AND ATTENTION DEFICITS**

One source of evidence to the dissociability of dyslexia and ADHD comes from the differential effect that methylphenidate (MPH) has on the two. MPH is the most commonly used drug treatment for ADHD. Keidar and Friedmann (2011) assessed whether individuals with developmental dyslexia and ADHD whose attention deficits are relieved by MPH also show reduced rates of errors in reading with MPH. They tested 20 Hebrewspeaking participants with attentional-based dyslexia (mainly LPD and attentional dyslexia) and ADHD, once with and once without MPH. The results were that even though MPH positively affected their performance in at least one of the attentional functions (sustained, selective, orienting, or executive attention), it did *not* improve their reading accuracy. All of these participants had LPD, and many of them also had attentional dyslexia, but still their rate of migrations between words and within words was not affected by MPH. This study already provides some

<sup>2</sup>In this study we focus on dyslexia, and hence on reading at the single word (and nonword) level. Another type of relation between reading and attention pertains to text reading and reading comprehension. Individuals with attention disorders may read an entire chapter and have no idea about its content (Levine, 1987), as a result they often have to read the same paragraph repeatedly in order to grasp its meaning (Robin, 1998, p. 284. See also Cherkes-Julkowski et al., 1995; Brock and Knapp, 1996; Stern and Shalev, 2013, for impairments of children with ADHD on comprehension at the text level).

<sup>3</sup>It is also possible that a deficit in sustained attention would induce a more general reading impairment, leading to garden variety of errors: letter substitution, addition, omission and migration, as well as possibly reading via grapheme-to-phoneme conversion.

<sup>4</sup>Notice that even if reduced attention span can account for surface dyslexia, it cannot account for other dyslexias, such as LPD, attentional dyslexia, and the others.

evidence that reading and attention systems are separate, and that the deficit that underlies LPD and attentional dyslexia is orthographic-specific rather than resulting from a general attentional deficit. Had the attention deficit been the source of the reading impairment, we would have expected improved attention to also reduce reading errors.

Another type of evidence suggesting that the deficit in LPD and in neglect dyslexia is orthographic-specific rather than resulting from a general attentional deficit comes from the differential effect of dyslexia on the reading of words and numbers. Friedmann et al. (2010a) reported on ten individuals with LPD who made many migration errors of letters within words, but their performance on multi-digit number reading was good: they read numbers without migration errors, and not differently from the control participants. Had attention been the source of the deficit in reading, we would expect all kinds of stimuli to be affected by it, including numbers, and not only words. Similarly, Friedmann and Nachman-Katz (2004) and Nachman-Katz and Friedmann (2008)reported on 21 individuals with developmental neglect dyslexia who made neglect errors on the left side of words, but not on the left side of multi-digit numbers. Such a dissociation between word and number reading is inconsistent with a general visuo-spatial attention deficit, which should have affected both types of stimuli.

Finally, Collis et al. (2013) recently examined the performance in a partial report task of adults with developmental dyslexia who make letter position errors and migrations between words (parallel to LPD and attentional dyslexia). They compared the participants' performance on strings of letters and symbols (as well as digits), and found that the participants with developmental dyslexia performed poorer than the control participants, but their deficit was limited to letter strings, and did not affect symbol strings. These findings suggest that the dyslexic participants did not suffer from a general visuo-spatial deficit in the visual attentional window, but rather from a deficit that was limited to orthographic material.

In this study we examined the relation between developmental dyslexia and attention deficits from another perspective, by systematically examining developmental dyslexia types and specific attention difficulties. We aimed to identify, at the cognitive level, the bases of different types of reading difficulties and different types of attention deficits. We assessed the reading and attention of all the participants. Firstly, we asked whether reading and attention are separate cognitive modules. Then, we asked whether participants with specific types of dyslexia share a specific attention deficit. The rationale was that if we can identify cases of dissociation between dyslexia and attention deficits, and specifically, if we can identify individuals with dyslexia that has a suspected attentional cause who do not have a visuo-spatial attention deficit, attention cannot be the underlying cause for this dyslexia.

### **METHODS**

### **PARTICIPANTS**

The participants we report below are 110 Hebrew-readers with either dyslexia or attention deficit, 65 of them are children and adolescents (age ranges 10;0–17;0, *M* = 13;2, 35 girls and 30 boys) and 45 are adults (with ages ranging 18;1–42;0, *M* = 28;2, 23 women and 22 men). We only included children older than 10 years of age, to make sure that they have already had enough time to fully acquire reading and to establish an orthographic lexicon. For all of the participants, the reading and attention deficits were developmental: none of them had history of brain lesion, neurological disease, or loss of consciousness. For all of them Hebrew was the first language, and the first language in which they learned to read.

Most of the participants responded to ads that invited volunteers with both reading and attention deficits and a few of them responded to ads that looked for individuals with difficulties in at least one of the above domains. The Tel Aviv University and the Ministry of Education Ethics Committees approved the experimental protocol.

All participants completed two extensive test batteries: a reading battery and an attention functioning battery. Because we were only interested in dissociations between reading and attention deficits, we only included in the study participants who had a deficit in at least one of the domains: dyslexia, or attention deficits, or both.

### **READING ASSESSMENT**

To evaluate the oral reading of each participant and to determine which type of dyslexia each participant had, we tested each of the participants using the TILTAN screening test (Friedmann and Gvion, 2003), which was developed to identify subtypes of dyslexia in Hebrew. The screening test includes oral reading of 136 single Hebrew words (2–11 letters long), 30 word pairs (3–6 letters long), and 40 nonwords (3–6 letters long). According to the error types in the screening test, we ran additional tests to each participant for the types of dyslexia that emerged from the reading aloud test. These tests are described below for each type of dyslexia.

The word list in the screening test included words of various types that can reveal the different types of dyslexia: 65 migratable words—words in which middle letter migration creates another existing word, for the identification of letter position dyslexia; 104 words for which omission, substitution, migration, or addition of a vowel letter creates another existing word, for the identification of vowel letter dyslexia; 136 words for which neglect of the left side of the word yields another existing word, for the identification of neglect dyslexia, and 108 words for which right neglect errors create an existing word; 84 irregular words and potentiophones for the identification of surface dyslexia; 57 morphologically complex words for deep dyslexia and phonological dyslexia; and 26 abstract nouns and 28 function words, for deep dyslexia. All the words were sensitive to visual dyslexia, as each words had more than six orthographic neighbors.

The 40 nonwords were included for the identification of impairments in the sublexical route, in phonological dyslexias or vowel dyslexia, and deep dyslexia, but also contained migratable nonwords and words that created existing words by substitution, omission, or addition of letters, and where hence also sensitive to various impairments at the orthographic-visual analyzer (letter position dyslexia, visual dyslexia, neglect dyslexia). The list of 30 word pairs was created so that between-word migrations created other existing words, for the identification of attentional dyslexia.

On the basis of this test, we determined whether a participant had normal reading or whether s/he had dyslexia, and if s/he had dyslexia, which types of dyslexia were suspected based on the error pattern s/he showed and the factors that affected her/his reading (frequency effect, word length effect, lexicality effect, etc.). Impaired performance in the screening task, as well as on each of the further reading tasks, was determined according to the comparison of the participant's reading to an age-matched control group. The control groups were collected in previous studies and throughout the development of the test batteries. The control group for the adult participants included 372 adults, the children control groups included at least 20 children in each age group. Skilled readers after 8th grade showed identical reading pattern to the adult control group. Each participant's performance was compared to the control group using the Crawford and Howell's (1998) *t*-test for the comparison of the performance of a participant with a control group. An impaired performance was defined as performance that was significantly below the control, with *p* < 0.05. The type of dyslexia was determined using the same procedure and statistical test, applied to the various types of errors. We determined that a participant had a certain dyslexia if s/he made significantly more errors of the relevant type compared to the control group, and performed significantly poorer than the control group in the relevant reading tests. We only included in the no-dyslexia group individuals who performed within the normal range in all the reading tests. Unclear cases, with performance that was marginally different from that of the control group, and hence could not form a clear case of dissociation, were excluded from the study.

Letter position dyslexia was determined according to the number of letter position errors in reading migratable words (See Appendix C for Hebrew examples of the words of the various types and types of errors).

Attentional dyslexia was determined according to the number of between-word errors, including between-word migrations and between-word letter omissions, in reading migratable word pairs.

Left neglect dyslexia was determined according to letter errors (substitutions, omissions, and additions) that occurred predominantly on the left side of the words (see Friedmann and Nachman-Katz, 2004; Friedmann and Gvion, 2005; Nachman-Katz and Friedmann, 2007, 2008, 2009, 2010; Reznick and Friedmann, 2009, for a description of the manifestation of neglect dyslexia in Hebrew readers).

Surface dyslexia was determined according to the number of reading errors that resulted from reading via the sublexical route rather than via the lexical route, which caused regularization errors in irregular words and potentiophones.

Vowel dyslexia was determined according to the number of vowel letter errors (migrations, substitutions, omissions, and additions) in words and nonwords.

We then further tested the participants' reading in additional tests from the TILTAN battery that were specific to different types of dyslexia that emerged from the screening test, in order to establish decisively the type of dyslexia each participant had. In these additional tests, reading aloud was done without time limit, and the participants were requested to read aloud as accurately and as quickly as possible. The first responses were counted, even when they were later self-corrected. In the lexical decision and the comprehension tasks, the participants were requested to perform the tasks in silent reading, without sounding out the words they read.

The results of each participant in each of the further reading tasks was compared to those of age-matched controls. In the reading aloud tasks, the number of errors of each type (reading via the sublexical route, vowel omission, substitution, addition, migration, consonant omission, substitution, addition, migration, letter neglect on the left, migrations between words, voicing errors, semantic errors) was compared to the number of these errors in the control group. In the lexical decision and comprehension tasks, the percentage of correct responses was compared to that of the control group.

### *Letter position dyslexia*

To establish the diagnosis of letter position dyslexia, which is characterized by letter migrations within words, we used tasks that tested the participants' oral and silent reading of words that are most sensitive to this dyslexia—migratable words. These are words in which migration of middle letters within the words creates another existing word (such as cloud-could, parties-pirates, casual-causal).

The *reading aloud* task for LPD included 232 migratable words of 4–7 letters (*M* = 4.9, *SD* = 0.9). In 87 of these words, a middle migration that involves a vowel letter and a consonant letter creates another existing word, and in 163 words a middle migration that involves two consonant letters creates another word. (For an English example, the word *stops* has a potential for transposition of two consonant letters- *t* and *p*, creating the words *spots*, and the word *form* has a potential for migration that involves a vowel—a transposition of *o* and *r* would create the word *from*).

Additional tasks involved *same-different decision* in which the participant was presented with 60 word pairs, half of which differed in middle letter order (clam-calm), and was requested to determine whether the words in the pair are same or different; *lexical decision task*, in which the participants saw 60 items, half of them words and half migratable nonwords (pecnil) and were requested to determine whether the item was a word; and a *reading comprehension task* that included 50 triads. Each triad consisted of a target migratable word, and two words to choose from: one word that is semantically associated with the target word, and one that is semantically associated with a word that can result from a transposition of middle letter (dairy → milk, notebook). The participants were requested to circle the word that is semantically associated with the target word.

### *Attentional dyslexia*

To establish the diagnosis of attentional dyslexia, characterized by migrations of letters between neighboring words (and by omissions of an instance of a letter that appears in two neighboring words in the same position), the participants read aloud additional lists of word pairs and a list of nonword pairs.

The *word pair* list included 120 word pairs of 2–7 letters (*M* = 4.8, *SD* = 1). All these word pairs were migratable, namely, for each of them, migration of a letter from one word to the other, preserving the within-word position, creates another existing word (e.g., *mild wind* in which between-word migration can create *wild mind*). The *migratable nonword pair* list included 30 3-letter nonword pairs in which letter migration between words would result in existing words.

### *Neglect dyslexia*

Identification of neglect dyslexia was based on an analysis of the position of consonant letter errors (substitutions, additions, and omissions) in the three subtests of the reading aloud screening task, as well as three additional tasks: reading aloud of words and nonwords that share the right side with other words, and lexical decision.

The *oral reading of words* for neglect dyslexia included 100 words in which substitution, omission, or addition of a letter on the left side created another existing word (*rice*→ nice, price, ice). The list for *oral reading of nonwords* for neglect dyslexia included 30 nonwords that differ from existing words in the left letter (*netter*). The *lexical decision* task included 50 nonwords that differ from existing words in the left letter (*diraffe*), as well as 40 existing words.

### *Surface dyslexia*

*Surface dyslexia test: Reading aloud of potentiophones.* To establish surface dyslexia, which is characterized by reading via the sublexical route, the participants read aloud 78 potentiophonic words, 2–6 letters long (*M* = 3.7 letters, *SD* = 0.8). Potentiophones are words whose reading via grapheme-to-phoneme conversion creates another existing word (like *now*, which can be read via grapheme-to-phoneme conversion to sound like "know," Friedmann and Lukov, 2008). Such words are the most sensitive stimuli to detect surface dyslexia because, like irregular words, their correct reading requires the lexical route. They are more sensitive to surface dyslexia than other irregular words because reading them via the sublexical route results in another word, and hence the reader cannot know that the word was read erroneously.

*Pseudo-homophone lexical decision.* The lexical decision task for surface dyslexia contained 66 word pairs. Each pair included a word spelled correctly and its pseudo-homophone (e.g., knifenife). For each pair, the participants were requested to circle the word that was spelled correctly.

*Homophone-potentiophone written word comprehension.* The reading comprehension task included 40 triads. Each triad consisted of a target word, and two words to choose from: one word that is semantically associated with the target word, and a homophone or a potentiophone of the associated word (e.g., bottle—bear beer). The participants were requested to circle the word that is semantically associated with the target word.

### *Vowel dyslexia*

To establish the diagnosis of vowel dyslexia, characterized by substitutions, omissions, additions and migrations of vowel letters, the participants performed two additional tasks of lexical decision and word comprehension.

*Lexical decision.* The vowel dyslexia lexical decision task contained 80 items: 45 nonwords in which a vowel error creates existing Hebrew words and 35 existing words—16 of which included a vowel letter and 19 without vowel letters. The items in the task were 2–8 letters (*M* = 4.8, *SD* = 1.13). The participants were requested to silently read each word and to circle the words that exist in Hebrew.

*Written word comprehension.* The reading comprehension task for vowel dyslexia included 52 triads. Each triad consisted of a target word (3–6 letters long, *M* = 4.4, *SD* = 0.75), and two to four words to choose from: one word that is semantically associated with the target word, and the rest are words that are semantically associated with words that can result from a vowel error in the target word (form → shape, to, ranch). The participants were requested to circle the word that is semantically associated with the target word.

### **ATTENTION ASSESSMENT**

Attention functioning was assessed by using four computerized neuropsychological tasks, serving as indicators of performance in each of the attention functions (Tsal et al., 2005). The four attention tasks enable us to assess the attentional profile of each participant. The performance of each participant in each of the above tests was compared to that of an age-matched control group. The control group for the adult participants included 300 adults, and the children control groups included at least 30 children in each age group collected throughout the development of the test battery. A deficit in sustained, selective, or executive attention was defined in cases where an individual's performance was located in the lowest five percentages of the distribution of her/his age-matched control group, that is, when ≤ −1.645. In orienting attention there are two different possible deficits: a deficit in disengagement of attention (when invalid cue caused a large decrease in performance) and a deficit in automatic orienting of attention (when a valid cue was not effective and did not improve performance). The former was defined in cases where the performance was located in the lowest 5% of the distribution, that is, when ≤ −1.645 and the latter was defined when the performance was located in the highest 5% of the distribution, that is, when ≥ 1.645. Each attention test starts with a short practice block and the test lasts approximately 12 min. The task that assessed sustained attention was always administered as the first task. The other three attention tasks were administered in a counter-balanced order.

### *Sustained attention*

For sustained attention, we used a Conjunctive Continuous Performance Test (CCPT). Participants were presented with a long series of stimuli but were instructed to respond to a single reoccurring pre-specified target (a red square) while withholding responses to all other, non-target stimuli. There were four possible shapes (square, circle, triangle, and star) and four possible colors (red, blue, green, and yellow). As soon as a target appeared the participant was requested to press the spacebar. Using a low rate of target stimuli (30%) and varying the inter-stimulus interval (ISI), this task maintains a high demand on sustained attention but minimizes the involvement of other cognitive factors. Standard deviation of mean RT of target trials served as the measure of sustained attention (Shalev et al., 2011; Stern and Shalev, 2013).

### *Selective (spatial) attention*

For selective attention, we used a conjunctive search task (Tsal et al., 2005). Participants were instructed to search for a target stimuli appearing among distracters. The displays varied in their set size (i.e., the number of distractors), enabling estimation of the effect of attentional load on performance. The participant was instructed to fixate on a central fixation point which was followed by a display of items. The participant was requested to decide whether the display contained the target—a blue square among the distractors (blue circles and red squares). The target appeared in 50% of the displays. If a target was detected the participant had to press the "L" key and if the target was absent then s/he had to press the "A" key. The slope of the search graph reflected the efficiency of spatial selective attention.

### *Orienting attention*

For orienting of attention, we used a peripheral cueing paradigm (Posner et al., 1980) with an exogenous cue (Jonides, 1981). Participants had to discriminate a stimulus—a triangle or a circle—preceded by an abrupt onset at either the target's location (valid cue) or the opposite side of fixation (invalid cue). When the target was a triangle the participant had to press the "L" key and when the target was a circle s/he had to press the "A" key. The difference in performance between valid and invalid trials indicates the ability to orient attention and efficiently disengage from irrelevant locations (Tsal et al., 2005).

### *Executive attention*

For executive attention, we used a Location-Direction Strooplike task (Stroop, 1935) with a spatial aspect. Participants had to respond either to the location or the direction of an arrow (in different blocks) appearing on the screen, while ignoring the other irrelevant dimension. Half of the stimuli were congruent trials (that is, the location on the screen and the direction of the arrow match; i.e., an arrow presented below fixation pointing downwards) and half of them were incongruent (i.e., an arrow presented above fixation pointing downwards). In the first two blocks of the tasks participants were requested to judge the location of the arrow (relative to the fixation point; if it is presented above the fixation they had to press "L" and if it is presented below the fixation they had to press "A") and in the last two blocks they were requested to judge its direction (Tsal et al., 2005). The widely-used interference effect in such tasks reflects the extent to which conflicting irrelevant information is being effectively suppressed.

### **RESULTS**

### **PART A: DISSOCIATIONS BETWEEN DYSLEXIA AND ATTENTION DEFICITS**

One of the most fundamental tools in the neuropsychological toolbox is that of double dissociation. Such a condition in which one person has impairment in cognitive ability A but has normal performance on B, and another person with the opposite dissociation, impairment in cognitive ability B but with normal performance on A, suggests that A and B are separate modules. Thus, if we are able to identify a double dissociation between dyslexia and attention deficits, we can demonstrate that neither of them underlies the other. Specifically for this special issue, we will be able to answer the question as to whether attention deficits underlie developmental dyslexia in "NO."

As shown in **Tables 1, 2**, we identified 55 participants who showed a double dissociation between reading and attention functions: As summarized in **Table 1**, 28 had dyslexia with normal attention functioning (12 children and 16 adults), and 27 had deficits in at least one attention function, with normal reading (10 children and 17 adults). Importantly, various types of dyslexia showed dissociations with attention: among the participants with dyslexia who had spared attention abilities there were 21 individuals with letter position dyslexia, 13 with attentional dyslexia, and 2 with neglect dyslexia, all dyslexias that have been linked by some to attention functions, as well as 12 participants with surface dyslexia, 11 with vowel dyslexia, and one woman with phonological buffer dyslexia. Appendix A details the relevant error rates in reading aloud for each of these participants- for each participant, errors of each type that occurred at a rate significantly (p < 0.05) higher than that of an age-matched control group appear under the relevant dyslexia type. Empty cells in Appendix A indicate no errors or a small percentage of error within the normal range for the relevant error (average percentage and *SD* of each type of error for each control age group appear in the bottom of each dyslexia column).

These results suggest that each of these types of dyslexia can be dissociated from attention deficits, indicating that reading and attention are separate, and that attention deficits do not underlie these dyslexias.

As summarized in **Table 2**, it is also clear that each of the four tested attention functions could be impaired without giving

**Table 1 | The types of dyslexia among individuals with intact attention and impaired reading (***n* **= 28).**




rise to dyslexia. Among the 27 participants with attention deficit whose reading was intact there were 19 with deficient sustained attention, 7 with deficient selective attention, 12 with deficient orienting attention and 11 with executive attention deficit (the detailed Z-scores for each of these participants in each of the tasks are presented in Appendix B). Importantly, we demonstrated that individuals who suffer from a deficit in any of the four functions of attention (sustained, selective, orienting, and executive) may still show preserved reading. These findings further support the claim that attention deficits are not necessarily related to dyslexia.

### **PART B: FINER GRAINED OBSERVATIONS: TYPES OF DYSLEXIA AND TYPES OF ATTENTION DEFICITS**

Considering the possible connections between specific dyslexias and specific attention deficits, one can ask whether when an individual has both dyslexia and attention deficit, there are consistent relations between the type of dyslexia and the type of attention function that is impaired.

As we explained in the Introduction, several possible specific relations between dyslexia and attention deficits can be inferred from the assumption that reading and attention deficits are the result of the same core deficit. The possible connections that we examined were between letter position dyslexia and selective attention, between attentional dyslexia and selective attention, between neglect dyslexia and orienting of attention, and between surface dyslexia and sustained or executive attention. We have already seen in Section A that dyslexia can be dissociable from attention deficits altogether, and hence, we can also conclude that these types of dyslexia can be dissociated from attention deficits. In the data summarized in **Table 3** we are able to explore, for individuals who have both dyslexia and attention deficit, whether the witnessed attention function that was impaired was the one suspected under a general attentional hypothesis for each dyslexia with possible attentional bases.

Starting with letter position dyslexia, where we look for relations to a selective attention deficit, **Table 3** shows that even in cases where both reading and attention are impaired, LPD does not necessarily appear with selective attention deficit. In our results, summarized in **Table 3**, 30 individuals had LPD but no selective attention deficit. A broader look at the other attention functions indicates that there was no single attention function that was impaired for all the individuals with LPD who also had an attentional deficit.

Similarly, attentional dyslexia can also be thought to stem from a deficit in selective attention. **Table 3**, however, reports on 18 individuals with attentional dyslexia who had an attention deficit but no selective attention deficit.

As for neglect dyslexia, the suspected attention function would be orienting of attention. However, the results in **Table 3** include three individuals with developmental neglect dyslexia, and neither of them had a deficit in orienting of attention.

Finally, considering surface dyslexia, we suggested that a deficit in sustained attention can cause a chain of events following which children will have more limited exposure to reading, and read via grapheme-to-phoneme conversion, rather than via the lexical route. This mechanism as a basis for surface dyslexia is also not supported by our results: In **Table 3** we report 15 individuals with surface dyslexia who had an attention deficit but intact sustained attention, and 13 individuals with sustained attention deficit, who did not have surface dyslexia. As for the hypothesis according to which executive attention underlies surface dyslexia, there were 10 participants with executive attention deficit without surface dyslexia, and 21 participants with surface dyslexia without executive attention deficit.

Additionally, three participants had phonological dyslexia. It may be suggested that a deficit in grapheme-to-phoneme conversion may be related to a difficulty in the serial shift of attention from one letter to the next. Such attention function may be supported by orienting of attention. This suggestion is not supported by the findings, however. There were two phonological dyslexics who also had attention disorders: one had only selective attention deficit and one had only sustained attention deficit. More importantly, in Part A we reported a woman who had phonological (buffer) dyslexia with completely normal attention functions. Similarly, no specific attention deficit was found for vowel dyslexia or visual dyslexia.

## **DISCUSSION**

An important part of the quest into the nature of developmental dyslexia is the search for underlying causes for dyslexia. Often such causes are searched within the general cognitive abilities, and one such candidate is attention. In this research we explored the question of the relation between attention deficit and dyslexia from a neuropsychological perspective that takes into account the existence of various types of dyslexia and of various types of attention deficits. As a first step, we established a double dissociation between dyslexia in general and attention deficits in general in 55 individuals. We showed children, adolescents, and adults who had dyslexia (of any type) without attention deficits (all four attention functions were normally functioning). We then showed children, adolescents, and adults who had attention deficits (of any of the four types) without dyslexia (reading at


**99**


 *row represents a different participant (n 55).*

*aIn the attention tasks, Z scores smaller than* −*1.645 reflect deficient functioning. In the case of orienting attention Z scores smaller than* −*1.645 reflect deficient disengagement of attention and Z scores larger than* +*1.645 reflect deficient automatic orienting. That is, the colored cells represent performance level below the cutoff of 0.05.*

*bIn the reading tasks, all the error rates of the various types presented in the table are significantly larger than the age-matched control groups and indicate a dyslexia of the relevant type. Empty cells indicate that there were no relevant errors at all or that the relevant error rate was within the normal range, in most cases less than 1%. The percentages of errors that appear in the table for each dyslexia are: for LPD—migrations in migratable words (except LEA, VIV, LNA, and ZYK whose error rates refer to their migrations in nonwords), for attentional dyslexia- migrations between words in migratable word pairs, for neglect—consonant omission and substitution on the left of the word, for vowel dyslexia—vowel letter omission, substitution, migration, or addition in nonwords (except NML, ASH, VKK, ORF, AVA, ARD, and LDK whose error rates refer to their vowel letter errors in words), for surface dyslexia—sublexical reading in irregular words and potentiophones, for phonological dyslexia—errors in nonwords, and for visual dyslexia—consonant omission, substitution, or addition (for NMR, who has visual-analyzer-output dyslexia, the error rates also included vowel letter errors and letter migrations). cAsbewhencomparingtheratesofthedyslexicstothoseofthecontrols,although,4%neglectlikelowrate,unimpairedreadersmakealmostsuchandhence4%*

**Table 3 | Continued**

 *can* 

*errors (5 or 6 consonant errors on the left out of the 136 words) already signifies a significant deficit, which is supported by the additional reading tests. The same holds also for visual dyslexia, where 3% visual*

*errors (4 consonant omission, substitution,*

 *seen* 

 *error* 

 *or addition errors out of the 136 words) already indicates a significant deficit.*

 *errors seem* 

 *a* 

 *error* 

 *no* 

 *errors* 

the single word level was completely normal). Such double dissociation already indicates that attention cannot be the source of dyslexia.

However, we sought to be more specific in this study, and looked for the fine relations, or lack thereof, between specific types of dyslexia and specific attention functions.

Starting with *letter position dyslexia*, an imaginable attentional source would be selective attention: if an individual cannot focus attention on a restricted area, several letters could be perceived together in the attended area, and as a result their positions may be misperceived. The results, however, do not support such account. We have seen 21 individuals with LPD who had no attention deficits, and 7 individuals with a deficit in selective attention who had no dyslexia, including no LPD. Even in cases where both reading and attention are impaired, LPD does not necessarily appear with selective attention deficit. In our results, 30 individuals had LPD but no selective attention deficit. A broader look at the other attention functions indicates that there was no single attention function that was impaired for all the individuals with LPD who also had an attentional deficit.

Similarly, *attentional dyslexia* can be thought to stem from a deficit in selective attention. Here the suspected mechanism would be that a deficit in attenuation of the neighboring words would result in letters from the neighboring words being perceived with the target words. Here, again, our results do not support such an underlying basis for attentional dyslexia: firstly, in Section A we reported on 13 individuals who had attentional dyslexia without any attention deficit, and 7 individuals with selective attention deficit without attentional dyslexia. Secondly, we reported on 18 individuals with attentional dyslexia who did have an attentional deficit but no selective attention deficit.

In fact, these findings indicate that even the dyslexia that Shallice and Warrington (1980) termed "attentional dyslexia," is actually not attentional in nature, and can occur in individuals with no general visuo-spatial attention deficit.

Additionally, one may take the dissociations found between letter position dyslexia and attentional dyslexia as an additional indication that a general deficit in selective attention cannot be the source of these dyslexias. Had selective attention deficit been the source of both these dyslexias, we would expect them to always appear together. However, in this study there are 23 individuals with letter position dyslexia who did not have attentional dyslexia, and 4 individuals with attentional dyslexia who did not have letter position dyslexia. This double dissociation was also found in previous studies of these dyslexias: Friedmann and Rahamim (2007) and Keidar and Friedmann (2011) reported on individuals with LPD without attentional dyslexia and Friedmann et al. (2010b) reported on individuals with attentional dyslexia without LPD. Thus, this is an additional evidence that these dyslexias cannot stem from the same attentional source5 .

When we think of *neglect dyslexia*, the imaginable connections to attention are different. One can think of neglect dyslexia as resulting from a deficit in orienting of attention to the left visual field. In fact, data from adults with acquired dyslexia already show that neglect dyslexia at the word level can appear without visuo-spatial neglect (Kinsbourne and Warrington, 1962; De Lacy Costello and Warrington, 1987; Patterson and Wilson, 1990; Haywood and Coltheart, 2001; see a summary and discussion in Cubelli et al., 1991 and Young et al., 1991). Such dissociation was also reported from Hebrew-speaking children and adolescents with developmental neglect dyslexia (Friedmann and Nachman-Katz, 2004; Nachman-Katz and Friedmann, 2007, 2010). The other direction of dissociation has also been reported: Primativo et al. (2013) reported seven patients with unilateral spatial neglect who did not have word-level neglect dyslexia. Such double dissociations already suggest that there are two different mechanisms underlying visuo-spatial neglect (and omissions of words on one side of text) and neglect dyslexia at the word level. The current results support these conclusions (as well as Haywood and Coltheart's, 2001 perception of word-level neglect dyslexia as separate from visuospatial neglect) from additional angle: Part A reported two individuals with developmental neglect dyslexia with no attention deficits, and 8 individuals with a deficit in orienting of attention without any dyslexia, including no neglect dyslexia. The results of part B include three additional individuals with developmental neglect dyslexia, none of whom had a deficit in orienting of attention and eleven participants with deficient orienting of attention who suffer from different types of dyslexia, none of which is neglect dyslexia.

Finally, let us consider the non-specific effect that a deficit in sustained attention may have on reading. One hypothesis we raised in the Introduction was that a deficit in *sustained* attention would cause a chain of events following which children will have more limited exposure to reading. In this case, many words will not be represented in the orthographic lexicon, and they will be read via grapheme-to-phoneme conversion, leading to *surface dyslexia*-like errors. This mechanism as a basis for surface dyslexia is also not supported by our results: we saw in part A 12 individuals who had surface dyslexia but no attention deficits, and 27 individuals who had attention deficits (including 19 with sustained attention deficit) without surface dyslexia. Part B added to these results by showing 15 individuals with surface dyslexia who had an attention deficit but intact sustained attention, and 12 individuals with sustained attention deficit, but without surface dyslexia. We also found results that do not support *executive* attention as the basis for surface dyslexia: we suggested that executive attention may be responsible for keeping the reader on the lexical route and resolving conflicts in the output buffer between inputs from the lexical and sublexical routes. However, this mechanism is not borne out, as in part A there were 12 individuals who had surface dyslexia but no attention deficits, and 27 individuals who had attention deficits (including 11 with executive attention deficit) without surface dyslexia. Part B added to these results by showing 20 participants with surface dyslexia who had an attention deficit but intact executive attention, and 10 individuals with executive attention deficit and dyslexia, but without surface dyslexia.

<sup>5</sup>The double dissociation between LPD and attentional dyslexia indicates that it cannot be the case that selective attention underlies both dyslexias but only one of them is affected in some cases because it is more sensitive to the attention deficit.

The other hypothesis, according to which sustained attention deficit would lead to a garden variety of errors in reading is also not supported by our results, especially given the 19 participants in part A who had sustained attention deficit but no dyslexia at all.

In addition, it was found that the different attention deficits are dissociable from one another (i.e., there were participants who were impaired in a single attention function). Most importantly, it seems that selective attention and orienting of attention—two spatial attention functions that sometimes are treated as interchangeable functions, are separate. We reported (in **Tables 2, 3**) 22 participants who showed no significant orienting deficit yet demonstrated selective attention deficit and 15 participants who showed the opposite pattern.

Thus, we saw double dissociations between letter position dyslexia and attention, including a double dissociation with selective attention; between attentional dyslexia and attention, including a double dissociation with selective attention; between word-based left neglect dyslexia and attention, including a double dissociation with orienting of attention; and between surface dyslexia and attention, including a double dissociation with sustained attention and with executive attention, as well as dissociations between vowel dyslexia and phonological dyslexia and attention. These results show that each of these dyslexias can occur with intact attention, indicating that attention deficits do not underlie these dyslexias. These results may suggest that the impairment in dyslexias such as letter position dyslexia, attentional dyslexia, or neglect dyslexia lies in an attention component that is specific to reading, an *orthographic-attention*. Such approach is also consistent with previous findings that describe letter migrations in words without digit migrations in numbers (Friedmann et al., 2010a); neglect of the left side of words without neglect of the left side of numbers (Friedmann and Nachman-Katz, 2004; Nachman-Katz and Friedmann, 2008); findings according to which MPH improves visuo-attention functions but not reading errors in letter position dyslexia or attentional dyslexia (Keidar and Friedmann, 2011); and findings according to which adults with developmental dyslexia perform poorer than controls on partial report task only in letter strings but not in symbol strings (Collis et al., 2013).

The differences between the findings of the current study and previous studies that reported comorbidities between reading and attention can be ascribed to several factors. Firstly, whereas previous studies focused on the group level and looked for correlations, we examined the question at the individual level and focused on the search for dissociations as a tool to examine whether attention deficit underlies dyslexia. Another difference relates to the level at which reading disorders were examined. We examined dyslexia and hence tested errors in reading at the single word (and nonword) level, whereas some of the previous studies that found relation between attention and reading tested reading speed, which may be affected by attention, and reading comprehension at the text level.

This research is the first to assess the intricate relations between types of dyslexia and types of attention deficits, and it has demonstrated how important it is to assess reading and attentional personal profiles of children and adults with reading and/or attention deficits. We have demonstrated that the different types of dyslexia are dissociated from the different attention deficits and that individuals who suffer from a reading disorder, attention deficit or both can be characterized by various reading and attention profiles. The sensitive identification of detailed reading and attention profiles may improve significantly the ability to select personalized tailor-made interventions that will aim at facilitating reading as well as other everyday functioning.

## **ACKNOWLEDGMENTS**

This article is dedicated to the precious memory of Limor Lukov, our much-loved and appreciated student and teacher, an avid proponent of the dissociability of dyslexias from attention disorders and from phonological disorders. The article is mainly based on Limor's PhD research. This research was supported by the Israel Science Foundation (grant no. 837/04, Shalev; 1066/14, Friedmann), by the Lieselotte Adler Laboratory for Research on Child Development, and by the Australian Research Council Centre of Excellence for Cognition and its Disorders (CE110001021).

## **REFERENCES**


eye fixation duration in reading. *Neuropsychol. Rehabil.* 12, 177–197. doi: 10.1080/09602010244000002


Zoccolotti, P., De Luca, M., Di Pace, E., Judica, A., Orlandi, M., and Spinelli, D. (1999). Markers of developmental surface dyslexia in a language (Italian) with high grapheme–phoneme. *Appl. Psycholinguist.* 20, 191–216. doi: 10.1017/S0142716499002027

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 May 2014; paper pending published: 27 July 2014; accepted: 13 October 2014; published online: 12 January 2015.*

*Citation: Lukov L, Friedmann N, Shalev L, Khentov-Kraus L, Shalev N, Lorber R and Guggenheim R (2015) Dissociations between developmental dyslexias and attention deficits. Front. Psychol. 5:1501. doi: 10.3389/fpsyg.2014.01501*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Lukov, Friedmann, Shalev, Khentov-Kraus, Shalev, Lorber and Guggenheim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX A**

**Table A1 | Percentage of relevant errors in reading aloud of each of the 28 participants with dyslexia and with normal attention functions (summarized in Table 1).**


*aAll the error rates of the various types presented in the table are significantly larger than the age-matched control groups for this error type (p* < *0.05), and indicate a dyslexia of the relevant type. Empty cells indicate that there were no relevant errors at all or that the relevant error rate was within the normal range, in most cases less than 1%. The percentages of errors that appear in the table for each dyslexia are: for LPD—migrations in migratable words, for attentional dyslexia—migrations between words in migratable word pairs, for neglect—consonant omission and substitution on the left of the word, for vowel dyslexia—vowel letter omission, substitution, migration, or addition for surface dyslexia—sublexical reading in irregular words and potentiophones, and for phonological buffer dyslexia—errors in non-words.*

*bThe percentage of vowel errors of LGB, SHL, NLK, ALM, and NKV refer to their percentage errors in reading words, for the other vowel dyslexic participants the percentages are from nonwords.*

## **APPENDIX B**

**Table B1 | Percentage of relevant z-scores in the attention tasks of each of the 27 participants with attention disorders and with normal reading (summarized in Table 2).**


*LEB did not participate in a selective attention task; NHP did not complete the orienting attention task.*

## **APPENDIX C**

**Table C1 | Examples for the various types of Hebrew stimuli used in the reading tasks.**


*Each example shows the Hebrew target word and the word or words created by the relevant error, followed by orthographic transliteration, phonological transcription, and translation.*

## Visual processing of multiple elements in the dyslexic brain: evidence for a superior parietal dysfunction

## *Muriel A. Lobier 1,2 \*, Carole Peyrin1,3 , Cédric Pichat 1,3 , Jean-François Le Bas <sup>4</sup> and Sylviane Valdois1,3*

*<sup>1</sup> Laboratoire de Psychologie et NeuroCognition, Université Grenoble Alpes, Grenoble, France*

*<sup>2</sup> Neuroscience Center, University of Helsinki, Helsinki, Finland*

*<sup>3</sup> CNRS, Laboratoire de Psychologie et NeuroCognition, UMR5105, Grenoble, France*

*<sup>4</sup> INSERM U836/Université Joseph Fourier – Institut des Neurosciences, Grenoble, France*

### *Edited by:*

*Donatella Spinelli, Università di Roma Foro Italico, Italy*

### *Reviewed by:*

*Kristen Pammer, The Australian National University, Australia Fabio Richlan, University of Salzburg, Austria*

### *\*Correspondence:*

*Muriel A. Lobier, Neuroscience Center, P. O. Box 56, FI-00014 University of Helsinki, Finland e-mail: muriel.lobier@gmail.com* The visual attention (VA) span deficit hypothesis of developmental dyslexia posits that impaired multiple element processing can be responsible for poor reading outcomes. In VA span impaired dyslexic children, poor performance on letter report tasks is associated with reduced parietal activations for multiple letter processing. While this hints towards a non-specific, attention-based dysfunction, it is still unclear whether reduced parietal activity generalizes to other types of stimuli. Furthermore, putative links between reduced parietal activity and reduced ventral occipito-temporal (vOT) in dyslexia have yet to be explored. Using functional magnetic resonance imaging, we measured brain activity in 12 VA span impaired dyslexic adults and 12 adult skilled readers while they carried out a categorization task on single or multiple alphanumeric or non-alphanumeric characters. While healthy readers activated parietal areas more strongly for multiple than single element processing (right-sided for alphanumeric and bilateral for non-alphanumeric), similar stronger multiple element right parietal activations were absent for dyslexic participants. Contrasts between skilled and dyslexic readers revealed significantly reduced right superior parietal lobule (SPL) activity for dyslexic readers regardless of stimuli type. Using a priori anatomically defined regions of interest, we showed that neural activity was reduced for dyslexic participants in both SPL and vOT bilaterally. Finally, we used multiple regressions to test whether SPL activity was related to vOT activity in each group. In the left hemisphere, SPL activity covaried with vOT activity for both normal and dyslexic readers. In contrast, in the right hemisphere, SPL activity covaried with vOT activity only for dyslexic readers. These results bring critical support to the VA interpretation of the VA Span deficit. In addition, they offer a new insight on how deficits in automatic vOT based word recognition could arise in developmental dyslexia.

**Keywords: developmental dyslexia, visual attention, reading, superior parietal lobes**

## **INTRODUCTION**

Developmental dyslexia is a severe, persistent reading disability: dyslexic children and adults do not acquire efficient, fluent reading despite adequate schooling and intelligence. A large body of research has supported difficulties with language processing (Bishop and Snowling, 2004) and more specifically with phonological processing of oral language as the core deficit in dyslexia (Ramus, 2003; Vellutino et al., 2004; Ramus and Szenkovits, 2008). Accordingly, numerous studies have reported links between phonological deficits and left hemisphere language areas neural dysfunction in developmental dyslexia (see Démonet et al., 2004; Maisog et al., 2008; Richlan et al., 2009, 2011 for reviews). In addition, developmental dyslexia has been associated with disrupted activity in the left ventral occipitotemporal (vOT) cortex (Richlan et al., 2009, 2011; Van der mark et al., 2011) thought to subserve visual processing of letter strings (Dehaene and Cohen, 2011). However, in accordance with multifactorial accounts of dyslexia (Pennington, 2006; Menghini et al., 2010; Vidyasagar and Pammer, 2010), recent research

has hinted towards a possible visual component to the core deficit in dyslexia. Various deficits in visual attention (VA) and visual processing have been identified in dyslexic individuals as supporting different visual-attentional models of developmental dyslexia (Hari and Renvall, 2001; Facoetti et al., 2006, 2008; Boden and Giaschi, 2007; Bosse et al., 2007; Vidyasagar and Pammer, 2010). Most of these models assume the co-occurrence of VA and phonological deficits in dyslexic individuals except the VA span model which posits that a deficit in multi-element (ME) visual processing can account for reading acquisition problems in a subset of dyslexic individuals who otherwise have preserved phonological skills (Valdois et al., 2004, 2014b; Bosse et al., 2007).

Indeed, according to both case studies (Valdois et al., 2003; Dubois et al., 2010) and group studies (Bosse et al., 2007; Lassus-Sangosse et al., 2008), a subset of dyslexic children suffers from a selective deficit in multiple letter report tasks, independently from any phonological deficit. Performance on report tasks is interpreted as indexing the number of individual elements that

can be processed in parallel, i.e., the VA Span. Impaired performance is thus viewed as a consequence of a reduced VA Span: dyslexic children cannot process as many letters in parallel as normal reading children. Furthermore, within the theoretical framework of the MultiTrace Memory (MTM) model (Ans et al., 1998), a reduced VA Span also results in impaired reading performance. According to the MTM model, letters of a word are processed in parallel through a visual-attention window. In expert readers, the size of this window adapts to the length of the to-be-read word in order to encompass all of its letter string. If the to-be-read word is unfamiliar, the window's size is subsequently reduced to cover fewer letters and focus on the word orthographic units (letters, graphemes, or syllables). Reading then switches from a fast, parallel procedure to a slow, serial identification of successive orthographic units. If a deficit in visual processing capacity limits the ability of the visual-attention window to spread over a whole word, then words cannot be identified by a fast, parallel procedure resulting in impaired reading ability (for a more detailed and complete theoretical overview of the role of VA Span in impaired reading, see Valdois et al., 2004).

The VA Span definition places no constraints on the visual elements to which it refers: they may be letters or other visual elements. In turn, the VA Span deficit hypothesis posits that the ME processing deficit it evidences extends to any type of visual element, independently of its lexical nature. However, it has been suggested that low performance in letter report tasks using both verbal report and verbal stimuli (letters or digits) follows not from a deficit in visual processing but from impaired mapping of visual codes onto phonology (Hawelka and Wimmer, 2008; Ziegler et al., 2010). This hypothesis is supported by data suggesting that normal readers' performance on a two alternative forced choice partial report task is higher than dyslexic readers' for letters and digits but not symbols (Ziegler et al., 2010). However, other studies have brought forward evidence for a ME deficit that extends to non-verbal tasks and stimuli. Dyslexic adults and children are impaired on a symbol-string matching task requiring no verbal report (Pammer et al., 2004; Jones et al., 2008). A recent study used a non-verbal ME visual processing task to explore visual processing performance on non-verbal character strings in dyslexic children chosen to have a VA span disorder (Lobier et al., 2012b). In this task, a five element string made up of characters belonging to two different categories (e.g., pseudoletters/unknown geometrical shapes, letters/digits) was displayed for 200 ms and then masked. Participants were asked to identify how many characters in the displayed string belonged to a previously designated target category. VA span impaired dyslexic children showed lower performance than age-matched controls, regardless of target character category. Since this categorization task required no verbal response and since no visual to phonological code mappings exist for novel target characters, these results argue strongly for an underlying visual processing impairment in theVA Span deficit (seeValdois et al.,2012,for converging evidence against the visual to phonological code mapping hypothesis). The prevalence of the VA Span deficit in the dyslexic population has been previously estimated in cohorts of dyslexic children. Around a third of dyslexic children were found to exhibit an isolated VA

Span deficit in either French (Bosse et al., 2007; Zoubrinetzky et al., 2014), British (Bosse et al., 2007), or Brazilian Portuguese (Germano et al., submitted).

Abnormal neural activity in brain areas associated with VA in VA Span impaired children has brought forward additional evidence for VA as a constraining factor of VA Span performance in dyslexia. Neural correlates of the VA Span deficit were first explored in an functional magnetic resonance imaging (fMRI) study comparing neural activity for a flanked letter categorization task between normal reading and VA Span impaired dyslexic children (Peyrin et al., 2011). VA mechanisms involved in multiletter processing were assessed using a task that minimized verbal report and phonological processing. Results showed that superior parietal lobule (SPL) activity was reduced bilaterally in dyslexic children compared to controls. Importantly, a recent case report (Peyrin et al., 2012) suggested that this SPL dysfunction is specific to the VA span deficit rather than to dyslexia. Neural activity for the same visual categorization task was assessed in two dyslexic adults with distinct neurocognitive profiles. SPL activity was normal for the patient with a phonological deficit but preserved VA span performance whereas it was decreased for the patient with a VA span deficit but preserved phonological performance.

The co-occurrence of poor multiple letter report performance and SPL dysfunction is consistent with a visuo-attentional account of the VA span disorder. SPL activity has not only been associated with visuo-spatial attention (Wojciulik and Kanwisher, 1999; Corbetta and Shulman, 2002; Behrmann et al., 2004) but also, more specifically, with ME processing (Mitchell and Cusack, 2008; Xu and Chun, 2009; Scalf and Beck, 2010). Closer to the cognitive demands of reading, SPL activity relates to length effects in pseudo-word reading (Valdois et al., 2006) and is observed in proficient readers when word letter parallel identification is compromised (Cohen et al., 2008 see also Gaillard et al., 2006). If SPL plays a role in reading acquisition, it should show different patterns of activation for different levels of reading proficiency. Indeed, less proficient readers have stronger bilateral (children vs. adults, see Church et al., 2008), right lateralized (Adult ex-illiterates vs. literates, see Dehaene et al., 2010) posterior parietal activity than more proficient readers. In addition, activity in left SPL and right IPL/SPL clusters is negatively correlated with reading proficiency (Jobard et al., 2011). In line with this putative role of SPL in reading acquisition, Brem et al. (2010) report activity peaks in right SPL for visual word processing in learning to read children. In Chinese, Cao et al. (2010) shows developmental increases in bilateral SPL during visuo-orthographic processing and stronger involvement of the right SPL during the visual comparison of twocharacter words than during phonological processing of the same words.

We recently showed stronger SPL involvement for preorthographic processing of multiple character strings than of single flanked characters, for both alphanumeric (AN) and nonalphanumeric (nAN) characters (Lobier et al., 2012a). However, this reduced SPL activity has only been reported for multiple letter processing, which cannot disentangle between a general ME impairment or a more specific letter processing impairment. A stronger argument for a VA dysfunction as the underlying factor

in VA Span impairment would be made by showing a similar SPL dysfunction in dyslexic participants on a non-verbal ME task using both verbal and non-verbal stimuli.

The main aim of this study is to use non-verbal categorization tasks to isolate the underlying neural dysfunction in the VA Span disorder in dyslexia using fMRI. VA span impaired dyslexic adults and healthy skilled adult readers carried out a visual categorization with either alphanumeric, familiar characters or non-alphanumeric, unfamiliar characters. In order to isolate neural correlates specific to parallel processing of MEs, the task had two conditions: a ME categorization condition of interest and a single-element (SE) categorization control condition. Both conditions were carried out with either AN or nAN characters. While both the experimental and control conditions required visual categorization of the attended stimuli, only the experimental condition required processing of several elements. Contrasts between these conditions should highlight neural activations that are specific to ME processing demands.

Our central hypothesis is that the VA span deficit is associated with disrupted SPL activity for pre-orthographic multiple character processing regardless of character type. In line with previous studies, we expect to find abnormal parietal activations for multiple-element processing for the dyslexic group. More importantly, these abnormal brain activations should befound regardless of stimuli type. We first contrasted whole-brain neural activity between VA span impaired dyslexic adults and control normalreading adults. In addition, we used regions of interest (ROIs) to compare more specifically activity in inferior parietal and superior parietal cortices between groups. Finally, since abnormal activity in the vOT cortex is commonly reported for dyslexic readers, we also used ROIs to test whether SPL activity was correlated with vOT activity.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Twelve dyslexic (mean age 21.6 ± 4.2 years) and twelve healthy, skilled adult readers (mean age 23.8 ± 2.6 years) took part in this study. They were all right-handed and had normal or corrected to normal vision. All participants had given informed consent and received 60 Euros for their participation. Dyslexic participants were recruited through the university disabilities office. They had previously undergone a complete neuropsychological assessment to establish the diagnosis of developmental dyslexia and the presence of a VA span disorder while ruling out any co-morbid attentional disorders (e.g., ADHD). The diagnosis of developmental dyslexia was established using both inventories and testing procedures in accordance with the guidelines of the ICD-10 classification of Mental and Behavioral disorders. Reading speed was estimated for all participants, using the "Alouette" text (Lefavrais, 1965) that required reading a 265 word text as quickly and as accurately as possible during 3 min. Control participants had no reported learning or reading disability. Reading speed for dyslexic participants was significantly lower than for control participants (Dyslexic: Mean = 119wpm, 95%CI = [103–135], Controls: Mean = 202wpm, 95%CI = [185– 219], *t*(22) = 7.9, *p* < 0.0001). This study was approved by the local ethics committee.

### **VISUAL ATTENTION SPAN ASSESSMENT**

All participants carried out a global letter report task in order to assess their VA span abilities. Ceiling effects are often observed for adults on the 5-letter report task used in previous studies with children (Valdois et al., 2003; Bosse et al., 2007). For this reason, a 6-letter report task was developed for testing adults (Peyrin et al., 2012). Stimuli were random 6-consonant strings presented in black upper-case letters on a white background. At the start of each trial, a central fixation point was displayed for 1000 ms followed by a 50 ms blank screen. A horizontal 6-letter string was then presented for 200 ms, centered on fixation. Participants were asked to report all the letters they had seen with no time pressure. Ten training and 24 experimental trials were carried out. Experimental stimuli were 24 consonant strings built-up from 10 consonants (BPTFLMDSRH). An additional 10 different letter strings were used for training. Score was the number of accurately reported letters, regardless of order (maximum score: 144).

The VA span performance of the participants was compared to normative data from the EVADYS diagnostic tool (Valdois et al., 2014a). Every control participant scored within 1 SD of the norm on the VA span task. The dyslexic participants' VA Span abilities were at least 1.65 standard deviations below adult norms. Performance on the 6-letter whole report task indexing ME processing capacity (VA Span) was significantly lower for dyslexic (3.5 letters per trial on average) than for control (5.3 letters) participants (Dyslexic: Mean score = 84, 95%CI = [74–94], Control: Mean score = 128, 95%CI = [123–133], *t*(16.4) = 9.0, *p* < 0.0001).

## **fMRI STUDY**

### *Stimuli*

Four different character categories were used: letters, digits, Japanese Hiragana, and pseudo-letters, with five different characters in each category. While participants had extensive multiple character processing experience with two categories (letters and digits), the other two were completely novel. The font used for letters and digits was Arial. Letters were drawn from the following set of five consonants: D, F, K, M, and V. Digits were drawn from the following set of five digits 3, 5, 6, 8, and 9. Pseudoletters were taken from a set created by Hawelka and Wimmer (2008) by cutting and rearranging letter visual features. The five characters created from consonants D, F, K, M, and V made up the pseudoletter set. The five Hiragana characters were chosen amongst the 48 possible characters of the Hiragana syllabary so that their mean visual complexity as defined by Majaj et al. (2002), was similar to that of the other character sets. Character perimetric complexity is a reliable predictor of character recognition efficiency (Pelli et al., 2006): characters sets with similar average perimetric complexity are recognized with similar efficiency.

For the ME condition, strings of five characters were built-up from these sets. There were 48 AN strings and 48 nAN strings. Out of the 48 AN strings, 24 were consistent and 24 were inconsistent. Consistent strings were made up exclusively of letters and digits. Twelve of the consistent strings contained three letters and two digits and the other 12 contained two letters and three digits. Inconsistent strings were made up of letters, digits and one distractor character, either Hiragana or pseudo-letter. Twelve of the inconsistent strings contained two letters, two digits and one distractor character and the other 12 contained three letters, one digit and one distractor character. The position and choice of the distractor character was controlled across trials. Similarly, individual character positions were counterbalanced across consistent and inconsistent trials. The 48 nAN strings were built up the same way as the AN ones, with pseudo-letters and Hiragana replacing letters and digits. Distractor characters were then letters and digits. For the SE condition, stimuli were made up of one central character surrounded by four pound (#) signs. There were 48 strings: 24 with a central AN character (12 letters, 12 digits) and 24 with a central nAN character (12 pseudo-letters, 12 Hiragana). For all stimulus strings, characters subtended a visual angle of 0.7◦. To minimize visual crowding, the distance between adjacent characters was of 0.57◦. The entire string subtended a visual angle of 5.4◦ and was drawn in white on a black background.

### *Procedure*

A task requiring visual categorization of characters was carried out in two conditions: ME and SE (see **Figure 1**). Stimuli were displayed for 200 ms to avoid useful ocular saccades and serial visual processing. Stimuli display was driven by E-Prime software (E-Prime Psychology Software Tools, Inc., Pittsburgh, USA). Synchronization between scanner and paradigm was ensured by a trigger pulse sent from the scanner to the computer on which E-Prime was running. The paradigm was presented using a video projector (Epson EMP 8200), a projection screen situated behind the magnet and a surface mirror centered above the participant's eyes. A response key was used to collect participant responses. Response accuracy and reaction times (RT, in milliseconds) were recorded.

In the ME condition, visual categorization of individual characters of a ME string was required. Performance was monitored by asking participants to report the number of target category characters present in the stimulus string. For AN strings, participants were asked to report the number of letters present in a letter and digit 5-character string. For nAN strings, participants were asked to report the number of Hiragana characters in a Hiragana and pseudo-letter character string. Participants pressed the index finger button for two target-category characters and the middle finger button for three target-category characters. They carried out 48 trials for each condition, half with two target characters and half with three target characters. Trial order was pseudo-randomized.

In the SE condition, visual categorization of a single character flanked by pound signs was required. Performance was monitored by asking participants to report whether or not the stimulus character belonged to either one of two target categories (AN: letters or digits, nAN: Hiragana or pseudo-letters). If the stimulus character belonged to a target category, participants pressed the index finger button. If it did not, they pressed the middle finger button. They carried out 48 trials for each condition, half of which contained a target category character. Trial order was pseudo-randomized. This condition was designed to control for three important task characteristics. First, low-level visual stimulation was similar to the ME condition: five characters were displayed (four pound signs and a central stimulus character). Second, motor response was the same for both tasks. Last, both conditions required character categorization, controlling for higher-order categorization processing.

Immediately before the scanning session, participants took part in a 45 min training session. Participants first performed two character-identification tasks in order to familiarize themselves with the two unfamiliar character types. During the second part of training, participants were familiarized with the experimental task. For each condition (ME and SE) and each character type (AN and nAN), they first carried out five training trials followed by a sequence of 48 trials with the same timing as the experimental sequence (but different stimulus strings).

### **EVENT-RELATED fMRI EXPERIMENTAL DESIGN**

Each participant carried out four event-related-fMRI sessions: two to assess ME processing (one for AN and one for nAN characters)

and the other two to assess SE processing (one for AN and one for nAN characters). FMRI session order was counterbalanced across participants. Stimuli onsets were optimized using pseudorandomized ER-fMRI paradigms (Friston et al., 1999). For each session, 48 stimulus strings were displayed: 24 consistent and 24 inconsistent. In order to provide an appropriate baseline measure (Friston et al., 1998), 27 null-events (three of them at the end of the session) were included in each session. These null-events comprised a black screen and a fixation dot displayed at the center of the screen. SOA between events was set to 3 s. SOAs between trial events were of 3, 6, or 9 s, depending on the presence of null-events. To reduce eye movements, participants were asked to fixate the fixation dot during null-events. In order to stabilize the magnetic field, each functional run started with five dummy scans that were discarded before analysis. After these dummy scans, 90 functional volumes were acquired for each run. Each functional session lasted 3 min 45 s.

### **MR ACQUISITION**

A whole-body 3T MR scanner was used (Bruker MedSpec S300) with 41 mT/m maximum gradient strength and 120 mT/m/s maximum slew rate. For functional scans, the manufacturer-provided gradient-echi/T2∗ weighted EPI method was used. Thirty-nine adjacent axial slices parallel to the bi-commissural plane were acquired in interleaved mode. Slice thickness was 3.5 mm. The in-plane voxel size was 3 mm × 3 mm (216 × 216 field of view acquired with a 72 × 72 pixels data matrix; reconstructed with 0 filling to 128 × 128 pixels). The main sequence parameters were: TR = 2.5 s, TE = 30 ms, flip angle = 80◦. To correct images for geometric distortions induced by local B0 inhomogeneity, a B0 fieldmap was derived from two gradient echo data sets acquired with a standard 3D FLASH sequence -TE = 9.104 ms). The fieldmap was subsequently used during data processing. Finally, a T1-weighted high-resolution three dimensional anatomical volume was acquired, by using a sagittal magnetization-prepared rapid acquisition gradient echo (MP-RAGE) sequence (field of view = 256 × 224 × 176 mm; resolution = 1.333 × 1.750 × 1.375 mm; acquisition matrix: 192 × 128 × 128 pixels; reconstruction matrix = 256 × 128 × 128 pixels).

### **DATA PROCESSING**

Both preprocessing and statistical analyses of the data were performed using the Statistical Parametric Mapping software (SPM5, Wellcome Department of Imaging Neuroscience, London, UK; http://www.fil.ion.ucl.ac.uk/spm; Friston et al. (1994). Functional volumes were time corrected using the 20th slice as reference. All volumes were then realigned using rigid body transformations to correct for head movement, using the first ER-fMRI session as the reference volume. The T1-weighted anatomical volume was co-registered to the realigned mean images and normalized to MNI space using a trilinear interpolation. The anatomical normalization parameters were then used for functional volume normalization. Finally, each functional volume was smoothed by an 8-mm FWHM (FullWidth at Half Maximum) Gaussian kernel. Time series for each voxel were high-pass filtered (1/128 cut-off) to remove low-frequency noise and signal drift.

### **STATISTICAL ANALYSES** *Whole-brain analyses*

Statistical analyses were performed on the pre-processed functional images for each one of the four sessions. For each session (ME AN and nAN, SE AN and nAN), consistency (consistent and inconsistent character strings) was modeled as a regressor convolved with a canonical hemodynamic function. Movement parameters computed during the realignment corrections (three translations and three rotations) were included in the design matrix of each session as additional parameters. Parameter estimates of activity in each voxel were generated using the general linear model at each voxel for each condition and each participant. Linear contrasts between the HRF estimates for the different experimental sessions were used to generate statistical parametric maps. All analyses were carried out with consistent and inconsistent trials separately as well as together. Results did not differ qualitatively between analyses; however all results presented here (behavioral and fMRI) were computed using consistent trials only.

At the individual level, statistical parametric maps were computed for several contrasts of interest. The entire cerebral network associated with ME processing was assessed by contrasting the ME condition to baseline (fixation point) conjointly for both character types (AN and nAN). The cerebral network associated with SE processing was assessed by contrasting the SE condition to baseline conjointly for both character types (AN and nAN). We identified brain regions involved more specifically in attention demanding simultaneous processing by contrasting the multiple to the SE condition for each character type. We then performed separate random-effect group analyses for control and dyslexic participants on the contrast images from individual analyses (Friston et al., 1998), using one-sample *t*-tests. Clusters of activated voxels were identified for each group, based on the intensity of the individual responses (Contrasts against baseline: voxel-wise threshold: *p* < 0.001 uncorrected for multiple comparisons, *T* > 4.0, with an cluster extent threshold correction of *p* < 0.05, Contrasts between conditions: voxel-wise threshold: *p* < 0.001 uncorrected for multiple comparisons, *T* > 4, with a cluster extent threshold of 20 voxels) Finally, two-sample *t*-tests were performed in order to statistically compare brain activity between controls and dyslexics on the relevant contrasts. Significance thresholds for between-group comparisons (voxel-wise threshold: *p* < 0.001 uncorrected for multiple comparisons, *T* > 3.5, with a cluster extent threshold of 20 voxels) were chosen by reference to previous studies reporting activation differences between skilled and dyslexic readers (Hoeft et al., 2007; van der Mark et al., 2009; Wimmer et al., 2010). For all analyses, brain regions were reported according to the Automated Anatomical Labelling SPM toolbox (Tzourio-Mazoyer et al.,2002).

### *A priori ROIs*

Analysis was finally completed by statistically comparing activity for skilled and dyslexic readers within *a priori* anatomical ROIs. A first set of four ROIs was defined using predefined masks from the Wake Forest University (WFU) PickAtlas (Maldjian et al., 2003). ROI masks were created with the automated anatomical labeling atlas, which uses an anatomical parcellation of the MNI MRI single-subject brain and sulcal boundaries to define

each anatomical volume. In order to assess neural activity in the part of the vOT cortex usually associated with character string processing, a second set of two *a priori* ROIs was defined by rectangular boxes. These ROIs were designed in reference to previous research (Jobard et al., 2003; Cai et al., 2010) within the bilateral fusiform and inferior temporal gyri rather than by anatomical boundaries. Parameter estimates (percent signal change) of eventrelated responses were then extracted from all ROIs for each participant. We both compared ROI activity between groups and tested whether activity levels in SPL covaried with activity levels in vOT. All ROIs were constructed using the SPM Marsbar toolbox (http://marsbar.sourceforge.net).

To investigate the presence of neural dysfunction in dyslexic participants, we first compared ROI activity between groups across different task conditions. To investigate putative links between neural activity in superior parietal cortex and in ventral occipital cortex for ME processing, we used multiple regression analyses to test whether percent signal change for the ME condition in SPL ROIs significantly predicted percent signal change in vOT ROIs while taking into account the putative effect of stimulus type. We ran separate regressions for each group (Dyslexic/Control) and hemisphere (Right/Left). The regression models tested were vOT ∼ SPL + stimulus Type [stimulus Type was numerically coded as 0 (AN) or 1 (nAN)].

### **RESULTS**

### **fMRI BEHAVIORAL RESULTS**

Reaction times and accuracy for consistent trials during the fMRI task are presented in **Table 1**. For each condition, RTs and accuracy were entered in a 2 × 2 mixed design ANOVA with Group (Dyslexic vs. Control) as a between-subjects factor and character type (AN vs. nAN) as a within-subject factor. ME condition accuracy data were transformed in order to meet parametric assumptions. For the SE condition, there were no significant main effects or interaction (Group: *F*(1,22) = 4.1, *p* = 0.054, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.11, Type: *<sup>F</sup>*(1,22) <sup>=</sup> 1.4, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02, Group <sup>×</sup> Type: *<sup>F</sup>*(1,22) <sup>=</sup> 0.08, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.00). For ME RTs, the Type main effect was significant [*F*(1,22) <sup>=</sup> 7.5, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05], as well as the Group × Type interaction [*F*(1,22) = 9.1, *p* < 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05] Type: [*F*(1,22) <sup>=</sup> 16.5, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19]. The main effect of Group was not significant [*F*(1,22) = 2.9, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.10]. Contrasts corrected for multiple comparisons showed that dyslexic participants are slower than control participants for AN character strings (*t*(22) = 2.8, *p* < 0.05) but not for nAN strings (*t*(22) = 0.5, n.s.). Accuracy for the SE condition was near ceiling for both groups. There were no significant main effects of Group [*F*(1,22) <sup>=</sup> 4.1, *<sup>p</sup>* <sup>=</sup> 0.053, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.11] or Type [*F*(1,22) <sup>=</sup> 1.4, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02] and no significant Group <sup>×</sup> Type interaction [*F*(1,22) <sup>=</sup> 0.8, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.00]. For accuracy in the ME condition, control participants were significantly more accurate than dyslexic participants [*F*(1,22) = 8.3, *p* < 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.21], and participants were more accurate for AN strings than for nAN strings [*F*(1,22) <sup>=</sup> 16.5, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19]. The Group × Type interaction was not significant [*F*(1,22) = 2.0, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.03], suggesting that the accuracy difference between dyslexic and control participants is the same regardless of character type.

## **fMRI RESULTS**

### *Within-group brain networks*

First, we used contrasts between our task and baseline to identify the main networks of brain regions involved in multiple or SE processing in each group separately for AN and nAN character strings. Brain activations are illustrated in **Figure 2**. Relative to baseline (fixation) ME processing activated a broad and bilateral cortical network in control participants regardless of stimulus type. Visual areas included occipital extra-striate cortex bilaterally as well as fusiform and inferior temporal gyri bilaterally. Parietal activations extended over SPL and IPL bilaterally. Finally, cortical activations included the pre-supplementary motor area for AN characters as well as the right superior and middle frontal gyri for nAN characters. Dyslexic participants activated a more limited network. For AN characters; visual areas included the lingual gyrus. Parietal areas were limited to left IPL and postcentral gyrus. As with control participants, cortical activations included pre supplementary cortex. In addition, activation was present in the left rolandic operculum and supramarginal gyrus. The activation pattern was similar for nAN characters, save for the left rolandic operculum and supramarginal gyrus activity that was absent. Relative to baseline, SE processing activated a mostly ventral cortical network in control participants. For AN characters, a very limited network included the left calcarine, lingual gyrus, and cuneus as well as the right fusiform gyrus. For nAN characters; visual areas included occipital gyri and fusiform gyri bilaterally. Activated parietal areas were limited to the left postcentral and precentral gyri. For dyslexic participants, there were no significant activations at our chosen threshold for AN characters (Lowering the threshold revealed activation patterns similar to control participants). For nAN characters, activated visual areas included the right fusiform and bilateral lingual gyri.

For each group, brain regions specific to ME processing were identified by contrasting ME and SE conditions for each stimuli type (AN and nAN) separately. Brain areas showing stronger activations for the ME than the SE condition are listed in **Table 2** and illustrated in **Figure 3**. For control participants, the [*ME* > *SE*] contrast for AN strings activated a single right hemisphere parietal cluster. This cluster extended over parts of the superior and inferior parietal lobule as well as angular, superior occipital and mid occipital gyri. For nAN strings, control participants had stronger ME activations bilaterally in parietal cortex. A left hemisphere parietal cluster extended mainly over SPL (and over limited parts of precuneus and IPL) while the right hemisphere cluster extended exclusively over SPL. Increased activity was also found in the pre supplementary motor area. For dyslexic participants, the [*ME* > *SE*] contrast for AN and nAN characters revealed presupplementary motor area clusters in both conditions. Neither contrast revealed any parietal activation at the chosen threshold. No brain areas showed significantly stronger activity for the ME condition than for the SE condition in either group:..

### *Between-group differences in activation*

Two-sample *t*-tests were then performed to statistically compare brain activation in control and dyslexic readers on relevant contrasts. To identify brain areas significantly more activated in normal readers than in dyslexic participants in ME processing,


#### **Table 1 | fMRI task performance of dyslexic and control participants for consistent trials.**

*Reaction times are reported in ms, accuracy in proportion correct.*

**element processing for AN and nAN conditions for control and dyslexic participants, overlaid on a surface-rendered single subject brain normalized to MNI template.** Top two rows: BOLD activation for the contrast [ME > Baseline] for each condition (AN and nAN) in control and

contrast [ME > Baseline] for each condition (AN and nAN) in control and dyslexic participants. For all contrasts: voxel-wise threshold of *p* < 0.001 uncorrected with an extent threshold correction of *p* < 0.05 at the cluster level.

we compared activations for the ME condition between each group for each character type separately. Brain areas showing stronger activations for the control group than for the dyslexic group are listed in **Table 3** (ME and SE conditions) and illustrated in **Figure 4** (ME condition). For AN characters, the right parietal cortex (including SPL and extending to the superior part of the occipital cortex and precuneus) and the left vOT cortex (including the inferior temporal and fusiform gyri) were more strongly activated in control than dyslexic readers. For nAN characters, there were stronger activations for control than dyslexic participants in the right parietal cortex (including SPL and precuneus) as well as in the right vOT cortex (including inferior

temporal and inferior occipital gyri). The opposite comparison ([Dyslexic > Control]) revealed no areas more activated for dyslexic than for control participants for either character type.

We then compared activations for SE processing between each group by contrasting activations maps ([Control > Dyslexic]) for the SE condition separately for each character type (AN and nAN) There were no brain areas significantly more activated in control than in dyslexic participantsfor either character type. The opposite contrasts ([Dyslexic > Control]) showed that for AN characters, a single left middle/superior frontal gyri cluster was more strongly activated in dyslexic than control participants (see **Table 3**). For


### **Table 2 | Cerebral regions significantly more activated for multiple element than for single element processing.**

*The statistical significance voxel-wise threshold of p* < *0.001 uncorrected (T* > *4.02) with an extent threshold correction of p* < *0.05 at the cluster level. For each cluster, peak MNI coordinates (x,y,z), cluster spatial extent k and peak z-value are indicated. Anatomical labels are based on the automated anatomical labeling (AAL) atlas. Labels represent anatomical regions with the largest percentages of overlap with the activation cluster.*

**surface-rendered single subject brain normalized to MNI template.** For all contrasts: voxel-wise threshold of *p* < 0.001 uncorrected with a cluster threshold of 20 voxels.

nAN characters, there were no brain areas significantly more activated in dyslexic than in control participants.

### *Regions of interest*

Previous research has linked behavioral deficits in simultaneous visual processing in dyslexia to lower activation in parietal brain areas, and more specifically in the SPL bilaterally and the left inferior parietal lobule (Peyrin et al., 2011; Reilhac et al., 2013). We compared parietal activations in dyslexic and skilled readers in four predefined and standardized neuro-anatomical ROIs using predefined masks from the WFU PickAtlas (Maldjian et al., 2003). The first two ROIs were defined as right and left SPL intersected with BA7 and the next two as right and left IPL intersected with BA 40 (as defined by the automated labeling atlas which uses an anatomical parcellation of the MNI single subject brain and sulcal boundaries to define anatomical volumes). The SPL/BA7 ROI sizes were, respectively, of 139 (R) and 136 (L) voxels. The IPL/BA40 ROI sizes were, respectively, of 333 (R) and 367 (L) voxels (ROIs are illustrated in **Figure 5**). Parameter estimates (percent signal change) were extracted for each ROI and entered in a 2 × 2 × 2 mixed ANOVA with Condition (ME vs. SE) and Character Type (AN vs. nAN) as within-subject factors as well as Group (Dyslexic vs. Control) as a between-subject factor (see **Figure 5**). Concerning right SPL, there were significant main effects of Condition [*F*(1,22) <sup>=</sup> 21.3, *<sup>p</sup>* <sup>&</sup>lt; 0.0001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.13] and Group [*F*(1,22) <sup>=</sup> 12.5, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.22] as well as a significant Group × Condition interaction [*F*(1,22) = 7.5, *p* < 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05]. There was neither a significant main effect of character


**Table 3 | Overview of clusters significantly more activated for one group compared to the other [control** *>* **dyslexic and control** *>* **dyslexic; voxel-wise threshold of** *p <* **0.001 uncorrected (***T >* **3.5) with a cluster extent** *k >* **20].**

*For each cluster, peak MNI coordinates (x,y,z), cluster spatial extent k and peak z-value are indicated. Anatomical labels are based on the AAL [(automated anatomical labeling) atlas (Tzourio-Mazoyer et al., 2002)]. Labels represent anatomical regions with the largest percentages of overlap with the activation cluster. Contrasts with no significant clusters are not presented.*

**FIGURE 4 | Brain areas more strongly activated in control participants than in dyslexic participants for ME processing and AN or nAN characters, overlaid on a surface-rendered single subject brain normalized to MNI template.** For all contrasts: voxel-wise threshold of *p* < 0.001 uncorrected with a cluster threshold of 20 voxels.

type nor any other significant interaction. The difference in activation between groups was affected by the number of elements to be processed. Contrasts indicated that the interaction was driven by a different effect of Group in each Condition. The effect of Group was significant for the ME condition [*F*(1,22) = 20.4, *p* < 0.001], but non-significant in the SE condition [*F*(1,22) = 3, n.s.]. Concerning left SPL, there were significant main effects of Condition [*F*(1,22) <sup>=</sup> 11.9, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.09] and Group [*F*(1,22) <sup>=</sup> 8.4, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.11]. No other effects were significant. The difference in activity between groups in left SPL is not affected by condition demands. Concerning IPL, results were similar for right and left hemisphere. There were no significant main effects for either Group [RH: *F*(1,22) = 1.1, n.s.; LH: *F*(1,22) = 0.7, n.s.], Condition [RH: *F*(1,22) = 0.1, n.s.; LH: *F*(1,22) = 0.1, n.s.] or Character Type [RH: *F*(1,22) = 0.6, n.s.; LH: *F*(1,22) = 3.2, n.s.], suggesting that IPL is not specifically implicated in ME processing in either healthy or dyslexic readers.

Abnormal brain activity for letter strings in the left vOT cortex in dyslexia is well documented (see Richlan et al., 2011 for a recent meta-analysis). We built a ROI covering the fusiform and inferior temporal gyri using a coordinate-delimited box (RH: *X* = –34 to –55, *Y* = –34 to –68, *Z* = –4 to –26, mirrorreversed for LH). This ROI was defined by Cai et al. (2010) according to activation peaks reported in meta-analysis of normal word reading by Jobard et al. (2003). Parameter estimates were extracted and analyzed similar to SPL and IPL ROIs (See **Figure 6A**). In the right hemisphere ROI, there was a significant main effect of Group [*F*(1,22) <sup>=</sup> 7.5, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.13] and no other effects were significant [Condition: *F*(1,22) = 1.5, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.01; Type: *<sup>F</sup>*(1,22) <sup>=</sup> 0.01, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.00]. The result pattern was similar in the left hemisphere with a significant main effect of Group [*F*(1,22) <sup>=</sup> 7.5, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.14] and no other significant effects [Condition: *F*(1,22) = 1.3, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.01; Type: *<sup>F</sup>*(1,22) <sup>=</sup> 0.01, n.s., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.00]. Reduced

brain activity in the vOT cortex for dyslexic participants is present for single or ME processing as well as for AN or nAN character strings.

To investigate putative links between neural activity in superior parietal cortex and in ventral occipital cortex, we ran regressions for each group and hemisphere with percent signal change in vOT ROIs as the dependent variable and percent signal change in SPL ROIs as well as stimulus type as regressors (Scatterplots of the data are shown in **Figure 6B**). The effect of stimulus type was non-significant in all regressions, suggesting that a putative link between vOT and SPL is independent of character type. In the right hemisphere, SPL predicted vOT for the dyslexic group [Full regression: *<sup>F</sup>*(2,21) <sup>=</sup> 10.1, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.49, *<sup>p</sup>* <sup>&</sup>lt; 0.001, SPL regressor: β = 0.6, *t* = 4.5, *p* < 0.0001], but, not for the control group [Full regression: *<sup>F</sup>*(2,21) <sup>=</sup> 0.5, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.05, n.s., SPL regressor: β = 0.3, *t* = 1.0, n.s.]. In the left hemisphere, SPL predicted vOT for the dyslexic group [Full regression: *<sup>F</sup>*(2,21) <sup>=</sup> 8.9, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.46, *p* < 0.01, SPL regressor: β = 0.8, *t* = 4.2, *p* < 0.0001], as well as for the control group [Full regression: *<sup>F</sup>*(2,21) <sup>=</sup> 4.3, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.29, *p* < 0.05, SPL regressor: β = 0.6, *t* = 3.0, *p* < 0.01].

### **DISCUSSION**

The present fMRI study compared character string processing in VA Span impaired dyslexic readers and healthy skilled readers. Reduced performance of dyslexic participants on a 6-letter global report compared to control participants is posited to index a general impairment of parallel ME processing. This VA Span impairment has been associated with reduced SPL activation for

multiple letter processing in dyslexic children (Peyrin et al., 2011). The main purpose of this study was to extend these results to nAN character processing. We hypothesized that abnormal parietal activations should be found in dyslexic individuals with a VA span disorder regardless of character type for ME processing. In addition, we hypothesized that if parietal cortex is involved in visual processing and information extraction from multiple character strings, then parietal activity should correlate with vOT activity for character string processing. Participants carried out a visual categorization task in two conditions: SE or MEs. The task was carried out with alphanumeric, familiar characters and non-alphanumeric, unfamiliar characters in order to investigate the stimulus specificity of the putative parallel ME processing deficit.

Dyslexic participants for this study were selected to present aVA Span deficit at the individual level. VA Span abilities were assessed outside the scanner, using a 6-letter whole report paradigm similar to the 5-letter paradigm used with children (Bosse et al., 2007; Bosse and Valdois, 2009). Dyslexic participants were not able to report as many letters from a briefly presented array of letters as normal-reading adults. This behavioral impairment is taken as indexing a reduced ability to attend to and process MEs simultaneously. Dubois et al. (2010) showed that a reduced VA Span co-occurred with reduced VA capacity for MEs in dyslexic children while Stenneken et al. (2011) provide similar evidence for reduced VA capacity in high achieving dyslexic adults. In our experimental fMRI task, dyslexic participants were expected to perform as well as control participants for the SE condition, but to perform significantly worse for the ME condition, in line with a specific ME processing deficit. Furthermore, the ME processing behavioral impairment has been associated with abnormal brain activations in the parietal cortex, and more specifically in SPL. Comparisons between activations for ME processing in control and dyslexic participants were expected to highlight abnormal parietal neural activity in dyslexia, regardless of to-be-processed character type.

Behavioral results are consistent with a specific ME processing deficit regardless of character type. Both groups performed at ceiling for SE categorization, although RTs were slower for nAN characters than for AN characters for both groups. For the ME condition, dyslexic participants were less accurate than control participants regardless of character type, but were slower only for AN characters. Reduced accuracy for both character types argues for a general inability to attend to and process all displayed elements in VA Span impaired dyslexics. The different pattern of results for RTs could be explained by accuracy and RTs indexing different processes in character recognition for short exposure durations (Santee and Egeth, 1982). While accuracy could be sensitive to early perceptual effects, RTs could be more sensitive to later processes such as response interference. Within such a framework, poor VA capacity (an early process) would lead to poorer accuracy for dyslexic participants regardless of character type. Interference by later processes could be stronger when the task is not performed at ceiling performance levels, resulting in slowed RTs for dyslexic participants for both character types and in slowed RTs for control participants only for nAN characters.

### **NEURAL CORRELATES OF SINGLE AND MULTIPLE ELEMENT PROCESSING IN HEALTHY, SKILLED READERS**

In control participants, ME processing recruited additional regions from a broad occipito-parietal network compared to SE processing (see **Figure 2**). ME processing activated vOT cortex, as expected for processing single (Flowers et al., 2004) or multiple letters and symbols (Tagamets et al., 2000; Turkeltaub et al., 2003; Brem et al., 2006). However, patterns of parietal activation differed. In SE processing, there were no significant parietal activations. In contrast, ME processing activated a broad parietal network, including SPL, IPL, and precuneus bilaterally. Involvement of IPL and SPL in VA processes is well documented (Behrmann et al., 2004), and could be related to the attentional demands of attending to several characters. Furthermore, activations of SPL and IPL for multiple character processing are consistent with reports of similar activations in adult healthy skilled readers for letter string processing (Levy et al., 2008; Valdois et al., 2009), a flanked character categorization task (Peyrin et al., 2008) or a visual matching task (Reilhac et al., 2013), and in typically reading children for the same flanked character categorization task (Peyrin et al., 2011).

Brain areas specifically involved in ME processing in healthy readers were identified by contrasting ME to SE conditions for each stimuli type (AN and nAN) separately. ME processing activated parietal cortex more strongly than SE processing for both character types. For nAN characters, additional increased activation were located in the right insula, as have been previously reported in VA tasks (Hahn et al., 2006), and in the pre supplementary motor area consistent with that area's putative role in cognitive processes (Picard and Strick, 2001). Increased SPL activity for ME processing was limited to the right hemisphere for AN characters while bilateral for nAN characters. Similar recruitment of left-side homologues for VA tasks with high cognitive demands has been previously reported (Nebel et al., 2005). SPL activations are broadly consistent with our team's previous studies investigating neural correlates of ME processing (Peyrin et al., 2008, 2011), albeit specific activity seems to be more right lateralized in this study. As parietal activity has consistently been associated with visuo-spatial attention (Corbetta and Shulman, 2002; Behrmann et al., 2004), increased parietal activations for both conditions (AN and nAN) could index increased demands onVA for the processing of MEs.

## **NEURAL CORRELATES OF SINGLE AND MULTIPLE ELEMENT PROCESSING IN DYSLEXIC READERS**

Neural networks associated with single and ME processing were more limited in dyslexic participants. For SE processing, visual processing activity was limited to the occipital and occipitotemporal cortices. ME processing in dyslexic readers failed to elicit the broad parietal network present for control participants. Although similar pre-supplementary motor area activations were present for both groups, parietal activations for dyslexics were limited to the left supramarginal gyrus and post-central gyrus. This relative absence of parietal activation is consistent with previous assessments of neural activity for multiple letter processing in dyslexic participants with poor VA Span performance (Peyrin et al., 2008, 2011, 2012; Reilhac et al., 2013; Valdois et al., 2014b).

Further assessment of neural networks subserving ME processing was carried out by contrasting multiple and SE processing for each character type. Similarly to control participants, ME processing led to increased pre-supplementary motor area activations in both conditions (AN and nAN). This pre-supplementary motor area activity, present for more demanding task conditions (ME > SE AN and nAN for dyslexic participants, but also ME > SE nAN for control participants) could reflect higher cognitive demands (Picard and Strick, 2001). However, a complete absence of parietal activation in either hemisphere, for either character type, is to be noted. This absence of parietal activations could reflect a failure to engage appropriate attentional mechanisms for processing MEs, failure that would then lead to impaired behavioral performance.

### **MODULATION OF MULTIPLE ELEMENT PARIETAL ACTIVATIONS BY READING ABILITY**

To identify brain areas significantly more activated in normal readers than in dyslexic participants in ME processing, we compared activations for the ME condition between each group for each character type separately. For both character types (AN and nAN), control participants had larger activations in broadly similar areas in both ventral and dorsal cortices. Reduced activity in vOT cortex was present in the left hemisphere for AN characters and in the right hemisphere for nAN characters. Consistent with the difference in ME processing activity patterns between groups, dyslexic participants exhibit reduced activation in right hemisphere SPL regardless of character type. While previous studies have hinted towards a left SPL dysfunction in VA Span impaired dyslexics (Peyrin et al., 2008, 2011), the current findings seem to point to right SPL as the critical area subserving successful ME processing.

Taken together, results from these whole-brain analyses point towards a right hemisphere superior lobule dysfunction inVA Span impaired dyslexic adults. This functional impairment of parietal cortex seems to be condition-related (present in multiple but not in SE processing) but not stimuli-type related (equally large for NA and nAN characters). Furthermore, this pattern of dysfunction is localized to SPL. This account is supported by our *a priori* ROI analyses. For right hemisphere SPL, the difference in activation between groups was affected by the number of elements to be processed (the activation difference was present for ME processing but absent for SE processing). Interestingly, although whole-brain comparisons between groups did not reveal any left hemisphere activation differences, ROI analyses of left SPL showed stronger activations for normal readers for both ME and SE processing.

A possible confounding factor in these results is the difference in behavioral performance between groups. Differences in neuronal activity could reflect lower accuracy for dyslexic participants within a functional parietal network rather than a dyslexic parietal dysfunction. It, however, seems unlikely that between-group differences in neuronal activation only resulted from between-group differences in RTs, since between-group neuronal activity differences were present for the ME-nAN condition in the absence of between-group RTs differences.

The critical result of this study is that this parietal dysfunction is present regardless of character type. Whole-brain comparisons between groups for the ME-nAN condition revealed dyslexic under-activation in right hemisphere SPL clusters. Indeed, result patterns in SPL ROIs suggested that activations did not differ between character types, and this was true for both dyslexic and control participants. The activation difference between control and dyslexic participants is the same for AN, familiar, verbal characters, and nAN, unfamiliar, non-verbal characters. This strongly suggests the existence of abnormal neural function in dyslexia in non-language related processes.

Finally, this pattern of condition sensitive/stimuli non-sensitive deficit seems to be circumscribed to right SPL. Activation patterns in other parietal (left SPL, bilateral IPL) or upper visual areas (bilateral vOT) were explored in our *a priori* ROI analyses. Bilateral IPL is equally activated for ME or SE conditions, suggesting it plays no specific role in ME processing. This is supported by the absence of activation strength differences between dyslexic and control participants for either the ME or SE conditions. There were also stronger activations for control participants than dyslexic participants in vOT and left SPL. However, this activation difference between groups was similar for (1) SE and ME conditions and (2) for AN and nAN character strings. Within the constraints of our experimental paradigm, VOT BOLD activity seems to be sensitive to neither VA demands nor character type.

### **IMPLICATIONS FOR THE VA SPAN HYPOTHESIS OF DYSLEXIA**

While previous studies had reported decreased activations in SPL for ME processing in VA Span impaired dyslexia (Peyrin et al., 2011, 2012; Reilhac et al., 2013; Valdois et al., 2014b), this is the first study to do so by using a non-verbal task requiring verbal and non-verbal stimuli processing. Our results bring forward new evidence for a visual-attention account of the VA Span deficit. Indeed, these data speaks against two alternative explanations of poor dyslexic performance on the VA Span letter report tasks: impaired print tuning and impaired object-to-phonological code mapping. While our results do not rule out impaired print tuning as one of the contributing factors to poor letter report performance, they argue against it being the sole cause. If poor letter report performance only indexed reduced perceptual specialization for letter (Nazir et al., 2004) or letter-like character (Szwed et al., 2012) strings in dyslexia (Maurer et al., 2007; van der Mark et al., 2009), we would expect poor performance on our ME categorization task to be associated with activation differences in visual rather than parietal cortex. If poor letter report performance were a consequence of impaired visual-to-phonological code mapping (Hawelka and Wimmer, 2008; Ziegler et al., 2010 but see Valdois et al., 2012), we would expect dyslexic participants to perform as well as control participants on a non-verbal categorization task, even more so for non-verbal stimuli. In contrast and in line with similar behavioral results previously reported with typical reading children (Lobier et al., 2012b), dyslexic participants performed worse than control participants in the ME condition. Furthermore, impaired visual-to-phonological code mapping would not result in abnormal brain activity for dyslexic individuals for visual processing of non-verbal character strings, as is present in our data. In contrast, decreased activation of right hemisphere SPL, a brain area consistently associated with space-based (Vandenberghe et al., 2001; Yantis et al., 2002) and object-based (Yantis and Serences, 2003) attention, could index impaired ability to properly

attend to MEs simultaneously. SPL could subserve two necessary attentional mechanisms: chunking character strings into appropriate individual elements and allocating spatial attention to each individual element to allow further processing. This could be done by modulating lower level visual responses to spatial locations or features (Corbetta and Shulman, 2002). If all visual elements cannot be attended to in our ME categorization condition, target characters may be missed, leading to poor performance. Similarly, if dyslexic participants can attend to fewer letters than control participants in the VA Span letter report task, their performance will be worse. Poor performance or neurobiological dysfunction cannot be ascribed to different amounts of lifelong experience with characters between dyslexic and control participants. First, all participants had the same amount of limited experience with the nAN characters. Second, SPL parietal dysfunction is of similar magnitude regardless of stimuli type, consistent with similar parietal activation patterns for letter and non-letter stimuli (Nebel et al., 2005). In sum, abnormal parietal activations in VA Span impaired dyslexic participants for ME processing of both AN and nAN character strings supports a ME visual processing disorder as the underlying cause of the VA Span deficit.

### **IMPLICATIONS FOR NEUROBIOLOGICAL MODELS OF DYSLEXIA**

Neurobiological accounts of dyslexia, in line with classic models of reading usually highlight neural dysfunction of the left hemisphere reading network as a hallmark of dyslexia. These functional deficits are present in brain areas thought to subtend phonological processing (left inferior frontal, and parieto-temporal gyri) and orthographic word processing (vOT cortex; see Shaywitz and Shaywitz, 2005 for a review). These abnormal brain activations are identified using reading or reading related tasks (e.g., rhyming) and verbal visual stimuli, in line with a phonological account of dyslexia. The overwhelming developmental model of this disruption of reading neural circuits is one where the vOT neural dysfunction systematically follows from frontal and temporoparietal dysfunction (McCandliss and Noble, 2003): impaired phonological processing impedes the acquisition of orthographic knowledge and the development of appropriate neural tuning for print (Maurer et al., 2007; van der Mark et al., 2009). However, this model fails to account for a number of empirical findings. First, there is mounting evidence that while a number of dyslexic children do in fact have a phonological deficit, a non-negligible number do not (White et al., 2006; Bosse et al., 2007; Vidyasagar and Pammer, 2010). In line with these behavioral results, a recent case study has reported not only normal phonological behavioral performance but also normal activation of the fronto-temporoparietal network associated with phonological processing (Peyrin et al., 2012). Second, a recent meta-analyses of brain imaging studies of dyslexic children and adults has failed to find unilateral evidence for a contrasted pattern of predominant left temporo-parietal dysfunction in children and predominant left vOT dysfunction in adults (Richlan et al., 2011). These results suggest that reduced print tuning and orthographic specificity of left vOT cortex in dyslexia could follow from alternative disruption in the learning to read process.

Two aspects of our data are noteworthy. As expected from our hypotheses and appropriately highlighted earlier, VA Span impaired dyslexic adults display reduced parietal activations in tasks requiring visual processing of multiple characters, AN or not. More unexpectedly, task related activations were also reduced in vOT cortex bilaterally and for both character types. Previous accounts of reduced vOT in dyslexia have been associated with processing of letter strings (word or non-words) and restricted to LH vOT (Helenius et al., 1999; Maurer et al., 2007; van der Mark et al., 2009; Wimmer et al., 2010). Indeed, neural responses for non-alphabetic strings have usually been similar in dyslexic and control readers (Helenius et al., 1999; van der Mark et al., 2009 but see Maurer et al., 2007). However, an important caveat of these studies is that their experimental tasks required no explicit processing of individual elements of the non-alphabetic strings. In contrast, in our study, explicit processing of the individual characters composing strings is necessary for both character type. Therefore, if visual processing of individual elements in vOT is influenced by top-down VA related parietal activity, then a parietal dysfunction should result is abnormal vOT activity regardless of character type. In addition, while the difference in vOT activity between letter and non-letter string processing is present only in left vOT in expert readers, visual processing of both string types recruits vOT bilaterally (Tagamets et al., 2000;Vinckier et al., 2007). If at least part of this vOT activity is top-down driven by parietal cortex, then abnormal parietal function will result in abnormal vOT activity bilaterally. The presence of consistent correlations between SPL and vOT activity in each hemisphere further argues for this interpretation of our data. We posit that not only these two co-occuring neural dysfunctions (SPL and VOT) are related but that this relationship can explain disrupted vOT function in dyslexic readers independently from any phonological deficit.

How can impaired parietal function lead to decreased vOT activity in a ME processing task? Parietal areas are responsible for feature and spatial attention focus and shifts (Kanwisher and Wojciulik, 2000). Dorsal areas are thus involved in a fast feedforward/feedback loop with visual areas: early visual signals trigger parietal attention mechanisms and global analysis which then guides further processing in the ventral stream (Bullier, 2001). If attentional processes fail, the downstream ventral processing is also disrupted. In our task, failure to allocate attention appropriately to each element of the character string reduces feedback to ventral areas responsible for character recognition (Szwed et al., 2011) and thus leads to reduced occipito-temporal activations. How does this relate to impaired vOT specificity for print in dyslexia? When children learn to read, they cannot at first rely on fast, parallel processing of words as supported by vOT in expert readers (Dehaene and Cohen, 2011). Letter string processing is supported by attention-based processes as supported by parietal cortex. Development of orthographic knowledge in vOT is therefore dependent on appropriate attentional feedback from parietal areas for proper letter identification. Similar involvement of parietal areas in reading is seen when spatial layout of words is modified in order to disrupt automatic vOT processing (Mayall et al., 2001; Pammer et al., 2006; Cohen et al., 2008; Rosazza et al., 2009). If parietal function fails, vOT specialization cannot take place and fast, automatic visual word processing cannot be achieved. In line with such a model, Richlan (2012) has proposed that impaired

general attention processes in dyslexic readers, indexed by abnormal left IPL activity, could result in lack of vOT specialization for print.

Recent connectivity studies in normal and dyslexic readers offer support for this account. Both resting-state and functional connectivity between parietal areas and vOT have been reported, and this connectivity is modulated by reading efficiency. Vogel et al. (2011) investigated resting state connectivity between the specific part of vOT cortex thought to subserve orthographic reading, namely the visual word form area (VWFA; Cohen and Dehaene, 2004; Dehaene and Cohen, 2011) and the dorsal attentional network. They not only found significant connectivity between the VWFA and superior parietal cortex bilaterally, but this connectivity was significantly correlated to reading ability. Better readers had stronger connectivity between SPL and VWFA. Van der mark et al., 2011 investigated functional connectivity between five different seed regions of left vOT cortex (including the VWFA) and other brain regions in normal-reading and dyslexic children. In normal-reading children, bilateral SPL was significantly correlated to the middle, VWFA proper, seed area. This correlation between bilateral SPL and the VWFA seed area did not reach significance in dyslexic children (In that study, however, that the difference in functional connectivity between normal reading and dyslexic children did not reach significance for SPL-VWFA but did for left IPL-VWFA). Taken together, these results speak strongly for an important role of SPL in efficient reading.

In line with the VA span hypothesis of dyslexia (Bosse et al., 2007), VA Span impaired dyslexic adults are impaired in a nonverbal ME processing task. This impairment is associated with reduced specificity of SPL for ME processing, in support of a visual account of the VA span deficit. Co-occurring reduced vOT activation could be related to reduced connectivity between dorsal and ventral visual areas, in line with recent accounts of reduced SPL-vOT connectivity in dyslexia. Further research is needed to (1) investigate if and how the time-course of parietal and vOT activity in ME processing tasks deviates in dyslexic participants and (2) assess connectivity between SPL and vOT in both normal-reading and dyslexic readers with a VA span disorder.

### **REFERENCES**


using a macroscopic anatomical parcellation of theMNIMRI single-subject brain. *Neuroimage* 15, 273–289. doi: 10.1006/nimg.2001.0978


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 January 2014; accepted: 13 June 2014; published online: 07 July 2014. Citation: Lobier MA, Peyrin C, Pichat C, Le Bas J-F and Valdois S (2014) Visual processing of multiple elements in the dyslexic brain: evidence for a superior parietal dysfunction. Front. Hum. Neurosci. 8:479. doi: 10.3389/fnhum.2014.00479 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Lobier, Peyrin, Pichat, Le Bas andValdois. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Reading the dyslexic brain: multiple dysfunctional routes revealed by a new meta-analysis of PET and fMRI activation studies

## *Eraldo Paulesu1,2,3\*†, Laura Danelli 1,2 † and Manuela Berlingeri 1,2*

*<sup>1</sup> Department of Psychology, University of Milano-Bicocca, Milan, Italy*

*<sup>2</sup> NEUROMI- Milan Center for Neuroscience, University of Milano-Bicocca, Milan, Italy*

*<sup>3</sup> fMRI – Unit, Istituto di Ricovero e Cura a Carattere Scientifico Galeazzi, Milan, Italy*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

### *Reviewed by:*

*Cyril R. Pernet, University of Edinburgh, UK Peter Klaver, University Zürich, Switzerland*

### *\*Correspondence:*

*Eraldo Paulesu, Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Pzza dell'Ateneo Nuovo, 1, 20126 Milano, Italy e-mail: eraldo.paulesu@unimib.it*

*†These authors have shared first co-authorship.*

Developmental dyslexia has been the focus of much functional anatomical research. The main trust of this work is that typical developmental dyslexics have a dysfunction of the phonological and orthography to phonology conversion systems, in which the left occipito-temporal cortex has a crucial role. It remains to be seen whether there is a systematic co-occurrence of dysfunctional patterns of different functional systems perhaps converging on the same brain regions associated with the reading deficit. Such evidence would be relevant for theories like, for example, the magnocellular/attentional or the motor/cerebellar ones, which postulate a more basic and anatomically distributed disorder in dyslexia. We addressed this issue with a meta-analysis of all the imaging literature published until September 2013 using a combination of hierarchical clustering and activation likelihood estimation methods. The clustering analysis on 2360 peaks identified 193 clusters, 92 of which proved spatially significant. Following binomial tests on the clusters, we found left hemispheric network specific for normal controls (i.e., of reduced involvement in dyslexics) including the left inferior frontal, premotor, supramarginal cortices and the left infero-temporal and fusiform regions: these were preferentially associated with reading and the visual-to-phonology processes. There was also a more dorsal left fronto-parietal network: these clusters included peaks from tasks involving phonological manipulation, but also motoric or visuo-spatial perception/attention. No cluster was identified in area V5 for no task, nor cerebellar clusters showed a reduced association with dyslexics. We conclude that the examined literature demonstrates a specific lack of activation of the left occipito-temporal cortex in dyslexia particularly for reading and reading-like behaviors and for visuo-phonological tasks. Additional deficits of motor and attentional systems relevant for reading may be associated with altered functionality of dorsal left fronto-parietal cortex.

**Keywords: developmental dyslexia, meta-analysis, fMRI, PET, ALE, hierarchical clustering**

## **INTRODUCTION**

Developmental dyslexia (DD), the inability of acquiring fluent reading skills notwithstanding normal intelligence, adequate socio-cultural conditions, and preserved elementary sensory skills (DSM-IV, American Psychiatric Association, 1994; ICD-10, World Healt Organization, 1993), often co-occurs with phonological deficits (Snowling, 2001) that persist in adult life (Paulesu et al., 2001; Ramus et al., 2003). While it is fairly clear which classes of phonological tasks are more sensitive in bringing about a deficient performance in dyslexics (e.g., spoonerism tasks; see for example, Pennington et al., 1990), the fine-grained nature of cognitive deficits underlying these faulty performances remains to be established fully (Frith, 1999).

Subjects with DD may present a more complex behavioral profile (Menghini et al., 2010), the reading and phonological difficulties being sometimes accompanied by attentional, visual- and auditory-magnocellular and/or motor-cerebellar impairments (Facoetti et al., 2000; Nicolson et al., 2001; Stein, 2001; Gaab et al., 2007); these are hereafter called "*additional deficits*"1 . The prevalence of the *additional deficits* may vary from sample to sample fuelling the debate on whether a core dyslexia syndrome exists together with a core underlying cognitive deficit. Indeed, the variable importance given to the *additional deficits* by different authors is one strong motivation for the presence of competing interpretations of dyslexia as a syndrome. The matter is complicated by the fact that the studies on co-morbidity in dyslexia have been run in groups selected with very different criteria: the range spans from studies on highly compensated adult university students in some cases (Ramus et al., 2003; Reid et al., 2007) to

<sup>1</sup>This labeling is used for convenience only without ideological positions about the importance of these deficits in dyslexia.

unselected young kids in other cases (Heim et al., 2008; Menghini et al., 2010). Studies in adult dyslexics have the advantage of permitting the assessment of a relatively stable neurocognitive system and to minimize the observation of co-occurring deficits due to delayed maturation; studies in kids are more prone to the uncertainties due to the—not necessarily synchronous—development of the multiple systems involved in reading and to the changing neuropsychological patterns that may place a given kid in the dyslexic or in the normal population range, depending on the year of testing (see for example, Shaywitz et al., 1992). Of course, studies in kids have the advantage of giving information relevant for the developmental process while the reading skill is being acquired.

There have been great hopes that functional anatomical studies of dyslexia could contribute to a better understanding of the disorder: it has been reasoned that if a well-defined malfunctioning brain system was identified, one could make stronger inferences on the nature of dyslexia at the cognitive level as well. This would have had obvious consequences in the field of rehabilitation (Demonet et al., 2004).

Indeed, brain imaging has had the merit of giving a demonstration that dyslexia has neurological bases. However, this demonstration has come sometimes in contrasting ways, giving further breath to the debate on the nature of dyslexia and on whether different forms of dyslexia exist and their relative weight.

By the time of the completion of the data collection for this paper, there have been more than 50 functional imaging papers on dyslexia that one could use for a formal review of the literature with a meta-analysis.

This previous literature can be grouped in few broad classes of activation studies: studies with tasks involving primarily reading (including lexical decision tasks, phonological awareness tasks or semantic tasks); lexical retrieval for visual stimuli, as in picture naming; studies on auditory phonological processes; studies on motor tasks and motor learning; studies on visual perception (picture or face oriented) or on visuo-spatial attention; studies on early visual or auditory processes, including stimuli tackling the magnocellular systems.

After such a huge experimental effort in the field, any review of the data based on a conventional verbose discussion of what is nominally described by the authors would prove insufficient, confusing, and sometimes contradictory. This is also because a nominal reference to a given brain structure, and the ensuing discussions, is deprived of much value and sometimes misleading when the precise stereotactic location of a statistical effect may point to more specific cortical or subcortical regions: congruence and incongruence of different data may only appear such because of this impreciseness2 .

In addition, the relative weight of a given study, based on the sample sizes and statistical thresholds adopted, is often impossible to deal with. Having clearly in mind the aforementioned limitations of *verbose*, that is, non-quantitative, reviews, we mention hereafter those that seem to be the most solid findings for reading related tasks. To make this illustration, we use some of the raw data that were entered into a formal meta-analysis in the paper. Much of this discussion will hopefully be superseded by the results of the present meta-analysis whose aim was in fact to shed further light in the dyslexia imaging literature by showing findings that truly replicate across studies of the same class and perhaps across studies of different classes.

## **STUDIES ON THE CORE SYMPTOMS: READING AND PHONOLOGICAL PROCESSING**

The studies involving single-word reading in some form indicate dysfunction of both left occipito-temporal (ventral) (see Paulesu et al., 2001; Shaywitz et al., 2002 and more 20 studies from those listed in **Table 1**) or left temporo-parietal (dorsal) cortex (see for example Rumsey et al., 1992, 1997a). In particular, it has been proposed that the dorsal temporo-parietal cortex might be associated with an early dysfunction of phonological processing, emerging in the initial stage of learning process (Turkeltaub et al., 2003; Sandak et al., 2004), while the ventral occipitotemporal region may be associated with perturbed maturation of the word recognition systems (Paulesu et al., 2001; Sandak et al., 2004), a finding that generalizes across different alphabetic orthographies (Paulesu et al., 2001) and even Chinese (Hu et al., 2010).

A first look on the highly replicated finding for the left ventral occipito-temporal cortex can be seen in **Figure 1A**, where the location of the local maxima of significant hemodynamic response reduction was described for dyslexics. In the figure we visualize the peaks of reduced activations in dyslexia for all tasks that involved reading from the papers listed in **Table 1**.

A further look to the distribution of the areas of reduced activations for any task involving reading in dyslexia (see **Figure 1B**), however, provides a more complex picture that clearly justifies the urge for a formal re-assessment of the data.

Are these patterns age dependent? Are some of them task dependent? What is the role of the right hemispheric hypoactivations for a behavior like reading that is highly dependent on a left-lateralized neural system (Cattinelli et al., 2013a; Taylor et al., 2013)? More importantly, what is the level of replication of the findings of any given paper? Is this seemingly highly distributed pattern of malfunction undermining our understanding of the biology and the cognition of dyslexia? These are all questions that are still in search of some formal answer.

In fact, the situation appears immediately more complex if one also considers phonological tasks, both visual and auditory. As one major theory of dyslexia predicates a phonological deficit it becomes logical to expect a great anatomical congruence between findings based on reading and findings based on phonological tasks. The way these focal effects (regional hypoactivations in dyslexia) overlap with the reading ones is illustrated by **Figure 1C** (dots in blue and dots in green). Clearly there is some degree of overlap between the three sets of findings.

However, there are also quite a few discrepancies. The same unsolved questions listed before apply here.

<sup>2</sup>One obvious historical such example can be the parietal region involved in phonological short-term memory: identified at the temporo-parietal junction by Paulesu et al. (1993, 1995), it was re-discovered, so to speak, two cm above by Smith et al. (1998). This is just one of the many obvious limitations of reviews based on qualitative approaches (for more discussions, see Fox et al., 1998; Cattinelli et al., 2013b).

### **Table 1 | List of papers included in the metanalysis.**


*(Continued)*

### **Table 1 | Continued**


**FIGURE 1 | Peaks of reduced activations in dyslexia for all tasks that involved reading (circles in red), for visual or auditory phonological tasks (circles in blue and in green, respectively) and for non-linguistic tasks (circles in yellow). (A)** Show the highly replicated reduction of dyslexics at the level of the left ventral occipito-temporal peaks reported in literature. In **(B)** all the peaks of reduced activations observed in dyslexics during reading tasks included in our meta-analysis are reported. Finally, in **(C)** all the peaks of reduced activations observed in dyslexics during reading, phonological and non-linguistic tasks included in our meta-analysis are reported.

### *ASHES TO ASHES, NOISE TO NOISE***: THE CONTRIBUTION OF THE IMAGING FINDINGS ON THE ADDITIONAL DEFICITS IN DYSLEXIA**

Studies on what we call the additional deficits of dyslexia investigated the neural dysfunction of more basic abilities such as those of the magnocellular (visual or auditory) system, of the spatial attention system and of motor control with particular attention to the functions of the cerebellum.

The case of the visual magnocellular and visuo-motion perception system is an exemplar one: a dysfunction of this system is suggested by evidence that dyslexic may have reduced contrast sensitivity at the low spatial frequencies and low luminance levels (stimuli favored by the magnocells; Stein and Walsh, 1997), reduced visual-motion sensitivity, in particular for coherent motion (Cornelissen et al., 1995), that correlates with impaired letter position encoding (Cornelissen et al., 1998); the same deficit may explain greater crowding effects in dyslexic subjects (Zorzi et al., 2012). In addition, subjects with dyslexia may have subtle signs of a dysfunctional visuo-spatial attentional system (Facoetti et al., 2000) that may be more severe for the left hemispace in a sort of mini-neglect (Hari and Renvall, 2001; Liddle et al., 2009). This evidence was supported initially by ERP data (Galaburda and Livingstone, 1993) who found reduced VEPs in 5 dyslexics for low contrast reversing checkerboard stimuli (a finding not replicated by others—see Johannes et al., 1996) and a disorganized magnocellular subdivision of the lateral geniculate nucleus.

Initial fMRI evidence pointing to a dysfunctional magnocellular system was provided by Eden et al. (1996) followed by Demb et al. (1998) in two small samples of subjects: they found reduced activation of the visual motion area MT/V5, a result that was lately not confirmed by MEG data, as Vanni et al. (1997) found normal MT/V5 activation for moving stimuli.

Notwithstanding that the contribution of the magnocellular and visuo-motion perception system in normal reading remains contentious (no involvement of MT/V5 is seen for single word reading in normal subjects), the aforementioned results have been seen as a imaging evidence of the malfunction of the visual magnocellular system in dyslexia. Indeed, the magnocellular hypothesis remains a much pursued research avenue in dyslexia. Similar considerations may apply to the cerebellar hypothesis and its investigation.

To make this brutal introductory overview even more dismaying, **Figure 1C** (dots in yellow) shows how focal hypo-activations spread all over the brain if one considers non-linguistic tasks for either the visual modality or the motor one. This picture is quite similar with what would emerge if the scars and dyslaminations originally described by Galaburda et al. (1985) were superimposed onto the lateral surface of the brain in stereotactic space.

It should be noted that in these examples, we illustrate only voxels showing significant differences between groups. There is much more to be displayed if one considers as we did in the paper, also within group effects.

Clearly, such body of data cannot be assessed and summarized by a mere discussion of what has found paper X as opposed to paper Y. The obvious alternative to qualitative reviews is provided by formal meta-analyses, as their quantitative approach makes them more rigorous and less prone to subjective bias. In brain imaging, meta-analyses are generally used to identify groups of regional effects that fall sufficiently close in stereotactic space to be interpreted as reflecting a common functional-anatomical entity (Fox et al., 1998; Wager et al., 2007; Cattinelli et al., 2013b). The functional significance of any of these entities then needs to be analyzed, on the basis of the background information about the experiments that generated the activation peaks constituting them. Several meta-analytic studies, differing in the specific technique employed and the investigated cognitive domain, have appeared in the literature in recent years (Salimi-Khorshidi et al., 2009; Kober and Wager, 2010; Radua and Mataix-Cols, 2012). Quantitative meta-analytic approaches were also recently used to determine consistency across neuroimaging studies and to identify regions reported as dysfunctional in developmental dyslexia (Maisog et al., 2008; Richlan et al., 2009). In particular, two studies, using the Activation Likelihood Estimation (ALE) method (Maisog et al., 2008; Richlan et al., 2009), analyzed the neural differences between controls and dyslexics during reading and reading-related tasks, i.e., letter matching, rhyming, semantic judgment, and lexical decision tasks. In both articles, the authors suggested that developmental dyslexia is associated with the hypoactivation of the left occipito-temporal, temporo-parietal, and inferior frontal regions. No evidence for a systematic hyperactivation in the dyslexics was found (for the left inferior frontal cortex, nor for the cerebellum, as initially suggested by Shaywitz et al. (1998).

To provide information on the developmental progression of neural dysfunction in dyslexia, Richlan et al. (2011) performed a second meta-analysis and separated adult-related activations and children-related activations while comparing controls and dyslexics. They observed that the left occipito-temporal and temporo-parietal hypoactivation was present in the studies on adults. A hypoactivation was also observed in the anterior portion of the left occipito-temporal cortex for dyslexic children.

## **AIMS OF THE STUDY**

These previous meta-analyses were focused on the task of reading or on reading-like behaviors. Aim of this study was to further assess the dysfunctional anatomical correlates of dyslexia, to approach the issue of co-occurrence of neural dysfunction dyslexia and test the hypothesis that, beyond well replicated findings (the lack of commitment to reading of the left ventral occipito-temporal cortex), other functional anatomical deficits might be present. The usual logic used to test this hypothesis in previous studies has been to assess the presence of focal hypoactivations in non-reading tasks, for example, in motor learning (Nicolson et al., 1999; Menghini et al., 2006) or visual motion perception (Eden et al., 1996). Conversely, the logic behind our study is similar to the one of Danelli et al. (2013) for normal reading: given the vast literature supporting the involvement of multiple systems in dyslexia (Frith, 1999; Nicolson et al., 2001; Snowling, 2001; Stein, 2001; Reid et al., 2007; Pernet et al., 2009), and given that these systems normally intersect in the brain into higher order cortices (Danelli et al., 2013), we expected that, on top of differences in brain areas that are highly specific for reading, dyslexics may also show a more limited functional anatomical intersection between different systems normally overlapping in skilled readers. This would be revealed in the present metaanalysis by reduced presence of regional effects from dyslexic groups in clusters showing a mix of peaks from reading-like and non-reading-like behaviors in normal controls.

In the present study this hypothesis was tested using a metaanalytical approach based on the optimized hierarchical clustering (HC) algorithm of Cattinelli et al. (2013b), complemented by the ALE algorithm (Turkeltaub et al., 2002; Eickhoff et al., 2009).

Hierarchical clustering has the advantage of permitting *posthoc* statistical assessments of the functional or group assignations of individual clusters without the constraint of considering superhomogenous tasks at the stage of cluster identification, as when using ginger-ALE alone.

However, hierarchical clustering does not provide a statistical test of the spatial significance of a given cluster against a random reference distribution of regional effects. This is permitted by the ALE approach (Turkeltaub et al., 2002; Eickhoff et al., 2009) that we used to complement our analyses. A schematic flowchart diagram is now reported in **Figure 2**. A previous example of this combined approach can be found in Crepaldi et al. (2013), where, in addition to the dual meta-analytical procedure, the clusters were assessed *post-hoc* not only for simple effects but also for interaction effects, as in the present study.

By considering all imaging studies on dyslexia, no matter the neurocognitive domain under investigation, we hoped to detect the existence of a systematic co-occurrence of dysfunctional patterns of different functional systems and to evaluate whether these involve different system specific brain regions or rather multimodal regions that normally show intersections of multiple

systems. The face validity of the latter hypothesis was also assessed by comparison with the data of Danelli et al. (2013).

### **METHODS**

### **DATA COLLECTION AND PREPARATION**

Our meta-analysis is based on 53 neuroimaging articles investigating the anatomofunctional dysfunction of developmental dyslexia using PET or fMRI in both children and adult subjects published to September 2013.

Studies were selected through PubMed database (http://www. ncbi.nlm.nih.gov/pubmed/) running five queries. The search keys were: "Dyslexia AND fMRI," "Dyslexia AND PET," "Dyslexia AND neuroimaging," "Dyslexia AND functional Magnetic Resonance Imaging," and "Dyslexia AND Positron Emission Tomography." These queries returned 544, 34, 462, 267, and 45 entries, respectively.

After removing duplicates, we included only studies that did satisfy the following inclusion criteria: (1) sample population composed of both normal controls and subjects with developmental dyslexia; (2) imaging technique: PET or fMRI; (3) whole brain voxel based data-analyses using stereotactic conventions; region-of-interest analyses were not considered nor multiple single case analyses restricted to few regions, as, for example, in Eden et al. (1996); (4) presence of data for either within group comparisons, or between group comparisons or both.

For the suitable studies, in the meta-analysis we used data derived from (i) within group *simple effects* and (ii) between *group comparisons*. We incorporated also the within group data to have a more complete survey on whether a given brain region was differentially activated across groups, while still being active in each a group above a given conventional threshold, or whether the region, besides being significantly associated with one group, it never reached statistically significant effects in the other group. In any event, for the interaction group-by-task effects we only considered 1st order interactions.

Only data emerging from univariate statistical analyses were considered.

By applying such criteria, we included 2360 stereotactic activation loci, 1402 associated with controls and 958 associated with subjects with dyslexia. Thirty-nine foci were excluded by the analyses because they were outside of the boundaries the MNI stereotactic space.

The main characteristics of the 53 experiments included in this meta-analysis are reported in **Table 1**.

### **CLASSIFICATION OF THE RAW DATA PRIOR TO CLUSTERING ANALYSES**

For each activation peak, we recorded all relevant information about the statistical comparison that generated it. We therefore determined a list of classification criteria to characterize each peak of activation included in the dataset (**Table S1** of the Supplementary Materials). These classifications were used for initial *post-hoc* statistical comparisons on the clusters that passed the ginger-ALE test for spatial significance.


For each peak we also completed our database with information about the variables listed below.


To make it possible a combination of data coming from studies based on different stereotactic spaces, the stereotactic coordinates of studies in which activation peaks were reported in terms of the Talairach and Tournoux atlas (Talairach and Tournoux, 1988) were transformed into the MNI (Montreal Neurological Institute) stereotactic space (Mazziotta et al., 1995); the transformation was done using the software GingerALE, using MatthewBrett's script (http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach).

### **CLUSTERING PROCEDURE**

First, we performed a hierarchical clustering analysis (HC) of the activation peaks as described in Cattinelli et al. (2013b): the analysis allowed us to extract the principal clusters of regional effects from the database. Hierarchical clustering was performed by using functions implemented in MATLAB 7. After computation of squared euclidean distances between each pair of the input data, clusters with minimal dissimilarity were recursively merged using Ward's (1963) criterion which minimizes total intracluster variance after each merging step. As described in Cattinelli et al. (2013b) and Crepaldi et al. (2013), "this procedure results in a tree, whose leaves represent singletons (i.e., clusters formed of a single activation peak), and whose root represents one large cluster including all the activation peaks submitted to the algorithm. Each level of the tree reports the clusters created by the algorithm at a specific processing step, as it progresses from individual activation peaks at the lowest level to the all-inclusive final cluster at the top of the tree". The procedure was continued until the average standard deviation around the cluster centroids of the individual peaks, in the *x*, *y*, and *z* directions, remained below 7.5 mm. This measure roughly mimics the spatial resolution of fMRI studies. As hierarchical clustering may be sensitive to the order in which the individual data are processed, and may generate alternative clustering trees (Morgan and Ray, 1995), an optimal clustering solution was identified by accepting the solution with maximized the between cluster error sum of squares (see Cattinelli et al., 2013b).

The mean coordinates of each cluster included in the final set were then passed as an input to a MATLAB script to automatically label the anatomical correspondence of the stereotactic coordinates of the centroids of each cluster. This procedure implied a query of the Automatic Anatomical Labeling (AAL) template available in the MRIcron visualization Software (Rorden and Brett, 2000).

HC analyses have the advantage of permitting *post-hoc* assessment of the functional meaning of a given cluster (see for example, Cattinelli et al., 2013a; Crepaldi et al., 2013) or, as in the present study, its assignation to a class of subjects (e.g., clusters specific or preferentially associated with controls rather than clusters associated with dyslexics).

However, HC does not quantify the significance of each individual cluster with reference to the probability of a spatially distributed statistical process. To protect ourselves from considering clusters of limited biological significance, the spatial distribution of the clusters identified by HC was compared with the results of a different meta-analytical method, namely the Activation Likelihood Estimation technique as implemented in the GingerALE software (Eickhoff et al., 2009; Turkeltaub et al., 2012). Only clusters also present in the GingerALE analyses were further considered (the threshold was set at *p* < 0.05 with FDRpN correction).

### *POST-HOC* **STATISTICAL ANALYSES ON THE RESULTING CLUSTERS**

Group, age, or task preferential associations were assessed with the binomial test as follows.

For the group effect, we tested whether the distribution of control- and dyslexic-related peaks within each cluster was significantly different from the overall proportion of control- and dyslexic-related peaks included in the whole sample of coordinates (1382/2321 = 0.59543 for controls and 939/2321 = 0.40457 for dyslexics). To this end, we used the binomial distribution and computed the probability of observing a specific number of peaks associated with a given group as the number of successes in a series of independent randomly-distributed trials: when this

<sup>3</sup>The rationale for using the two broad categories reading- and non-readinglike tasks, was dictated by the need to statistically assess the meaning of the clusters along a parsimonious, admittedly reduced, number of categories. The finer grained description of the more numerous yet broad classes of studies was left to the descriptive assessment of the composition of the clusters that proved significant for the two main categories. For example, once a cluster proved to be significantly associated with reading-like tasks, a qualitative exploration of the individual tasks that contributed to that effect allowed us to make some additional finer grained statement. See for example the case of clusters L5 and L6 in the results section.

probability was below 0.05, the cluster was considered to be associated with either the control or dyslexic groups.

Similar analyses were implemented in these clusters to test their association with either reading-related or non-reading related tasks and with either children or adult group. The proportion of non-reading and reading-related peaks included in the whole sample of coordinates was 406/2321 (=0.17492) and 1915/2321 (=0.82508), respectively, while the proportion of children- and adult-related peaks included in the whole sample of coordinates was 1144/2321 (=0.49289) and 1177/2321 (=0.50711), respectively4 .

We also assessed whether there were interactions effects within each cluster: the group-by-task and group-by-age interactions were tested with Fisher's exact test (Fisher, 1970); this estimates whether the distribution of one categorical variable (group) varies according to the levels of a second categorical variable (experimental task or age class), thus revealing clusters that were associated with either group in one task (e.g., reading-like tasks in controls), but with the opposite group in another task (e.g., dyslexic in non-reading-like tasks).

The odds ratio under the null hypothesis of the Fischer's test on the individual clusters was corrected to reflect the distribution of the categories under examination in the entire dataset. The odd ratio for group-by-task interaction was 1.09, for group-by-age interaction was 0.81 and for task-by-age interaction was 1.27.

Some of the interaction effects were tested to replicate previous analyses published in other meta-analytical papers: for example, the age-by-group interaction described by Richlan et al. (2013). We believe that describing these results, even if not all discussed in detail later on, leaves an important trace behind this paper for future assessments.

Finally, clusters that did not show a significant group preferential association were assigned to a class called undifferentiated. Among these clusters, we attempted to highlight those having higher probability of actually being completely non-specific, by performing binomial tests along the group axis. In particular, we assumed that clusters whose one-tailed *p*-value was greater than 0.5 for both groups are of high chance of being genuinely non-specific.

All post-clustering statistical analyses were performed using the free statistical software *R* (the code is available upon request to Manuela Berlingeri).

### **COMPARISON WITH DANELLI ET AL. (2013) MAPPING OF READING AND SYSTEMS INVOLVED IN DYSLEXIA**

The results of the clustering analyses were also compared with the independent fMRI data described by Danelli et al. (2013). In that paper, the authors described fMRI patterns of intersection between the normal reading system and the auditory phonological system, or the visual motion/magnocellular system, or the motor/cerebellar system: they also reported reading per-se activations, that is, areas activated for single pseudowords reading, once any trend for the other aforementioned tasks was excluded by the analysis5 . Comparison with this independent data-set helped in the interpretation of the functional relevance of the data of the present meta-analysis.

### **ADDITIONAL ANALYSES**

The paucity of data on MT/V5 due to the lack of group-based data would inevitably dismiss the MT/V5 finding: to avoid this, we identified the average MT/V5 of the normal controls in Eden et al. (1996) and looked for the closest cluster in the meta-analysis.

Finally, the data of the regional effects associated with nonreading-like tasks (e.g., motor tasks, attentional tasks, visual perception tasks etc.) were also submitted to a separate metaanalysis. This additional meta-analysis was motivated by the desire of excluding the possibility that the overwhelmingly larger number of peaks from the reading-like experiments (# of peaks: 1915) could have masked the manifestation of specific clusters from the *non-reading-like* data (# of peaks: 406)6 . As above, we tested whether the distribution of control- and dyslexicrelated peaks within each cluster was significantly different from the overall proportion of control- and dyslexic-related peaks included in the whole sample of coordinates (0.58 for controls and 0.42 for dyslexics). To this end, we used the binomial distribution and computed the probability of observing a specific number of peaks associated with a given group as the number of successes in a series of independent randomlydistributed trials: when this probability was below 0.05, the cluster was considered to be associated with either the control or dyslexic groups. These analyses were performed only on clusters that showed a spatial congruence in the HC and ALE procedures.

### **RESULTS**

The hierarchical algorithm identified a total of 193 clusters (**Figure 3A**)—96 clusters in the left hemisphere and 97 ones in the right hemisphere—with 2 to 51 peaks each, from 2 to 18 different studies; mean standard deviation along the three axes were 4.54 mm (x-axis), 4.83 mm (y-axis), and 4.76 mm (z-axis).

After the comparison of these results with ALE maps (**Figure 3B**), only 92 out of 193 clusters (**Figure 3C**) were considered for subsequent analyses.

### **GROUP-PREFERENTIAL CLUSTERS**

When we indicate a cluster as "related to" or "preferential for" a group, we imply that there was a significantly greater proportion of peaks in one group as opposed to the other. This would

<sup>4</sup>To make some justice of the fine-grained variability of the tasks that contributed to each cluster, beyond the two broad categories described above, each cluster was explored qualitatively (the raw data are available in the supplementary materials).

<sup>5</sup>In Danelli et al. (2013) reading was tested with a pseudoword silent reading task; the visual magnocellular/visual motion system with a visual motion perception task using Gabor low-frequency patches; the auditory phonological system was tested with a rhyming task for syllabic sounds; the motor cerebellar system was tested with a finger tapping sequence learning task.

<sup>6</sup>The idea here is that the cloud of peaks from the reading-like experiments may operate as geometrical attractors in the clustering procedure and mask spatial effects coming from the less numerous data-set from non-reading-like experiments. The separate meta-analysis should have protected us from this hypothetical confound.

**FIGURE 3 | Clusters identified with HC (A), clusters identified using ALE approach (B) and the final data-set of clusters, identified in both HC and ALE meta-analyses and considered for** *post-hoc* **statistical analyses (C).**

correspond to the terminology suggested by Pernet et al. (2007), the so-called "preferential response" for brain regions with a comparatively greater response in a given condition/group, with no zero response in the control condition/control group. Nine clusters were preferentially associated with controls, while five clusters were associated with dyslexics (**Table 2**). The peaks distribution for each significant cluster is reported in the contingency tables in Supplementary Materials (**Table S2**).

### *Clusters associated with normal readers*

There was a distributed left ventral occipito-temporal network involving the infero-temporal (clusters L6, L23) and fusiform (cluster L5) regions (areas in red in **Figure 4**). Within this network, peaks coming from lexical decision tasks were present in cluster L23 only, while all other reading-like behaviors were fairly evenly present in the entire set of the three left occipito-temporal clusters. Once compared with Danelli et al. (2013) statistical maps, the cluster L5 fell in the reading network while L6 fell in a region of the shared activations by an auditory phonological task and reading. L23 fell in between these two regions.

Moving toward more dorsal regions, there were two areas in the middle temporal and supra-marginal gyri (L86 and L89; areas in blue in **Figure 4**) that were associated with the normal controls for a mixture of tasks including reading but also active phonological manipulation tasks, involving some working memory demands.

Even more dorsally, we also found a left hemispheric network involving the posterior part of the supplementary motor cortex, the superior parietal cortex, the dorsal portion of the inferior parietal lobule (areas in green in **Figure 4**): these clusters included peaks from tasks involving phonological manipulation (e.g., phonological short-term memory), but also motoric or visuo-spatial perception/attention. A comparison with Danelli et al. (2013) data confirmed the mixed nature of the functional properties of these regions, which were involved in motoric tasks and, for the superior parietal region, in the visual motion perception task as well.

### *Clusters associated with dyslexic readers*

These included the left basal ganglia (head of the caudate; pallidum), the right anterior cingulate, right precentral cortex and the right inferior parietal lobule (areas in cyan in **Figure 4**). While the left subcortical regions were brought about by reading-like tasks, the right hemispheric ones were associated with a variety of tasks, often of the non-reading-like kind.

Detailed description of group-related clusters is reported in **Table 2**.

### **TASK-PREFERENTIAL CLUSTERS**

Four clusters, located in the opercular parts of the left inferior frontal gyrus, in the left insula and in the posterior portion of the left inferior temporal gyrus, were preferential for reading-like group, while five clusters, located in the right superior and inferior parietal lobule, in the superior temporal cortex, bilaterally, and in the left middle temporal gyrus, were significantly related with non-reading-like tasks.

### **AGE-PREFERENTIAL CLUSTERS**

Fifteen clusters were preferentially associated with adults, while 10 clusters were associated with children. In particular, adult-related clusters were located in the left SMA, in the opercular part of the inferior frontal gyrus, bilaterally, in the left insula, in the left superior and inferior temporal gyrus, in the cerebellum, bilaterally, in the left pallidum and caudate nuclei, and in the left thalamus.

Children-related clusters were located in the pre-SMA, bilaterally, in the left middle frontal cortex, in the left superior temporal gyrus, in the left superior and right inferior occipital gyri, in the lingual gyri, bilaterally.


**congruence in the HC and ALE procedures.** The red dots represent the control-related clusters that fell in reading and phonological specific activations in Danelli et al. (2013), the blue dots represent the control-related clusters that were not observed in Danelli et al. (2013) and the green dots represent the control-related clusters fell in visual motion and motoric activations in Danelli et al. (2013). The right yellow dots represent the control-related cluster identified by the meta-analysis restricted to the non-reading-like tasks. Finally, dyslexic-related clusters are reported in cyan.

## **INTERACTION EFFECTS**

## *Task-by-age interactions*

Three clusters, located in the left middle frontal and middle temporal cortex and in the left lingual gyrus, showed a task-by-age interaction effect (see **Table 3**). The former cluster was associated with reading-like tasks in children, the latter two clusters with reading-like tasks in adults.

## *Group-by-task interactions*

Three clusters, located in the left superior and middle temporal gyri and in the right inferior parietal lobule, showed a group-bytask interaction effect (see **Table 3** and **Figure 5** for details). The former two were associated with the normal controls for readinglike tasks, the third with the dyslexics for the non-reading-like tasks.

## *Group-by-age interactions*

Three clusters, located in the opercular part of the left inferior frontal gyrus, in the triangular part of the right inferior frontal gyrus and in the right precentral gyrus, showed a group-by-age interaction effect (see **Table 3** and **Figure 5** for details). The former two clusters were associated with adult controls, the third with young controls.

## **UNDIFFERENTIATED CLUSTERS**

Seven clusters, located in the pre-SMA, bilaterally, in the triangular part of the right inferior frontal gyrus, in the right insula, in the

**Table 2 | Clusters with significant**

 **group specificity.** **Table 3 | Group-by-task, group-by-age, and task-by-age distribution of the activation peaks included in each of the four clusters showing significant interaction between two factors.**


### **Table 3 | Continued**


*x, y, and z refer to stereotactic coordinates of the centroid of each cluster.*

right middle cingulum, and in the right inferior occipital gyrus, did not showed a group-related preferential association (*p* > 0.5 in the binomial test) and were classified as undifferentiated activations (see **Table 4** for details).

### **ADDITIONAL ANALYSES**

Because of the historical importance of the theories behind the scenes of the MT/V5 and cerebellar findings, *ad-hoc* special analysis was made for these two sets of findings.

### *MT/V5*

We first identified "group" average stereotactic coordinates from Eden et al. (1996) from their eight normal subjects. This was done by using the same hierarchical clustering software of the metaanalysis. The centroid stereotactic locations of the MT/V5 region were located at *X* = -52; *Y* = -75; *Z* = 7; the SDs were: 11, 8, 5 mm in the three directions; on the right, the stereotactic coordinates were *X* = 50; *Y* = −70; *Z* = 5; the SDs were: 8, 8, 3 mm (areas in orange in **Figure 6**). As expected, Eden's et al. (1996) clusters fell within the statistical maps for visual motion perception described in Danelli et al. (2013).

We explored the anatomical congruence of these clusters with those that proved significant in the comparisons of controls and dyslexics in the meta-analysis. We also compared the Eden's MT/V5 location with the distribution of the raw data of the activations that were significantly larger in controls than in dyslexia. None of these analyses showed a systematic overlap of Eden's et al. (1996) MT/V5 and the data from other experiments on dyslexia.

We also compared the clusters associated with controls or dyslexics in the present study (see **Table 2**) with the mapping of the magnocellular system as identified by Danelli et al. (2013). There was one area of overlap (shared with an overlap for the motor learning task of Danelli et al., 2013) in one cluster located in the left superior parietal lobule (cluster L34). The experiments that generated these clusters in the data-set considered in this paper were based on phonological tasks, on a motor task in one case and on a visuo-spatial attentional tasks.

### *Cerebellum*

There were five clusters identified by the general meta-analysis in the cerebellum (see **Table 5** and **Figure 7**). These regions were identified by a variety of reading-like tasks (66 peaks overall) and non-reading-like tasks (11 peaks), with no specific association with either normal controls or developmental dyslexics for any of these clusters. Of the 77 peaks, only three peaks came from a comparison controls > dyslexics, 8 came from the comparison




*Co, controls; D, Dyslexics; NR, non-reading-like tasks; R, reading-like tasks; A, Adults; Ch, Children.*

**FIGURE 7 | Cerebellar clusters identified in both HC and ALE meta-analyses. In none of these there was a significant association with normal controls.**

dyslexics > controls, 37 came from simple effects in the controls and 29 from simple effects in the dyslexics. Three of such clusters were significantly associated with data coming from adult volunteers.

### *Meta-analysis restricted to the non-reading-like tasks*

The hierarchical algorithm identified a total of 85 clusters with 2–10 peaks each, and had mean standard deviation along the three axes of 4.56 mm (x-axis), 5.21 mm (y-axis) and 5.09 mm (zaxis). After the comparison of these results with ALE maps, 40 out of 85 clusters were considered for subsequent analyses.

The *post-hoc* analyses identified only one cluster that could be associated with the normal controls to a statistically greater (*p* = 0.05) degree than with dyslexics: the cluster (centroid coordinates: *X* = 35; *y* = −41; *Z* = 41; the SDs were: 5, 6, 7 mm) was located in the right inferior parietal cortex. The cluster included peaks from motor tasks (#3), auditory perception tasks (#3) and three dimensional visual discrimination tasks (#3). The centroid of this cluster is very close to that of cluster R86 (centroid coordinates: *X* = 37; *y* = −45; *Z* = 43; the SDs were: 5, 5, 7 mm) of the general analysis in which again specificity for controls (i.e., lack of activation for dyslexics) was seen. This cluster was not observed in the ALE analysis.

## **DISCUSSION**

The problem of co-occurrence of neural dysfunctions in dyslexia remained not explored by meta-analytic studies to date; it remained to be seen whether, besides the well replicated finding of a left occipito-temporal hypoactivation, there is a systematic co-occurrence of dysfunctional patterns of different functional systems, perhaps converging on the same brain regions associated with the reading deficit. Such evidence would be relevant for theories like, for example, the magnocellular or the cerebellar ones, which postulate a more basic and possibly more broadly distributed disorder in dyslexia.

In the present study this issue was tested by submitting to the meta-analysis all the suitable data7 from the literature on dyslexia, published up to September 2013, independently from the nature of the task, the materials, and the age-groups. Functional interpretation of the regional effects was made by direct exploration of the cluster compositions, by looking at

<sup>7</sup>See inclusion criteria for the raw data in the methods section.

the group or task that generated one effect, and appropriate *post-hoc* statistical tests, when possible, for broad categories. In addition, our evaluations of the results were also based on a direct comparison with the functional mapping of the reading, auditory phonological, visual magnocellular and visual motion, motor/cerebellar systems, and their intersections, as described by Danelli et al. (2013) for normal subjects using fMRI.

Our meta-analysis confirms one major milestone of previous empirical imaging studies and previous meta-analyses on dyslexia: the commitment to reading in normal controls for left occipito-temporal cortex (Paulesu et al., 2000; Cohen et al., 2002; Price et al., 2003), and the lack of such commitment in the same region for dyslexics (Shaywitz et al., 1998; Paulesu et al., 2001; Maisog et al., 2008; Richlan et al., 2009).

Because of the finer grained analysis afforded by our method, and thanks to the comparison with the independent fMRI data of Danelli et al. (2013), the same left occipito-temporal region identified by previous meta-analyses using ALE (Maisog et al., 2008; Richlan et al., 2009), was fractionated into three different clusters preferentially associated with the normal controls (L5, L6, and L23): clusters L5 and L23 are most likely associated with initial visual processing of the orthographic strings, while cluster L6 with the integration orthography with phonology.

By comparison with the data of Cohen et al. (2004) these corresponded to the visual-word form area (L5, VWFA), to the lateral inferior temporal multimodal region (L6; LIMA), and to an intermediate area (L23) that, to the best of our knowledge, has not been further characterized in the literature as yet.

There was more in our new data. The two more dorsal regions in the middle temporal and supra-marginal gyri (L86 and L89), were associated with the normal controls for a mixture of tasks including reading but also active phonological manipulation tasks, involving some working memory demands. These were not identified in the intersection paper of Danelli et al. (2013) most likely because the auditory phonological task had minimal demands in terms of manipulation and working memory processes.

Further up more dorsally, there was a new set of left hemispheric regions with different functional associations in parietal and premotor cortices and the supplementary motor area: these were brought about by a mix of reading, motor, phonological manipulation and visual attention tasks. Interestingly, once compared with Danelli et al. (2013) maps, some clusters overlapped with motoric regions (inferior parietal and SMA), while the left superior parietal lobule cluster overlapped with an intersection of motor learning and visual motion perception maps.

It is also worth noting that the meta-analysis restricted to the non-reading-like tasks revealed a right hemispheric inferior parietal cluster (R86; centroid coordinates: *X* = 37; *y* = −45; *Z* = 43; the SDs were: 5, 5, 7 mm) preferentially associated with normal readers, that is not present in developmental dyslexics. However, a more lateral right parietal cluster was preferential for the dyslexics (see **Figure 4**). There is overwhelming evidence of a role of the right parietal cortex in spatial attention (for review see Vallar et al., 2003; Corbetta and Shulman, 2011). The disorganized response in the right inferior parietal cortex of dyslexics (in some cases "more active," in other cases "less active") may be evidence for an anatomically grounded dysfunctional right hemispheric spatial attentional system in dyslexia.

On the other hand, no evidence was found either for a cerebellar dysfunction, nor for a left inferior frontal cortex hyperactivation in dyslexics, as in the previous meta-analyses (Maisog et al., 2008; Richlan et al., 2009).

In addition, we could not find evidence for the visual/magnocellular hypothesis of dyslexia, if this was to be benchmarked by a reduced recruitment of area V5/MT (Eden et al., 1996).

These findings expand previous evidence on the presence of functional anatomical deficits in dyslexia and identify a ventral to dorsal functional gradient with the more ventral areas, normally involved in the decoding aspect of reading (from orthography to phonology), the intermediate middle temporal and supra-marginal areas being related to readinglike behaviors or phonological processing and the more dorsal group being involved in reading but also in motoric or visual motion perception aspects of functional anatomy. We argue that the more dorsal left parietal and premotor cortex might be normally associated with eye-movement control or with visuo-spatial attention in language specific tasks. These would be functionally associated with the left hemispheric network of reading in normal controls but not in subjects with dyslexia.

This evidence brings new fuel for those believing in the existence of multiple dysfunctional systems in dyslexia without implying the need for focal and highly localized hypo-activations, preferentially associated with single classes of non-reading-like tasks. Rather, this new evidence speaks in favor of a distributed set of local malfunctions in "associative" regions normally involved in more than one behavior/cognitive domain. At a quantitative level, the number of peaks that contributed to the identification of these group-specific clusters was fairly balanced: 50 peaks for the occipito-temporal clusters, 31 for the intermediate network and 66 for the more dorsal network. It is worth noting that the more dorsal clusters appear group "specific", or preferentially associated with normal readers, only if one considers the entire data-set rather than the non-reading-like behaviors on their own (data not shown). This result speaks in favor of our strategy of merging the data from multiple classes of tasks to reach a critical mass of observations.

## **COMPARISON WITH THE PREVIOUS LITERATURE WITH PARTICULAR REFERENCE TO META-ANALYSES**

There are other quantitative meta-analyses on the neural bases of developmental dyslexia in literature (Maisog et al., 2008; Richlan et al., 2009, 2011). In particular, these studies were focused on the task of reading or of reading-like behaviors, excluding auditory-verbal or non-linguistic tasks. Moreover, they included only peaks derived from group-by-task comparisons and used the ALE method (Maisog et al., 2008; Richlan et al., 2009). The potential advantages of our approach have been already commented upon.

Our findings are only partially consistent with the metaanalytic work published in literature. Indeed, control-related clusters emerged not only at the level of the left occipito-temporal and temporo-parietal regions, but also in the left middle temporal and parietal areas and in the supplementary motor cortex. Consistent with previous meta-analyses are also the more frequent subcortical effects for the dyslexics in the basal ganglia, for reading tasks.

Contrary to what described by Maisog et al. (2008) and Richlan et al. (2009), we could not find a reduced recruitment for the dyslexic group at the level of the left inferior frontal gyrus. However, it is worthy to note that a significant groupby-age interaction emerged in this area showing an association of this region with adult-control activations (as reported also by Richlan et al., 2011). The same area shows a "difficulty effect in phonological retrieval" in Cattinelli et al. (2013a) and Taylor et al. (2013) meta-analyses of reading, whereby the inferior frontal region is more active when reading non-words or lowfrequency irregular words. An interaction effect emerged also for the right inferior frontal gyrus in the present study, while an association with control-children in the right precentral gyrus was observed.

A final difference with Richlan et al. (2011) is the lack of group by age interactions in the left parietal and occipito-temporal cortex. While not significant, technically speaking, at a statistical level, we note that an age effect in the left ventral occipitotemporal cortex was present and it was driven by adult normal readers (20 peaks for the controls, 1 peak for dyslexics) rather than by young readers (young controls: 6 peaks; young dyslexics: 0 peaks) and it is worth recalling that overall the number of "adult" and "young" peaks is balanced across the entire data-set. This is consistent with the idea that time is needed before the occipito-temporal cortex develops a neural expertise for reading (Dehaene et al., 2010).

Recently, Richlan et al. (2013) described a ALE-based metaanalysis of VBM data from dyslexia studies. The paper was mainly concerned with gray matter effects, as the papers that reported white matter abnormalities were only two (Eckert et al., 2005; Silani et al., 2005). The main trust of the paper is that there are reproducible reductions of gray matter in the left superior temporal sulcus; one of the coordinates described by them is consistent with the centroid of our cluster L86. The fact that functional imaging data show broader differences between normal controls and dyslexics when compared with the VBM ones, is a further argument in favor of the hypothesis that an abnormally wired cortex, rather than a focally damaged one, may better explain the functional disorder of dyslexia (see Silani et al., 2005, for further discussion; see also Paulesu et al., 1995; Klingberg et al., 2000).

### **VISUAL MAGNOCELLULAR AND CEREBELLAR THEORIES: CHASING THE WRONG USUAL SUSPECTS?**

As discussed in the introduction there is a non-negligible evidence of a visuo-perceptual deficit in children with dyslexia and some evidence for motoric deficits. The neural counterpart of these deficits has been sought by using visual motion perception tasks or motor learning tasks. The visual motion perception experiment of Eden et al. (1996)is the one that sits less comfortably with our results as we could not find a cluster in V5/MT, and of course, nor a group specific effect there. This difficulty may in part arise by the fact that the testing of the visual-magnocellular/attentional hypothesis has somewhat limited attention in the literature or by the fact that the main replication of the V5/MT finding was made using region of interest analyses (Demb et al., 1998), which were not included in our study.

Our attempt to test the V5 hypothesis by all means (see the results section) failed to identify a congruence with any of the effects described in the dyslexia literature, Eden et al. (1996) and Demb et al. (1998) excluded. However, our finding is consistent with more recent evidence on area V5. In a recent study, again based on a region of interest analysis of the data (preceded by a localizer experiment) Olulade et al. (2013) were able to show that if the dyslexics and the controls are equated for reading age rather than by age per-se, a significant difference in V5/MT cannot be found. A rehabilitation program on reading had a carry-over effect on V5/MT response (Olulade et al., 2013).

However, the visual magnocellular V5/MT hypothesis could be reformulated as a spatial attentional hypothesis. If so, activity in V5/MT may not be the best benchmark as discussed elsewhere (see Danelli et al., 2013, p. 2682). If the magnocellular hypothesis may still give an account for oculomotor control difficulties, there are better anatomical targets to be explored, for example, the dorsal premotor and parietal areas that we found less frequently activated in dyslexia. In the same vein, the evidence for a disorganized recruitment of the right inferior parietal lobule in dyslexia in non-linguistic tasks is a potentially revealing finding for all the theorists of attentional hypotheses in dyslexia.

Similar considerations apply for the cerebellar hypothesis. This could be easily reformulated in terms of deficient fine-grained motor control/learning without an a-priori commitment to the cerebellum. Indeed, none of the tasks whose deficit is attributed to the cerebellum by the believers in the cerebellar hypothesis, can be univocally and exclusively attributed to that organ: posture tasks, walking tasks, subtle finger coordination or bimanual tasks, motor learning tasks all depend on widely distributed neural systems in which the cerebellum is just one of the players (Kandel et al., 2012).

We found reduced recruitment of a series of motor regions in which there was a mixture of peaks derived from reading-like and non-reading-like tasks. Observation of these focal effects may contribute to a re-evaluation of motoric disorders in developmental dyslexia. Of course, fresh new experiments are needed to further address this hypothesis.

## **THE CONTRIBUTION OF FUNCTIONAL CONNECTIVITY APPROACHES AND THE DISCONNECTION HYPOTHESIS OF DYSLEXIA**

Finally, our data could be discussed in the context of a more network based approach, such as that provided by functional or effective connectivity analyses. It has been repeatedly suggested that dyslexia could be associated with a failure of the functional interaction between distant brain regions that subserve diverse, perhaps elementary, cognitive operations needed for the task of reading and the like (Paulesu et al., 1995; Horwitz et al., 1998). These regions should have greater functional and effective connectivity in normal controls. Even though this disconnection hypothesis of dyslexia is particularly dear to us (Paulesu et al., 1995; Klingberg et al., 2000; Silani et al., 2005), the number of the connectivity studies is still limited and therefore our analysis was concentrated on classical studies based on univariate assessments of regional effects. There are two classes of such connectivity studies: task-based and resting state studies. Task-based studies reported reduced functional connectivity between reading-related areas like the left angular gyrus and the occipito-temporal cortex (Horwitz et al., 1998) or the occipitotemporal cortex and the frontal cortex (van der Mark et al., 2011; Finn et al., 2013; Schurz et al., 2014) <sup>8</sup> . In one study (Finn et al., 2013), a stronger right hemispheric connectivity for dyslexics was described. In the lone dynamic causal modeling (DCM) study performed on developmental dyslexia to date (Cao et al., 2008), reduced modulatory effects and connectivity were demonstrated in a temporo-parietal network for visual rhyming trials with conflicting orthography/phonology. It is worthy to note that task-based connectivity studies have an important limitation: the connectivity patterns explored are task dependent, the number of connections explored are limited in some cases (e.g., when using DCM), and different patterns could be produced by different reading tasks (see for example Levy et al., 2009); as a consequence, different dysfunctional patterns could emerge from the comparison between controls and dyslexics depending on the task under examination (Pugh et al., 2000). Resting-state connectivity studies, independent component analysis (ICA) studies (Wolf et al., 2010) or the technique proposed by Finn et al. (2013), may be more task-independent9 and better suited to test broader dysfunctions: while the ICA studies (Wolf et al., 2010) are difficult to interpret because one has to make assumptions on the functional meaning of the identified components and their comparability across different groups, the seed-based resting state functional connectivity studies have shown a reduction of connectivity between reading specific areas and regions not strictly involved in reading tasks, like, for example, between the left inferior parietal lobule and the left dorsal middle frontal areas (Koyama et al., 2013).

Taken together, these results are in line with the present findings, as they support the hypothesis that dyslexia could be the consequence of the co-occurrence of distributed dysfunctional patterns of different functional systems (see also Schurz et al., 2014): our data, however, also suggest a more limited degree of convergence of the multiple systems on high-level regions involved in reading-like as much as in non-reading-like tasks, particularly for the dorsal network identified here. Similar conclusions have not been made on the basis of a single study, even if based on a connectivity analysis. However, a more explicit demonstration of this general principle in the same sample of subjects is still in need.

## **CONCLUSIONS**

Taken together our results provide a partial reconciliation of different accounts of dyslexia, those more concerned with the decoding problem of dyslexia, the underlying phonological deficit and the deficit in the conversion from orthography to phonology, and those more focused on motoric and visuo-attentional problems. Interestingly, the more dorsally one moves within the system identified here, the more the contribution of non-reading-like tasks becomes relevant with a mixture of phonological awareness tasks and motoric/attentional tasks.

In at least one cluster, it was possible to make an indirect reference to a likely component of the magnocellular cortical network thanks to its intersection with the visuo-motor perception maps and motor learning maps of Danelli et al. (2013). The same cluster was observed in an independent meta-analysis on reading by Cattinelli et al. (2013a), the cluster being associated with reading tasks that are more demanding (e.g., as in pseudoword reading) because the stimuli seek greater visuo-attentional resources and require a finer grained control of eye-movement. The right inferior parietal cluster is also giving support to a multidimensional account of dyslexia. It would have been hard to make these conclusions on the basis of a single experiment or with a conventional meta-analysis based on ultra-specific and similar tasks. Yet, as we value the original contributions of the colleagues who produced the 53 papers submitted to a meta-analysis here, we urge the readers to refer to that original work for further discussions of the functional anatomical patterns of dyslexia.

### **ACKNOWLEDGMENTS**

We are grateful to all the colleagues who performed the hard work of collecting the empirical direct evidence on normal and dyslexic readers in the 53 papers included in this meta-analysis, or, indeed, previous meta-analyses. Even if it proved impossible to discuss their individual findings and their opinions in detail, at the very least their work is faithfully cited in the reference list of the manuscript. We are grateful to Silvia Brambilla for an initial contribution to this work for her master degree dissertation.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00830/abstract

**Table S1 | For each peak we reported the MNI coordinates (MNIx,y,z), the name of the first author, the journal and the year of publication of the article, the technique (PET or fMRI) and the stereotactic space used, the mean age of participants and the composition of the sample, the nature of the task, the nature of the functional contrast from which the peak was extracted.** Moreover, we reported, for each peak, the corresponding cluster ID and the cluster label. In particular, for each cluster we reported the coordinates of the centroid (Mean MNIx,y,z) and the standard deviation on the three axes (SDx,y,z), the corresponding anatomical area. Finally, we reported, for each peak, the binomial classifications (non-reading or reading-like task; children or adults, controls or dyslexics) used for the *post-hoc* analyses.

**Table S2 | Peak distribution in group-related clusters.**

<sup>8</sup>Finn et al. (2013) is a task-based study in which the weight of the task was regressed out. There are studies, however, in which reduced of connectivity was not found in dyslexics (Richards and Berninger, 2008). Here, an enhanced connectivity between the left inferior frontal gyrus and the right homolog was reported.

<sup>9</sup>The resting state condition, with the vaigue instructions attached, remains a task on its own.

## **REFERENCES**


Frith, U. (1999). Paradoxes in the definition of dyslexia. *Dyslexia* 5, 192–214.


'magnocellular' processing during normal and dyslexic reading: behavioural and fMRI investigations. *Dyslexia* 16, 258–282. doi: 10.1002/dys.409


from longitudinal ERP data supported by fMRI. *Neuroimage* 57, 714–722. doi: 10.1016/j.neuroimage.2010.10.055


and phonological processing in dyslexic men. *Arch. Neurol.* 54, 562–573. doi: 10.1001/archneur.1997.00550170042013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 July 2014; accepted: 29 September 2014; published online: 11 November 2014.*

*Citation: Paulesu E, Danelli L and Berlingeri M (2014) Reading the dyslexic brain: multiple dysfunctional routes revealed by a new meta-analysis of PET and fMRI activation studies. Front. Hum. Neurosci. 8:830. doi: 10.3389/fnhum.2014.00830 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Paulesu, Danelli and Berlingeri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Functional neuroanatomy of developmental dyslexia: the role of orthographic depth

## **Fabio Richlan\***

Centre for Neurocognitive Research and Department of Psychology, University of Salzburg, Salzburg, Austria

### **Edited by:**

Donatella Spinelli, Università di Roma Foro Italico, Italy

### **Reviewed by:**

Urs Maurer, University of Zurich, Switzerland Maaike Vandermosten, KU Leuven, Belgium

#### **\*Correspondence:**

Fabio Richlan, Centre for Neurocognitive Research and Department of Psychology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria e-mail: fabio.richlan@sbg.ac.at

Orthographic depth (OD) (i.e., the complexity, consistency, or transparency of grapheme-phoneme correspondences in written alphabetic language) plays an important role in the acquisition of reading skills. Correspondingly, developmental dyslexia is characterized by different behavioral manifestations across languages varying in OD. This review focuses on the question of whether these different behavioral manifestations are associated with different functional neuroanatomical manifestations. It provides a review and critique of cross-linguistic brain imaging studies of developmental dyslexia. In addition, it includes an analysis of state-of-the-art functional neuroanatomical models of developmental dyslexia together with orthography-specific predictions derived from these models. These predictions should be tested in future brain imaging studies of typical and atypical reading in order to refine the current neurobiological understanding of developmental dyslexia, especially with respect to orthography-specific and universal aspects.

**Keywords: brain, developmental dyslexia, fMRI, language, neuroimaging, orthography, PET, reading**

In this Review Article I will discuss current advances and future directions in the neurobiological understanding of developmental dyslexia. For this purpose, I will focus on brain imaging studies and will elaborate on the question of whether different behavioral manifestations of dyslexia across languages may be associated with different functional neuroanatomical manifestations. This question was not dealt with in previous review articles in the field (e.g., Pugh et al., 2000; Temple, 2002; McCandliss and Noble, 2003; Démonet et al., 2004; Heim and Keil, 2004; Sandak et al., 2004; Shaywitz and Shaywitz, 2005; Schlaggar and McCandliss, 2007). Of main interest will be whether orthographic depth (OD)—a well-known factor in reading acquisition—has an influence on the brain activation pattern during non-impaired and dyslexic reading. For this purpose, I will begin with a review of relevant studies, followed by a critique of some of these studies. Although the focus of the present paper is on OD in alphabetic writing systems, I will also cover cross-cultural studies comparing alphabetic with syllabic or logographic writing systems. This topic is of immediate interest as it can contribute to the understanding of universal and orthography-specific neurobiological manifestations of developmental dyslexia (Frost, 2012). Finally, I will put forward some model-based orthography-specific predictions of classical as well as newer functional neuroanatomical conceptions of developmental dyslexia, which may serve as blueprint for future hypothesis-driven brain imaging studies.

## **ORTHOGRAPHIC DEPTH AND READING ACQUISITION IN ALPHABETIC WRITING SYSTEMS**

OD refers to the complexity, consistency, or transparency of grapheme-phoneme correspondences in written alphabetic language (Frost et al., 1987). A deep (or highly complex or inconsistent or opaque) orthography like English is characterized by multi-letter graphemes, context-dependent rules, and morphological effects resulting in a many-to-many mapping of graphemes to phonemes. In contrast, a shallow (or little complex or consistent or transparent) orthography like Finnish is characterized by consistent one-to-one mapping of graphemes to phonemes (Seymour et al., 2003).

OD has been identified as one of the most important environmental factors influencing learning to read (e.g., Seymour et al., 2003; Landerl et al., 2013). It has a direct effect on how easy or difficult it is for children to translate a new letter string into a phonological code by which phonological word forms can be accessed. The idea is that in deep orthographies phonology has to be retrieved from stored memory representations (i.e., from an internal lexicon), whereas in shallow orthographies phonology can be derived relatively easily and directly from print. The ability to translate letter strings into a phonological code is called phonological recoding and was labeled the *sine qua non* of reading acquisition. It provides the prerequisite for a self-teaching mechanism that enables a young reader to autonomously establish an orthographic lexicon (Share, 1995).

It has been shown numerous times that children exhibit different behavioral performance according to the language they are learning to read. The usual finding is a marked word and pseudoword reading accuracy advantage of children learning to read in a shallow orthography (e.g., Dutch, Finnish, German, Greek, Italian, Spanish) over children learning to read in a deep orthography (English). This pertains to non-impaired (e.g., Wimmer and Goswami, 1994; Cossu et al., 1995; Frith et al., 1998; Aro and Wimmer, 2003; Seymour et al., 2003; Bergmann and Wimmer, 2008; Zoccolotti et al., 2009; Georgiou et al., 2012) as well as to impaired reading acquisition (i.e., developmental dyslexia) (e.g., Wimmer, 1993; Landerl et al., 1997; Landerl and Wimmer, 2000; Spinelli et al., 2005; Zoccolotti et al., 2005; Barca et al., 2006; Davies et al., 2007; Wimmer and Schurz, 2010).

Some accounts, however, emphasize the commonalities between reading in deep and shallow orthographies over their differences (e.g., Ziegler et al., 2003, 2010; Caravolas et al., 2012, 2013). For example, Ziegler et al. (2010) investigated whether the role of different cognitive predictors of reading development (phonological awareness (PA), rapid automatized naming (RAN), phonological short-term memory, vocabulary, and nonverbal IQ) varies with OD. They showed that, although its influence is weaker in shallow compared with deep orthographies, PA is a relatively universal predictor of reading performance in alphabetic languages. Likewise, developmental dyslexia was characterized by similar deficits (overall slow reading, increased difficulties with nonwords compared with words, and slow and effortful phonological decoding) in a shallow orthography (German) and a deep orthography (English) (Ziegler et al., 2003).

With respect to the developmental pattern of cognitive predictors, Vaessen et al. (2010) found a strong contribution of PA to reading fluency in beginning readers, followed by a gradual shift towards stronger contribution of RAN in more proficient readers. Importantly, this general developmental shift was not influenced by OD of the three studied languages (Hungarian, Dutch, Portuguese). The contribution of PA to reading fluency, however, was important for a longer period of time in deeper orthographies. Likewise, Moll et al. (2014) confirmed that PA and RAN both account for significant amounts of unique variance in literacy development across five orthographies (English, French, German, Hungarian, Finnish). In all studied languages, PA was the best predictor of reading accuracy and spelling, whereas RAN was the best predictor of reading speed. With respect to developmental dyslexia, Landerl et al. (2013) showed that both PA and RAN were strong concurrent predictors of reading problems. The influence of PA and RAN was larger in deeper orthographies, in which more participants were correctly classified as dyslexic. In sum, the results suggest that the same cognitive components underlie reading development in deep and shallow orthographies, but to a different degree that varies as a function of reading level.

One attempt to explain the differences in reading speed and reading accuracy across orthographies is psycholinguistic grain size theory (Ziegler and Goswami, 2005, 2006). This theory postulates that the behavioral differences can be attributed to differences in the size of the orthographic units on which phonological recoding is based. Specifically, readers in shallow orthographies can rely on small psycholinguistic grain size (i.e., single letters or letter clusters corresponding to single phonemes) because grapheme-phoneme correspondences are relatively consistent. In contrast, readers of deep orthographies additionally have to rely on larger psycholinguistic grain size (i.e., letter patterns corresponding to rimes, syllables, or even whole words), which are more consistent compared with the relatively inconsistent grapheme-phoneme correspondences. It was shown that especially the continuous switching between small unit recoding and large unit recoding strategies leads to the reading accuracy disadvantage in deep orthographies (Goswami et al., 2003). In addition, it is far more difficult for a beginning reader to remember the mapping from orthography to phonology based on the vast amount of letter pattern-rime/syllable correspondences compared with the limited number of grapheme-phoneme correspondences.

## **ORTHOGRAPHIC DEPTH AND BRAIN IMAGING IN ALPHABETIC WRITING SYSTEMS**

Cross-linguistic brain imaging studies are extremely laborious and difficult to conduct. They require well-matched designs and samples and face many practical problems (e.g., availability and comparability of assessment tools, differences in the school system, socio-economic factors, matching of stimuli, data acquisition protocols, etc.). Therefore, it is not surprising that to date only few cross-linguistic brain imaging studies have been published. Another approach focuses on bilingual or biliterate participants and searches for the effect of OD on brain activation within participants. The findings from these two types of studies will be reviewed below. In addition, there are promising attempts to investigate the influence of OD by means of artificial language training studies (Mei et al., 2013; Taylor et al., 2014). These studies can potentially contribute to the understanding of languagerelated differences in reading development.

Motivated by findings on behavioral differences between readers in deep and shallow orthographies, Paulesu et al. (2000) conducted a seminal positron emission tomography (PET) study. They compared brain activation during word and nonword reading in Italian and English skilled adult readers. With a conjunction analysis, they identified a largely left-lateralized brain network showing common activation in both groups. Specifically, this network included left inferior frontal (IFG), precentral (PreG), fusiform (FFG), inferior (ITG) and middle temporal (MTG) regions as well as bilateral superior temporal gyri (STG).

In addition, orthography-specific effects were investigated in a direct comparison between Italian and English readers. In general, orthography-specific effects were reflected in quantitative rather than qualitative differences in brain activation. That is, the very same brain regions were active in both languages, but to a different degree and spatial extent. Specifically, the direct comparison identified the left posterior STG at the junction to the parietal cortex with higher activation in Italian readers and the left posterior ITG and anterior IFG with higher activation in English readers. The STG activation was interpreted as reflecting enhanced involvement of phonological processing, whereas the ITG and IFG activation was interpreted as reflecting enhanced involvement of the orthographic lexicon. The left IFG region was also associated with semantic processing. In sum, the results were taken as evidence for the shaping of the neurobiological systems for reading through specific properties (i.e., OD) of the written language.

In a follow-up study, Paulesu et al. (2001) investigated whether such language-related brain activation effects would also pertain to developmental dyslexia. They acquired PET scans from nonimpaired and dyslexic university students from Italy, France, and the UK during the same activation tasks used in their earlier study. The main finding was the identification of a large left hemisphere cluster comprising STG, MTG, and ITG as well as middle occipital gyri with higher activation in non-impaired readers compared with dyslexic readers, irrespective of orthography. Vice versa, no regions were identified with higher activation in dyslexic readers compared with non-impaired readers.

With respect to orthography-specific effects, Paulesu et al. (2001) could confirm their earlier findings on non-impaired reading. That is, they replicated the findings on the English nonimpaired readers with the French non-impaired readers (which they classified as readers in a deep orthography). Crucially, however, no orthography-specific effects were found in the direct comparison of the dyslexic subsamples from the three languages varying in OD. Taken together with the fact that all of the dyslexic participants were selected based on a marked phonological deficit, the brain imaging results were interpreted as evidence for a universal neurocognitive basis of developmental dyslexia.

Subsequently, Silani et al. (2005) acquired structural magnetic resonance imaging (MRI) data in addition to the functional PET activation data of the Italian, French, and English participants of Paulesu et al. (2001). Their voxel-based morphometry (VBM) analysis sought to investigate the correspondence between regional dysfunctions (i.e., underactivation) and anatomical alterations (i.e., with respect to the cortical surface and underlying fibers) in developmental dyslexia. For gray matter density, consistent reduction in dyslexic readers across all three orthographies was identified in the left MTG (with consistent augmentation in an adjacent left posterior MTG region). For white matter density, consistent reduction in dyslexic readers across all three orthographies was identified underneath the left IFG, postcentral, and supramarginal cortices. Similar to the functional activation study, no orthography-specific effects of dyslexia were found.

A different approach to investigate orthography-specific effects in reading-related brain activation was recently put forward by Das et al. (2011). They studied mono- and biliterate English and Hindi adult readers during reading aloud English and Hindi words. Furthermore, the group of biliterate readers were divided into those who learnt to read both languages at the age of 5 (simultaneous biliterate readers) and those who learnt to read Hindi at the age of 5 and English at the age of 10 (sequential biliterate readers). Crucially, only biliterate adults who learnt to read both languages simultaneously at the age of 5 showed a similar activation pattern as monoliterates, that is, left ITG activation for English (deep orthography) and left inferior parietal lobule (IPL) activation for Hindi (shallow orthography). During English word reading, the sequential biliterates did not exhibit the left ITG activation found in simultaneous biliterates and English monoliterates. The divergence between simultaneous and sequential biliterate readers speaks for early orthography-specific functional tuning of reading networks in the brain that persists into adulthood. This unique fMRI study is particularly impressive because it shows orthography-specific effects within participants rather than between participants.

Similarly, Bar-Kochva and Breznitz (2012) used a withinsubjects design to investigate the effect OD on brain activation during reading by means of event-related potentials (ERPs). They studied Hebrew speakers, which were familiar with two forms of script (pointed and unpointed) varying in OD. During a lexical decision task, the shallow pointed script evoked larger amplitudes around 165 ms over occipito-temporal (OT) electrodes, whereas the deep unpointed script evoked larger amplitudes around 340 ms over occipito-parietal electrodes. The same authors found these effects also in adult dyslexic readers, but with reduced and delayed amplitudes. These results were interpreted as a failure in dyslexic readers to fine-tune their reading strategies to the particular demands imposed by the deep and shallow orthographies (Bar-Kochva and Breznitz, 2014).

## **BRAIN IMAGING COMPARING ALPHABETIC WITH SYLLABIC OR LOGOGRAPHIC WRITING SYSTEMS**

In addition to comparing written alphabetic languages with varying OD, there were attempts to compare brain activation of proficient readers in alphabetic writing systems with Chinese (logographic), Japanese Kana (syllabic), and Japanese Kanji (morpho-syllabic). In their coordinate-based meta-analysis, Bolger et al. (2005) found convergence of reading-related activation of all four writing systems in left STG, IFG, and OT regions. As expected, the activation patterns of the different writing systems also differed to some degree, mainly with respect to extension of clusters. Specifically, divergence was identified in a posterior aspect of the left STG (with higher activation for Western and Kana writing systems), in an anterior aspect of the left IFG (with higher activation for Chinese), and in the right OT cortex (again with higher activation for Chinese). The higher activation for the alphabetic and syllabic writing systems in the left posterior STG was interpreted as reflecting the mapping of written symbols to fine-grained speech sounds (phonemes and syllables)—in contrast to mapping to whole-word phonology in the case of Chinese and Kanji. The higher activation for Chinese (logographic) in the left anterior IFG was interpreted as reflecting synchronous processing of semantic and phonological information that is—due to the high number of homophones in Chinese—required for unambiguous identification of written symbols. Finally, the higher activation for Chinese in the right OT cortex was interpreted as reflecting global and low spatial frequency processing of the spatial arrangement of the written symbols. In line with these meta-analytic findings, a newer near-infrared spectroscopy (NIRS) study comparing English and Chinese readers during a homophone judgment task identified higher activation in the left STG in English readers and higher activation in the left middle frontal gyrus (MFG) in Chinese readers (Chen et al., 2008).

Another study with biliterate participants in Korean Hangul (phonographic) and Hanja (logographic) showed that the level of reading proficiency in the logographic orthography modulated the reading strategy and, correspondingly, the brain activation pattern during processing of the phonographic orthography (Jeon, 2012). Specifically, highly skilled readers, relying on the lexical route, activated anterior cingulate, MFG, and OT regions, whereas less skilled readers, relying on the sublexical route, activated IPL and IFG regions.

A developmental difference between English and Chinese readers was recently observed in an fMRI study using a word pair rhyme judgment task (Brennan et al., 2013). A network of left hemisphere regions (including STG, IPL, and IFG) showed an increase of activation in English adults compared with children but not in Chinese adults compared with children. This finding was taken as evidence for reorganization of the left hemisphere phonological network in readers of alphabetic but not in readers of logographic writing systems, possibly as a result from the differences in psycholinguistic grain size, with smaller units in English compared with Chinese.

First evidence for writing system-related brain activation abnormalities in developmental dyslexia was reported by Siok et al. (2004). They found marked underactivation of the left MFG in dyslexic compared with non-impaired Chinese children during homophone judgment and lexical decision. Further indication for a crucial influence of the writing system on the neurobiological manifestation of developmental dyslexia was provided in a followup study (Siok et al., 2008). Chinese dyslexic children not only exhibited functional underactivation of the left MFG in response to a rhyme judgment task but also showed reduced gray matter volume of this region compared with their age-matched nonimpaired peers. Interestingly, the Chinese dyslexic readers did not exhibit the left posterior underactivation, which is distinctive of dyslexic readers in alphabetic writing systems. The unique engagement of the left MFG in Chinese non-impaired reading (and failure of engagement in dyslexia) was explained by the strong involvement of motor processes during learning to read Chinese. Children in primary school spend a lot of time copying newly learned characters and this likely involves recruitment of the left MFG just anterior to the motor cortex. It was shown that handwriting skills are the best predictor of reading ability, with both supported by long-term graphomotor memories of characters (Tan et al., 2005).

Although these data seem to challenge the universality of neurocognitive explanations of dyslexia, Ziegler (2006) argues that the phonological deficit theory still accounts for the problems of Chinese dyslexic readers. Instead of the mapping of graphemes onto phonemes, the phonological deficit of Chinese dyslexics lies in the association of complex graphomotor programs of logographs to whole-word phonology. That is, the universality of the phonological deficit is still valid, but its putative association with a left STG dysfunction is not. Another possibility is that the left MFG—as part of the central executive—subserves a coordination and integration function of orthographic, phonological, and semantic information, which is particularly important for Chinese reading (Perfetti et al., 2006).

In a direct cross-linguistic comparison, it was recently shown that the brain activation differences between dyslexic and nonimpaired readers of Chinese and English are not that massive as previously thought. In a well-conceived fMRI study, Hu et al. (2010) found writing system-specific activation differences during a semantic word matching task between the two groups of non-impaired readers but not between the two groups of dyslexic readers. Specifically, Chinese non-impaired readers exhibited higher activation compared with English non-impaired readers in the left IFG sulcus and lower activation in left posterior superior temporal sulcus. Crucially, dyslexic readers of both languages showed reliable activation in these two regions, indicating the use of a similar reading strategy. The dyslexic readers shared, however, a common pattern of underactivation relative to non-impaired readers in the left MFG, left posterior MTG, left angular gyrus, and left OT sulcus. Thus, the functional neuroanatomical manifestation of dyslexia in English and Chinese is similar when a reading task with demands on semantic processing is used.

## **METHODOLOGICAL CONSIDERATIONS**

Cross-linguistic brain imaging studies of developmental dyslexia have also been the target of serious criticism. For example, Hadzibeganovic et al. (2010) questioned the biological unity account of dyslexia because of both conceptual and methodological issues of some of the above mentioned studies. In particular, they focused their critique on the seminal studies by Paulesu et al. (2001) and Silani et al. (2005), which had the biggest impact on the field. The problems raised include (i) missing subtyping of dyslexia cases; (ii) differences in selection of participants across the three countries; and (iii) discounting of differences in cognitive demands for reading diverse orthographies.

Some of the points of criticism of Hadzibeganovic et al. (2010) are valid; however, I want to clarify the crucial aspects by providing some explanations for why the studies of the Paulesu group did not identify orthography-specific effects in the neurobiology of developmental dyslexia. First of all—and most importantly— Paulesu et al. (2001) and Silani et al. (2005) did not claim that all developmental dyslexics have the same brain abnormality. Rather, they showed that there is some shared component across the three alphabetic orthographies (Italian, French, English) namely left posterior underactivation as well as reduced gray and white matter density. Although their finding of an absence of orthography-specific effects is suggestive of a complete overlap of brain abnormality patterns, the much more probable scenario based on evidence from studies comparing different writing systems (e.g., Bolger et al., 2005; Hu et al., 2010)—is that there is some core dysfunction present in dyslexia in all writing systems with additional language-related variations and extensions. This means that there is a shared component (the core dysfunction), but with orthography-specific differences based on the particular properties of the language and the reader's experience. It is plausible to assume that the nature of orthography-specific differences is quantitatively rather than qualitatively. That is, that differential weighing of cognitive components is reflected mainly in the degree and spatial extent of activation clusters rather than in variation of anatomical location (Paulesu et al., 2000). There are several reasonable explanations for why the Paulesu et al. (2001) and Silani et al. (2005) studies did not identify these fine-grained language-related variations. These possibilities will be spelled out in detail below.

The logic behind Paulesu et al.'s search for orthographyspecific effects in developmental dyslexia was not ideal. Paulesu et al. (2001) directly compared the activation patterns of the dyslexic readers across the three languages. It would be, however, more sensible to compare the abnormality patterns of the dyslexic readers (relative to the non-impaired readers) across the three languages. As an example, imagine the following situation (illustrated in **Figure 1A**): Italian but not English non-impaired readers show strong activation of the left STG. Both Italian and

English dyslexic readers show weak activation of the left STG. In the case of the Italian dyslexics, let us assume that this weak activation is the result of a specific deficit of a cognitive process supported by this region, whereas in the case of the English dyslexics, it is simply the result of little requirement by the deep orthography of English to engage this very same process (hence only weak activation in non-impaired readers as well). A direct comparison of Italian versus English dyslexic readers the strategy used by Paulesu et al. (2001)—would not identify this region with an orthography-specific deficit. In contrast, a comparison of the underactivation pattern (i.e., non-impaired > dyslexic) between Italian versus English would identify this region. The latter strategy is all the more sensible, given that there are known brain activation differences between Italian and English non-impaired readers (Paulesu et al., 2000), and dyslexiarelated dysfunctions can be supposed to be associated with these very same regions.

Moreover, the strategy to search for orthography-specific differences in dyslexic under- or overactivation would yield reasonable results for the hypothetical situations illustrated in **Figures 1B–D**. Situation D deserves closer attention. Here, both non-impaired and dyslexic readers show a language-related effect, that is, higher reading-related activation in a shallow compared with a deep orthography. Crucially, however, there is no effect of dyslexia within a language. Therefore, this activation pattern should not be considered as showing an orthography-specific dyslexic abnormality pattern. Again, the proposed search strategy would yield a reasonable result because it would not identify a region with this activation pattern. With the knowledge from their previous study, that is, an orthography-specific effect in non-impaired readers (Paulesu et al., 2000), and the strategy to directly compare dyslexic readers across languages, it seems like Paulesu et al. (2001) searched for such a pattern of general language-related differences. This is, however, rather uninformative when it comes to orthography-specific dyslexic activation abnormality patterns.

A further possible explanation for why the Paulesu et al. (2001) and Silani et al. (2005) studies did not find orthographyspecific dyslexic abnormalities relates to the small number of participants (six participants per group, per language, and per activation task) resulting in low statistical power (Button et al., 2013). In addition, as already suggested by Paulesu et al. (2001), the dyslexic readers may have used idiosyncratic and interindividually heterogeneous reading strategies resulting in less consistent group-level brain activation. This would be in line with evidence for the engagement of inter-individually diverse neuronal networks for reading (Seghier et al., 2008; Kherif et al., 2009; Richardson et al., 2011). Finally, PET imaging (Paulesu et al., 2001) and the VBM method (Silani et al., 2005) are subject to inherent limitations such as low temporal and spatial resolution and reliance on block-designs (PET), and mis-registration of images, mis-classification of tissue, and a neuroanatomically unspecific measurement of local gray matter volume or density (e.g., Mechelli et al., 2005; Richlan et al., 2013b), which may obscure subtle and fine-grained orthography-specific differences.

## **PREDICTIONS DERIVED FROM FUNCTIONAL NEUROANATOMICAL MODELS**

As argued above, it is not surprising that Paulesu et al. (2001) and Silani et al. (2005) did not find evidence for differences in the neurocognitive deficits between dyslexic readers in deep and shallow orthographies despite the documented orthography-specific effects in non-impaired readers (e.g., Paulesu et al., 2000; Das et al., 2011). As evident from the review of behavioral studies, the usual finding is that successful reading acquisition is based on the same cognitive components in deep and shallow orthographies. The general pattern of early contribution of PA and later contribution of RAN is independent of OD. What varies across languages is the degree to which these components contribute over time (Vaessen et al., 2010). In addition, it makes a difference whether reading accuracy or reading speed is assessed (Moll et al., 2014).

Functional neuroanatomical models of developmental dyslexia (e.g., Pugh et al., 2000; Richlan, 2012) provide a basis for testable hypotheses about different brain activation patterns in non-impaired and dyslexic readers between languages differing in OD. Although these models are not explicit in stating orthography-specific predictions, their architecture (e.g., which brain regions are engaged by certain cognitive processes) allows one to derive hypotheses about expected brain activation patterns. These model-based predictions will be described below. In line with the behavioral findings on the contribution of cognitive components to reading, activation of brain regions across orthographies is not a matter of all or none but rather a matter of degree. In addition, due to the usual spatial smoothness of functional brain imaging data, a higher level of brain activation can be expressed in larger spatial extent of activation clusters. Thus, the functional neuroanatomical models do not predict involvement of completely different brain regions across orthographies, but rather activation of the same brain regions to a different degree and extent (Pugh et al., 2005).

In addition to tuning of local brain activation it was put forward that the development of skilled reading relies on systemslevel plasticity (i.e., on changes in the interactions between brain regions) (Schlaggar and McCandliss, 2007). The idea is that brain regions that are already partially active at the beginning of learning to read become better connected over time (both structurally and functionally), thus providing the basis for the development of skilled reading. This interactive specialization concept is incorporated in the predictions of the newer functional neuroanatomical model (Richlan, 2012). It relies on the many neuroimaging studies from the last years that investigated reading-related structural connectivity by means of diffusion tensor imaging (DTI; e.g., Ben-Shachar et al., 2007; Hoeft et al., 2011; Vandermosten et al., 2012; Boets et al., 2013; Thiebaut de Schotten et al., 2014) or functional and effective connectivity by means of both task-based and resting-state fMRI (e.g., Richardson et al., 2011; van der Mark et al., 2011; Koyama et al., 2013; Vogel et al., 2014; Schurz et al., 2014). In addition to this MRI-based research, valuable information on inter-regional functional coupling can be gained from the time course of activation of relevant brain regions via temporally precise techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) (for a recent review see Carreiras et al., 2014).

## **THE CLASSICAL MODEL**

**Figure 2A** illustrates the predictions for reading-related brain activation in non-impaired and dyslexic readers of deep and shallow orthographies based on the classical model by Pugh et al. (2000). This seminal model and its subsequent variations (e.g., McCandliss and Noble, 2003; Démonet et al., 2004; Sandak et al., 2004; Pugh et al., 2005) propose engagement of the left dorsal temporo-parietal (TP) cortex (including the posterior STG and the supramarginal and angular gyri of the IPL) during phonology-based reading processes (i.e., graphemephoneme conversion, phonological assembly) in non-impaired readers and a corresponding dysfunction (reflected in absent or reduced activation) in dyslexic readers. Consequently, for shallow orthographies, one would predict reading-related activation in non-impaired children and adults and underactivation in dyslexic readers. For deep orthographies, in contrast, one would expect activation only in non-impaired children or in tasks requiring phonology-based reading or explicit phonological analysis. The dyslexic readers, due to their proposed primary phonological TP deficit, would exhibit underactivation. In sum, the left dorsal TP system dominates at the beginning of learning to read in typically developing children, irrespective of orthography. Dyslexic readers, however, fail to properly activate this system.

Furthermore, the classical model proposes engagement of the left ventral OT cortex (including posterior ITG and FFG) during memory-based orthographic word recognition. In skilled readers, this system becomes the critical support for fast and efficient reading. In dyslexic readers, a secondary deficit of the left ventral OT system is assumed to follow from a primary deficit in left dorsal TP regions. The predictions for shallow orthographies are intermediate activation in non-impaired readers irrespective of reading age (unless explicit orthographic tasks require high engagement of this region) and little activation in dyslexic readers. For deep orthographies the predictions are strong activation in non-impaired adults and advanced children (that is, as soon as orthographic representations are built up) and underactivation in dyslexics. Therefore, the universal reading speed deficit of dyslexic readers across languages is thought to be reflected in underactivation of the left ventral OT cortex (Pugh, 2006).

Finally, the classical model includes a third anterior reading circuit, which is located in the left IFG. Its function is assumed to include (among others) speech-gestural articulatory recoding of print. According to the model and regardless of orthography, dyslexic readers should exhibit overactivation of this region and of additional right hemisphere posterior regions compared with non-impaired readers in order to compensate for their dysfunction in the two left posterior regions. This overactivation, however, is not present at the beginning of learning to read, but increases with age (Shaywitz et al., 2002).

## **THE NEW MODEL**

## **LEFT INFERIOR PARIETAL LOBULE**

**Figure 2B** illustrates the predictions for reading-related activation based on a newer model by Richlan (2012). The involved regions are largely the same (with some subtle anatomical variations) but the assumed functions in non-impaired reading and the associated dysfunctions in dyslexic reading are crucially different. Importantly, the model by Richlan (2012) divides the left TP circuit of Pugh et al. (2000) into a more dorsal IPL part adjacent to the intra-parietal sulcus and a more ventral STG part around the posterior sylvian fissure. The former was proposed to be engaged by more general attentional mechanisms, which are not exclusively related to reading (Shaywitz and Shaywitz, 2008). In the left dorsal IPL, as evidenced by meta-analysis, the typical finding is increased task-negative activation in dyslexic compared with non-impaired children during reading or reading-related processes (Richlan et al., 2011). That is, non-impaired readers show weak (de)activation relative to a low-level visual baseline, whereas dyslexic readers exhibit marked deactivation relative to baseline. This task-induced interruption of baseline activation was interpreted as reflecting greater mental effort during reading in dyslexic readers.

Note, however, that the left dorsal IPL can also be activated by non-impaired readers depending on the task and stimulus requirements. In this case, the typical finding is reduced taskpositive activation in dyslexic readers (e.g., Cao et al., 2006; van der Mark et al., 2009; Richlan et al., 2010; Wimmer et al., 2010) and disrupted functional connectivity between the left dorsal IPL and the left ventral OT cortex (Cao et al., 2008; van der Mark et al., 2011). It is possible that the left dorsal IPL is involved in shifting attention from letter to letter within a string (e.g., Behrmann et al., 2004; Wager et al., 2004; Rosazza et al., 2009; Cabeza et al., 2012) and thus subserves serial decoding. This function is needed during reading based on

**model by Richlan (2012)**. Gray gradient bars represent activation that is only

cortex, PreG = precentral gyrus, STG = superior temporal gyrus, TP = temporo-parietal cortex.

grapheme-phoneme conversion but is irrelevant during reading based on whole-word representations. Accordingly, the left dorsal IPL was consistently identified with higher activation in response to pseudoword reading compared with word reading in a metaanalysis of 36 neuroimaging studies of typical readers (Taylor et al., 2013).

Assuming a functional role of the left dorsal IPL subserving serial decoding in non-impaired readers and a dysfunction in dyslexic readers, the prediction would be reduced dyslexic taskpositive activation in shallow orthographies (with reliance on rule-based grapheme-phoneme conversion) or when serial processing is emphasized by task or stimulus demands (e.g., Cohen et al., 2008; Rosazza et al., 2009). In contrast, in deep orthographies (with reliance on memory-based word recognition) or when visual-orthographic whole-word processing is predominant, one would expect little engagement of the left dorsal IPL in non-impaired readers. Dyslexic readers, however, would exhibit increased task-negative activation (i.e., deactivation) in response to greater mental effort during reading, reflecting an interruption of the baseline activation of the left dorsal IPL (Richlan et al., 2011). Note that the orthography-specific predictions would be the same based on psycholinguistic grain size theory (Ziegler and Goswami, 2005, 2006). That is, non-impaired readers in shallow orthographies should rely more on the serial attention shifting mechanism in the left dorsal IPL compared with non-impaired readers in deep orthographies, due to the smaller size of the orthographic units. In order to distinguish whether dyslexic underactivation relative to non-impaired readers stems from differences in task-positive or task-negative activation, it is indispensable for future fMRI studies to include rest blocks (in the case of block-design fMRI) or appropriate interstimulus intervals and null-events (in the case of event-related fMRI).

With respect to neuroanatomy, it is important to note that the left dorsal IPL clusters found in the meta-analyses of dyslexic brain activation abnormalities (Richlan et al., 2009, 2011) correspond more to the supramarginal gyrus than to the angular gyrus. Among the functions discussed above, the former is thought to be involved in phonological processes, whereas the latter is thought to be involved in semantic processes (e.g., Vigneau et al., 2006; Binder et al., 2009; Cabeza et al., 2012; Carter and Huettel, 2013). This classical subdivision, however, might be too coarse (Seghier and Price, 2012). Recent evidence from studies on cytoarchitectonics (Caspers et al., 2006, 2008), receptor architectonics (Caspers et al., 2013), structural connectivity (Mars et al., 2011, 2012), and functional connectivity (Yeo et al., 2011; Bzdok et al., 2013) speaks for much more fine-grained parcellation of the parietal cortex into multiple subdivisions.

## **LEFT SUPERIOR TEMPORAL GYRUS**

The second left TP region (left posterior STG adjacent to the sylvian fissure) is not assigned a key role in grapheme-phoneme conversion in the new model. This assumption stands in marked contrast to the classical model, in which this region is proposed to dominate at the beginning of learning to read. Consequently, in the new model it is not assumed that a primary deficit in the left STG leads to a secondary deficit in the left ventral OT cortex in dyslexic readers. Instead, the left perisylvian TP region seems to be involved when explicit fine-grained phonological analysis is required (e.g., Griffiths and Warren, 2002; Hickok et al., 2011; DeWitt and Rauschecker, 2012; Price, 2012) or when information from auditory linguistic inputs and visual linguistic inputs (i.e., speech sounds and letters) has to be integrated (e.g., van Atteveldt et al., 2004; Blau et al., 2010). Hence, activation of this region is predicted when the task involves unimodal auditory or bimodal audiovisual processing. For unimodal visual processing, engagement of this region is typically not reported, unless the task involves demanding phonological analysis. In a recent metaanalysis the left STG was not considered to be part of the reading network (Taylor et al., 2013).

There is emerging evidence that the neural correlates of multisensory letter-speech sound integration might be modulated by OD. Specifically, a study with English adult readers (Holloway et al., 2013) did not find congruency effects for letter-speech sound pairs in the STG as were previously reported in a similar study with Dutch adult readers (van Atteveldt et al., 2004). With respect to brain plasticity, it was shown that reading development has an influence on activation in the left STG regions associated with phonological processing, and that this influence is stronger in alphabetic compared with logographic writing systems (Brennan et al., 2013). In addition, a recent meta-analysis on structural brain abnormalities in dyslexia identified reduced gray matter volume in bilateral perisylvian TP regions, possibly reflecting reduced tuning of the phonological network as a result of reduced reading experience in dyslexics (Richlan et al., 2013b). Therefore, one may speculate that, opposite to the developmental assumption of the classical model, a primary left ventral OT dysfunction results in a secondary left STG dysfunction in dyslexia. The influence of OD within alphabetic writing systems on this developmental effect, however, is still a blank spot on the map.

## **LEFT VENTRAL OCCIPITO-TEMPORAL CORTEX**

One may ask where in the brain, if not in the left perisylvian TP cortex, grapheme-phoneme conversion should be located in the new model. The idea is that the left ventral OT cortex is associated with both visual-orthographic whole-word processing and serial grapheme-phoneme conversion. Among others (e.g., Xu et al., 2001; Mechelli et al., 2003; Binder et al., 2005; Kronbichler et al., 2007, 2009; Bruno et al., 2008; Brem et al., 2010; Ludersdorfer et al., 2013), evidence comes from an fMRI study (Schurz et al., 2010), in which non-impaired German readers exhibited a length by lexicality interaction in the left ventral OT cortex (i.e., an increase of activation with increasing number of letters for pseudowords but not for words). German dyslexic readers exhibited overall lower activation and failed to show the modulation of activation by length of pseudowords (Richlan et al., 2010).

At least for German, other fMRI studies have also shown that dyslexic underactivation is more pronounced when orthographically unfamiliar stimuli (e.g., pseudowords or pseudohomophones) impose higher demands on phonological processing compared with orthographically familiar stimuli (words) (van der Mark et al., 2009; Wimmer et al., 2010). Future studies in other shallow orthographies (e.g., Dutch, Italian, or Spanish) are expected to yield similar results. Therefore, the prediction for shallow orthographies is intermediate activation for nonimpaired readers with increasing activation when graphemephoneme conversion is required by task or stimulus demands (e.g., pseudowords). Dyslexic readers are expected to exhibit weak overall activation and failure to increase activation in response to unfamiliar stimuli.

Higher left ventral OT cortex activation for unfamiliar compared with familiar letter strings is a common finding also in the English-based literature and was interpreted as a reflection of sustained task-related top-down processing (Dehaene and Cohen, 2011). A different explanation was put forward by Price and Devlin (2011). In their interactive account, higher activation for unfamiliar compared with familiar letter strings in the left ventral OT cortex is explained by higher prediction error (i.e., the difference between bottom-up visual information and top-down predictions). The top-down predictions are generated automatically from prior experience in higher cortical levels that contribute to representing phonology, semantics, and actions. This view is in line with the role of the left ventral OT cortex in grapheme-phoneme conversion in the new model. In the Interactive Account, the left ventral OT underactivation exhibited by dyslexic readers is interpreted as failure to establish hierarchical connections and access top-down predictions. As top-down predictions from phonology and semantics play an important role in reading irrespective of OD, the left ventral OT activation pattern is expected to be similar in deep and shallow orthographies.

### **LEFT INFERIOR FRONTAL GYRUS**

A further difference between the classical model (Pugh et al., 2000) and the new model (Richlan, 2012) refers to the left anterior reading component. In contrast to the classical model, the new model—supported by findings from meta-analyses (Richlan et al., 2009, 2011)—proposes a subdivision of the left anterior system into an IFG region and a dorsal precentral region. The former was consistently identified with dyslexic underactivation, whereas the latter was consistently identified with dyslexic overactivation.

The left IFG underactivation is thought to reflect the problem of dyslexic readers to access phonological output representations (Ramus and Szenkovits, 2008). This notion was recently supported by a study combining multivoxel pattern analysis, and functional and structural connectivity analysis (Boets et al., 2013). The main finding was reduced functional coupling in an auditory phoneme discrimination task and reduced white matter integrity as measured by DTI between left IFG and STG regions in dyslexic readers. In addition, the left IFG is assumed to have strong reciprocal connections and to interact with the left ventral OT cortex during non-impaired reading (e.g., Catani et al., 2005; Ben-Shachar et al., 2007; van der Mark et al., 2011; Vandermosten et al., 2012; Yeatman et al., 2013; Schurz et al., 2014).

Up to now, there are no indications of essential differences in dyslexic underactivation in shallow versus deep orthographies in the left IFG. Some accounts speak for engagement of the left IFG in grapheme-phoneme conversion (e.g., Jobard et al., 2003) or lexical access (e.g., Heim et al., 2013). There is, however, room for speculation because the IFG is a heterogeneous region, which is not only characterized by anatomical subdivisions based on neurotransmitter receptor architectonics (Amunts et al., 2010), but was associated with various different cognitive and emotional processes (e.g., Laird et al., 2011; Price, 2012; Richlan et al., 2013a).

## **LEFT DORSAL PRECENTRAL GYRUS**

In line with the classical model, the left dorsal PreG was consistently identified with overactivation in dyslexic children and adults (Richlan et al., 2009, 2011). This overactivation is assumed to reflect compensatory reliance on articulatory processes during reading from an early age on. The left dorsal PreG is part of the sublexical phonological decoding route and typically identified with higher activation for pseudowords compared with words in non-impaired readers (Price, 2012; Taylor et al., 2013). Interestingly, in our study with dyslexic adolescents and young adults (Richlan et al., 2010), it was the only region which showed higher activation for pseudowords compared with words together with a length effect for pseudowords in dyslexic readers. With respect to OD, no differences are assumed in left dorsal PreG overactivation between dyslexic readers in deep and shallow orthographies, because of universal overreliance on articulatory processes.

## **FUNCTIONAL INTEGRATION/INTERACTIVE SPECIALIZATION**

Following Schlaggar and McCandliss (2007), the new model incorporates the concept of interactive specialization, that is, the idea that the development of skilled reading relies on the functional integration of distributed brain regions. Changes through development are not only assumed to take place in local brain modules (reflected in tuning of regional activation patterns and structural cortical plasticity), but also on the systems-level (reflected in alterations in functional coupling and white matter connectivity between brain regions). As already mentioned in Section Predictions Derived from Functional Neuroanatomical Models, a number of studies investigated the functional and structural neuroanatomy of reading from this systems-level perspective.

A main focus was on connectivity of the left ventral OT cortex with other language-related brain regions. There is good evidence from studies with non-impaired readers (e.g., Koyama et al., 2011; Vogel et al., 2012), developmental dyslexic readers (e.g., Shaywitz et al., 2003; van der Mark et al., 2011), and acquired dyslexic readers (e.g., Epelbaum et al., 2008; Seghier et al., 2012; Woodhead et al., 2013), that integration of the left ventral OT cortex with frontal and parietal regions is vital for fast and efficient reading (Price and Devlin, 2011). In addition to functional integration, it was shown that skilled adult readers show functional segregation (i.e., decoupling) of the reading network with the typically tasknegative default mode network (Koyama et al., 2011).

The connections between brain regions in **Figure 2B** should be taken as illustration of the interactive specialization framework. For reasons of simplicity, all possible connections between brain regions are drawn, but the assumption is not that all of the brain regions interact with each other in any given situation. Instead, the idea is that different parts of the overall network interact in flexible and temporal dynamic ways depending on the required cognitive processes for a given task or stimulus.

Based on the evidence available up to now, it is impossible to reliably predict differences in connectivity patterns between deep and shallow orthographies. Studies aimed at these differences, however, are likely to shed light on the functional neuroanatomical reflection of OD, despite potentially very subtle differences in local brain activation profiles. Therefore, I look forward to innovative future studies investigating the effect of OD within alphabetic writing systems and differences between alphabetic and other writing systems by means of structural, functional, and effective connectivity in task-based as well as resting-state fMRI.

## **SUMMARY AND CONCLUSION**

To sum up, dyslexia-related differences between deep and shallow orthographies can be expected in a variety of left hemisphere brain regions, depending on task and stimulus demands and age of participants. The two models (Pugh et al., 2000; Richlan, 2012) differ in many respects as for how they predict the degree and extent of engagement in these regions. In addition, differences between deep and shallow orthographies are likely to be reflected in the dynamic interactions between brain regions.

Evidence from cross-linguistic brain imaging studies on developmental dyslexia is scarce. The different approaches of classical between-subjects designs, within-subjects designs (in the case of bilingual participants), and artificial orthography learning paradigms should be continued and expanded in the future. In addition, meta-analysis might provide a valuable tool to synthesize and compare a high number of original studies, which were conducted within a single language. A comparable strategy was already successfully applied in the investigation of child and adult studies of developmental dyslexia (Richlan et al., 2011).

The investigation of typical and atypical reading processes in different orthographies yields important implications for the neurobiological understanding of developmental dyslexia. The given variations in OD and the role of English as an "outlier" orthography (Share, 2008) should be considered as an opportunity to test the current neurocognitive models and to refine them. The present review article contributes to this endeavor by providing orthography-specific predictions derived from two distinct conceptions of the functional neuroanatomy of nonimpaired and dyslexic reading. These predictions should be tested in future brain imaging studies of reading.

## **ACKNOWLEDGMENTS**

I would like to thank Julia Sophia Crone, Benjamin Gagl, Stefan Hawelka, Florian Hutzler, Robin Litt, Anna Martin, Matthias Schurz, Sarah Schuster, Lorenzo Vignali, and Heinz Wimmer for their feedback during the preparation of this manuscript. This work was supported by the Austrian Science Fund (FWF P 23916- B18 and P 25799-B23).

### **REFERENCES**


in children and adults. *J. Neurosci.* 31, 8617–8624. doi: 10.1523/jneurosci.4865- 10.2011


**Conflict of Interest Statement**: The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 January 2014; accepted: 08 May 2014; published online: 20 May 2014*. *Citation: Richlan F (2014) Functional neuroanatomy of developmental dyslexia: the role of orthographic depth. Front. Hum. Neurosci. 8:347. doi: 10.3389/fnhum.2014. 00347*

*This article was submitted to the journal Frontiers in Human Neuroscience*. *Copyright © 2014 Richlan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## What does the brain of children with developmental dyslexia tell us about reading improvement? ERP evidence from an intervention study

## *Sandra Hasko\*, Katarina Groth , Jennifer Bruder , Jürgen Bartling and Gerd Schulte-Körne*

*Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital Munich, Munich, Germany*

### *Edited by:*

*Donatella Spinelli, Università di Roma "Foro Italico," Italy*

#### *Reviewed by:*

*Donatella Spinelli, Università di Roma "Foro Italico," Italy Barbara Penolazzi, University of Padua, Italy*

#### *\*Correspondence:*

*Sandra Hasko, Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital Munich, Pettenkoferstr. 8a, 80336 Munich, Germany e-mail: sandra.hasko@ med.uni-muenchen.de*

Intervention is key to managing developmental dyslexia (DD), but not all children with DD benefit from treatment. Some children improve (improvers, IMP), whereas others do not improve (non-improvers, NIMP). Neurobiological differences between IMP and NIMP have been suggested, but studies comparing IMP and NIMP in childhood are missing. The present study examined whether ERP patterns change with treatment and differ between IMP and NIMP. We investigated the ERPs of 28 children with DD and 25 control children (CON) while performing a phonological lexical decision (PLD) task before and after a 6-month intervention. After intervention children with DD were divided into IMP (*n* = 11) and NIMP (*n* = 17). In the PLD–task children were visually presented with words, pseudohomophones, pseudowords, and false fonts and had to decide whether the presented stimulus sounded like an existing German word or not. Prior to intervention IMP showed higher N300 amplitudes over fronto-temporal electrodes compared to NIMP and CON and N400 amplitudes were attenuated in both IMP and NIMP compared to CON. After intervention N300 amplitudes of IMP were comparable to those of CON and NIMP. This suggests that the N300, which has been related to phonological access of orthographic stimuli and integration of orthographic and phonological representations, might index a compensatory mechanism or precursor that facilitates reading improvement. The N400, which is thought to reflect grapheme-phoneme conversion or the access to the orthographic lexicon increased in IMP from pre to post and was comparable to CON after intervention. Correlations between N300 amplitudes pre, growth in reading ability and N400 amplitudes post indicated that higher N300 amplitudes might be important for reading improvement and increase in N400 amplitudes. The results suggest that children with DD, showing the same cognitive profile might differ regarding their neuronal profile which could further influence reading improvement.

**Keywords: developmental dyslexia, intervention, treatment, improvement, non-improvement, electrophysiology, N400, N300**

### **INTRODUCTION**

Developmental dyslexia (DD) is characterized by severe problems in learning to read properly and is often accompanied by a comorbid spelling disorder. These difficulties arise unexpectedly, because affected children and adults possess the intelligence, motivation, and educational opportunities required for language acquisition and they do not suffer from neurological or sensory deficits (DSM-5: APA, 2013). With prevalence rates around 4–9%, DD is one of the most common specific developmental disorders (Shaywitz et al., 1990; Katusic et al., 2001; Esser et al., 2002). DD accompanies the individuals throughout their lifespan and interferes with academic achievement and professional success (Shaywitz et al., 1999; Daniel et al., 2006; Willcutt et al., 2007). In addition around 40% of children with DD suffer from comorbid psychiatric disorders, especially from externalizing disorders, low school-related self-esteem, and depressive symptoms, as a consequence of their failure in acquiring adequate reading and spelling skills (Willcutt and Pennington, 2000; Arnold et al., 2005; Daniel et al., 2006; Goldston et al., 2007; Willcutt et al., 2007; Mugnaini et al., 2009). Therefore, the attainment of sustainable intervention effects in children with DD is crucial.

In contrast, the empirical state of research for evidencebased evaluation of interventions for children with DD is low. Current meta-analyses quantified the effectiveness of treatment approaches on reading and spelling disabilities and reported only marginal to average effect sizes (Ise et al., 2012; Galuschka et al., 2014). Because DD has a neurobiological basis (e.g., Shaywitz et al., 2007; Shaywitz and Shaywitz, 2008; Caylak, 2009; Richlan, 2012; Richlan et al., 2013) it is important to understand how interventions work on the neuronal level. Does intervention normalize neuronal activity of children with DD? Or does intervention lead to an enhancement of compensatory mechanisms? A better understanding of treatment related changes on the neuronal level might help to refine intervention programs in order to make treatment more effective.

In addition, meta-analyses reported high heterogeneity between the effect sizes of different studies for both reading and spelling interventions (National Institute of Child Health and Human Development, 2000; Ise et al., 2012; McArthur et al., 2012; Galuschka et al., 2014). Weak and inconsistent effect sizes might amongst others arise by inclusion of participants who do not improve during intervention (non-improvers; NIMP). This assumption is supported by studies indicating that up to 30% of struggling readers do not benefit from intervention (Shanahan and Barr, 1995; Vaughn et al., 2003). A better understanding of neuronal differences between children who improve during intervention (improvers; IMP) and children who continue to struggle might help to predict treatment response and to further establish intervention programs adapted to the special needs of the latter.

Against this background, the aim of the present study was twofold. On the one hand we were interested in investigating which neurophysiological changes occur during treatment. A further goal was to explore whether there might be any pre-existing neurophysiological differences, between IMP and NIMP.

Over the past decade researchers began to focus on the neuronal processes related to inefficient reading and spelling abilities to understand the efficacy of reading and spelling interventions. Treatment-related functional changes have been observed in the neuronal reading network. Aberrant activation patterns in the subsystems of the neuronal reading network including posterior occipito-temporal and parieto-temporal regions as well as inferior-frontal areas in DD have been established (Shaywitz et al., 2007; Shaywitz and Shaywitz, 2008; Caylak, 2009; Richlan, 2012; Richlan et al., 2013). Compared to typically developing children, children with DD show a hypoactivation in the posterior subsystems of the left hemispheric reading network, which was found to be accompanied by an overactivation in homolog right hemispheric regions during performing language tasks (Simos et al., 2002; Demonet et al., 2004; Kronbichler et al., 2007; Shaywitz and Shaywitz, 2008; Richlan et al., 2009). With respect to the inferiorfrontal subsystem results are less homogeneous. Some studies report hypoactivation (Paulesu et al., 1996; Wimmer et al., 2010; for meta-analyses see Richlan et al., 2009, 2011) whereas others observed hyperactivation in subjects with DD (Salmelin et al., 1996; Shaywitz et al., 1998; Brunswick et al., 1999; for review see Pugh et al., 2000; Sandak et al., 2004). Furthermore, disconnectivity between posterior and frontal subsystems (Paulesu et al., 1996) as well as the two posterior subsystems (Shaywitz et al., 2002) of the neuronal reading network has been described. After intervention a normalization of activation in the neuronal reading network has been observed in English speaking children (Simos et al., 2002, 2006, 2007b; Aylward et al., 2003; Temple et al., 2003; Shaywitz et al., 2004; Richards et al., 2007; Meyler et al., 2008) and adults with DD (Eden et al., 2004). Furthermore, it has been described that the connectivity between reading-related areas is normalized after treatment (Richards and Berninger, 2008; Keller and Just, 2009). Treatment-related changes have been also found using electrophysiology. Researchers observed changes in several reading-related event-related potential (ERP) measures (MMN: Kujala et al., 2001; Huotilainen et al., 2011; Lovio et al., 2012; P100: Mayseless, 2011; N170: Jucla et al., 2009; Spironelli et al., 2010; P300: Santos et al., 2007; Jucla et al., 2009) as well as in EEG frequency bands (Penolazzi et al., 2010; Weiss et al., 2010) after intervention.

It has been suggested that different neurobiological processing disorders might cause DD and that these differences in brain development within the group of children with DD might further influence improvement in literacy skills during treatment (Noble and McCandliss, 2005). However, studies examining whether there might be neurophysiological differences prior to receiving intervention between IMP and NIMP are less common. To the best of our knowledge only eight studies differentiated between IMP and NIMP (Simos et al., 2005, 2007a; Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Rezaie et al., 2011a,b; Molfese et al., 2013).

Six out of these eight studies focused on neuronal differences between IMP and NIMP after intervention. In most studies this was the consequence of applying a cross-sectional design, which investigated neurophysiological activity only after intervention (Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Molfese et al., 2013). These cross-sectional studies reported on normal activation patterns throughout the reading network in IMP after intervention or on brain mechanisms which are known to have a compensatory function (Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Molfese et al., 2013). In contrast, NIMP who had persistent deficits in reading performance were marked by aberrant activation patterns throughout the reading network (Odegard et al., 2008; Davis et al., 2011), deficiencies in ERP measures (Molfese et al., 2013) and lower functional connectivity between reading-related brain areas (Farris et al., 2011). Furthermore, two longitudinal studies conducted by Simos et al. (2005, 2007a) reported on similar spatial and temporal brain activation patterns in normal developing children and 6–8-year-old (Simos et al., 2005) and 8–10-year-old (Simos et al., 2007a) IMP after intervention, which was not observed in NIMP. However, Simos et al. (2005, 2007a) did not report on pre-existing differences between IMP and NIMP. Small sample sizes and confounding variables such as wide age range probably mask pre-existing differences, which might be expected if different neurobiological processing disorders underlie DD and influence improvement during intervention (Noble and McCandliss, 2005). In line with this assumption, Rezaie et al. (2011a,b) reported on pre-existing differences between adolescent IMP and NIMP using MEG. In contrast to control children (CON) and IMP, children, who did not improve in reading ability displayed reduced activity in left middle- and superior-temporal gyri, left supramarginal and angular gyrus and ventral occipito-temporal regions as well as in the right parahippocampal gyrus (Rezaie et al., 2011a,b). Furthermore, NIMP displayed reduced activity in the superior- and medial-temporal gyrus of both hemispheres compared to CON (Rezaie et al., 2011b). No differences in these areas were found between CON and IMP. Interestingly, the degree of activation in these regions predicted improvement during intervention, suggesting that preexisting neuronal activity might influence improvement during treatment.

To summarize, neuronal differences between IMP and NIMP have been reported before (Rezaie et al., 2011a,b) and after intervention (Simos et al., 2007a; Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Molfese et al., 2013). Even though these studies provide interesting information about IMP and NIMP their informative value is limited due to methodological difficulties. First the cross-sectional design of most studies (Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Rezaie et al., 2011a,b; Molfese et al., 2013) makes clear interpretation of the results difficult. Second the inclusion criterion for DD within most of the studies was not very strict (below the 25th for Rezaie et al., 2011a,b; below the 30th percentile for Simos et al., 2007a) or DD was assessed by non-standardized tests (Davis et al., 2011). This suggests that also normally developing children with somewhat poorer reading skills might have participated in previous studies. Third, differentiation between IMP and NIMP was not strict in most studies using either the median split or performance above and below of arbitrary defined percentile ranges in order to group IMP and NIMP (Simos et al., 2005, 2007a; Davis et al., 2011; Rezaie et al., 2011a,b; Molfese et al., 2013). Moreover small sample sizes, wide age ranges (Simos et al., 2007a; Odegard et al., 2008; Farris et al., 2011), differences in reading ability between IMP and NIMP before intervention (Simos et al., 2007a; Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011), partly rehabilitated NIMP (average skills in phonological awareness but not in word reading) and a big time lag between completion of the intervention and participation in the experiments (Odegard et al., 2008; Farris et al., 2011) are further methodological problems which have to be taken into account. In addition to the best of our knowledge, so far nothing has been reported about preexisting neurophysiological differences between IMP and NIMP in childhood. However, keeping the high number of children, who don't improve during interventions (Shanahan and Barr, 1995; Vaughn et al., 2003; Groth et al., 2013) and the therapy costs involved (Georgii et al., in review) in mind it is absolutely essential to better understand possible markers of improvement and non-improvement.

In order to investigate electrophysiological differences between IMP and NIMP before and after intervention in the present study we took advantage of the phonological lexical decision (PLD) task. In this task subjects are presented with real words (W), pseudohomophones (PH), pseudowords (PW), and false fonts (FF) and indicate whether the visually presented stimulus sounds like a real word or not (Kronbichler et al., 2007; van der Mark et al., 2009, 2011; Schurz et al., 2010; Wimmer et al., 2010; Hasko et al., 2013). One major advantage of the PLD—task, is the fact, that it is a continuous reading task, which allows to study both orthographic and phonological processing in one experiment (Hasko et al., 2013). The PLD—task taps orthographic processing on two levels. Firstly, by comparing the letter string material (W; PH; PW) to the visual control stimuli (FF) print sensitivity will be examined. Secondly, the contrast between orthographic familiar (W) and unfamiliar (PH; PW) word material, while controlling for phonology in the case of the contrast between W and PH provides information about the subjects' familiarity with orthographic representations. Furthermore, according to dual route models of reading (e.g., Coltheart et al., 1993, 2001) contrasting of unfamiliar (PH; PW) with familiar (W) word material also taps phonological processing because grapheme-phoneme correspondence (GPC) rules need to be applied in order to sound out the orthographic unfamiliar word material (see Hasko et al., 2013).

Using this task we recently proposed a temporal model of reading processes (Hasko et al., 2013) based on the assumption of dual route models of reading (Coltheart et al., 1993, 2001) in normal developing children and we found processing differences in children with DD. According to dual route models of reading (Coltheart et al., 1993, 2001) reading processes take place in a hierarchical manner. After identification of visual features (contrast, color, spatial frequency) of a letter string the first step of reading processes comprises the identification of letters (Coltheart et al., 1993, 2001). Our results show that the first component which is sensitive to print in contrast to non-orthographic stimuli (FF) is the N170 over occipito-temporal electrodes. At about 220 ms CON's N170 mean peak amplitudes are higher for orthographic material compared to FF indicating that letters are identified in this time window. After the identification of letters phonology of a letter string can be accessed in two different ways depending on the orthographic familiarity of the letter string. Familiar known words are read via the lexical route by accessing the orthographic representations in the orthographic lexicon and directly retrieving the corresponding phonological representations from the phonological lexicon. Whereas unfamiliar word forms, such as pseudohomophones and pseudowords or words for which the reader does not possess an entry in the orthographic lexicon are read by applying GPC rules in order to access the phonological representation (Coltheart et al., 1993, 2001). According to dual route models of reading these processes proceed in a parallel manner (Coltheart et al., 1993, 2001) and they occur at about 400 ms (Hasko et al., 2013). In normal developing children N400 amplitudes over centro-parietal electrodes were comparable high for W, PH, and PW suggesting that children rely on comparable reading processes for all letter strings. Thus, with respect to dual route models of reading the N400 might index the process of GPC or the searching process within in the orthographic lexicon. Access to the phonological lexicon in the PLD—task is indexed between 600 and 900 ms by a late positive complex (LPC) over left centro-parietal electrodes, which was higher for phonological familiar W and PH in contrast to PW in normally developing children. Processing differences dependent on the linguistic material in CON were observed only in the LPC, suggesting that similar reading processes were adopted independent of orthographic familiarity. With respect to children with DD our results indicated deficits on all processing steps. Firstly, a diminished mean area under the curve for the word material—FF contrasts in the time window of the N170 indicated that the degree of print sensitivity was reduced in the brain of children with DD. Secondly, reduced N400 amplitudes in children with DD pointed to less specified orthographic representations or impairments in accessing the orthographic lexicon or applying GPC rules. Lastly, the difference between phonological familiar and phonological unfamiliar word material was not found in children with DD suggesting an impaired access to phonological representations or an underspecification of phonological representations.

With respect to the first research question of the present study, namely which neurophysiological changes occur during treatment in children with DD we hypothesized to find effects on the N400. This was expected because the applied intervention programs worked on either orthographic knowledge or GPC, which is reflected by the N400. As found previously (see Hasko et al., 2013) we hypothesized to find higher N400 mean peak amplitudes before intervention for CON in contrast to IMP and NIMP. After intervention we expected that IMP might show an increase in N400 mean peak amplitudes, with the result that differences in N400 mean peak amplitudes between IMP and CON are diminished. No changes in N400 mean peak amplitudes over time were expected for CON and NIMP.

To answer our second research question whether there might be any neurophysiological differences between IMP and NIMP our analysis strategy was exploratory, as to the best of our knowledge there is no study, which allows deriving specific hypotheses regarding ERPs. However, previous MEG studies give us hints that differences between IMP and NIMP might be expected over temporo-parietal areas before intervention.

## **METHODS**

### **PARTICIPANTS**

As part of a longitudinal study 29 children without DD and 40 children with DD participated in the present study (for detailed description of recruitment procedure see Hasko et al., 2013). All children were tested regarding their reading and spelling abilities before and after intervention by means of German standardized tests. Common word and pseudoword reading fluency was assessed by using the one-minute-fluent reading-test (German: Ein-Minuten-Leseflüssigkeitstest [SLRT-II]; Moll and Landerl, 2010). In this measure, children are presented with a list of common words and pseudowords and are given one minute to read as many items as possible. Spelling was assessed with a basic vocabulary spelling test for grades 2–3 before intervention (German: Weingartener Grundwortschatz Rechtschreib-Test für zweite und dritte Klassen [WRT2+]; Birkel, 1994) and for grades 3–4 after intervention (German: Weingartener Grundwortschatz Rechtschreib-Test für dritte und vierte Klassen [WRT3+]; Birkel, 2007). In addition, reading comprehension was measured with a reading comprehension test for grades 1–6 (German: Leseverständnistest für Erst- bis Sechstklässler [ELFE 1-6]; Lenhard and Schneider, 2006). Moreover, measures of phonological awareness, rapid automatized naming (RAN) of numbers, letters, colors, and objects and working memory (digit span forwards and backwards from the Wechsler Intelligence Scale for Children IV; German: Hamburg-Wechsler-Intelligenztest für Kinder- IV [HAWIK-IV]; Petermann and Petermann, 2007) were taken.

In order to be included into the study the CON's common word reading fluency and spelling performance had to exceed the 25th percentile for both measures. Before intervention both the reading and the spelling score of children with DD had to diverge from the mean *T*-value for at least 1 SD (cutoff criteria was therefore set to a *T*-value of 40) and 1 SD from the IQ according to the regression criterion (Schulte-Körne et al., 2001). Thus, both a discrepancy of reading and spelling abilities from the class or age level, but also from the level expected on the basis of the child's intelligence was required for diagnosing DD. Children with DD were pseudorandomly assigned to one of two intervention programs. Three CON did not take part in the post treatment measurement and one CON had to be excluded from further analyses due to technical problems during EEG recording, resulting in 25 CON. From the children with DD one child started another intervention before our intervention period began and therefore recalled study participation resulting in a sample size of 39 children with DD. In the present study we were interested in the investigation of reading improvement during intervention. Therefore, children with DD were classified as IMP or NIMP after intervention according to their gain in common word reading fluency measured with the SLRT-II. Children were assigned to the group of IMP if their reading ability increased at least half SD from pre to post. We oriented our classification criteria based on results from current meta-analyses reporting effect sizes of *g* = 0.31 and *g* = 0.33 for reading interventions (Ise et al., 2012; Galuschka et al., 2014). Children whose ability did not change at all over time or did decrease from pre to post were classified as NIMP. According to this classification 12 children were identified as IMP, 21 as NIMP and 6 could not be assigned to one of the groups because their gain in common word reading fluency was between 1 and 4 *T*-values. One child from IMP and a total of 4 children from NIMP were excluded from further analyses due to excessive EEG artifacts, resulting in a sample size of 11 IMP and 17 NIMP.

Before intervention all groups had an average age of about 8 years (see **Table 1**). Gender was distributed similarly in all groups [χ<sup>2</sup> <sup>=</sup> <sup>1</sup>.35, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.51] and apart from 1 IMP and 4 NIMP all subjects were right-handed [χ<sup>2</sup> <sup>=</sup> <sup>6</sup>.56, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.04; see **Table 1**]. As can be seen in **Table 1** all children had an IQ within the normal range (≥ 85 IQ points; as measured with the Culture Fair Intelligence Test; CFT 1; Cattell et al., 1997), the IQ of CON was significantly higher than the IQ of IMP and NIMP (*p* < 0.05). Attention was assessed with the subscale "Attention Problems" of the Child-Behavior-Checklist (CBCL/1–4; Achenbach, 1991). The CBCL-score of all children was below the cut-off score (CBCLscore < 7 for girls and CBCL-score < 8 for boys, see **Table 1**). In all reading and spelling tests IMP and NIMP performed significantly worse than CON before and after intervention (*p* < 0.001; see **Table 1**). Furthermore, CON outperformed IMP and NIMP before and after intervention in phoneme deletion, all subtests of the RAN and working memory (*p* < 0.05). The only difference between IMP and NIMP, was found in reading comprehension where IMP performed significantly better than NIMP pre and post (*p* < 0.05). As expected due to group assignment the common word reading fluency increased significantly over time for IMP (*p* < 0.001) and IMP outperformed NIMP in this measure after intervention (*p* < 0.001). Reading comprehension increased in all groups over time (*p* < 0.001). In addition all children improved their performance from pre to post (*p* < 0.05) in phoneme deletion and segmentation and all subtests of the RAN (apart from IMP in the subtest RAN—objects). In order to control for a confounding influence of IQ, handedness and text comprehension on the ERP results the groups were matched according to these variables resulting in sample sizes of 20, 10,

### **Table 1 | Descriptive statistics of CON, IMP, and NIMP.**


*CON, control group; IMP, improvers; NIMP, non-improvers; n, sample size; T, T-values, T-values have a mean of 50 (SD* ± *10); RS, raw scores; SS, standard scores, SS have a mean of 10 (SD* <sup>±</sup> *3); aCFT 1; bCBCL/1–4; cSLRT-II; dELFE 1-6; eWRT 2*+*/WRT 3*+*; <sup>f</sup> number of correct items, max. 27; gnumber of correct items, max. 10; hitems per minute; <sup>i</sup> HAWIK-IV.*

and 16 children for CON, IMP, and NIMP, respectively. The Analyses of Variance (ANOVAs) presented below were also run with matched groups and significant results reported below were also observed within these calculations.

Parents and children were informed about the aim, purpose, and procedure of the study and gave their written consent prior to inclusion in the study. Before and after intervention children received a present as acknowledgement for their participation in the testing session. Experimental procedures were approved by the Ethical Committee of the Faculty of Medicine at the University of Munich, Germany.

### **INTERVENTION**

Children with DD received intervention twice a week for 6 month in an individual setting in our clinic. Intervention started in the beginning of the third grade. All children completed 40 units each lasting 45 min. Both intervention programs (IP1 and IP2) were highly structured thus assuring a consistent proceeding between therapists. Furthermore, to ensure fidelity of treatment, therapists, basically students of linguistics and speech therapy, were extensively trained before and regularly supervised during intervention by psychologists and speech and language therapists. In addition video recordings as well as the observation of single treatment sessions were used to assure treatment fidelity.

As mentioned in the section Participants children with DD were pseudorandomly assigned to the treatment groups. IP1 is based on orthographic knowledge and systematic, rule-based strategies (Schulte-Körne and Mathwig, 2007; Ise and Schulte-Körne, 2010; Schulte-Körne et al., 2012). It focuses on the transfer of correct phoneme discrimination and the according orthographic knowledge (e.g., in German orthography long vowels are often marked by a following silent /h/ or another vowel, whereas short vowels are often marked by two following consonants; therefore perceiving the correct vowel length is important for deducing the right orthographic rule). IP2 belongs to the group of phonics trainings (Dummer-Smoch and Hackethal, 2007). Words are read aloud in syllables and phonemes are used instead of letter pronunciation. It focuses on the acquisition of GPC. For this reason only words with a 1-1 GPC are used (for further information see Groth et al., 2013). Six IMP and 8 NIMP did receive IP1 and 5 IMP and 9 NIMP participated in IP2.

### **ERP PARADIGM AND PROCEDURE**

All children underwent ERP recording before and after intervention (6 month later). During ERP acquisition children performed a PLD—task (Hasko et al., 2013). In this task participants had to decide whether a visually presented stimulus sounded like a real word or not ("Does . . . sound like a real word?" see **Figure 1**). Children were presented either with W (orthographically and phonologically familiar forms of German nouns), PH (phonologically correct but orthographically unfamiliar forms of the same words) or PW (phonologically and orthographically unfamiliar forms). W and PH required a "yes" response and PW should be responded with "no." For each item type (W; PH; PW) 60 stimuli were presented and every item was presented once only. To (PW; e.g., Munk /m-

avoid a response bias toward "yes" responses we included a fourth condition, consisting of 60 FF and requiring a "no" response. FF were created by assigning a FF to each upper and lower case letter. To avoid effects due to item length and complexity all stimuli were matched for number of characters (3–7 characters). In addition W, PH, and PW were controlled for bigram frequency (see Hasko et al., 2013, for a complete list of all stimuli used in the PLD task and for further description of item selection).

ηk/) and false fonts (FF; e.g., π λ) were presented

All stimuli were presented in white font on black background in the center of a 17 screen using E-Prime® 2.0 software (Psychology Software Tools, Inc.). The computer screen was placed 70 cm in front of the children resulting in a vertical visual angle of 1.23◦ and in an average horizontal angle of 3.44◦. The 240 stimuli were presented pseudorandomized in four blocks. After each block there was a short break. To ensure that the subjects fully understood the task, the experiment was preceded by a short practice-block (24 trials). Trials utilized in the practiceblock did not occur in the experiment. The task was self-paced in order to make sure that even the poorest reader had enough time to read the letter string stimuli. However, all children were presented with the stimuli for a minimum of 700 ms to guarantee that all participants saw the same in the first milliseconds, which is important for ERP analysis. Participants had to decide by button press whether the presented stimulus sounded like a real word or not. Half of the children used their right hand for giving a "yes" response and the left hand for giving a "no" response, the other half used the left hand for "yes" and the right hand for "no" responses. Depending on correct or incorrect response children were provided with a feedback in form of a happy or sad face (1500 ms). The next trial appeared automatically after a blank screen of 500 ms (see **Figure 1**).

## **ERP RECORDING AND ANALYSIS**

EEG was recorded during the stimulus presentation with an Electrical Geodesic Inc. 128-channel-system (see **Figure 2**, for a schematic illustration of the electrode net). The impedance was kept below 50 k-. EEG-data was recorded continuously with Cz as the reference electrode and sampled at 500 Hz. Further analysis steps were performed with Brainvision Analyzer (Brain Products GmbH).

stimulus sounded like a real word or not. Figure taken from Hasko et al. (2013).

After filtering (low cutoff: 0.5 Hz, time constant 0.3, 12 dB/octave; high cutoff: 40 Hz, 24 dB/octave; Notch filter: 50 Hz; filtered continuous on raw data to avoid discontinuities and transient phenomena), removing EOG-artifacts with Independent Component Analysis (Zhou et al., 2005; Hoffmann and Falkenstein, 2008) and exclusion of other artifacts (gradient criteria: more than 50μV difference between two successive data points or more than 150μV in a 200 ms window; absolute amplitude criterion: more than ±150μV; low activity: less than 0.5μV in a 100 ms time window), the EEG was re-referenced to the average reference.

The data was then segmented into 1100 ms epochs including 100 ms pre-stimulus baseline and the ERP data was baseline corrected. For inclusion in the statistical analysis a minimum of 20 artifact free trials was necessary. Only correct trials were analyzed. Grand averages of all conditions were computed by averaging separately for each subject group (CON; IMP; NIMP) and each point in time (pre; post).

Based on our hypothesis we were interested in changes of the N400, which reflects GPC or the searching process in the orthographic lexicon. Based on the electrophysiological activity for W in CON before intervention the time window for the N400 was set 330–460 ms using running *t*-tests against zero (*p* < 0.05) at each electrode and the following centro-parietal electrodes were selected for the region of interest (ROI): 31, 37, 42, 53, 54, 55, 61, 62, 78, 79, 80, 86, 87, 93, 129 (see **Figure 2**, e.g., Deacon et al., 2004; Hasko et al., 2013; for review see Lau et al., 2008; Kutas and Federmeier, 2011).

The analyses run to answer our second research question (whether we could identify any pre-existing electrophysiological differences between IMP and NIMP) was exploratory. During the visual inspection of electrodes and unpaired *t*-tests comparing the electrophysiological activity of IMP and NIMP we observed a hyperactivation over left and right hemispheric (LH and RH) fronto-temporal electrodes starting around 300 ms (see **Figure 4**). According to the timing and the topography we identified a N300 in the time window of 300–400 ms. Based on the electrophysiological activity for W in CON before intervention using running *t*-tests against zero (*p* < 0.05) at each electrode we selected LH and RH ROIs. Electrodes included in the LH were 26, 27, 33, 34, 38, 39, 40, 44 and electrodes included in the RH were 2, 109, 114, 115, 116, 121, 122, 123 (see **Figure 2**).

Mean peak amplitude measures capturing data 20 ms before and 20 ms after the individual peak and peak latencies were exported for each electrode of the N400 and N300 ROIs using the defined time windows. The values of individual mean peak amplitudes and peak latencies were averaged after peak export for every ROI.

### **STATISTICAL ANALYSIS**

To test for significant changes over time regarding the N400 mean peak amplitudes and peak latencies we computed ANOVAs. The ANOVAs included the within-subject factors *condition* (W; PH; PW) and *time* (pre; post) and the between-subject factor *group* (CON; IMP; NIMP). For clean ERP data at least 10–20 participants are recommended (Luck, 2005), therefore a further specification of the groups by IP1 and IP2 was not reasonable. In order to test the main hypotheses, namely changes of the N400 during treatment dependent and independent *t*-tests were calculated. Firstly, we hypothesized that CON show higher mean peak amplitudes compared to IMP and NIMP before intervention. Therefore, independent *t*-tests were tested one-sided. Furthermore, we hypothesized that N400 mean peak amplitudes should increase over time in IMP and should remain stable in CON and NIMP, which was also evaluated using one-sided alpha-level.

The expected effect that N400 mean peak amplitudes should increase over time for IMP was moderate to large but only marginally significant. The small sample size (*n* = 11) might be the main reason why the effect did not reach significance on the 5% level. Therefore, we decided to simulate the data for a larger group of IMP. The simulation was done in two steps. Firstly, we estimated the required sample size with g\*power using the observed effect size of *d* = 0.54, alpha of 0.05 and beta of 0.95. This estimation resulted in a sample size of 39 IMP. Secondly, the data of 39 IMP was generated with R using normal distribution sampling with the mean and SD of the original IMP group. For each simulated child, 1000 observations were randomly generated and the mean of these observations was calculated.

Similar ANOVAs for repeated measures were computed to analyze the N300 mean peak amplitudes and peak latencies including the additional within-subject factor *hemisphere* (LH; RH). The resulting fourfold interaction between group∗time∗condition∗hemisphere for the N300 mean peak amplitudes was analyzed by stratifying the data on time as we were interested in exploring pre-existing differences between IMP and NIMP. Therefore, two further ANOVAs for repeated measures were calculated separately for pre and post measures. Resulting threefold interactions were analyzed by combining two of the three factors in further ANOVAs for repeated measures. To interpret twofold interactions we ran *post-hoc t*-tests for independent and dependent samples.

The behavioral data (reaction times and accuracy on the PLD—task) was analyzed using ANOVAs for repeated measures including the within-subject factors *condition* (W; PH; PW; FF) and *time* (pre; post) and the between-subject factor *group* (CON; IMP; NIMP). Trials were excluded from analysis if the response times were lower than 200 ms and deviating more than 2.5 SD from the individual group mean within a condition type. This procedure resulted in a loss of 2.65 and 2.96% of the trials for pre and post, respectively. Furthermore, for the reaction time analysis only correct trials were included. Resulting threefold interactions were analyzed by combining two of the three factors in further ANOVAs for repeated measures. To interpret twofold interactions we ran *post-hoc t*-tests for independent and dependent samples.

If sample sizes are equal, ANOVAs are unsusceptible against violations of homogeneity of variance. Given that the sample of CON was bigger than the sample of IMP and NIMP the *F*max test was applied in case of violations of the homogeneity of variances (Bühner and Ziegler, 2009). According to the *F*max test an adjustment of the alpha-level is necessary if the critical value of *F*max > 10 is exceeded (Bühner and Ziegler, 2009). In none of the variables the critical value was exceeded. If necessary the Greenhouse-Geisser correction was applied to correct for violations of the sphericity assumption. The alpha level for all analyses was 0.05. In order to avoid alpha-error-inflation due to multiple comparisons the alpha level of 0.05 for follow-up tests was corrected using the Bonferroni-Holm correction (Bühner and Ziegler, 2009). Bonferroni-Holm correction was applied separately for each set of dependent and independent *t*-tests and for each follow-up ANOVA.

In addition to the *p*-values, effect sizes η<sup>2</sup> *<sup>p</sup>* for ANOVAs with repeated measures and Cohen's *d* for independent and dependent *t*-tests are reported for significant results (Cohen, 1988; Bühner and Ziegler, 2009). Regarding the ERP data for follow-up tests detailed statistical values will be presented only for significant results, whereas non-significant results are indicated by *p* > 0.05. For the behavioral data significant and non-significant results of the follow-up analyses will be indicated by *p* < 0.05 and *p* > 0.05 without reporting detailed statistical values.

Additionally, in order to better understand the significance of the N300 for improvement during treatment we computed correlations across the whole group of children with DD and for IMP and NIMP separately. Correlations were calculated between N300 mean peak amplitudes before intervention and the gain in common word reading fluency and the N400 after intervention. For common word reading fluency we used the post minus pre differences' of raw scores (see **Table 1**). Raw scores were used in order to enhance variance. As we did not observe differences between W, PH, and PW in the N400 we decided to use mean values calculated across the three letter string types for the correlation analysis. Because of the small sample size in the IMP group Cook's *d* was calculated for significant correlations in order to check for undue influence of single cases. All cases had a Cook's *d* < 1 indicating that none of the participants had an excessive influence on the correlational results. The correlational analysis was exploratory, therefore Bonferroni-Holm correction was not applied. Significant results on the 5% and tendencies toward significance (10% alpha level) will be reported.

## **RESULTS**

### **N400**

### *Mean peak amplitudes*

The analysis of the N400 mean peak amplitudes revealed only a main effect group. No main effect time, condition and no interactions could be observed (see **Table 2**, first column). As no effect of condition could be observed independent and dependent *t*-tests to test our N400 hypotheses were computed across conditions (see **Table 3**, for N400 mean peak amplitudes).

In line with our hypothesis independent *t*-tests revealed higher N400 amplitudes for CON compared to IMP and for CON in contrast to NIMP before intervention (see **Figure 3A**). No difference was found between IMP and NIMP before intervention (see **Figure 3A**).

Consistent with our expectation a clear trend towards increased N400 mean peak amplitudes in IMP after 6 month of intervention could be observed (see **Figure 3B**). In agreement with our assumptions N400 mean peak amplitudes remained stable over time in CON and NIMP (see **Figure 3B**). Mean peak amplitudes were comparable between CON and IMP after intervention but still diminished for NIMP in contrast to CON (see **Figure 3C**). Even though **Table 3** and **Figure 3C** suggest higher N400 amplitudes in IMP in comparison to NIMP after intervention this effect does not reach significance (see **Figure 3C**).

*Simulation of the intervention effect in IMP.* Although the increase of the N400 amplitude from pre to post in IMP was moderate to large (*d* = 0.54), this effect was only marginally significant (*p* = 0.052, see **Figure 3B**). The small sample size (*n* = 11) is probably the main reason why the effect did not reach significance on the 5% alpha level. Therefore, data was simulated for a larger sample size (*n* = 39). Dependent *t*-tests of the simulated data revealed a significant increase in N400 mean peak amplitudes from pre (−0.30μV ±1.36 SD) to post (−1.81μV ±0.77 SD), *t*(38) = 6.99, *p* < 0.001, *d* = 1.12.

### *Peak latencies*

The analysis of the N400 peak latencies revealed a main effect group (see **Table 2**, second column). No further effects were observed. Independent *post-hoc t*-tests showed shorter peak latencies for NIMP compared to CON, *t*(40) = 2.97, *p* = 0.005, *d* = 0.96, before and after intervention and no differences in peak latencies were observed between CON and IMP as well as between IMP and NIMP before and after intervention (*p* > 0.05; see **Table 3**).

### **N300**

### *Mean peak amplitudes*

The analysis of the N300 mean peak amplitudes revealed a main effect group, time, and condition, as well as an interaction condition∗hemisphere. Furthermore, the four-way interaction group∗time∗condition∗hemisphere reached significance (see **Table 4**, first column).

**Table 2 | Results of the ANOVAs for repeated measures with** *F***-values (df),** *p***-values, and effect sizes η<sup>2</sup>** *<sup>p</sup>* **for the N400 mean peak amplitudes and latencies including the between-subject factor** *group* **(CON; IMP; NIMP) and the within-subject-factor** *time* **(pre; post) and** *condition* **(W; PH; PW).**


*CON, control children; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention; W, words; PH, pseudohomophones; PW, pseudowords. Significant results are indicated in bold.*



*W, words; PH, pseudohomophones; PW, pseudowords; CON, control children; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention. \*IMP had significantly shorter peak latencies for all conditions before and after intervention (M* <sup>=</sup> *397.38, SD* <sup>=</sup> *16.40) compared to CON (M* <sup>=</sup> *384.02, SD* = *10.39).*

In order to explore this four-way interaction two separate ANOVAs were conducted for each point in time. The analysis of the N300 mean peak amplitudes before intervention revealed a significant interaction group∗condition∗hemisphere, *<sup>F</sup>*(4, 100) <sup>=</sup> <sup>3</sup>.84, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.006, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.13. No main effects and no further interactions could be observed (*p* > 0.05). In order to interpret this three-way interaction separate follow-up ANOVAs were run by combining two of the three factors.

*Follow-up ANOVAs for each hemisphere.* For the LH we found a main effect condition, *F*(1, 50) = 3.84, *p* = 0.015, = 0.08, and an interaction group∗condition, *F*(2, 50) = 3.05, *p* = 0.020, η2 *<sup>p</sup>* = 0.11. No main effect group could be observed (*p* > 0.05). Independent *post-hoc t*-tests revealed that IMP had higher amplitudes for PW in contrast to CON and NIMP in the LH (see **Figure 4A**). In CON and NIMP amplitudes for PW were comparable high (see **Figure 4A**). No group differences were found for W and PH (see **Figure 4A**). Mean amplitudes for W, PH, and PW did not differ within CON, IMP, and NIMP (*p* > 0.05).

For the RH the main effect group, *F*(2, 50) = 4.59, *p* = 0.015, η2 *<sup>p</sup>* = 0.16, was significant. No main effect condition and interaction group∗condition could be observed (*p* > 0.05). Independent *post-hoc t*-tests calculated across conditions revealed higher mean peak amplitudes for IMP in contrast to CON and NIMP (see **Figure 4B**). No difference was found between CON and NIMP (*p* > 0.05, see **Figure 4B**).

*Follow-up ANOVAs for each condition.* As could be expected from the ANOVAs run separately for each hemisphere (see above) the analysis revealed a main effect group for PW, *F*(2, 50) = <sup>5</sup>.99, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.005, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.19. No hemisphere effect as well as no interaction group∗hemisphere could be observed (*p* > 0.05). Independent *post-hoc t*-tests revealed higher N300 mean peak amplitudes for IMP in contrast to CON, *t*(34) = 2.97, *p* = 0.005, *d* = 1.11 and NIMP, *t*(26) = −3.29, *p* = 0.003, *d* = 1.32, bilaterally and no difference was found between CON and NIMP (*p* > 0.05, see **Figures 4A,B**). For W and PH no main effects and no interactions were found (*p* > 0.05).

*Follow-up ANOVAs for each group.* A twofold interaction condition∗hemisphere did occur within the IMP group, *F*(2, 20) = <sup>5</sup>.10, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.016, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.34, and no main effect condition or hemisphere was observed for the IMP group (*p* > 0.05). This interaction suggests that mean peak amplitudes are higher for PW in contrast to W and PH specifically in the LH (see **Figure 4A**). However, dependent *post-hoc t*-tests did not reveal amplitude differences between conditions in the LH and RH (*p* > 0.05). Furthermore, mean peak amplitudes were comparable high between the LH and RH for W, PH, and PW (*p* > 0.05). For CON and NIMP no main effects and no interactions were found (*p* > 0.05).

To summarize IMP in contrast to CON and NIMP are marked by higher N300 mean peak amplitudes for all conditions in the RH and additionally for PW in the LH.

After intervention no significant main effect group, time, condition and no significant interactions between these factors could be observed for the N300 mean peak amplitudes (*p* > 0.05, see **Table 5** and **Figure 5**).

### *Peak latencies*

The analysis of the N300 peak latencies revealed a twofold interaction condition∗hemisphere and a threefold interaction group∗condition∗hemisphere (see **Table 4**, second column). Because the twofold interaction was modulated by the factor group follow-up ANOVAs were conducted for each group over both points in time by combining the factors condition and hemisphere.

The follow-up ANOVAs revealed a significant interaction condition∗hemisphere for the NIMP group, *F*(2, 32) = 7.59, *p* = 0.002, η<sup>2</sup> *<sup>p</sup>* = 0.32, the main effect condition and the main effect hemisphere were not significant (*p* > 0.05). In the LH NIMP had shorter peak latencies for PW in contrast to W, *t*(16) = −3.35, *p* = 0.004, *d* = 0.81, and PH, *t*(16) = −3.19, *p* = 0.006, *d* = 0.77, peak latencies between W and PH were comparable (*p* > 0.05, see **Table 5**). No difference between conditions was found in the RH and peak latencies did not differ for none of the conditions between LH and RH (*p* > 0.05). No significant main effect condition, hemisphere and no significant interaction

condition\*hemisphere could be observed for CON and IMP (*p* > 0.05).

### **BEHAVIORAL RESULTS**

### *Accuracy*

Performance on the PLD—task revealed a main effect group, time and condition, as well as the twofold interactions group∗condition and time∗condition (*p* < 0.05, see **Table 6**, first column).

In order to better understand the two-way interaction between the factors time and condition dependent *post-hoc t*-tests were calculated. Accuracy rates increased over time for W and PH (*p* < 0.05) and slightly decreased for FF (*p* < 0.05). No difference between pre and post was found for PW (*p* > 0.05; see **Figure 6A**). Furthermore, dependent *post-hoc t*-tests revealed that all children gave more correct answers to FF compared to the linguistic material (W, PH, and PW) before and after intervention (*p* < 0.05). In addition, accuracy rates were pre and post higher for W compared to PH and PW (*p* < 0.05). And all children had higher accuracy rates for PH compared to PW before intervention and after intervention (*p* < 0.05, see **Figure 6A**).

Dependent *post-hoc t*-tests in order to explain the twofold interaction between group and condition revealed the accuracy pattern FF > W > PH > PW (*p* < 0.05) as described above for IMP and NIMP. In CON, however, no difference between correct


**Table 4 | Results of the ANOVAs for repeated measures with** *F***-values,** *p***-values, and effect sizes η<sup>2</sup>** *<sup>p</sup>* **for the N300 mean peak amplitudes and latencies including the between-subject factor** *group* **(CON; IMP; NIMP) and the within-subject-factor** *time* **(pre; post),** *condition* **(W; PH; PW), and** *hemisphere* **(LH; RH).**

*CON, control children; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention; W, words; PH, pseudohomophones; PW, pseudowords; LH, left hemisphere; RH, right hemisphere. Significant results are indicated in bold.*

answers for PH and PW (*p* > 0.05) could be detected resulting in an accuracy pattern with FF > W > PH = PW (see **Figure 6A**). Independent *post-hoc t*-tests revealed that over both, pre and post, CON's performance was better to all linguistic stimuli compared to IMP and NIMP (*p* < 0.05). No difference in none of the conditions was found between IMP and NIMP and no group differences were found for FF (*p* > 0.05 see **Figure 6A**).

### *Reaction times*

Performance on the PLD—task revealed a significant main effect group, time and condition, as well as the significant interactions group∗time, group∗condition, time∗condition and group∗time∗condition (see **Table 6**, second column). In order to better understand the threefold interaction separate follow-up ANOVAs were run by combining two of the three factors.

*Follow-up ANOVAs for each point in time.* The analysis before and after intervention revealed a significant main effect group and condition as well as the interaction group∗condition (*p* < 0.05).

*Follow-up ANOVAs for each condition.* For W, PH, and PW the ANOVAs revealed a significant main effect group and time as well as the interaction group∗time (*p* < 0.05). No significant effects were found for FF (*p* > 0.05).

*Follow-up ANOVAs for each group.* For CON the analysis revealed a significant main effect condition as well as the interaction condition∗time (*p* < 0.05) but no main effect time (*p* > 0.05). For IMP and NIMP a significant main effect time and condition and the interaction condition∗time occurred (*p* < 0.05).

In the following the results of the independent and dependent *post-hoc t*-tests calculated in order to examine the twofold interactions will be summarized.

Independent *post-hoc t*-tests indicated that CON had shorter reactions times to W, PH, and PW compared to IMP and NIMP before intervention and after intervention (*p* < 0.05). No differences for W, PH, and PW were found for the comparison between IMP and NIMP before and after intervention (*p* > 0.05). For FF no group differences were found before and after intervention (*p* > 0.05, see **Figure 6B**).

Dependent *post-hoc t*-tests within each group revealed the same pattern of reaction times for all groups before and after intervention. CON, IMP, and NIMP had longer reaction times for all linguistic stimuli compared to FF before intervention and after intervention (*p* < 0.05). Furthermore, all groups showed shorter reaction times for W compared to PH and for W compared to PW before and after intervention (*p* < 0.05). And all groups responded slower to PW compared to PH before and after intervention (*p* < 0.05, see **Figure 6B**).

Reaction times did not change over time in CON for W, PH, PW, and FF (*p* > 0.05). However, IMP and NIMP had faster reaction times after intervention for W, PH, and PW (*p* < 0.05). No changes from pre to post were observed for FF in IMP and NIMP (*p* > 0.05, see **Figure 6B**).

### **CORRELATIONAL RESULTS**

When interpreting the correlation results, please note that N300 and N400 mean peak amplitudes have negative values. Larger increase in common word reading fluency was significantly correlated to higher N300 mean peak amplitudes before intervention for W and PH in the RH and PW in the LH and by trend for PW in the RH. Furthermore, a larger increase in pseudoword reading

fluency was correlated significantly to higher N300 mean peak amplitudes for W in the RH and by trend for PW in the LH. The linear relationship between N300 before intervention in the RH and gain in common word reading fluency remained stable only in the group of IMP (please see **Table 7**). Even though only the correlation between N300 mean peak amplitudes before intervention for PH in the RH and increase in common word reading fluency reached significance in the IMP group, the resulting correlations were large, ranging from *r* = −0.54 to *r* = −0.59 (see **Table 7**). Furthermore, higher N400 mean peak amplitudes after intervention were related to higher N300 mean peak amplitudes before intervention for PW in the LH and by trend for W and PH in the LH in children with DD. In the IMP group higher N400 mean peak amplitudes after intervention were related to higher N300 mean peak amplitudes before intervention for PH and PW in the LH (see **Table 7**).

## **DISCUSSION**

The aim of the present study was twofold. On the one hand we wanted to clarify whether growth in common word reading fluency during treatment is related to changes in the N400. Furthermore, we were interested whether we could identify preexisting differences on the neurophysiological level between IMP and NIMP. In order to achieve our aims we investigated a PLD task before and after children with DD were trained in literacy skills over 6 months. We investigated the ERPs of IMP, who did improve in common word reading fluency for at least half a SD, NIMP who did not show any increase in common word reading fluency and normally developing children.

### **READING IMPROVEMENT IS REFLECTED IN AN INCREASE OF N400**

As both trainings worked on either orthographic knowledge or GPC we hypothesized to find changes in the N400 (see Introduction), which reflects GPC or the searching process in the orthographic lexicon (Hasko et al., 2013). In line with our previous study (Hasko et al., 2013) we were able to show that both groups of children with DD (IMP and NIMP) had reduced N400 mean peak amplitudes compared to CON before intervention. The reduced N400 amplitudes in IMP and NIMP point to less specified orthographic representations or impairments


**Table 5 | N300 mean peak amplitudes in** μ**V (SD) and latencies in ms (SD).**

*W, words; PH, pseudohomophones; PW, pseudowords; LH, left hemisphere; RH, right hemisphere; CON, control children; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention. \*In NIMP PW in the LH over pre and post (M* <sup>=</sup> *333.64, SD* <sup>=</sup> *11.31) are significantly smaller compared to W (M* = *339.81, SD* = *11.50) and PH (M* = *339.91, SD* = *11.28).*

in accessing the orthographic lexicon or in applying GPC rules (Hasko et al., 2013). As hypothesized a clear trend towards increased N400 amplitudes over time in IMP only was observed. This might indicate an alteration of the process reflected by this component. Thus, in line with previous electrophysiological (Kujala et al., 2001; Santos et al., 2007; Jucla et al., 2009; Penolazzi et al., 2010; Spironelli et al., 2010; Huotilainen et al., 2011; Mayseless, 2011; Lovio et al., 2012) and neuroimaging studies (Simos et al., 2002; Aylward et al., 2003; Temple et al., 2003; Eden et al., 2004; Shaywitz et al., 2004; Simos et al., 2006, 2007b; Richards et al., 2007; Meyler et al., 2008; Richards and Berninger, 2008; Keller and Just, 2009) we found evidence for neurophysiological changes during treatment. This suggests that specific deficient processes in DD, in our case processes related to the N400, are malleable in children with DD. The design of the present study does not allow testing which proportion of reading improvement is related to the applied treatments and which proportion is due to other factors not related to the treatment. Probably due to the small sample size in the IMP group (*n* = 11) the increase in N400 amplitudes, which was moderate to large failed to reach significance. Simulation of the data for a larger sample of IMP revealed a significant increase in the N400 confirming our assumption that the small sample size is the main reason for why the effect does not reach significance.

Due to our classification criterion the common word reading fluency of IMP increased significantly but was still below average after intervention. Therefore, we expected to find increased N400 amplitudes for IMP and thus diminished differences between IMP and CON in N400 amplitudes. However, the differences between IMP and CON were not only diminished after intervention, but absent. N400 amplitudes of CON slightly decreased over time and thus contribute to the absence of differences between IMP and CON, even though this effect does not reach significance. Although no condition effect could be observed, **Table 3** shows that the slight decrease in N400 amplitudes is mainly the result of a reduction of the N400 component for W, whereas amplitude means remain stable for PH and PW. A decrease of N400 amplitudes for W in CON is what might be expected with maturation of the reading network. In line with this, it has been found that N400 amplitudes were smaller to orthographic familiar word forms compared to unfamiliar word forms in adults (e.g., Braun et al., 2006; Briesemeister et al., 2009). This suggests that adults in contrast to children (Hasko et al., 2013) adopt different reading strategies for orthographic familiar and unfamiliar word material. In the framework of dual route models of reading (Coltheart et al., 1993, 2001) less effort is needed in order to find a fitting orthographic representation for familiar words in the orthographic lexicon, whereas the search in the orthographic lexicon is prolonged and GPC rules have to be applied in case of unfamiliar word forms resulting in enhanced N400 amplitudes (Hasko et al., 2013). Thus, the observations in the present study might denote the beginning development of the orthographic familiarity effect for the N400 suggesting that some of the W do already possess an entry in the orthographic lexicon and are read via accessing the phonological lexicon directly from the orthographic lexicon in typically developing children. It might be interesting to further investigate when the maturation of the orthographic familiarity effect is fully developed as it indicates the point in time when children steadily use orthographic representations to access phonological representations for familiar word forms.

As expected, children who continued to struggle with common word reading fluency after intervention in our study did not show neurophysiological changes over time. This is consistent with previous research reporting that NIMP continuously display abnormal activation patterns throughout the neuronal reading network (Simos et al., 2007a; Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Molfese et al., 2013). One question which remains unanswered is why some children with DD improve during intervention, whereas other do not. This leads directly to our second research question, namely whether there might be any

pre-existing differences between IMP and NIMP, which could give insight into improvement and non-improvement.

## **PROFILING IMPROVER AND NON-IMPROVER**

Surprisingly, although the hypothesis of neurodiversity within DD has been raised several times (McCandliss and Noble, 2003; Shaywitz et al., 2004; Noble and McCandliss, 2005) neurobiological differences and their influence on improvement in literacy skills during treatment have been neglected in previous intervention studies, thus the analysis run to answer this question in the present study was exploratory. During the inspection of single electrodes and t-maps comparing the topographical distribution between IMP and NIMP we observed a hyperactivation distributed over left and right temporo-frontal electrodes starting around 300 ms after stimulus onset (see **Figure 4**). Based on the topographical distribution and latency the negative potential was identified as N300. The N300 was investigated employing different tasks and was attributed as being related to grapheme-phoneme conversion (Bentin et al., 1999; Penolazzi et al., 2006), phonological word analysis (Spironelli and Angrilli, 2007, 2009) and the integration of orthographic and phonological representations (Hasko et al., 2012).

In the present study IMP revealed before intervention higher N300 amplitudes for W, PH, and PW in the RH and additionally for PW in the LH compared to NIMP and CON. This suggests that enhanced N300 amplitudes might play an important role for improvement in common word reading fluency, which was further strengthened by our correlational results. Correlations calculated across the whole group of children with DD largely reflected the group differences found for IMP and NIMP, i.e., children who improved in common word reading fluency were those who had higher N300 amplitudes for W, PH, and PW (only marginal significant) in the RH and for PW in the LH before intervention. Especially, higher N300 amplitudes over the RH seem to play an important role for reading improvement as the same pattern of correlation between N300 amplitudes over the RH before intervention and improvement in common word reading fluency was found for IMP only. Children with the highest N300 amplitudes over the RH before intervention displayed also the strongest improvement in common word reading fluency. Even though only the correlation between N300 mean peak amplitudes before intervention for PH in the RH and increase in common word reading fluency reached significance in the IMP group, the resulting correlations were large, ranging from *r* = −0.54 to *r* = −0.59 and are therefore noteworthy.

In previous fMRI studies investigating the PLD—task (Kronbichler et al., 2007; Wimmer et al., 2010) it has been found that this task induces activation throughout the neural reading network including the inferior-frontal subsystem. As mentioned in the introduction evidence for aberrant activation patterns in this subsystem in DD was not as clear as for the left hemispheric posterior subsystem, where hypoactivation was reported repeatedly (Simos et al., 2002; Demonet et al., 2004; Kronbichler et al., 2007; Shaywitz and Shaywitz, 2008; Richlan et al., 2009). With regard to the inferior-frontal subsystem some studies report an hypoactivation (Paulesu et al., 1996; Wimmer et al., 2010; for meta-analyses see: Richlan et al., 2009, 2011); whereas others observed an hyperactivation (Salmelin et al., 1996; Shaywitz et al., 1998; Brunswick et al., 1999; for review see: Pugh et al., 2000; Sandak et al., 2004) in subjects with DD. In line with these inhomogeneous results children with DD in the present study varied with respect to their N300 amplitudes over right and left fronto-temporal electrodes depending on reading improvement or non-improvement with IMP showing significantly higher N300 amplitudes before intervention. It has been suggested that the inferior-frontal subsystem might be involved in articulation processes (Shaywitz and Shaywitz, 2008). Maybe IMP try to adopt different not efficient reading strategies via articulation processes in order to compensate for less specified orthographic representations, impairments in accessing the orthographic lexicon or in applying GPC rules as reflected by reduced N400 amplitudes. This strategy is probably not being applied in the NIMP group, for what reason is unsolved so far.

The observance of pre-existing differences on the neurophysiological level between IMP and NIMP in the present study is in line with the results of Rezaie et al. (2011a,b) who also reported differences between adolescent IMP and NIMP prior to intervention. In contrast to the present study, however, activation profiles of IMP in the studies of Rezaie et al. (2011a,b) seemed to resemble the activation profile of CON. Whereas NIMP were marked by aberrant activation patterns throughout the reading network in contrast to CON, the only difference between IMP and CON was observed in higher activity within the pars opercularis for CON in contrast IMP (Rezaie et al., 2011a,b). This suggests that poor reading skills in NIMP might be stronger influenced by neurobiological factors, whereas for low reading skills in IMP environmental factors like home literacy or socioeconomic status might play an important role. In addition, our results contrast the outcome of Simos et al.'s (2005, 2007a) studies who did not observe differences depending on improvement before intervention. One possible explanation for the absence of neurobiological differences in the study of Simos et al. (2007a) could be the wide age range, as children from 8 to 10 years were included. As this is a very sensitive age for reading development this might probably mask pre-existing differences between IMP and NIMP. Furthermore, in the 2005 study of Simos et al. the NIMP group consisted only of three children allowing to make only descriptive comparisons between IMP and NIMP and thus failing to find pre-existing differences.

Due to the cross-sectional design of the studies of Rezaie et al. (2011a,b), assessing neurobiological activity only before treatment, no statement can be made about neurobiological differences between IMP and NIMP after intervention. And studies comparing IMP and NIMP only after intervention (Odegard et al., 2008; Davis et al., 2011; Farris et al., 2011; Molfese et al., 2013) are limited as it cannot be resolved whether group differences between treatment IMP and NIMP is a cause or the result of improvement. An advantage of the present study is that we have assessed electrophysiological correlates before and after treatment. Interestingly, together with the improvement in reading ability and the increase in the N400 component the N300 amplitudes are higher in IMP compared to CON and NIMP only before intervention. This suggests that the N300 might index a compensatory mechanism or precursor, which facilitates reading improvement as well as the development of the N400 and is given up in favor of the more efficient process reflected by the N400. This is in line with a previous study by Shaywitz et al. (2004) showing that efficient activations throughout the neural reading network were enhanced and compensatory mechanisms were abandoned after a reading intervention. An important role of enhanced N300 amplitudes over the RH for improvement in common word reading fluency as suggested by the correlational results has been hypothesized above. Furthermore, the correlational results indicate that N300 amplitudes over the LH might be related to the increase in the N400. IMP with higher N300 amplitudes over the LH for PH and PW before intervention were those who had higher N400 amplitudes after intervention. Thus, the engagement of the LH seems to be of particular importance for the increase in the N400. At first sight this stands in contrast to our finding that especially the N300 amplitudes over the RH before intervention might be related to reading improvement. In a previous study it has been found that IMP in contrast to NIMP were marked by significantly higher functional connectivity between left and right inferior frontal regions (Farris et al., 2011). The authors suggested that IMP might use the connectivity from LH to RH in order to engage the RH when tasks are difficult. Therefore, with respect to the present study we might hypothesize that enhanced N300 amplitudes over the RH are the result of higher connectivity from LH to RH allowing the engagement of the RH. Thus, it might be concluded that children with highest amplitudes over the LH and highest connectivity between LH and RH show the strongest improvement as indexed by enhanced N400 amplitudes and growth in common word reading fluency. Another explanation might be that the higher LH N300 amplitudes just reflect some additional compensatory mechanism, which is present in IMP only. Because the whole correlational analyses were exploratory no terminal conclusions can be drawn about the relation between the N300 and the increase



*CON, control children; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention; W, words; PH, pseudohomophones; PW, pseudowords; FF, false fonts. Significant results are indicated in bold.*

in common word reading fluency and N400 amplitudes. Future research should further investigate whether the N300 truly has a predictive quality for reading improvement.

When interpreting the above mentioned data it is important to control for group differences on a behavioral level, as these too might influence improvement in literacy skills. Previous studies have reported, that especially, word-reading skills before intervention, phoneme awareness, rapid naming, IQ, and attention have an influence on improvement in literacy skills (Wise et al., 2000; Torgesen et al., 2001). However, in the present study IMP and NIMP had a very similar cognitive profile (see **Table 1**) suggesting that these factors might play a subordinate role for reading improvement in the present study. Only with respect to reading comprehension IMP differed from NIMP with the latter showing significantly lower reading comprehension skills before and after intervention. Lower performance in reading comprehension might point to deficits in oral language skills. It has been argued that reading comprehension deficits probably arise from poor vocabulary knowledge, weak grammatical skills, and difficulties in oral language comprehension (Snowling and Hulme, 2012a).


**Table 7 | Pearson correlations across the whole group with DD and within the group of IMP and NIMP between the N300 before intervention and the gain in word and pseudoword reading fluency and the N400 after intervention.**

*DD, developmental dyslexia; IMP, improvers; NIMP, non-improvers; pre, before intervention; post, after intervention; post – pre, difference between pre and post measures; W reading, common word reading fluency from the SLRT II; PW reading, pseudoword reading fluency from the SLRT-II; W, words; PH, pseudohomophones; PW, pseudowords; LH, left hemisphere; RH, right hemisphere; \*\*p* < *0.001; \*p* < *0.05; (\*)p* < *0.10.*

Furthermore, it has been found that general verbal ability predicts growth in reading ability (Torgesen et al., 2001). Thus, our results suggest that NIMP in addition to deficits in common word reading fluency are marked by stronger impairments in oral language skills in contrast to IMP, impeding reading improvement, and suggesting that NIMP might probably profit from training of oral language skills. Unfortunately, oral language skills were not assessed in this study, therefore this assumption cannot fully be answered.

Previous studies reported that up to 30% of struggling readers do not benefit from intervention (Shanahan and Barr, 1995; Vaughn et al., 2003). With a proportion of 50% our study shows that this number might be even larger. As has been reported above several factors, including word-reading skills before intervention, phoneme awareness, rapid naming, IQ, attention and general verbal ability might influence improvement in literacy skills. Thus, depending on the cognitive profile of children included in the respective studies improvement rates might vary between studies. Furthermore, and most important differences in improvement rates also depend on the operationalization of improvement in literacy skills. Improvement rates will be differing depending on which ability (e.g., phonological awareness, reading fluency, reading comprehension, spelling, etc.) and which cut-off criteria (0.5 SD, 1 SD, median, observation of therapists) is used. So far there are no guidelines or suggested criteria how to define improvement. With respect to the present study we oriented our cut-off criteria on results from current meta-analyses reporting effect sizes of *g* = 0.31 and *g* = 0.33 for reading interventions (Ise et al., 2012; Galuschka et al., 2014).

## **LIMITATIONS**

One limitation of the present study was the quite small sample size of our IMP group, albeit greater (often two times larger) in contrast to many previous studies. Probably due to the small sample size some of the observed effects were only marginally significant. This limits the degree to which the results can be generalized and interpretations have to be drawn cautiously. Therefore, the study needs replications with larger sample sizes. Furthermore, due to small sample sizes, splitting our groups according to type of intervention (IP1 vs. IP2) was not reasonable. Therefore, the present study does not allow discriminating intervention effects depending on the type of treatment. Future studies investigating treatment IMP and NIMP need to take into account that groups will be divided in two and that depending on the definition of improvement in literacy skills some children might be excluded from the study, meaning very large sample sizes are needed.

## **CONCLUSION**

In the present study we attempted to investigate the ERPs related to reading improvement. To summarize, children who significantly improve in reading during intervention are marked by an increased N400 component, which reflects GPC or the searching process within the orthographic lexicon. Children who continue to struggle in reading do not exhibit any neurophysiological changes over time. Furthermore, IMP and NIMP can be discriminated according to their neurophysiological profile already before intervention. Only IMP display higher N300 mean peak amplitudes over right fronto-temporal electrodes when processing W, PH, and PW and additionally over left fronto-temporal electrodes for PW. The importance of N300 amplitudes for reading improvement is strengthened by the correlational results in the IMP group. The higher the N300 amplitudes over the RH before intervention the larger the improvement in common word reading fluency. Furthermore, IMP with higher N300 amplitudes over the LH before intervention have higher N400 amplitudes after intervention. After intervention the N300 of IMP is equally high to the N300 of CON and NIMP suggesting that the N300 might index a compensatory mechanism or precursor, which facilitates the development of the N400 as well as reading improvement.

Future research should concentrate on the examination of the special needs of NIMP. What are the factors that make them more resistant to environmental change? Do they exhibit a different type of DD and therefore have to be treated in a different way? But how can this be identified? Which role play genetic differences for reading improvement? With respect to the present study NIMP seem to be a special group, who might benefit from another type of training. Lower reading comprehension skills in NIMP in the present study point to more pronounced impairments in oral language skills in contrast to IMP. Therefore, the NIMP in the present study might possibly profit from an additional training in oral language skills (Snowling and Hulme, 2011, 2012b). Answering these questions would help enormously to improve and adjust intervention for children with DD.

Important for all future studies, is to keep in mind that children with DD, even though matched with respect to their cognitive profile might differ regarding their neuronal profile. In fact, it is extremely difficult to categorize children on the behavioral level when the underlying cause of their DD might be very different with contributions from neurophysiology, neurobiology, genetics and environment. Future intervention studies should carefully distinguish between IMP and NIMP as the mixture of these children might even distort the results.

One of the main future goals is to further examine the N300 effects and to verify whether they can be replicated and hold true for a large sample size. Furthermore, future research should investigate whether the N300 might be a predictor for reading improvement in response to treatment. If the N300 truly has a predictive quality for response to intervention then it would be possible to streamline therapies for certain children.

### **ACKNOWLEDGMENTS**

This research was supported by grant of the Bundesministerium für Bildung und Forschung (Grant Number 01GJ1001). Special thanks to all of the children and their parents who were so kind and willing to participate in this study and who continue to take part in many important studies.

### **REFERENCES**


words: evidence for a length by lexicality interaction in the visual word form area (VWFA). *Neuroimage* 49, 2649–2661. doi: 10.1016/j.neuroimage.2009.10.082


dren, young adults and middle-aged subjects. *Biol. Psychol.* 80, 35–45. doi: 10.1016/j.biopsycho.2008.01.012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2014; accepted: 01 June 2014; published online: 26 June 2014. Citation: Hasko S, Groth K, Bruder J, Bartling J and Schulte-Körne G (2014) What does the brain of children with developmental dyslexia tell us about reading improvement? ERP evidence from an intervention study. Front. Hum. Neurosci. 8:441. doi: 10.3389/fnhum.2014.00441*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Hasko, Groth, Bruder, Bartling and Schulte-Körne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Preliteracy signatures of poor-reading abilities in resting-state EEG

#### *Giuseppina Schiavone1,2, Klaus Linkenkaer-Hansen1 \*, Natasha M. Maurits 3, Anna Plakas 3,4, Ben A. M. Maassen5, Huibert D. Mansvelder 1, Aryan van der Leij <sup>4</sup> and Titia L. van Zuijen4*

*<sup>1</sup> Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands*

*<sup>3</sup> Department of Neurology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands*

*<sup>4</sup> Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, Netherlands*

*<sup>5</sup> Center for Language and Cognition Groningen and University Medical Center Groningen, University of Groningen, Groningen, Netherlands*

### *Edited by:*

*Donatella Spinelli, Università di Roma Foro Italico, Italy*

### *Reviewed by:*

*Satu Palva, University of Helsinki, Finland Chiara Spironelli, Department of General Psychology, Italy*

### *\*Correspondence:*

*Klaus Linkenkaer-Hansen, Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Neuroscience Campus Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, Netherlands e-mail: klaus.linkenkaer@cncr.vu.nl*

The hereditary character of dyslexia suggests the presence of putative underlying neural anomalies already in preliterate age. Here, we investigated whether early neurophysiological correlates of future reading difficulties—a hallmark of dyslexia—could be identified in the resting-state EEG of preliterate children. The children in this study were recruited at birth and classified on the basis of parents' performance on reading tests to be at-risk of becoming poor readers (*n* = 48) or not (*n* = 14). Eyes-open rest EEG was measured at the age of 3 years, and the at-risk children were divided into fluent readers (*n* = 24) and non-fluent readers (*n* = 24) after reading assessment at their third grade of school. We found that fluent readers and non-fluent readers differed in normalized spectral amplitude. Non-fluent readers were characterized by lower amplitude in the delta-1 frequency band (0.5–2 Hz) and higher amplitude in the alpha-1 band (6–8 Hz) in multiple scalp regions compared to control and at-risk fluent readers. Interestingly, across groups these EEG biomarkers correlated with several behavioral test scores measured in the third grade. Specifically, the performance on reading fluency, phonological and orthographic tasks and rapid automatized naming task correlated positively with delta-1 and negatively with alpha-1. Together, our results suggest that combining family-risk status, neurophysiological testing and behavioral test scores in a longitudinal setting may help uncover physiological mechanisms implicated with neurodevelopmental disorders such as the predisposition to reading disabilities.

**Keywords: precursors of reading disabilities, resting-state EEG, reading fluency, delta and alpha oscillations**

### **INTRODUCTION**

Dyslexia is a learning disorder that specifically impairs a child's technical reading ability. It affects about 5–10% of all children, with higher prevalence in families with one or more members having dyslexia (Lyon et al., 2003). Dyslexia cannot be explained by low intelligence, low-level vision, hearing impairments, or poor education, but phonological deficits and problems with rapid automatized naming are commonly observed readingrelated deficits (Snowling et al., 2003; van Bergen et al., 2012). These reading impairments affect the ability to read fluently and are often a cause of frustration and distress for a child, producing severe social and psychological consequences in their lifespan (Vellutino and Scanlon, 1987; Humphrey and Mullins, 2004).

Several genes involved in early brain development have been suggested to cause susceptibility to dyslexia (Scerri and Schulte-Korne, 2010). Thus, even though dyslexia manifests in school years, underlying neural anomalies may already be present in the preliterate brain. This is supported by a number of longitudinal studies using auditory event-related potentials (Molfese, 2000; Guttorm et al., 2001; Maurer et al., 2003, 2009; Lyytinen et al., 2004; Van Zuijen et al., 2012, 2013) and visual event-related potentials (Regtvoort et al., 2006; Schulte-Korne and Bruder, 2010; Araujo et al., 2012) reporting differences in brain responses to reading-related stimuli (e.g., speech sounds or visual contrast) between familial risk and control groups, or between control children and children that later become poor readers.

In adults or school-aged children differences between dyslexics and typical readers have been reported in ongoing EEG activity. Comparing these two groups has pointed to higher delta and theta activity during phonological task (Rippon, 2000; Klimesch et al., 2001a; Spironelli et al., 2006; Penolazzi et al., 2008) and lower alpha and beta activity (Rumsey et al., 1989; Rippon, 2000; Klimesch et al., 2001b) during reading tasks in dyslexics. Several studies (Sklar et al., 1972; Colon et al., 1979; Ahn et al., 1980; Duffy et al., 1980; Pinkerton et al., 1989; Rumsey et al., 1989; Harmony et al., 1995; Clarke et al., 2002; Benasich et al., 2008; Gou et al., 2011; Babiloni et al., 2012) have also investigated EEG from the resting-state. However, differences in participant cohorts—ranging from preschool or school-age children, to adolescents or adults and varying degrees of language disability—and

*<sup>2</sup> Body Area Network, imec/Holst Centre, Eindhoven, Netherlands*

in EEG biomarker definitions has prevented a consensus about possible neuronal signatures of dyslexia.

Throughout development neuronal oscillations plays an important role in shaping the structural and functional neuronal connectivity that will support higher brain functions, such as language and cognition, later in life (Smit et al., 2011). Early impairments in resting-state EEG, reflecting underling neuronal activation, might prelude future developmental problems. In line with these considerations and previous findings, we test whether ongoing neuronal oscillations in preliterate children carry information about reading fluency later in life. We compared relative amplitude spectra of EEG measured at about 3 years of age in three groups of children: one control group of fluent readers and two at-risk groups of fluent and non-fluent readers. Reading fluency was assessed at third grade of school, when the children were about 9 years old. We identified two EEG biomarkers that correlated with reading performance and reading-related test scores collected in third grade.

## **MATERIALS AND METHODS**

### **SUBJECTS AND SELECTION CRITERIA**

The children that participated in this study are part of the Dutch Dyslexia Programme, a longitudinal research project. The study was approved by the medical ethics committee of the University of Nijmegen, the Netherlands. Informed consent was obtained from one of the parents of each child. Parents were recruited when expecting a baby. The children were first divided in control and at-risk groups, then the at-risk group was divided in fluent and non-fluent readers.

To assess whether the infants were at familial risk of becoming poor readers, the reading fluency of the parents was tested with a word reading task (Brus and Voeten, 1973) and a pseudoword reading task (Van den Bos et al., 1994). In addition, verbal reasoning was measured with the subtest Similarities of the Wechsler Adult Intelligence Scale (Wechsler, 1997; Dutch adaption: Uterwijk, 2000). Children were included in the at-risk group when one of the parents scored (1) lower or at the 20th percentile on both reading tests, (2) lower or at the 10th percentile on one reading test and below the 50th percentile on the other reading test, (3) lower than the 15th percentile on one reading test and not higher than the 40th percentile on the other reading test, or (4) with a discrepancy of 60 percentiles or more between the verbal reasoning test and either of the two reading tests and the additional requirement that both reading scores were below the 50th percentile. The children were included in the control group when both parents scored at the 50th percentile or higher on both reading tests.

Reading in the children was assessed based on three measurements: at the beginning of second grade (2.4 months after starting second grade, SD 1.7 months; the children were 7 years and 6 months of age, SD 5.1 months) with two word-reading lists (1A and 1B from the 3-min test, Verhoeven, 1995), at the end of second grade (6.0 months after starting second grade, SD 0.9 month) with a word reading list (2A from the 3 min test, Verhoeven, 1995) and a pseudoword reading list (Van den Bos et al., 1994), and in the middle of third grade (3.5 months after starting third grade, SD 1.1 month) with a word reading list (Brus and Voeten, 1973) and a pseudoword reading list (Van den Bos et al., 1994). A child was classified as a "non-fluent reader" when it scored poor on two out of three measurements. A child was marked "poor" when it scored below or at the 10th percentile on one of the reading lists and below the 50th percentile on the other reading list, or below or at the 25th percentile on both reading lists. Two children, initially selected as part of the control group, scored as "non-fluent readers" and were excluded from the analysis. One child that was selected to be part of the at-risk group showed general cognitive delay and was omitted as well. This resulted in three groups: a control group of fluent readers (C, 14 children, 9 boys), a group of at-risk fluent readers (RF, 24 children, 16 boys) and a group of at-risk non-fluent readers (RNF, 24 children, 14 boys).

### **BEHAVIORAL EVALUATION**

Children were submitted to a range of cognitive tests that were administered in the middle of third grade. Two subtests of the Wechsler Intelligence Scale for children (WISC-III, Wechsler, 1992; Dutch adaptation: Kort et al., 2002) were administered: the subtest Block Design measuring nonverbal visual-spatial skills and the subtest Vocabulary measuring expressive vocabulary. Two children from the at-risk fluent group and one child from the at-risk non-fluent group were absent during this evaluation; therefore, they were not included in the behavioral statistical analysis. Behavioral tests for assessing reading-related skills were the rapid automatized naming task (RAN, Van den Bos et al., 1994), a phoneme deletion task (Amsterdamse klankdeletietest, AKT, De Jong and Van der Leij, 2003) and an orthographic choice task (Horsley, 2005). RAN measures the speed of naming overlearned information. The child was requested to name 50 digits from a piece of paper. The naming time was measured and the score was then expressed in the number of digits a child could have named in a minute. The phoneme deletion task measures phonological awareness. The child was asked to repeat a pseudoword (e.g., "memslos"), and subsequently asked to leave out a specific phoneme (e.g., the sound "l"), and to pronounce the resulting word ("memsos"). The score is the number of correct items out of 27. The orthographic choice task measured orthographic knowledge. The child had to decide the correct spelling of a word presented together with two homophonic pseudowords (e.g., among "vurkeer, verkeer, verkir" the correct spelling is "verkeer"). The score was the number of correct answers out of 70 items that the child completed in 10 min.

## **EEG RECORDING**

Neurophysiological data were collected at 35.1 months of age (SD 0.4 months). EEG was recorded from 64 channels (positioned according to the International 10–20 system; 500 Hz sampling rate; filtered at 0.01–100 Hz), including mastoid references and vertical and horizontal electrooculogram (SynAmps2 64 Channel Quik-Cap, Neuroscan). Three to five min eyes-open rest EEG were collected while the child was on the parent's lap; the child was awake and was encouraged to look at moving lines on a screen to keep it sitting as still as possible. To ensure objectively similar artifact rejection across the different cohorts, the recorded EEG was filtered and cleaned offline using FASTER, a Matlab toolbox for automatic EEG artifacts rejection (for details, see Nolan et al., 2010). In brief, after filtering (0.4–30 Hz band-pass), FASTER segmented the signals into 1-s epochs, detected and interpolated noisy channels, removed contaminated epochs and ran Independent Components Analysis (ICA) for the identification and the rejections of ICs associated with EOG, EMG artifacts. Finally, the signals were re-referenced to the common average, i.e., the average of all remaining scalp electrodes. The mean number of interpolated channels was 2 (± 1 SD, standard deviation) and the mean distance between interpolated channels was in 86% of the cases greater than the distance between neighbor channels. Thus, the influence of the interpolation on the spatial density of scalp EEG was negligible. The duration of the artifact-free epochs in each recording was 1.9 ± 0.1 min (mean ± standard error).

### **SPECTRAL ANALYSIS OF EEG DATA**

Artifact-free epochs were submitted to spectral analysis. Power spectral analysis was computed using Fast Fourier Transform (Welch technique, Hamming windowing function, with 4096 FFT points, resulting in frequency bins of width 0.1 Hz). We observed multiple peaks in the 6–12 Hz range in all subjects and, therefore, we calculated the individual alpha frequency using the gravity frequency peak definition a(f)×<sup>f</sup> <sup>a</sup>(f) , with a (f) denoting the power spectral density at frequency f, and ( ) the sum computed over the frequency bins in the interval 6–12 Hz (Klimesch, 1999). Mean gravity frequency peak across central electrodes (C1, Cz, C2, CP1, CPz, CP2) was obtained for each subject in each group (mean and standard error of mean for C: 8 ± 0.04 Hz; for RF: 8 ± 0.04 Hz; for RNF: 7.9 ± 0.07 Hz). No statistically significant differences were found between the groups [*F*(2, 59) = 1.17; *p* = 0.3, ANOVA]; similarly, no differences for gender [*F*(1, 60) = 1.06; *p* = 0.3, ANOVA] or for the interaction group × gender were found [*F*(2, 56) = 0.4; *p* = 0.7, ANOVA]. Given these results we considered a common individual alpha frequency of 8 Hz (consistent with literature findings in this age range Stroganova et al., 1999; Marshall et al., 2002) and we defined the frequency bands accordingly (Klimesch, 1999; Babiloni et al., 2012): 0.5–2 Hz (delta-1), 2–4 Hz (delta-2), 4–6 Hz (theta), 6–8 Hz (alpha-1), 8–10 Hz (alpha-2), 10–13 Hz (alpha-3), 13–20 Hz (beta-1), 20–30 Hz (beta-2). The amplitude of EEG signals depends on several factors unrelated to neuro-electrical activity such as anatomical and physical properties of the brain and surrounding tissue (bone thickness, skull resistance and impedance). These parameters vary from one subject to another; however, their influence on the statistical analysis can be minimized by the use of relative amplitude spectra, because these factors equally affect all frequencies analyzed. Amplitude spectra for each electrode were computed as the square root of the power spectra. The relative amplitude spectra were obtained by normalizing as follows:

$$\frac{\left}{\sum\_{i=1}^n \left}$$

where, - indicates the average of the amplitude in a specific frequency band, PBi , across frequency bins, and *i* = 1:n, with *n* = 8, corresponds to the ith frequency band considered in the analysis. The relative amplitude in these eight frequency bands were computed with the NBT toolbox (www.nbtwiki.net) (Hardstone et al., 2012), and are referred to as "biomarkers" following the broad definition: "A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacological responses to a therapeutic intervention" (Frank and Hargreaves, 2003).

## **STATISTICAL ANALYSIS FOR BEHAVIORAL DATA**

The behavioral tests of word and pseudoword reading, the nonverbal and verbal intelligence subtests, phoneme awareness test, orthographic knowledge test and the RAN tasks were evaluated by One-Way ANOVAs in order to determine differences among the three groups. Tukey's *post-hoc* test was used for *post-hoc* analysis with multiple comparison correction. For correlation analysis with EEG biomarkers, reading fluency performance scores were defined as a composite measure of performance in the two reading tasks measured at the middle of the third grade: word reading and pseudoword reading. Reading fluency performance scores were computed as the z-scores of the average z-scores of the performance in two reading tasks.

### **STATISTICAL ANALYSIS OF EEG DATA**

Distributions of amplitude spectra in each group for each channel and frequency band were tested for normality with the Lilliefors test. For about 25% of the channels the null hypothesis of normal distribution was rejected (*p* < 0.05); for this reason non-parametric methods were used for the statistical analysis. Statistical analysis of differences between groups was performed with non-parametric One-Way ANOVA (Kruskal-Wallis test) with group (C, RF, RNF) as main factor for each electrode and frequency band. Wilcoxon rank-sum test for each channel was used for between-group comparison and to compare means across significant electrodes. As an alternative to multiple comparison correction, we performed binomial testing to validate the statistical significance of the results (Maris and Oostenveld, 2007; Montez et al., 2009; Nikulin et al., 2012); i.e., differences were only considered significant if at least 10 electrodes would reach a *p*-value below 0.05. The likelihood of having this many channels out of 64 reach a *p*-value below 0.05 by chance is less than 0.1% (cf., binomial distribution). No correction was applied across frequency bands.

### **CORRELATION ANALYSIS BETWEEN BEHAVIORAL AND NEUROPHYSIOLOGICAL DATA**

The behavioral measures (the reading fluency scores, the scores on phonological awareness, orthographic knowledge, and RAN) that were all assessed in the middle of third grade were correlated with the mean amplitudes across significant channels of the most significant spectral bands using Spearman correlation. Correlation analysis was also performed for each channel as an alternative approach to identify brain regions with activity associated to later reading and reading-related skills.

### **RESULTS**

### **BEHAVIORAL RESULTS**

The three subject groups (control, C; at-risk fluent readers, RF; at-risk non-fluent readers, RNF) were assessed on a variety of reading and reading-related tests (see, **Table 1**).

### **Table 1 | Behavioral measures.**


*Means in the same row that do not share subscript differ in Tukey's post-hoc test at p* < *0.05. FR, reading fluency; Phon. A, phonological awareness; RAN, rapid automatized naming; Orhto.K, orthographic knowledge.*

The RNF children scored lower than the two fluent reading groups for phonological awareness (RNF vs. RF, *p* = 0.002; RNF vs. C, *p* = 0.001), RAN digits (RNF vs. RF, *p* = 0.022; RNF vs. C, *p* = 0.01) and orthographic knowledge (RNF vs. RF, *p* = 0.001; RNF vs. C, *p* = 0.001) tasks. This confirms that non-fluent reading is accompanied by reading-related deficits in phonological awareness, rapid naming and orthographic knowledge (De Jong and Van der Leij, 2003; Vellutino et al., 2004).

The groups did not differ on the intelligence subtest of nonverbal IQ; however, the RNF readers scored lower than the RF readers in the vocabulary test (*p* = 0.024). This is a well-known effect and poor verbal abilities have been reported to be present already at preliteracy age in children later diagnosed with dyslexia studies (Snowling et al., 2003; van Bergen et al., 2013).

Performance on word and pseudoword reading fluency tests at the middle of the third grade differed between all groups with the control children performing best, followed by the at-risk fluent readers and the at-risk non-fluent readers (**Table 1**; for word reading: RNF vs. RF, *p* = 0.001; RNF vs. C, *p* = 0.001; RF vs. C, *p* = 0.007; and for pseudoword reading: RNF vs. RF, *p* = 0.001; RNF vs. C, *p* = 0.001; RF vs. C, *p* = 0.001). This indicates that the RNF children had subclinical reading deficits possibly as a result of a higher liability (van Bergen et al., 2012).

### **EEG CORRELATES OF FUTURE READING ABILITIES**

To investigate the presence of differences in preliteracy measures of brain function between those children who would later become fluent readers and those that would become non-fluent readers, we analyzed eyes-open rest EEG recorded at about 3 years of age using classical spectral analysis (**Figure 1**, and Materials and Methods). We performed an ANOVA to compare relative amplitude in eight different frequency bands among the three groups and observed marked differences in the delta-1 and alpha-1 bands (**Figure 2**). Electrodes with a significant effect formed spatially connected clusters over frontal and centro-parietal scalp regions for the delta-1 band and over central and parieto-occipital regions for the alpha-1 band (see *p*-value topography maps in **Figure 2**, fourth column). For delta-1 we found 10 electrodes with a significant difference between groups at *p* < 0.05, whereas 18 electrodes reached this level for alpha-1. According to binomial testing, the probability of obtaining an equal or larger number of electrodes than 10 was negligibly small (*p* < 1e−3; Materials and Methods). On the contrary, for the theta and alpha-2 bands the number of significant electrodes was too small to reject the null hypothesis of the binomial significance test. The ANOVA for delta-2 did not show a main group effect at any electrode. No effects were observed in the higher frequency bands of alpha-3, beta-1, or beta-2 (data not shown). We note that alpha-2 oscillations displayed peaks bilaterally over the sensorimotor regions for all groups as previously reported for this age (Marshall and Meltzoff, 2011), which suggests that the alpha-1 result is not the child equivalent of the sensorimotor mu rhythm.

To identify more specifically the group differences that gave rise to the significant effects in the overall group comparisons we performed rank-sum tests between groups for delta-1 and alpha-1 relative amplitude. The results indicated that the control group did not differ from the at-risk fluent group in either of the two frequency bands (**Figures 3A,B**, first row), whereas widespread differences were observed for the comparisons of the control and at-risk non-fluent groups as well as for at-risk fluent and the atrisk non-fluent groups. Relative delta-1 activity was low in at-risk non-fluent group compared to both the control and at-risk fluent group (**Figure 3A**, second and third row), whereas relative alpha-1 activity was higher in the at-risk non-fluent group compared to both control and at-risk fluent group (**Figure 3B**, second and third row). The number of electrodes with *p* < 0.05 in these comparisons varied between 7 and 33; the probability of obtaining an equal or larger number of electrodes for a binomial distribution was small (*p* < 0.03; Materials and Methods). Although one should be cautious in inferring the origin of the sources

giving rise to the scalp topographies shown in **Figure 3**, they could correspond to brain reading circuits identified in several neuroimaging studies to include parieto-temporal, occipito-temporal and inferior frontal lobes (Eckert et al., 2005; Richlan et al., 2009; Raschle et al., 2011). Together, these results indicate that children who became poor readers exhibited already at the age of three a peculiar resting-state EEG activity compared to both control and at-risk fluent groups.

### **CORRELATION BETWEEN EEG BIOMARKERS AND BEHAVIORAL MEASURES**

Having identified relative amplitude of delta-1 and alpha-1 as putative biomarkers of future ability to read, we subsequently examined whether these EEG biomarkers could be related to performance in reading fluency and reading-related abilities (**Figure 4**, first row).

First, we computed the mean biomarker values of the delta-1 and alpha-1 across significant channels (see **Figure 2**, fifth column) and correlated these with the z-scores of the performance in the behavioral tests measured in the middle of the third grade. Within-group correlations did not reach statistical significance, which was expected given the restriction of range for the performance scores within groups. On the other hand, when considering all children together significant correlations were found between the EEG biomarkers and all the behavioral tests, i.e., Reading Fluency, Phonological Awareness, Orthographic Knowledge and RAN digits. Increasing performance on the different tasks correlated with increases in delta-1 amplitudes and decreases in alpha-1 amplitudes (correlation coefficients and *p*-values are reported in **Figure 4**, second row and third row). This result was to some extent expected given the group effects in the EEG and in the behavioral tests. Interestingly, however, when using the full range of correlations, scalp topographies revealed that also channels not showing a significant group difference were found to correlate with future performances (**Figure 5**).

Scalp topographies of correlations between the behavioral measures and the EEG biomarkers again suggested the involvement of multiple brain regions. In particular, delta-1's relative amplitude correlations with Reading Fluency, with Phonological Awareness and with RAN digits appeared both in frontal, central, and parieto-temporal regions. Similar regions showed correlations with Orthographic Knowledge and, interestingly, also occipito-parietal regions showed a robust positive correlation (**Figure 5**, first row) with Orthographic Knowledge. For alpha-1's relative amplitude significant correlations were stronger in central sites and correlations with performance on the orthographic task were also observed in the occipital region (**Figure 5**, second row). Noteworthy, correlations over occipital sites were more evident in the orthographic task where processing of visual information takes place. Additionally, the correlations we have found over frontal and parietal electrodes for the phonological task might be in agreement with previous studies associating phonological processing with activity in frontal lobes and parietal and temporal regions (Buchsbaum, 2001; Burton, 2009).

## **DISCUSSION**

The present study aimed at investigating neurophysiological correlates of later emerging reading abilities in the preliterate brain. Uniquely to our study, having longitudinal data covering about 9 years of development, we showed that characteristic spectral pattern in the resting-state EEG activity of children at 3 years of age discriminate children that became poor readers from children that became fluent readers. The ability to read fluently develops gradually over time and through substantial practice but in dyslexics it is hampered by the presence of reading-related deficits in phonological processing, mapping phonemes to graphemes, and automatic word recognition. Our behavioral results confirm this effect in poor readers showing that at-risk non-fluent children scored lower than both at-risk fluent and control children in all reading-related tests (Phonological Awareness, Orthographical Knowledge and RAN). Our data on intelligence measures were in agreement with previous findings showing that dyslexics score slightly lower on verbal tasks despite having adequate reasoning

abilities compared to non-impaired readers (Snowling et al., 2003; van Bergen et al., 2013).

*column*), at-risk fluent readers (*second column*), at-risk non-fluent readers

The comparison of EEG relative amplitude between the three groups of children, divided on the basis of their family-risk status and their reading fluency abilities, revealed several interesting findings. Our results show that two EEG biomarkers, delta-1 and alpha-1 relative amplitudes, emerged as putative preliterate discriminants of those children with a familiar risk of dyslexia who did become non-fluent readers. Non-fluent readers exhibited significantly lower levels of delta-1 activity and significantly higher alpha-1 activity compared to fluent readers (both control and atrisk fluent readers). Noteworthy, although the alpha-1 effect was prominent in central regions, we believe this does not reflect a sensorimotor mu rhythm deficit, because of the larger amplitude and clear bi-lateral topographic distribution of relative amplitude seen in alpha-2 band (Marshall and Meltzoff, 2011). Topographic distributions of the main group effect and of the correlations between the EEG biomarkers and behavioral data suggested the involvement of several scalp regions, which is in line with neuroimaging studies that have identified reading circuits both in parieto-temporal, occipito-temporal and inferior frontal lobes, albeit often with a left laterality (Eckert et al., 2005; Richlan et al., 2009; Raschle et al., 2011). Given that our statistical analysis was performed in sensor space, we are cautious with the interpretation

readers (RNF, green squares); mean across subjects and standard error bars.

of the source origin of the effects shown in **Figures 2**, **3**, **5**; however, we note that both delta-1 and alpha-1 relative amplitude exhibited strong correlations with Orthographic Knowledge in occipital areas, which have previously been associated with orthographic processing (Samuelsson, 2000). Differently, phonological processing is known to involve frontal areas (Burton, 2009), which could explain the prominent effects we have found in the frontal scalp regions, especially for the delta-1. These results might suggest a broader role of delta activity and a specificity of alpha-1 oscillations for visual processing, present even before the reading onset.

first row, between controls and at-risk non-fluent readers (C-RNF) in the second row, between at-risk fluent readers and at-risk non-fluent readers (RF-RNF) in

## **THE FUNCTION OF DELTA AND ALPHA OSCILLATIONS IN BRAIN MATURATION**

It is well known that delta activity dominates the human EEG during early development and decreases over the course of normal development (John et al., 1980; Gasser et al., 1988; Harmony et al., 1990). Slow wave delta activity in development is believed to be important in pruning redundant cortical connections and supports brain maturation as reflected in the positive association between delta activity and gray matter volume (Whitford et al., 2007). If slow-wave delta activity is indeed a necessary mechanism for pruning and cortical development to take place, our finding of reduced delta-1 activity seem to be in agreement with the hypothesis of a cerebral maturation delay in 3-year old children that later on become poor readers. Later in life, delta waves remain dominant during slow-wave sleep; however, a relatively high delta activity in a wakeful state has been associated with pathological neuronal conditions (Spironelli and Angrilli, 2009; Babiloni et al., 2012) such as in adults with ADHD and dyslexia (Chabot et al., 2001; Penolazzi et al., 2008). Higher delta activity has also been observed in dyslexic school-age children (Spironelli et al., 2006; Penolazzi et al., 2008; Spironelli and Angrilli, 2010) and dyslexic young adults (Rippon, 2000), although these data were recorded during reading tasks. However, increased delta and theta activity in dyslexics or children with reading and writing disabilities have also been reported in resting-state EEG at the age of 9– 18 years (Sklar et al., 1972; Colon et al., 1979; Pinkerton et al., 1989; Harmony et al., 1995). Thus, in line with the hypothesis of dysfunctional development (Spironelli et al., 2006, 2010, 2011; Penolazzi et al., 2008), it is plausible that failure to produce adequate delta at a young age is part of the mechanism causing a delay of cortical maturation, which in turn may be reflected in an increase of delta activity in at-risk non-fluent readers compared to fluent readers at school-age. To investigate this, future analyses will be done on resting-state EEG collected in the present cohort at the age of 11 years.

obtained with rank-sum test, comparing the individual-subject average across significant electrodes between groups (∗*p* = 0.005;∗∗*p* = 0.01; ∗∗∗*p* = 0.03).

In the present study, we used relative amplitude measures to reduce the considerable genetic variance on oscillatory power (Linkenkaer-Hansen et al., 2007). Thus, it is plausible that our

findings in the alpha-1 bands are to some extent related to those in the delta band. On the other hand, increases of lower alpha (spectral component just below the IAF) activity have been associated with difficulty in sustaining attention and inhibiting distracting environmental stimuli (Klimesch et al., 2001b). Thus, we cannot rule out that the children in our study differed in the level of attention paid to the moving lines, with the non-fluent reading group exhibiting low sustained attention, as reflected in higher alpha-1 amplitude compare to the other two groups.

### **COMPARISON WITH OTHER FAMILIAL RISK STUDIES**

It is worth noting that our cohort resembles the cohort of the studies presented in Benasich et al. (2008) and Gou et al. (2011) where resting-state EEG of age-matched children (16, 24, and 36 months of age) with a family history of languagebased learning impairment (LLI, FH+) and controls (FH−) were compared. Benasich et al. (2008) performed qualitative analysis among absolute power spectra in 9 frequency bins ranging from 5 to 50 Hz (excluding frequency components of the delta range) and across different scalp regions. They selected frontal and prefrontal regions and focused on two wide frequency ranges for their statistical analysis (5–30 Hz and 31–50 Hz, of which the latter was referred to as gamma). They reported lower frontal gamma power in FH+ compared to FH-, and found correlations of gamma power with attention, and expressive and receptive language skills measured at the same age. In a later study, using the same cohort of children, Gou et al. (2011) reported that resting frontal gamma power at 16, 24, and 36 months was associated with phonological memory and syntactical skill measured at the age of 4 and 5 years. Comparison with the present study are not possible for several reasons: (1) the definition of the family risk in our study is based on the parents' performance on reading tests, whereas in Benasich et al. (2008) it is associated to the presence of at least one sibling or parent diagnosed with LLI (75% were siblings Choudhury and Benasich, 2003); (2) none of the children in Gou et al. (2011) were themselves diagnosed with any language or learning disabilities, whereas our at-risk children were divided into groups that became fluent or non-fluent readers; (3) the EEG

subjects are shown in the form of correlation coefficient (r) topographies (cool-warm color map) and *p-*value topographies (jet color map).

signal analysis performed in our study did not include frequency components higher than 30 Hz. Despite the differences in the cohort definition and in the EEG analysis, our results are congruent with the findings of Benasich et al. (2008) related to absence of statistical difference between control and at-risk fluent readers (comparable with FH+) for spectral amplitude lower than 30 Hz.

## **CONCLUSIONS**

In conclusion, we showed that combining family-risk status assessment and resting-state EEG at preliterate age could provide preliminary indicators of future reading abilities. Specifically, our data suggest that delta and alpha oscillations are implicated with neurophysiological processes of importance for readingrelated disabilities later in childhood. In particular, we confirm the role of delta activity as a physiological index of abnormal cerebral maturation. Further investigations are required to better understand the functional significance and the underlying mechanisms governing the dynamics of these oscillations in developmental dyslexia. For example, in this study we did not account for special training in addition to schooling that some poor-reading children have followed, partially influencing their performances in the reading-related tasks measured at the middle of the third grade. In this regard, future studies may investigate whether reading-intervention programs (Connor et al., 2007)affect the dynamics of the brain as reflected in the resting-state EEG.

## **ACKNOWLEDGMENTS**

We thank the Netherlands Organization for Scientific Research (NWO) for funding the Dutch Dyslexia Programme (1997–2012) under project number 200-62-304. We also thank all children and their parents for their dedicated participation in this longitudinal research program.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 May 2014; accepted: 01 September 2014; published online: 19 September 2014.*

*Citation: Schiavone G, Linkenkaer-Hansen K, Maurits NM, Plakas A, Maassen BAM, Mansvelder HD, van der Leij A and van Zuijen TL (2014) Preliteracy signatures of poor-reading abilities in resting-state EEG. Front. Hum. Neurosci. 8:735. doi: 10.3389/ fnhum.2014.00735*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Schiavone, Linkenkaer-Hansen, Maurits, Plakas, Maassen, Mansvelder, van der Leij and van Zuijen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A similar correction mechanism in slow and fluent readers after suboptimal landing positions

## *Benjamin Gagl\*, Stefan Hawelka and Florian Hutzler*

*Department of Psychology, Centre for Neurocognitive Research, University of Salzburg, Salzburg, Austria*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

#### *Reviewed by:*

*Angela Heine, Hochschule Rhein-Waal, Germany Maria De Luca, Fondazione Santa Lucia, Italy Julie Ann Kirkby, Bournemouth University, UK*

### *\*Correspondence:*

*Benjamin Gagl, Department of Psychology, Centre for Neurocognitive Research, University of Salzburg, Hellbrunnerstrasse 34, Salzburg 5020, Austria*

*e-mail: benjamin.gagl@sbg.ac.at*

### **INTRODUCTION**

The initial landing position of the eyes on a word affects the speed of word recognition (e.g., O'Regan and Jacobs, 1992; Hutzler et al., 2008). Typically, a fixation position slightly left to the word center, which is termed as the optimal viewing position (OVP), allows fast word recognition. In contrast, landing on the initial or the final letters of a word increases word processing times (see **Figure 1**). This is a classical finding from visual word recognition research, which could be replicated in different languages (Finnish: Hyönä and Bertram, 2011; German: Hutzler et al., 2008; French: O'Regan and Jacobs, 1992; Vitu et al., 2007) and different age groups (e.g., Aghababian and Nazir, 2000). Eye movement evidence suggests that increased reading times at suboptimal landing positions are the result of a correction mechanism that precedes visual word recognition to provide high quality visual information to the reading system. **Figure 1A** (bottom) presents a schematic example of a corrective re-fixation. Also, in **Figure 1B**, a schematic description of the OVP effect in reading time is presented and effects on eye movements that are described in the following.

In reading paradigms, for example, silent sentence or passage reading, word processing measures such as the number of fixations and the percentage of re-fixations typically show an OVP effect. The number of fixations and the percentage of re-fixations are lowest after landing at the word center when compared to suboptimal landing positions (**Figure 1B**; e.g., McConkie et al., 1989;Vitu et al., 2001; Nuthmann et al., 2005). Similar OVP effects were found for gaze duration. In addition, the study of Vitu et al. (2001) described the influence of landing position on the first fixation duration. The initial fixation duration showed an inverted-optimal viewing position effect (I-OVP) with the longest fixation durations at the center of the word and shorter fixations

The present eye movements study investigated the optimal viewing position (OVP) and inverted-optimal viewing position (I-OVP) effects in slow readers. The basis of these effects is a phenomenon called corrective re-fixations, which describes a short saccade from a suboptimal landing position (word beginning or end) to the center of the word. The present study found corrective re-fixations in slow readers, which was evident from the I-OVP effects in first fixation durations, the OVP effect in number of fixations and the OVP effect in re-fixation probability. The main result is that slow readers, despite being characterized by a fragmented eye movement pattern during reading, nevertheless share an intact mechanism for performing corrective re-fixations. This correction mechanism is not linked to linguistic processing, but to visual and oculomotor processes, which suggests the integrity of oculomotor and visual processes in slow readers.

**Keywords: eye movements, reading, landing position, slow readers, corrective re-fixations**

at suboptimal landing positions, that is, the word beginning or end (see **Figure 1B**). This finding was replicated in different languages (e.g., Vitu et al., 2001; Nuthmann et al., 2005; Hyönä and Bertram, 2011), age groups (Vitu et al., 2001; Huestegge et al., 2009; Joseph et al., 2009), and successfully modeled for adults (Reichle et al., 2003; Engbert et al., 2005) and children (Reichle et al., 2013).

At first sight, the I-OVP effect was counterintuitive since the initial fixation duration showed the opposite pattern compared to reading times (**Figure 1B**). To understand the I-OVP effect, the complementary OVP effects on the number of fixations and the probability of re-fixating a word, have to be taken into account. The first line of **Figure 1A** shows a standard case where a word is fixated at a preferred central location (Rayner, 1979) and recognized by a single fixation. In this case the fixation durations are the longest and influenced by linguistic word characteristics (e.g., Vitu et al., 2001). In the second line of **Figure 1A** the word *Maler* is initially fixated at the word end. After such a suboptimal fixation position the duration is typically short and accompanied by a re-fixation at the word center. The initial short fixations are not influenced by word characteristics such as word frequency (Vitu et al., 2001;Vergilino-Perez et al., 2004) or the lexicality of the letter string (i.e., word vs. pseudoword; Hutzler et al., 2008). To summarize, when readers land optimally, the initial fixation durations are longer, influenced by the linguistic properties of the word and less likely followed by an additional fixation. In case of landing at the beginning or the end of a word, initial fixation durations are shorter, not influenced by linguistic word characteristics and highly likely followed by a re-fixation.

The mechanism underlying the I-OVP–OVP effect combination is a correction process that initiates a saccade from unfavorable landing positions to the center of a word (see **Figure 1A**).

after suboptimal landing at the end of the word *Maler.* The initial fixation on *Maler*, presented in a smaller red circle (indicating a shorter fixation duration), is followed by a saccade (blue arrow) to set up a fixation near the word center (Hutzler et al., 2008). **(B)** The optimal viewing position effects (OVP effect) for reading times and the number of fixations for landing positions at the beginning, the center and the end of the word *Maler*. The right panel depicts the inverted optimal viewing position effect (I-OVP effect) for first fixation durations. The OVP effect shows longer reading times and a higher number of fixations after landing at the word beginning or the word end. In contrast, the I-OVP effect on first fixation durations shows the shortest fixation durations at word beginnings or ends and the longest durations at the word center. Note that, after landing at the word center of a short to medium length word (e.g., smaller than eight letters) only one fixation is typically required for word recognition (e.g., Vitu et al., 2001).

This initial information processing and saccade programming is independent from linguistic processing (Hutzler et al., 2008). In other words, after landing at an unfavorable landing position the brain recognizes that the position is off target (i.e., word center) and corrects the position by means of a fast eye movement towards the preferred location near the word center. In a study by Hutzler et al. (2008), this mechanism was labeled as corrective re-fixations and might allow the investigation of non-linguistic processing, such as visual and oculomotor processing of slow readers.

Landing position effects on slow readers (e.g., dyslexic readers) were seldom reported and, to our knowledge, there is no existing report of an I-OVP effect in fixation durations and OVP effects in the number of fixations and the re-fixation probability. Two eye movement studies (MacKeben et al., 2004; Hawelka et al., 2010) reported landing position data of dyslexic readers. These readers tend to target the word beginning more often than fluent readers. In addition, the relation between initial landing position and word length was investigated. Here the classical finding

is that a fluent reader tries to initially fixate a position near the word center (Rayner, 1979). This means that for a short word of four letters the preferred viewing location is near the second letter but on an eight-letter word the preferred viewing location would be around the fourth letter. In the dyslexic readers of Hawelka et al. (2010) the influence of word length on the initial landing position was smaller than in the fluent readers. They landed more towards the word beginning (at the second letter of words with four to seven letters) than the fluent readers (who landed on the third letter; see also MacKeben et al., 2004). To our knowledge, no further investigation of, for example, a reading time measure in relation to landing position is present in the literature.

The participants of the Hawelka study were German dyslexic readers. In German, dyslexia is mainly characterized by massively impaired reading speed (e.g., Wimmer, 1993). The speed impairment is reflected in prolonged fixation durations as well as in a higher number of fixations per word in comparison to fluent readers (Hutzler andWimmer, 2004; Dürrwächter et al., 2010; Hawelka et al., 2010). Furthermore, the slow readers exhibited markedly increased word length and frequency effects on number of fixations. In combination, the strong word length effect (e.g., in gaze durations), the high number of fixations and the initial landing position at the word beginnings of slow readers were interpreted as a serial reading strategy. Serial reading is typically present in beginning readers and reflects letter to sound conversion (e.g., Share, 1995; Ziegler and Goswami, 2005). The converted sounds are then assembled to allow access to phonological representations. After the initial phase of literacy acquisition the length effect typically decreases which is interpreted as reflecting the emergence of whole word recognition (see Rau et al., 2014, for a recent eye movement study).

Dyslexic and slow readers might stick to serial reading. From this perspective, word beginnings are reasonable targets for the initial fixation. The most prominent theory, the phonological deficit hypothesis, suggests that in dyslexic individuals the representation, storage, and retrieval of speech sounds is impaired (e.g., Snowling, 2000). Other hypotheses assume impairments in the process of connecting letters and orthographic information (e.g., an orthographic word unit) to the respective phonological representation (e.g., Wolf and Bowers, 1999; Wimmer and Schurz, 2010). These hypotheses are concerned with cognitive processes that are specific for linguistic processing during reading. Another type of deficit theories is concerned with processes during reading apart from the core linguistic processes. Examples would be deficits in visual processing (e.g., magnocellular vision; Stein and Walsh, 1997) or oculomotor processes (e.g., Pavlidis, 1981; Bucci et al., 2008; but see Kirkby et al., 2011). These processes are non-linguistic processes that accompany the core linguistic processes but are not exclusive to reading.

The main aim of the study is investigating the phenomenon of corrective re-fixations in slow readers. In particular, it is investigated whether or not slow readers also show corrective re-fixations along with OVP and I-OVP effects. If slow readers correct for unfavorable landing positions, then they should show an (1) OVP effect in the re-fixation probability, (2) OVP effect in the number of fixations and (3) an I-OVP effect on first fixation durations. In case the pattern of effects suggests such a correction procedure, it is of interest if the first fixation durations at suboptimal landing positions are (4) comparable between groups and if they are (5) influenced by word frequency or predictability. An absence of a correction mechanism as well as differences to the normal corrective pattern such as longer fixation durations at suboptimal landing positions would indicate an impairment in the non-linguistic visuo-oculomotor components of reading. For landing positions at the word center, slower readers are expected to show increased fixation durations and stronger effects of frequency and predictability in contrast to fluent readers. With regard to the deficit theories, a comparable correction mechanism would support theories which assume deficient linguistic processing. In contrast, differences in corrective re-fixations would support theories suggesting deficits in nonlinguistic processes. Final analyses will investigate the OVP and I-OVP effects for each individual reader (see e.g., Ramus et al., 2003). These analyses will inform whether deficits can be generalized for slow readers or whether there is a distinct subgroup of slow readers who exhibit evidence for a visuo-oculomotor deficit.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

We recorded eye movements from 46 slow readers (18 adolescent dyslexic readers were from Hawelka et al., 2010; 16 student dyslexic readers and 12 academic slow readers from previously unpublished datasets) and 99 fluent readers (18 adolescent fluent readers were from Hawelka et al., 2010 and 49 from Gagl et al., 2011; 28 fluent reading students, from previously unpublished datasets). All readers were either adolescents or adults (16–47 years old) and native speakers of German. All slow readers scored below a percentile of 10 on a reading speed test, which is an adaptation of a reading speed test for children (Auer et al., 2004) and our group is currently collecting the norming samples. The preliminary norms of the test are based on a sample of 309 students. In this test, readers are instructed to mark sentences as semantically correct or incorrect and the number of correctly marked sentences within 3 min was used as a measure of reading speed. For example sentences like "*People with pale skin and blond hair have an enhanced risk of sunburn*" or "*A weighing-machine measures the height of a person*" were included. In addition, all adult slow readers had achieved a high level of education and all adolescent dyslexics had a normal to high IQ. For 34 slow readers (i.e., the two dyslexic groups) two non-verbal subtests of German version of the Wechsler Adult intelligence scale (WAIS-R; German Version: Tewes, 1991) were administered. The group scores (standard deviations) were 12.3 (2.7) and 13.3 (2.8) for block design and object assembly, respectively. Note both scores were higher than the norm mean of 10 (3). This is a typical profile for a group of developmental dyslexics (e.g., Snowling, 2000). In addition, the adolescent dyslexic readers of Hawelka et al. stem from two large scale longitudinal studies from Salzburg (e.g., Wimmer et al., 2000) and most of the student dyslexic readers were diagnosed with developmental dyslexia before (10/16). The third group of 12 adult slow readers achieved at least a higher education

entrance qualification. They were students or already hold an academic degree and therefore it is highly unlikely that their reading speed deficit is due to an intellectual handicap. To be conservative we refer to the whole group as slow (rather than as dyslexic) readers.

Fluent readers were included, if they exhibited a reading score above percentile 35. As an additional measure for the group selection we calculated a word per minute measure (wpm) from the eye movement sentence reading task. Similar to Rayner et al. (2010) we set a *wpm* criterion, which was 200 wpm. Eight slow readers (seven from the student dyslexics and one slow reading academic) and 17 fluent readers performed above and below the criterion, respectively. Thus they might be inadequately assigned to their reading group. To be conservative, we discarded these participants from further analysis. As a result 38 slow and 82 fluent readers (*M* = 139 wpm; SD = 38 and 277 wpm; SD = 55, respectively) were included in the final analysis.

## **MATERIALS**

Participants read the 144 sentences with various grammatical structures of the Potsdam sentence corpus (PSC; Kliegl et al., 2004), which were presented in a mono-spaced, bold Courier New font (14 pt; 0.3◦ character width). Eye movements were analyzed on all words with four to seven letters (*n* = 495 words; *M* = 5.5 letters; SD = 1.1; four-letter words: *n* = 121; five-letter: 135; six-letter: 125; seven-letter: 114). We note that more than one word per sentence were included in the analysis. For example, in the sentence "*Der Gehilfe des Gärtners sät Kresse und Radieschen."* the words in bold letters met the criteria. However, the initial word of a sentence was not considered for analyses. Predictability measures were provided by the Potsdam group and word frequency values were obtained from the SUBTLEX frequency norms (Brysbaert et al., 2011). The mean log SUBTLEX frequency of the target words was 3.36 (SD = 1.17) per million and the mean predictability was 0.19 (SD = 0.29) in the whole set of the corpus sentences. The dyslexic student readers and their control group (*n* = 9 and *n* = 22, respectively) read a reduced set of the Potsdam sentence corpus (*n* = 36 sentences). In this short version, 157 words of four to seven letters were analyzed (*M* = 5.5 letters; SD = 1.1; *n* = 32, 48, 41, and 36 for the levels of words lengths, respectively). These words had a very similar frequency and predictability as those from the entire corpus (*M* = 3.36; SD = 1.10; *M* = 0.19; SD = 0.28, respectively).

### **APPARATUS AND PROCEDURE**

The eye movement recordings and the sentence reading paradigm are described in detail in Hawelka et al. (2010) and in Gagl et al. (2011). In short, an Eyelink 1000 tower mount system (SR-Research, Ontario, Canada) was used to record the eye movements of the right eye. The participant's heads were placed in a head and a chin rest in front of a 21' cathode ray tube monitor (Belinea, Germany, 1024 × 768 screen resolution; 120 Hz refresh rate). The sentence reading task was preceded by a monocular calibration procedure and 10 practice sentences. The participants were instructed to read the sentences silently. The presentation of the sentences was triggered by a fixation at a

fixation point at the left side of the screen (vertically centered). After the detection of a fixation at the fixation point, a sentence was presented in such a way that the fixation landed at the OVP of the first word of the sentence. To terminate the sentence presentation a cross at the bottom right corner of the screen had to be fixated whereupon a new trial was initiated. After about a quarter of the sentences a comprehension question was orally presented. These questions could mostly be answered with a single word. To ensure high eye tracking quality and a low number of recalibrations (in both groups on average four times) the chin rest was placed in such a way that the utterance was not hampered. Both groups had no problems comprehending the sentences, which was reflected in their nearly perfect performance on the comprehension questions (>98% in both groups).

## **DATA TREATMENT AND ANALYSIS**

First fixation durations, the re-fixation probability and the number of fixations were analyzed during the first pass reading of all 4–7 letter words in relation to the initial landing position. All fixation durations shorter than 80 ms and longer than 800 ms were removed from analysis (<2% of the fixation durations in each group). In sum 14,994 and 27,865 fixations were analyzed for slow and fluent readers, respectively, of which 4,597 and 6,043 landed at the beginnings of words and 1,028 and 5,054 landed at the ends of words. The re-fixation probability is estimated by setting the probability to one in case a word was re-fixated (i.e., more than one fixation) and to zero in single fixation cases. The re-fixation probability of, for example, a specific word would then be the mean of the probability values from each participant for this word excluding cases in which the word was skipped.

Data analysis was performed with linear mixed effect models (LMM) with the *lme4* package (Bates et al., 2007) in *R*. The LMM analysis is suitable for investigating sentence corpus data since LMMs deal well with missing data and they allow treating items and participants as random effects in a single analysis. For fixation durations, we analyzed both untransformed and log-transformed durations. The result of such a transformation usually lead to a better fit between observed and predicted data (i.e., to smaller residuals). However, we did not observe differences in the pattern of effects for the two analyses. Thus, we report the coefficients and statistics of the untransformed data from which one can more readily perceive the effect sizes. For the analyses of re-fixations and number of fixations the binomial and the Poisson distributions were used for modeling the LMMs, respectively.

To capture the parabolic shape of the I-OVP and the OVP effects we centered the first fixation position measure and added in the LMM the second order polynomial term of the centered position (i.e., the squared centered first fixation position). To illustrate, a centered initial landing position of zero relates to either the middle letter of a word or to the space between the two middle letters (in words with an even number of letters). Thus, zero would relate to the third letter in five-letter words or to the position in between the second and the third letter in four letter words. A first fixation position at the second and third letter of a five-letter word is one for both instances (the square of −1 and +1, respectively) and a fixation at the first and the last letter is four (the square of −2 and 2). This convention made possible to capture the parabolic shape of the OVP/I-OVP effects by accounting for the decrease or the increase of the eye movement measures with increasing distance of the fixation position from the word center (see also McConkie et al., 1989; Nuthmann et al., 2005).

To summarize, the main LMM analysis contains a first and second order polynomial of centered first fixation position (i.e., the linear and the squared effect of first fixation position) and the factor reading group. For these three fixed effects all possible interactions were modeled. In addition, we added word frequency, predictability and length as fixed effects to account for the effects of these word characteristics. Participants and items were treated as random effects.

## **RESULTS**

## **GLOBAL EYE MOVEMENT CHARACTERISTICS**

The global eye movement characteristics of the slow and the fast readers for all words of the sentence corpus are presented in **Table 1**. As evident from the Table, the slow readers exhibited longer first fixation durations, longer gaze durations and a higher number of fixations. The higher number of fixations was due to a lower percentage of word skippings (7%; fast readers: 20%) and a higher percentage of instances in which words receive multiple fixations (52 vs. 24%). Importantly, for the analysis of re-fixation cases, the total number of cases in which words were fixated more than once is similar in both groups (around 7,500 cases). The initial and second landing position was closer to the word beginnings in the slow readers compared to the fast readers.

## **LANDING POSITION EFFECTS**

**Figure 2** presents the landing position distributions of the initial and the second fixation for the four to seven-letter words. Slow readers preferentially targeted the beginning of the words (**Figure 2A**). In contrast, the fluent readers' initial landing position was, on average, closer to the word center. However, the peak of the distribution was still slightly left of center. Fluent readers did not only target the word beginning less often, they also showed a higher preference for landing positions between the center and the end of a word when contrasted to slow readers. In **Figure 2B**, the distribution of the second fixation position is presented. Here the group differences are more subtle, but still reliable (see **Table 1**). A detailed inspection revealed that the slow readers still tend to fixate slightly more to the left of the center than the fluent readers. The latter fixated more often to the right of the center. Overall however, the second fixation distribution of both groups has its maximum at a position slightly left to the center of the words. Thus, we may assume that the target of the first re-fixation in both groups is near the center of a word.

**Figure 3** shows all eye movement measures in relation to centered first fixation position for both groups and all four word length levels. Visual inspection of the I-OVP and OVP effects for each word length indicate that the shape, of the largely overlapping effects, was comparable between length levels. To decrease complexity word length was only added as fixed effect to the LMMs as a control variable without the interactions. The word


**Table 1 | Means (standard errors) of global eye movement measures and group comparisons.**

*Fixation positions are presented in absolute letter position and fixation probabilities in percentages.*

length effect, on top of the effects of group, the linear and the squared effect of first fixation position, was not reliable for first fixation durations (*t* < 1), re-fixation probability and number of fixations (both *Z*'s < 1.9). **Figures 3A,B** shows the re-fixation probability and the number of fixations in relation to the initial landing position. Increased re-fixation probabilities and number of fixations were found after landing at the word beginning or after landing at word ends. This OVP effect was present for slow and fluent readers. However, the slow readers exhibited more fixations and a higher probability for re-fixations than the fluent readers. The LMM analysis (**Table 2**) confirmed this observation for number of fixations and the re-fixation probability with a reliable effect of group and reliable effects of the linear and squared first fixation position. In addition, both measures showed

a reliable interaction of group and linear landing position, which were due to more pronounced linear effects of landing position for the slow than for the fast readers. To be specific, the slow readers, in contrast to the fast readers, exhibited a higher number of fixations and a higher re-fixation probability at word beginnings than at word ends. Critically, no reliable interaction of group and the squared first fixation position was found which indicates that the quadratic effects were similar in both groups.

The right panels of **Figure 3** show the first fixation durations in relation to landing position. Note that, we distinguished between the first fixation durations of all cases and first fixation durations of cases in which the initial fixation was followed by at least one re-fixation indicating corrective re-fixations at unfavorable landing positions. In general, fixation durations of the slow readers were prolonged when compared to fluent readers. In relation to the landing position, the fixation durations of both groups were shorter after landing at the beginning or the end of a word in contrast to landing at the word center. Thus we observed an I-OVP effect for both measures and both groups. The main finding in **Figure 3** is that landing at the word end resulted in short fixation durations and, most critically, the durations were not different in the slow and the fast readers which was particularly the case for the fixation durations in multiple fixation cases.

The slow readers showed longer fixation durations than the fast readers for landing positions at the beginning and the center of the words. The LMM analysis (**Table 2**) confirms this observation. For both fixation duration measures, reliable effects of group and of the squared landing position werefound. In addition, both first fixation duration measures showed a reliable interaction between the quadratic effect of landing position and group which was due to a more pronounced I-OVP effect in the slow than in the fast readers. The interaction between the linear and the quadratic landing position was also reliable indicating that an increase in the linear landing position effect was accompanied with an increase in the

**FIGURE 3 | Re-fixation probability (A) number of fixations (B), first fixation duration (C) and the first fixation duration of multiple fixation cases (D) in relation to centered initial landing position for both groups** **and all four word length levels.** The lines depicts smoothed means; the gray areas confidence intervals as provided by the ggplot2-package (Wickham and Chang, 2013).

**Table 2 | Results from the LMM analysis for the percentage of re-fixations, number of fixations, first fixation durations and first fixation durations of multiple fixation cases.**


*Reliable effects are shown in bold. FE: fixed effect; SE: standard error; z relates to Wald z which is a non-parametric statistic for the binomial or Poisson distributions of the re-fixation probability and the number of fixations, respectively.*

quadratic landing position effect. For first fixation duration only, a three-way interaction of reading group with the linear and the squared landing position was found. The interaction was due to fact that the slow readers exhibited both, a more pronounced linear reduction of fixation durations towards the word end and a more pronounced quadratic effect, than the fast readers.

In an additional LMM analysis (**Table 3**; **Figure 4**), we investigated the influence of word frequency and predictability on fixation durations of multiple fixation cases separately for landing position (beginning, middle, end). This analysis is concerned with the question whether the short fixations after landing on the end of words are influenced by word frequency and predictability. The focus of this analysis was on the durations of the first fixation of multiple fixation cases. The rationale is that these fixations are the most sensitive measure to investigate corrective re-fixations. The analyses was a combined one for all levels of word length. A landing position of smaller than −2 was defined as landing at the word beginning and a landing position greater than +2 was defined as landing at word end. Landing positions between −2 and +2 were defined as landing at the word center. As evident from the regression lines in **Figure 4** (right panel) and the LMM analysis in **Table 3**, no reliable main effects or interactions of group, word frequency and word predictability were found when readers landed on the end of the words. This is a strong indication that linguistic processes did not influence fixation durations after landing on word ends. In contrast, if fixations landed at the beginning or the center of a word (left panel of **Figure 4**), then the slow readers exhibited longer fixation durations and we


**Table 3 | Results from LMM analysis for first fixation durations of multiple fixation cases with reading group, word frequency and predictability as fixed effects and participants and items as random effects.**

*Reliable effects are shown in bold. FE: fixed effect; SE: standard error.*

observed a reliable interaction of word frequency with group. This interaction was due to a more pronounced effect of word frequency in the slow compared to the fast readers. The predictability of the word influenced fixation durations only after landing on the word beginnings. A reliable predictability by frequency interaction which was due to a more pronounced effect of frequency in case of predictable words could be found at word beginnings.

### **INDIVIDUAL EFFECTS**

In addition to the group analysis, we conducted an individual estimation of the effect of squared landing position on fixation durations of multiple fixations, number of fixations and refixation probability. Separately, for each group, a simplified model was computed that included the centered linear and squared landing positions as fixed effects and participants and items as random effects on the intercept. In addition, the random effect of participants on the slope of the linear and squared landing position was included in the model. The additional random effects made possible to estimate the individual linear and quadratic effects of landing position. However, we will focus on the quadratic effects. The I-OVP effect on first fixation durations (i.e., a negative effect of the squared landing position) was found in 37 out of the 38 slow readers (97%) with a mean of −5.7 ms (SE = 1.9 ms; range: −10.1 to 1.6 ms). This effect was found in each of the fluent readers with a mean of −2.6 ms (SE = 1.0 ms; range: −4.9 to −0.1 ms). The OVP effect on number of fixations (i.e., a positive effect of the squared landing position) was found in 92% of slow (35/38) and 99% of fast readers (81/82) with means of 0.05 (SE = 0.5; range: −0.009 to 0.21 fixations) and 0.03 (SE = 0.2; range: −0.002 to 0.09 fixations), respectively. For the re-fixation probability, which showed the lowest quadratic effects of all three measures, only 71% (27/38) and 74% (61/82) of the slow and the fast readers showed an OVP effect. Mean values were 0.3% (SE = 1.3; range: −1.9 to 2.2 fixations) and 0.8% (SE = 1.3; range: −1.8 to 2.8 fixations), respectively.

A second analysis was conducted for each slow reader in respect to the first fixation durations of multiple fixation cases that landed on the word end. For these fixation durations the group analysis showed no reliable differences between the groups but this result might mask several slow readers that may still show increased fixation durations. In this analysis one sample t-tests were realized that compared the individual means of the slow readers compared with the fixation durations of the group of fluent readers (*M* = 201 ms; SD = 30). This analysis showed that 27 of the 38 slow readers exhibited first fixation durations which were not reliably different from the fluent readers. In other words, 71% of slow readers showed fixation durations (i.e., corrective re-fixations) comparable to those of the fluent readers.

## **DISCUSSION**

The present study investigated landing position effects in slow and fluent readers during sentence reading. We found (1) an OVP effect in re-fixation probability, (2) OVP in number of fixations and (3) an I-OVP effect on first fixation durations in slow readers. The main finding was that we found (4) no difference between slow and fluent readers in fixation durations at word end. Furthermore, the fixation durations of both groups were (5) not influenced by linguistic word characteristics, when the fixation landed at the end of a word. Thus, we can conclude that both groups exhibited a similar correction process (i.e., corrective re-fixations) after landing at unfavorable positions within words. However, the total number of fixations and the percentage of re-fixations were higher in slow readers compared to the fluent readers. Furthermore, we replicated the finding that slow readers initially fixate closer to word beginnings than fluent readers (MacKeben et al., 2004; Hawelka et al., 2010). A further group difference was that the I-OVP effect was stronger in the slow readers, which was reflected by prolonged fixation durations at the word center (and at word beginnings) accompanied by a steep decrease of fixation durations towards the word ends.

The I-OVP effect in fluent readers showed the expected pattern with the longest fixation durations at the center and shorter of fixation durations at suboptimal landing positions, that is, at the end and beginnings of words. In both groups the I-OVP effect was most pronounced in first fixation duration of multiple fixation cases and similar for all word lengths. Therefore, we speculate that in most of the multiple fixation cases at a suboptimal position the

fixation duration reflect the visuo-oculomotor processes preceding a corrective re-fixation. These corrective re-fixations are the main objective of the study and hence we will focus on them forth on.

The group differences in fixation durations at the word center (i.e., the OVP) and word beginning (the preferred landing position in the slow readers) reflect linguistic processing. For these landing positions we found reliable effects of word frequency. The frequency effect was substantially more pronounced in the slower than in the fluent readers (replicating, e.g., Hawelka et al., 2010). After landing at the end of words, the fixation durations of both groups were similar and were not influenced by word frequency or word predictability. Therefore, fixation durations after landing at the end are not influenced by linguistic processes and the increased number of fixations and higher percentage for a re-fixation indicates that these fixations are highly likely followed by a corrective saccade towards the word center. When inspected in detail, these fixations were followed by a saccade towards the word center in 78 and 83% of the cases for fast and slow readers, respectively. Thus, one can assume that the pattern after landing at word end reflects corrective re-fixations, a mechanism which is intact in (the majority of) slow readers.

We observed group differences in fixation duration when the initial fixation was at the beginning of the words. The slow readers' fixation durations were prolonged and more affected by word frequency than those of the fluent readers. This finding suggests that slow readers habitually target word beginnings

(MacKeben et al., 2004). In fluent readers the preferred viewing location is slightly left to the word center (Rayner, 1979). However, even in the slow readers the fixation duration at word beginnings were, on average, shorter than fixations at the word center. Thus, the cohort of fixations at the beginnings of words might include two cohorts of fixations of different type: one small cohort in which the slow readers correct for suboptimal landing positions and a second, larger cohort which initialized the process of visual word recognition instantaneously (i.e., serial decoding; see below), that is, without correction of the landing position.

The differences in the fixation pattern might stem from differences in cognitive processes that lead to word recognition. In the study by Hawelka et al. (2010), the landing positions of dyslexic readers at the word beginning, in combination with their higher number of fixations and their strong word length effect, was interpreted as a reflection of word processing by means of serial decoding. In fluently reading adults, only words of very low frequency and pseudowords elicit a word length effect (Weekes, 1997). The present finding of the tendency of initially fixating at word beginnings accompanied with a second fixation at the word center, suggests that, at least in a considerable amount of cases, serial decoding is still present in our adolescent and adult slow readers.

The pattern of group differences suggests that slow readers show comparable corrective re-fixations than fluent readers, especially at word end; anyway, when linguistic processing is present, indicated by the frequency and predictability effects, their slow reading speed is reflected in massively prolonged fixation durations. On an individual level, the group pattern does not fit to all of the slow readers. The individual I-OVP effects on fixation duration and the OVP effect of number of fixations showed that a small number of slow readers (a maximum of 8%) did not show I-OVP and OVP effects. In comparison, all fluent readers showed an I-OVP effect and only one out of 82 fluent readers did not show an OVP effect on number of fixations. Only for the re-fixation probability OVP effect, which was the weakest of the three effects, a larger number of individuals were found that did not show a positive OVP effect. Here 26% of the fluent and 29% of the slow readers did not show an OVP effect. Although these percentages are high, they were comparable between the groups. Thus, the main finding from the individual analysis is that the vast majority of the slow readers do not exhibit visuooculomotor deficits and that deficient linguistic processing is the cause of their impaired reading speed. Studies, which assessed this issue with non-linguistic tasks, came to similar conclusions. Especially, sophisticated search tasks that used stimuli that were very similar to reading stimuli (e.g., consonant strings) found that slow and fluent readers showed comparable eye movement patterns (Hutzler et al., 2006; Prado et al., 2007) indicating that visual and oculomotor processing of the slow readers is intact. For the few slow readers who exhibited deviant I-OVP and OVP effects one could assume that non-linguistic processes be the proximal cause or an additional source for their slow reading speed. Although the prevalence of this type of deficit was very low in the present sample of slow readers, it deserves attention particularly with regard to individual diagnostic and individually tailored therapies.

To sum up, the present study on the I-OVP effect in first fixation durations and accompanying effects (e.g., OVP of number of fixations) informed about the influence of landing position on the eye movement characteristics of slow readers. In case of suboptimal landing positions both groups used a similar corrective mechanism, a fast corrective re-fixation to the word center. Similar corrective re-fixations in fluent and slow readers allow drawing the conclusion that visual and oculomotor processes cannot be the primary cause of the reading speed impairment.

### **ACKNOWLEDGMENTS**

We thank Heinz Wimmer, Julia Crone, Christina Marx, Fabio Richlan, and Mathias Schurz for helpful discussions and comments on an earlier version of the article. This work was supported by the Austrian Science Fund (FWF P 25799-B23).

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 January 2014; accepted: 09 May 2014; published online: 03 June 2014. Citation: Gagl B, Hawelka S and Hutzler F (2014) A similar correction mechanism in slow and fluent readers after suboptimal landing positions. Front. Hum. Neurosci. 8:355. doi: 10.3389/fnhum.2014.00355*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Gagl, Hawelka and Hutzler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia

## *Victoria Leong\* and Usha Goswami*

*Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, UK*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

### *Reviewed by:*

*Fumiko Hoeft, University of California, San Francisco, USA Roeland Hankock, University of California, San Francisco, USA (in collaboration with Fumiko Hoeft) Jenny Thomson, University of Sheffield, UK Jarmo Hämäläinen, University of Jyväskylä, Finland*

### *\*Correspondence:*

*Victoria Leong, Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK e-mail: vvec2@cam.ac.uk*

Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The "phonological deficit" in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: "Stress" (∼2 Hz), "Syllable" (∼4 Hz) and "Sub-beat" (reduced syllables, ∼14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either *single* AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or *pairs* of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from *pairs* of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages.

**Keywords: amplitude modulation, envelope, speech rhythm, dyslexia, oscillations**

## **INTRODUCTION**

### **SPEECH RHYTHM AND PHONOLOGICAL AWARENESS IN DYSLEXIA**

Dyslexia is characterized across languages by difficulties in phonological processing (e.g., Snowling, 2000; Ziegler and Goswami, 2005). Phonological processing encompasses the encoding and representation of speech at a range of grain sizes, both segmental (i.e., phoneme) and supra-segmental (e.g., rime, syllable and stress). As simple decoding (word reading) requires the acquisition of phonology-orthography correspondences at different grain sizes (segmental for alphabetic languages, syllabic for some character-based scripts), this cognitive "phonological deficit" affects reading acquisition in dyslexia across languages. While an impairment in segmental processing in dyslexia has long been noted (e.g., Tallal and Piercy, 1974; Snowling, 1981), suprasegmental sensitivity has only recently been a focus of study, and then mainly in English (e.g., Wood and Terrell, 1998; Goswami et al., 2002, 2010). This is surprising, as children's phonological sensitivity to supra-segmental features of speech develops early in all languages, well before the onset of formal literacy instruction. Indeed, EEG studies reveal sensitivity to the dominant stress patterns in the native language within the first months of life (Friederici et al., 2007; Ragó et al., 2014).

For English-learning infants, this early sensitivity toward dominant syllable stress patterns such as the "Strong-weak" (S-w) trochaic motif has been shown to be important for word learning (Jusczyk et al., 1993; Echols et al., 1997). By the age of 7.5 months, English-learning infants are capable of using the trochaic stress pattern as a template for segmenting words from continuous speech (Jusczyk et al., 1999). During early childhood, pre-literate children across languages already exhibit an awareness for rime and syllable units in speech. Pre-readers are able to identify pairs of words that rhyme (e.g., "mat" rhymes with "hat" but not with "cut"), and to clap out the number of constituent syllables in a word (Bradley and Bryant, 1983; Treiman and Zukowski, 1991; Ziegler and Goswami, 2005). In fact, children's phonological awareness of rhyme, syllables and stress predicts their later success in learning to read (Bradley and Bryant, 1983; de Bree et al., 2006; Whalley and Hansen, 2006).

Sensitivity to supra-segmental features of speech, particularly speech rhythm and syllable stress, also appear to be impaired in children and adults with developmental dyslexia (e.g., Wood and Terrell, 1998; Kitzen, 2001; Goswami et al., 2010; Holliman et al., 2010, 2012; Leong et al., 2011; Mundy and Carroll, 2012). Acoustically, prosodic rhythm and stress in the speech signal are cued by a combination of amplitude, duration and frequency changes (Hirst, 2006). The amplitude-based cues to rhythm are contained within the slow-varying "amplitude envelope" of speech (Plomp, 1983; Howell, 1984, 1988a,b; Greenberg et al., 2003; Tilsen and Johnson, 2008; Leong, 2012; Tilsen and Arvaniti, 2013). These slowly-varying amplitude patterns also cue the location of the rhythmic "perceptual (P)-center" or *moment of occurrence* of a sound (Allen, 1972; Morton et al., 1976; Scott, 1993, 1998; Villing, 2010). The P-center forms the basis for the deliberate rhythmic timing of speech and for synchronization of speech between speakers (Cummins and Port, 1998; Cummins, 2003). The P-center is related perceptually to a particular rhythmic marker within the speech amplitude envelope: the envelope onset rise time. Perceptual sensitivity to rise time is impaired in children and adults with dyslexia in a range of languages (Goswami et al., 2002; Hämäläinen et al., 2005, 2009; Surányi et al., 2009; Poelmans et al., 2011; Goswami et al., 2011a; see Goswami, 2011, for a recent summary). The rise time or "attack" time of a sound refers to the rate at which its amplitude increases during its initial onset, and is closely related to its P-center and rhythmic "beat strength." For example, a trumpet note with a fast rise time and early P-center will typically be perceived as having a stronger beat than a bowed violin note with a slower rise time and later P-center (Gordon, 1987). In speech, envelope onset rise times distinguish between stressed and unstressed syllables (Leong et al., 2011; Goswami and Leong, 2013), and provide phonetic cues to voice onset time and manner of articulation, for example aiding in phonetic distinctions such as between /b/ and /w/ (Goswami et al., 2011b). Dyslexics' difficulties in perceiving amplitude envelope rise times across languages has led to the theoretical suggestion that a deficit in neural rhythmic entrainment to amplitude modulation (AM) patterns in speech could underlie the phonological deficit in developmental dyslexia (Goswami, 2011; "temporal sampling theory").

### **NEURONAL OSCILLATORY ENTRAINMENT IN DYSLEXIA**

The speech amplitude envelope contains a spectrum of AM at different temporal rates, with certain key rates of AM associated with characteristic timescales of speech information. For example, the envelope is dominated by modulations that occur at around 3–5 Hz, corresponding to the average duration of the syllable (Greenberg et al., 2003; Greenberg, 2006). AMs at a slower rate of ∼2 Hz are associated with inter-stress intervals in speech, which have an average duration of 493 ms (Dauer, 1983). Toward the other end of the modulation spectrum, faster modulations immediately above the 'classic' syllable rate of 3–5 Hz correspond to more quickly-uttered unstressed syllables (∼10 Hz, Greenberg et al., 2003). Faster modulations up to 50 Hz are thought to provide phonemic cues to manner of articulation, voicing, and vowel identity (Rosen, 1992). Although the amplitude envelope has been the focus of many speech intelligibility studies (e.g., Drullman et al., 1994a,b; Shannon et al., 1995), the spectral fine structure also makes an important contribution to speech intelligibility, particularly under adverse listening conditions (Qin and Oxenham, 2003; Xu et al., 2005; Obleser et al., 2012).

Recently, Poeppel and colleagues have proposed a neural account of speech processing based on multi-time resolution of the modulation patterns in the speech envelope (multi-time resolution models, e.g., Poeppel, 2003; Giraud and Poeppel, 2012). In multi-time resolution models, the brain is thought to track speech information at different timescales using neuronal oscillations at different frequencies. These neuronal oscillations*entrain* ("phaselock") to speech modulation patterns on equivalent timescales, so that peaks and troughs in oscillatory activity align with peaks and troughs in modulations in the signal. According to Giraud and Poeppel (2012), neuronal oscillatory activity in the Theta band (3–7 Hz) tracks syllable patterns in speech, while slower oscillatory activity in the Delta band (1–3) Hz tracks phrasal and intonational patterns, such as stress intervals. Fast oscillatory activity in the Gamma band (25–80 Hz) is thought to track quickly-varying phonetic information, such as formant transitions and voice-onset times, which have timescales in the order of tens of milliseconds. This convergence between characteristic timescales in speech and the dominant neuronal oscillatory bands in auditory cortex has been used to argue that oscillatory entrainment ("phase locking") may be an important neural mechanism for parsing the speech signal into appropriately-sized linguistic units for further lexical processing (Ghitza and Greenberg, 2009; Schroeder and Lakatos, 2009; Giraud and Poeppel, 2012; Zion Golumbic et al., 2012).

In line with dyslexics' difficulties in rise time perception, which are particularly evident for slower rise times (Richardson et al., 2004; Stefanics et al., 2011). Goswami (2011) proposed a "temporal sampling" framework to explain why the development of accurate phonological representation of speech is impaired across languages in developmental dyslexia. The temporal sampling framework proposed that impaired phonological representation in dyslexia could arise in part from impaired oscillatory entrainment to *slow* AMs (<10 Hz) that carry stress and syllable patterning in speech (i.e., involving delta and theta oscillations, see Goswami, 2011; Power et al., 2012, 2013; Soltész et al., 2013). As neuronal oscillations in the cortex exhibit hierarchical nesting across slow and fast timescales (e.g., theta-gamma phase-amplitude coupling; Lakatos et al., 2005), an impairment in slow oscillatory activity (e.g., delta, stressed syllable rate; theta, syllable rate) could also have consequences for speech encoding at faster timescales, such as the Gamma or other phonetic rate timescales. Indeed, recent studies using non-speech stimuli have indicated that the hemispheric lateralization of Gamma-rate oscillations (∼30 Hz) may be altered in dyslexia (Lehongre et al., 2011, 2013).

### **AM PERCEPTION IN DYSLEXIA**

Consistent with Goswami's (2011) proposal, several AM perception studies based on non-speech stimuli and psychoacoustic modulation thresholds indicate that dyslexics show poor AM sensitivity below 10 Hz (e.g., Lorenzi et al., 2000; Amitay et al., 2002; Rocheron et al., 2002; although note that Poelmans et al., 2012 observed no deficit at 4 Hz). Studies reporting on modulation thresholds for faster AM rates vary in whether they report dyslexic deficits. For example, while McAnally and Stein (1997), Witton et al. (1998), and Menell et al. (1999) all observed deficits in dyslexics' AM detection at ∼20 Hz, Hämäläinen et al. (2009) failed to find a deficit at the same rate. Meanwhile, while no dyslexic deficit at 80 Hz was reported by (Hari et al., 1999), a study by Poelmans et al. (2012) found atypical laterality effects in EEG for 20 Hz AM speech-weighted noise, and a study by Lehongre et al. (2011) found atypical laterality effects in MEG for 35 Hz AM white noise. Similarly mixed results have been observed for dyslexics' perception of very slow "stress rate" AMs. While an early study by Witton et al. (1998) found that the perception of 2 Hz AMs was unimpaired in dyslexia, subsequent studies by Stuart et al. (2006) and Hämäläinen et al. (2012) have reported significant group differences in AM sensitivity at the 1 Hz and 2 Hz rates respectively. From the non-speech studies, it is currently unclear whether dyslexics have a *general* deficit in AM perception that affects all modulation rates, or whether their deficit is *specific* to the AM rates <10 Hz that are identified in temporal sampling theory (Goswami, 2011). It is also possible that a single auditory anomaly, impaired phonemic sampling in left auditory cortex, accounts for the impaired phonological processing found in dyslexia (Lehongre et al., 2011).

While AM studies are important for studying phase-locking, their implications for real-life speech perception are limited because the AM patterns used in these studies are artificial sinusoids and not real speech AMs. Real-speech AMs differ from artificial sinusoids in several important ways. First, unlike sinusoids, speech AMs are not perfectly periodically regular, but contain phase-advancements or delays that reduce their temporal predictability. Secondly, real-speech AMs differ in patterning at different acoustic frequencies. These temporal differences in modulation patterning across different "spectral channels" are crucial for speech intelligibility (e.g., Shannon et al., 1995). Finally, in real speech, AM patterns at all timescales (e.g., stress, syllable and phoneme) are *concurrently transmitted* to the listener, unlike artificial AM studies in which only one AM rate is presented at a time. During real-life speech processing, listeners probably extract speech information using *combinations* of AMs at different rates. For example, we have recently reported that listeners detect prosodic RPs by computing the *phase relationship* between two concurrent rates of speech AM: the "Stress" rate (∼2 Hz) and the "Syllable" rate (∼4 Hz, see Leong, 2012). This proposal is summarized in **Figure 1**. Dyslexics' ability to use such AM *combinations* in real speech has, to our knowledge, not been tested.

One obvious difficulty is that the complexity of the speech signal makes the extraction of specific features like cross-frequency AM phase alignment at pre-determined rates very difficult. Accordingly, studies using "vocoded" (envelope-only) real speech are useful. In vocoder studies, the speech signal is split into different frequency channels (e.g., typically 2, 4, 8 or 16 channels), the envelopes from each channel are used to modulate noise or tone carriers, and are then recombined. The resulting speech sounds like a harsh whisper, and is initially difficult to recognize. Speech vocoder studies with dyslexic children consistently suggest that their ability to use envelope cues for speech perception is impaired (e.g., Lorenzi et al., 2000; Johnson et al., 2011; Nittrouer and Lowenstein, 2013). For example, Lorenzi et al. (2000) used 4-channel noise-vocoded VCV syllables (e.g., /aCa/) as stimuli, and found that both typically-developing and dyslexic 11-year-old children performed more poorly than adults when using envelope cues (<500 Hz) for speech intelligibility. However, while the speech recognition performance of control children improved significantly over the course of five training sessions during the experiment, the performance of dyslexic children did *not* improve with training. Johnson et al. (2011) and Nittrouer and Lowenstein (2013) found more direct evidence for impaired speech envelope perception in dyslexia. In their study using 4 and 8-channel semantically-unpredictable noise-vocoded monosyllabic sentences (e.g., "dumb shoes will sing"), Johnson et al. (2011) found that 10–11 year-old children with reading difficulties showed significantly poorer word recognition of vocoded speech than control children, for both 4- and 8-channel stimuli. Similarly, Nittrouer and Lowenstein (2013) used 4-channel noise-vocoded sentences and found that there were consistent differences in speech perception performance between typicallydeveloping and dyslexic children, for both age groups tested (8–9 years and 10–11 years).

In each of these studies, the vocoded stimulus typically contained a very wide range of envelope AM rates rather than a single AM rate (e.g., the envelope was low-pass filtered under 500 Hz). Thus, a complication of these experiments is that a deficit in perceiving speech modulations at a *specific* rate (e.g., 4 Hz) would be masked if the dyslexic children were able to extract redundant speech information at other modulation rates (e.g., 20 Hz) to compensate for a slow AM deficit (see Drullman, 2006). Conversely, if a difference in performance is observed (as was the case in these studies), it is not clear whether this is caused by a general deficit in AM processing that affects all modulation rates, a specific deficit at certain AM rates (e.g., pertaining to stress, syllable or phoneme-rate information), or a deficit in combining AM information across different temporal rates. Therefore, to assess speech AM perception in dyslexia more closely, a combination of the two approaches (from AM studies and vocoding studies) is needed. Ideally, the stimuli should be created from the envelopes of real speech, but AMs at specific modulation rates (or combinations of modulation rates) should be systematically isolated from these real envelopes. Here, we present one such study.

### **EXPERIMENTAL RATIONALE AND HYPOTHESES**

Given the prior literature on the relationship between rhythmic awareness and reading (e.g., Thomson et al., 2006; Thomson and Goswami, 2008; Goswami and Leong, 2013; Tierney and Kraus, 2013), we were specifically interested in assessing dyslexics' ability to use different AM rates in speech for *rhythm perception* (rather than speech intelligibility *per se*). Accordingly, we devised a rhythm perception task using rhythmic sentences (nursery rhymes) that had been tone-vocoded using different AM rates. For normal adult listeners, speech rhythm perception relies on sensitivity to the phase-relationship between 2 key AM rates (stress ∼2 Hz and syllable ∼4 Hz; Leong, 2012). Furthermore, in prior work on rhythmic entrainment, we have shown that children and adults with dyslexia show "tapping to the beat" impairments at 2 Hz (Thomson et al., 2006; Thomson and Goswami, 2008), while when tapping to speech rhythms adults with dyslexia show impairment at the syllable rate (∼4 Hz; Leong and Goswami, 2014). Accordingly, here we presented dyslexic and

control adult listeners with tone-vocoded (envelope-only) sentences that contained only a narrow range of AM rates under 20 Hz. In order that the modulation patterns in our stimuli would be realistically speech-like, these modulation bands did not contain only a single AM rate (i.e., a "4 Hz" sinusoid). Rather each AM band contained a narrow range of AM rates centered around a target rate (e.g., 2.3–7 Hz, centered around 4 Hz), each of which we refer to in shorthand by the center rate (e.g., here as "∼4 Hz" or "Syllable-rate AMs").

Our dependent variable was the accuracy of speech rhythm perception. We created stimuli that contained modulations from either a single narrow AM band (i.e., Stress only ∼2 Hz, Syllable only ∼4 Hz, Sub-beat only ∼14 Hz), or from *paired combinations* of AM bands (Stress + Syllable and Syllable + Sub-beat). On the basis of the temporal sampling framework (Goswami, 2011), we predicted no dyslexic impairment at the sub-beat band rate of ∼14 Hz (included as a control frequency band), but significant impairment at both rates <10 Hz (Syllable and Stress rates). On the basis of our prior data on rhythmic entrainment to speech rhythms (Leong and Goswami, 2014), we also predicted that dyslexics would have difficulty in *combining* speech information across different temporal modulation rates. As Leong's modeling work (Leong, 2012) has shown that rhythm perception depends critically on the Stress + Syllable AM combination, it may be that particular dyslexic difficulty is found for this combination.

Note that in this experiment we used the 'Sub-beat' rate (∼14 Hz) as a control AM band, not the "phoneme rate" (∼30 Hz) that is the theoretical focus of AM work by Lehongre et al. (2011, 2013). Our decision was motivated by the classic psychophysical studies of Drullman et al. (1994a,b). These studies indicated that AM rates up to 16 Hz are the most important for speech intelligibility, and that the inclusion of faster AM rates *above 16 Hz* result in little improvement to intelligibility. Furthermore, in a rhythmic context, we noticed that unstressed syllables are often compressed to a "sub-beat" length in order to fit within the standard "beat" length of one ordinary syllable. For example, in the nursery rhyme sentence "Humpty Dumpty sat on the wall," the syllables "sat" and "on" are compressed together, or reduced, to fit the space of one regular syllable like "Hum." Consequently, the overall trochaic rhythm of the sentence is not disrupted. Thus, the "Sub-beat" rate (∼14 Hz) is likely to correspond to speech modulations that are important for intelligibility, but which contribute little toward the overall rhythmic patterning of "Strong" and "weak" beats in a sentence, making this an ideal control modulation band. As the cited "phoneme" rate (∼30 Hz) commonly refers to the timescale of formant transition patterns in speech (e.g., Giraud and Poeppel, 2012), we plan to examine this rate in the context of frequency modulation (FM) perception in future studies.

## **METHODS**

### **PARTICIPANTS**

Twenty-one adults (9 M, 12 F) with developmental dyslexia and 26 control adults (7 M, 19 F) participated in the study. All dyslexic participants had received a formal diagnosis of developmental dyslexia and also showed significant reading and phonological deficits according to our own test battery. All participants had no other diagnosed auditory or learning difficulties, spoke English as a first language, and were aged under 40 years. As shown in **Table 1**, dyslexic and control participants were matched on IQ [2 subscales of the Wechsler Abbreviated Scale of Intelligence (WASI), Wechsler, 1999: A non-verbal subscale (Block Design) and a verbal subscale (Vocabulary)]. However, there was a significant age difference between dyslexic and control groups, where controls were slightly older on average [dyslexic mean age = 22.9 years; control mean age = 25.5 years; *F*(1, <sup>45</sup>) = 5.66, *p* < 0.05]. To account for this age difference, all our subsequent statistical analyses include age as a covariate. As this statistical solution is impartial, we felt that it would be preferable to manually excluding certain participants on the basis of their age, which would entail subjectivity as to how many and which participants to exclude.



*\*p* < *0.05; \*\*p* < *0.01; \*\*\*p* < *0.001.*

Consistent with their diagnosis, dyslexics performed significantly more poorly than controls in standardized tests for literacy [Wide Range Achievement Test (WRAT-III), Reading and Spelling scales, Wilkinson, 1993] and phonological awareness (Phonological Assessment Battery (PhAB), Spoonerisms task, Fredrickson et al., 1997; Weschler Adult Intelligence Scale-Revised (WAIS-R) forward digit span subtest, Wechsler, 1981). Thus, despite the relatively high IQ of both groups (reflecting the fact that these were high-performing students at a world-class university), dyslexic participants still lagged behind their peers in their reading, spelling and phonological awareness skills. Both control and dyslexic participants also took part in other studies on rhythm perception and production (see also Leong and Goswami, 2014). Ethical approval for the study was obtained from the Cambridge Psychology Research Ethics Committee, and all participants were given a modest payment for taking part in the experiments.

### **MATERIALS**

In line with our focus on rhythm, children's nursery rhymes were used as stimuli because these are a form of naturally-occurring, rhythmically-rich speech material, whose rhythm patterns (RPs) should be familiar to and easily identified by listeners. Four duplemeter nursery rhymes were used for the experiment, taking the first line of each nursery rhyme (8 syllables). The sentences fell into either of two RPs, as shown in **Table 2**. Two sentences had a "S-w" or trochaic pattern. These were "MA-ry MA-ry QUITE con-TRA-ry" and "SIM-ple SI-mon MET a PIE-man" (stressed syllables in CAPS). The other two sentences had a "w-S" or iambic pattern. These were "as I was GO-ing TO st IVES" and "the QUEEN of HEARTS she MADE some TARTS." We chose to use trochaic and iambic patterns because these are the dominant prosodic motifs found in children's nursery rhymes (Gueron, 1974), and were easily understood by our participants. A total of 4 sentences (2 per RP) were used to encourage participants to attend to the global "S-w" or "w-S" rhythm patterning that was common between the 2 exemplars of each pattern. Using two exemplars also prevented reliance on minor non-rhythmic variations (e.g., total stimulus length) to perform the task. We did not use more than 4 sentences as this would have unnecessarily increased the difficulty of the task (which was already high in difficulty). Each sentence was ∼2 s in length (Mary: 2.01 s; Simon: 2.12 s; St Ives: 2.37 s; Queen: 2.31 s). The nursery rhymes were spoken by a female native speaker of British English who was articulating in time to a 4 Hz (syllable rate) metronome beat. The speaker was instructed to produce the RP of each nursery rhyme

**Table 2 | List of nursery rhyme sentences and their rhythm pattern.**


as clearly as possible. Utterances were digitally recorded using a TASCAM digital recorder (44.1 kHz, 24-bit), and the metronome was not audible in the final recording.

### **RHYTHM PERCEPTION TASK**

In each trial, participants heard one of four tone-vocoded nursery rhyme sentences. They were asked to indicate the target sentence (one of four) by selecting an appropriate response button. Participants were told to base their judgment on the *RP* of the stimulus. Given that the vocoded sentences had a clear rhythm but were unintelligible (see Section Signal Processing Steps for Tone Vocoding), we did not expect participants' sentence identification to exceed 50% in accuracy (i.e., we expected accurate discrimination *between* trochaic vs iambic sentences, but not *within* 2 trochaic or iambic sentences). All participants were first given 20 practice trials, during which they heard the four sentences as originally spoken, without any vocoding. This enabled participants to learn the RP of each sentence, and to become familiar with the response button mapping. Subsequently, participants performed the task with tone-vocoded stimuli only. The tone-vocoded stimuli retained the temporal pattern of each nursery rhyme sentence, but were completely unintelligible. Cartoon icons representing the four response options were displayed on the computer screen throughout the experiment to help to reduce the memory load of the task. Auditory stimuli were presented diotically using Sennheiser HD580 headphones at 70 dB SPL. The experimental task was programmed in Presentation and delivered using a Lenovo ThinkPad Edge laptop.

### *Signal processing steps for tone vocoding*

AM bands were extracted from the amplitude envelope of the speech signal of each nursery rhyme sentence using two different methods. In the first method, the amplitude envelope was extracted using the Hilbert transform. This Hilbert envelope was then passed through a modulation filterbank (MFB) of band-pass filters, which effectively isolated speech AMs corresponding to the (1) "Stress" rate (0.8–2.3 Hz), (2) "Syllable" rate (2.3–7 Hz), and (3) "Sub-beat" (7–20 Hz) rate. Please see Stone and Moore (2003) for details of the spectral filterbank design, which was adapted to be used as a MFB here. It is possible that artificial modulations may be introduced into the stimuli by the MFB method, since band-pass filters can introduce modulations near the center-frequency of the filter, through "ringing." Therefore, a second AM-hierarchy extraction method was also used. This was Probabilistic Amplitude Demodulation (PAD; Turner and Sahani, 2011), and did not involve the Hilbert transform or filtering. Rather, the PAD method estimates the signal envelope using a model-based approach in which the signal is assumed to comprise the product of a positive slow envelope and a fast carrier. Bayesian statistical inference is used to invert the model, thereby identifying the envelope which best matches the data and the *a priori* assumptions (i.e., a positive-valued envelope whose mean is constant over time). This envelope extraction protocol can be run recursively at different timescales, yielding AMs at the same modulation rates as those derived from MFB filtering (Turner and Sahani, 2007; Turner, 2010). All participants heard both MFB-derived and PAD-derived vocoded stimuli in the same experiment. It was reasoned that if participants produced the same pattern of results with two methods of AM extraction that operate using very different sets of principles, the observed effects were likely to have arisen from real features in speech rather than filtering artifacts.

The MFB- and PAD-derived AMs were used to modulate a 500 Hz sine-tone carrier in a single-channel vocoder. A multichannel vocoder was not used to ensure that the sentences would be completely unintelligible. As the dependent variable in the experiment was how well participants could identify each sentence on the basis of its AM RP, all other cues to sentence identity need to be removed. Therefore, the phonetic fine structure of the signal was intentionally discarded. In addition, the AMs derived from the amplitude envelope were used to modulate the sine-tone carrier, rather than being combined back with the fine structure of the signal. To create single-AM band stimuli (e.g., Stress only), the appropriate AM band was extracted and combined with the 500 Hz sine-tone carrier. A 30 ms-ramped pedestal at channel RMS power was added prior to combining with the carrier. To create double-AM band stimuli (e.g., Stress + Syllable), the two AM bands were first combined via addition (for MFB) or multiplication (for PAD) before combining with the carrier. All stimuli were equalized to 70 dB. These signal processing steps are illustrated in **Figure 2**.

The resulting tone-vocoded sentences had clear temporal patterns ranging from "Morse-code" to flutter, but were otherwise completely unintelligible (See **Audios 1**–**5** in Supplementary Material). **Figure 3** illustrates the different types of AM-vocoded stimuli used in the experiment, contrasting trochaic ("Mary Mary") and iambic ("the Queen of Hearts") sentences.

### *Design*

As explained in Section Experimental Rationale and Hypotheses, five different AM bands or band combinations were used for vocoding. This generated 3 types of single AM band stimuli (Stress only; Syllable only; Sub-beat only) and 2 types of paired AM band stimuli (Stress + Syllable; Syllable + Sub-beat). For each AM combination, each of the 4 nursery rhyme sentences was presented 10 times (5 MFB and 5 PAD stimuli) in a fully randomized order, giving 40 trials per AM type and 200 trials in total for the entire experiment. Participants were scored in terms of their sentence identification accuracy for each AM type (Accuracy scores), and their ability to discriminate more generally between trochaic and iambic RPs (RP scores). We had previously found that control participants showed no difference in listening accuracy for MFB and PAD stimuli (Leong, 2012). In our preliminary analysis of the current data, we likewise found that there was no difference in performance for PAD as compared to MFB stimuli [*F*(1, <sup>44</sup>) = 2.74, *p* = 0.11]. Therefore, to simplify further analysis, the scores for the two types of stimuli in each condition were averaged into a single mean score for each participant.

### **RESULTS**

### **SENTENCE IDENTIFICATION ACCURACY**

**Figure 4** shows the mean Accuracy scores achieved by the control and dyslexic groups for each AM type. To check for floor effects in performance (which could obscure group differences), we

assessed whether participants' scores for each AM type were significantly above the level of chance (25%). Accordingly, separate one-sample *t*-tests were conducted for control and dyslexic groups against the test value of 0.25. As this necessitated 10 *t*-tests in total, Holm's sequential Bonferroni correction was applied to the *p*-value threshold for significance (Holm, 1979). Holm's sequential Bonferroni correction entails a smaller reduction in statistical power than the standard Bonferroni correction, and is a widely-used alternative for controlling for Type 1 familywise error (Rice, 1989; Perneger, 1998). In the Holm-Bonferroni method, the threshold for significance is computed as 0.05/(10- [rank of uncorrected *p*-value] +1). Therefore, for the smallest (rank 1) *p*-value, the Holm Bonferroni-corrected threshold for significance was 0.05/(10 − 1 + 1) = 0.005, whereas for the largest (rank 10) *p*-value, the threshold for significance was 0.05/(10 − 10 + 1) = 0.05. The results of the *t*-tests indicated that both controls and dyslexics performed significantly above chance for all 5 AM types. Accordingly, we investigated whether there were group differences across the 5 AM types.

Two repeated measures ANCOVA analyses were conducted. In the first analysis, we compared group performance for the 3 *single* AM bands (Stress only, Syllable only, Sub-beat only). Single AM band (3 levels) was entered into the ANCOVA as the within-subjects factor, and Group (2 levels) was entered as the between subjects factor. Age was entered as a covariate factor. The results of the first ANCOVA showed *no* significant main effect of Group [*F*(1, <sup>44</sup>) = 0.14, *p* = 0.71], and no interaction between single AM band and Group [*F*(2, <sup>88</sup>) = 0.37, *p* = 0.69]. This suggests that controls and dyslexics were performing equally well in their use of single AM-band information for rhythm perception.

In the second RM ANCOVA analysis, we investigated group differences in the ability to combine information across more than one AM band. The second ANCOVA entered double-AM band (2 levels, Stress + Syllable, Syllable + Sub-beat) as the within-subjects factor, and Group (2 levels) as the between subjects factor. Age was again entered as a covariate factor. This second ANCOVA showed a significant main effect of Group [*F*(1, <sup>44</sup>) = 4.51, *p* < 0.05], but the interaction between AM band and Group did not approach significance [*F*(1, <sup>44</sup>) = 0.19, *p* = 0.66]. Therefore, our dyslexic participants were worse at combining AM information across different rates, as they were significantly less accurate than control participants. For combined AM bands, the dyslexic participants were significantly poorer at combining the Syllable-rate AM with other AMs at the Stress rate or the Sub-beat rate.

### **RHYTHM PATTERN DISCRIMINATION**

Next, we wanted to ascertain whether participants were able to use these speech AMs to discriminate between the two major

RPs that characterized the 4 nursery rhyme sentences [i.e., trochaic ("S-w") vs. iambic ("w-S")]. Accordingly, we re-scored participants responses according to whether they had correctly identified the *RP* of each sentence as trochaic or iambic, disregarding whether they had identified the actual sentence correctly (i.e., for the stimulus sentence "Mary Mary," responses of "Mary Mary" and "Simple Simon" were both scored as the correct RP, as both were trochaic responses). The resulting mean RP scores for iambic sentences (Ives, Queen) and trochaic sentences (Mary, Simon) are shown in **Figure 5**. To check for floor effects in performance (which could obscure group differences), we assessed whether participants' scores for each AM type were significantly above the level of chance (50%). Accordingly, separate one-sample *t*-tests were conducted for control and dyslexic groups against the test value of 0.5. As this necessitated 20 *t*tests in total, Holm's sequential Bonferroni correction was applied to the *p*-value threshold for significance (Holm, 1979). For the smallest (rank 1) *p*-value, the Holm Bonferroni-corrected threshold for significance was 0.05/(20 − 1 + 1) = 0.0025, whereas for the largest (rank 10) *p*-value, the threshold for significance was 0.05/(20 − 20 + 1) = 0.05.

As shown in **Figure 5** (∗), controls and dyslexics always performed significantly above chance when making a binary discrimination of the rhythm of trochaic (T) sentences (with the exception of controls in the Sub-beat AM condition). By contrast, for iambic (I) sentences, dyslexics *never* performed above chance in binary rhythm discrimination, whereas controls performed significantly above chance when listening to Stress-only, and Stress + Syllable AM types. Given the presence of clear floor effects for binary rhythm discrimination of iambic sentences, we were unfortunately unable to draw further conclusions regarding group differences for these sentence types (as both controls and dyslexics were performing at chance in many conditions). However, both groups had performed significantly above chance for trochaic sentences when listening to Stress only AMs, Syllable only AMs, Stress + Syllable AMs and Syllable + Sub-beat AMs. According, we performed repeated measures ANCOVAs on these RP scores for trochaic sentences only.

In the first ANCOVA analysis, we compared group performance for the 2 single AM bands only, taking single AM band (2 levels) as the within-subjects factor, Group (2 levels) as the between subjects factor, and Age as the covariate. Consistent with the previous Accuracy analysis, there was *no* significant main effect of Group [*F*(1, <sup>44</sup>) = 0.16, *p* = 0.69], and no interaction between single AM band and Group [*F*(1, <sup>44</sup>) = 0.11, *p* = 0.75]. This suggests that controls and dyslexics did not differ in their ability to use Stress only and Syllable only AM band information to make trochaic-iambic distinctions. We then analyzed double-AM band performance in a similar fashion. This time double-AM band (2 levels, Stress + Syllable, Syllable + Sub-beat) was the within-subjects factor, Group (2 levels) was the between subjects factor, and Age was the covariate. Unlike the Accuracy analysis, the ANCOVA showed no significant main effect of Group [*F*(1, <sup>44</sup>) = 1.90, *p* = 0.17]. There was also no interaction between

double-AM band and Group [*F*(1, <sup>44</sup>) = 0.17, *p* = 0.68]. Hence dyslexic participants appeared to recognize trochaic RPs based on pairs of AM as well as controls.

These results should be interpreted with caution, however. Firstly, only performance for trochaic sentences could be analyzed meaningfully (meaning that half the total dataset could not be analyzed). Secondly, the RP scores computed here reflect participants' rhythm discrimination *indirectly* rather than directly. The RP scores measure the *perceptual confusability* of sentences (i.e., how participants make guesses when they are unsure of the correct sentence identity). Perceptual confusability will depend in large part on the global RPs of the stimuli, but will also include other factors like total duration and perceptual grouping effects, as well as participants' own cognitive strategies. Nevertheless, the data show that perceptual confusability was maximal for trochaic sentences, for both groups.

### **CORRELATIONS BETWEEN AM PERCEPTION, PHONOLOGY, AND LITERACY**

By hypothesis, a perceptual deficit in using AM patterns to discriminate rhythmic sentences should be related to both phonological awareness and reading skills in our participants. Accordingly, we investigated the relationship between participants' sentence identification Accuracy for each AM band or combination, and their performance on memory, reading and phonological tasks. **Table 3** shows the partial correlation matrix between accuracy of performance in the rhythm perception task (by AM type) and participants' memory, reading, and


**Table 3 | Pearson's r partial correlation values between accuracy of performance in rhythm perception (by AM type), and general ability, literacy and phonology measures.**

*For each cell, correlations over both groups are shown on the top left, correlations for controls only are shown on the middle right, and correlations for dyslexics only are shown on the bottom right. Age and IQ are controlled in all the correlations.*

*\*p < 0.05; \*\*p < 0.01;* \$*<sup>p</sup>* <sup>=</sup> *0.07;* &*<sup>p</sup>* <sup>=</sup> *0.074;* <sup>∧</sup>*<sup>p</sup>* <sup>=</sup> *0.096.*

phonological ability, with age and IQ controlled. Correlations were performed with both groups combined, as well as separately. As shown in **Table 3**, there were several significant relationships between AM performance, literacy and phonology. Taking the group as a whole, the conceptually important Stress + Syllable speech AMs were significantly related to phonological awareness (*r* = 0.40, *p* < 0.01), as well as to auditory short-term memory (digit span, *r* = 0.35, *p* < 0.05). Performance with the Syllable + Sub-beat level was also significantly associated with spelling performance, which was not predicted (*r* = 0.32, *p* < 0.05). When considering the dyslexic group alone, the table shows that dyslexics' phonological awareness was significantly related to their sensitivity to Stress + Syllable speech AMs (*r* = 0.52, *p* < 0.01), while the relationship between Syllable AM performance and phonological awareness approached significance (*r* = 0.42, *p* = 0.074). Further, spelling skills were significantly related to Sub-beat AM sensitivity (*r* = 0.48, *p* < 0.05). Dyslexics also showed a significant relationship between their auditory short-term memory skills and their performance in the two combined AM conditions (*r* = 0.52, *p* < 0.05 for Stress + Syllable; *r* = 0.55, *p* < 0.05 for Syllable + Sub-beat). This may indicate that dyslexics' ability to use multiple patterns of temporal information to recognize speech rhythm in our experimental paradigm was constrained by their lower short-term memory capacity in comparison to controls. When considered as a group, controls showed no significant relationships between performance in the AM RP recognition task, phonology and reading, although there was a trend toward a correlation between Sub-beat AM sensitivity and spelling (*r* = 0.38, *p* = 0.07). Overall, therefore, the partial correlations show that the perceptual deficit in using AM patterns to detect speech

rhythm was related to phonological awareness for the dyslexic participants only.

### **DISCUSSION AND CONCLUSION**

Here, we tested the hypothesis that perceptual difficulties in processing the AM patterns in speech that yield speech rhythm are associated with the development of impaired phonological representations for words by dyslexic individuals. The development of impaired phonological representations of speech is the cognitive hallmark of dyslexia across languages (Snowling, 2000; Ziegler and Goswami, 2005; Goswami, 2011). We tested the sensitivity of adults with dyslexia to AM patterning yielding speech rhythm for several different AM bands and band combinations below 20 Hz that are present within the amplitude envelope of speech. We found that dyslexic participants performed significantly more poorly than control adults when they were required to combine Syllable-rate AMs with AMs at other rates (Stress + Syllable or Syllable + Sub-beat).However, the dyslexic participants performed on par with controls when asked to utilize the temporal information at a single AM rate only (Stress only, Syllable only, or Sub-beat only). Accordingly, we conclude that dyslexics' difficulties with AM perception appear to occur across *more than one* speech timescale (particularly involving the Syllable rate). Moreover, as predicted by the temporal sampling framework, a perceptual deficit in utilizing AM patterns in speech is related to phonological development in dyslexia.

A deficit in Syllable-rate *combination* or *synchronization* with other rates would support the findings of Leong and Goswami (2014), in which the same group of adult dyslexics tested here showed differences in their *phase* of rhythmic entrainment at the Syllable rate in a rhythmic tapping task to nursery rhyme targets. A difference in Syllable *phase* of entrainment suggests that dyslexics have temporal differences in their processing of Syllable-rate information (e.g., they may perceive P-centers as occurring earlier in a speech sound as compared to controls). Here, participants with dyslexia were significantly poorer at recognizing the target nursery rhymes when they had to combine Syllable AM cues with prosodic stress AM cues (Stress + Syllable).

In fact, a circular-linear correlation analysis of the two datasets (Leong and Goswami, 2014 and the current study) revealed that there was a strong correlation between participants' Syllable AM phase of tapping in the entrainment task based on rhythmic tapping, and their sensitivity to Stress + Syllable AMs in the current task (*r* = 0.55, *p* < 0.01). An earlier Syllable AM phase of rhythmic tapping in Leong and Goswami (2014) was associated with poorer perception of Stress+Syllable AMs in the current study. No other AM band in the current study yielded significant correlations with tapping phase in the prior study. Others have argued that the perception and production of rhythm both rely on similar cognitive and neural mechanisms, such as the entrainment of neuronal oscillatory activity (Martin, 1972; Liberman and Mattingly, 1985; Kotz and Schwartze, 2010). In the current context, it is note-worthy that the common locus of dyslexic deficit across perception and production tasks involved the Syllable-rate of temporal processing.

Utilizing younger participants, Power et al. (2013) have shown in a rhythmic speech processing task that children with dyslexia also have a different preferred phase of entrainment in the *delta* band (2 Hz), both in response to auditory speech alone, and when speech information is audio-visual. The 'temporal misalignment' of both stress- and syllable-rate information in dyslexia found by Power et al. (2013) and the current study could explain why individuals with dyslexia develop phonological representations for words that are impaired (or specified differently) in comparison to those of unaffected individuals. If temporal processing of slower-rate information in speech is impaired, for example because oscillatory phase alignment is inaccurate, then this would affect the development of the entire mental lexicon of word forms, not simply of syllable-level and prosodic information. If syllable stress representation and syllabic parsing is different in dyslexia because of a perceptual deficit in utilizing AM patterns in speech, this would also affect phonetic-level information. Phonemes are perceived more accurately when they are in stressed syllables (Mehta and Cutler, 1988). Over the course of development, if dyslexic children consistently fail to capture rich, high-dimensional representations of the temporal patterns that occur on multiple timescales in speech (e.g., concurrently encoding Stress patterns, Syllable patterns and Phoneme patterns into an integrated representation of a word), this would yield the impoverished or atypical phonological representations that are developed by children with dyslexia across languages.

At first glance, our data appear to be inconsistent with the results of previous AM perception studies as summarized in the Introduction. These non-speech studies generally indicated that individuals with dyslexia had poorer AM perception at the 4 Hz rate (Syllable AM). Here, we find no differences in performance between controls and dyslexics when making rhythm judgments on the basis of the Syllable AM (4 Hz) only. However, it should be noted that the dependent variable being assessed in the current study is different from that of psychophysical AM studies. Whereas AM studies assess modulation detection *thresholds* based on just noticeable differences in modulation depth or rate (e.g., Lorenzi et al., 2000; Rocheron et al., 2002), here we assess nursery rhyme recognition using real-life speech AMs that contain strong (and likely supra-threshold) modulation patterns. As such, it is not surprising that no group differences were observed for our single AM rate stimuli. It is possible that significant group differences could have been observed at single AM rates if we had used sentences with weaker modulation patterns, such as whispered or mumbled speech. However, we *did* observe a significant difference in dyslexics' ability to *combine or integrate* speech modulation patterns across the Stress and Syllable rates, which is consistent with dyslexics' poorer speech perception performance in vocoder studies (e.g., Lorenzi et al., 2000; Johnson et al., 2011; Nittrouer and Lowenstein, 2013). This difference cannot be attributed to a general lack of attention or engagement by dyslexic participants, since they performed as well as controls with the single AM band stimuli. Rather, dyslexics appear to have a particular difficulty in making use of modulation information that is patterned at more than one timescale, here when Syllable-rate information has to be temporally synchronized with Stress-rate speech information or Sub-beat information. However, as we did not include paired AM combinations that did *not* involve the Syllable AM rate (e.g., Stress + Phoneme), we are not able to determine whether this difficulty is specific to Syllable AM combinations only, or whether it would also occur for other combinations of speech AMs.

It should also be observed that our participants found the rhythm judgment task very difficult. This high level of difficulty stemmed from the fact that the sentences were (deliberately) unintelligible, forcing our participants to rely solely on the acoustic modulations in the stimuli to perform rhythm judgments, without recourse to lexical factors. Consequently, accuracy scores for both controls and dyslexics (although significantly above chance) were relatively low (below 50%). In future studies, the issue of task difficulty may be ameliorated by using a tonevocoder with more than 1 spectral channel (i.e., 3 or 4 channels), which would have the effect of increasing speech intelligibility. However, increasing the intelligibility of the stimuli would also introduce a new confound: participants would now be able to use their lexical knowledge to augment their perceptual judgments of speech rhythm. Nonetheless, this trade-off might produce stronger effects. Lexical "boot-strapping" effects could be reduced by using semantically unpredictable sentences (following Johnson et al., 2011).

According to the temporal sampling framework (Goswami, 2011), the combination impairment for Stress + Syllable rate AMs found here should affect speech perception even when listening to *clear (i.e., fully intelligible) speech,* which has strong modulation patterns that are above the threshold for detection. Interestingly, this was exactly what Lorenzi et al. (2000) found in their study. They reported that dyslexic children performed significantly more poorly than adults and control children even when listening to clear, unprocessed (not-vocoded) VCV syllables (these syllables will contain significant Syllable-rate modulation, but not Stress-rate modulation). This controversial result might possibly be explained by other factors like memory or attention, nonetheless data like these suggest that speech AM perception in dyslexia clearly requires more investigation. Current data suggest that individuals with dyslexia are less sensitive to small changes in modulation depth and rate, particularly around the syllable and stress rates in speech. Future studies should explore how dyslexics' difficulties with processing slow modulations affects their ability to integrate and synchronize slow-varying stress and syllable information with more quickly-varying phoneme-rate information in speech. These perceptual difficulties could be one source of the impaired or atypical phonological representations stored in the mental lexicon of word forms by dyslexic individuals.

Finally, we note that, given recent proposals by Poeppel and colleagues regarding neural oscillatory phase-locking to speech modulation patterns (e.g., Ghitza, 2011; Giraud and Poeppel, 2012), the perceptual difficulties that we observe here could be underpinned by impaired phase alignment and cross-frequency phase synchronization between different neuronal oscillatory rates. For example, dyslexics could have poorer neuronal oscillatory synchronization between theta oscillations (syllable rate) and delta (stress rate) or gamma (phoneme rate) oscillations in the cortex. Similarly, the neural interplay between theta (syllable rate) and alpha (8–13 Hz, similar to the sub-beat rate here) oscillations during speech comprehension might be atypical in dyslexia as well (Obleser and Weisz, 2012). To date, such *cross-frequency neural synchronization* has not been studied in dyslexia (although see Leong and Goswami, 2014, for an assessment of cross-frequency *AM* synchronization in dyslexics' speech). Such studies could be very informative in the quest to identify cross-linguistic perceptual and neural deficits underpinning cognitive markers such as impaired phonology in developmental dyslexia.

## **ACKNOWLEDGMENTS**

This research was funded by a Harold Hyam Wingate Research Scholarship to Victoria Leong and by a Medical Research Council grant G0902375 to Usha Goswami.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00096/abstract

**Audio 1 | Stress-only AM (MFB, Trochaic).**

**Audio 2 | Syllable-only AM (MFB, Trochaic).**

**Audio 3 | Sub-beat only AM (MFB, Trochaic).**

**Audio 4 | Stress** + **Syllable AM (MFB, Trochaic).**

**Audio 5 | Syllable** + **Sub-beat AM (MFB, Trochaic).**

### **REFERENCES**



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 December 2013; accepted: 08 February 2014; published online: 24 February 2014.*

*Citation: Leong V and Goswami U (2014) Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia. Front. Hum. Neurosci. 8:96. doi: 10.3389/fnhum.2014.00096*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Leong and Goswami. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

Snowling, M. J. (2000). *Dyslexia*, *2nd Edn.* Oxford: Blackwell Publishers.

## The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia

#### *Jeremy M. Law1 \*, Maaike Vandermosten1,2, Pol Ghesquiere1 and Jan Wouters <sup>2</sup>*

*<sup>1</sup> Faculty of Psychology and Educational Sciences, Parenting and Special Education Research Unit, KU Leuven, Leuven, Belgium <sup>2</sup> Laboratory for Experimental ORL, Department of Neuroscience, KU Leuven, Leuven, Belgium*

### *Edited by:*

*Peter F. De Jong, University of Amsterdam, Netherlands*

### *Reviewed by:*

*Elise De Bree, University of Amsterdam UvA, Netherlands Anne Castles, Macquarie University, Australia*

### *\*Correspondence:*

*Jeremy M. Law, Faculty of Psychology and Educational Sciences, Parenting and Special Education Research Unit, KU Leuven, Leopold Vanderkelenstraat 32, PO Box 3765, 3000 Leuven, Belgium e-mail: jeremy.law@ppw. kuleuven.be*

This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM) and an amplitude rise time (RT); an intensity discrimination task (ID) was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words-in-noise tasks. Group analyses revealed significant group differences in auditory tasks (i.e., RT and ID) and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech-in-noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.

**Keywords: dyslexia, literacy, phonological processing, speech perception, auditory processing, amplitude rise time, frequency modulation**

### **INTRODUCTION**

Dyslexia is a neurological condition affecting 5–10% of the population. This specific learning disability impacts an individual's ability in learning to read and write despite adequate intelligence, education, and remediation (Vellutino et al., 2004). It has been well established in the literature that the major causes of the expressed literacy problems lay within a deficit in the phonological domain, specifically in the quality and accuracy of phonological representations (Snowling, 2000). In this paper the auditory temporal processing deficit theory of dyslexia, and its cascading effects on speech and phonological processing will be examined. To this end, measures of slow-rate modulation, and speech perception will be assessed along with phonological and literacy measures in a population of university level dyslexic and non-dyslexic adult readers.

A vital part in the development of phonological representations is the awareness of how speech sounds correspond to a written symbol. Findings of the past few decades have begun to suggest the existence of an underlying deficit in low-level auditory temporal processing within the dyslexic population (Farmer and Klein, 1995; Habib, 2000; Boets et al., 2006). Thus, if dyslexic readers perceive speech or related auditory cues inaccurately, the mapping of speech sounds onto their corresponding symbols will be problematic.

Beginning with Tallal's (1980) study of temporal order judgment of children with specific language impairments, research has explored the idea that the primary deficit of dyslexics could lay in deviant auditory processing skills. Early research related the interpretation of "temporal processing" restrictively to rapid succession or short durational cues (e.g., Tallal, 1980). However, recent studies have demonstrated that the deficits observed in dyslexic readers are not merely limited to the processing of short, rapidly presented stimuli, but also to slow-rate dynamic acoustic stimuli such as frequency modulations (FMs) and sound rise time discrimination (RT). Such a deficit has been theorized to produce a cascade ultimately disrupting an individual's reading and spelling abilities. If an individual were to be affected by poor auditory processing of slow-rate modulations (between 2 and 20 Hz), it would be expected that speech perception would ultimately be affected, since the identification of phonemes and syllables depends on changes in the amplitude that occur respectively around 50 ms (i.e., 20 Hz) to 500 ms (i.e., 2 Hz). Such speech perception difficulties could impact the segmentation of aspects of the speech signal into smaller elements, thus hampering the development of phonological representations and ultimately disrupting the creation of accurate mapping schemes between speech sound and corresponding graphemes (Poelmans et al., 2011). Ultimately, these poor phoneme-grapheme representations will be expressed as poor coding and decoding abilities impacting word reading and spelling.

Slow-rate auditory modulations can be assessed by two different tasks, FM and rise time (RT) detection task. FM detection assesses the individual's ability to detect frequency fluctuations in a carrier frequency at a certain modulation rate. Such FMs could be said to represent the fine structure found within the envelopes of the speech waveform (Rosen, 1992). Research on FM detection of dyslexics and controls have found significant group differences, where dyslexics have been shown to have a reduced sensitivity compared to controls, thus demonstrating FM task's ability to differentiate between adult, school aged, and pre-reading dyslexics from normal readers (Witton et al., 1998, 2002; Ramus et al., 2003; Boets et al., 2007). Yet, of the 12 papers examining FM perception in a review study by Hämäläinen et al. (2013), three of the studies were not able to replicate these group differences (Halliday and Bishop, 2006; Stoodley et al., 2006; White et al., 2006).

In addition to findings of group differences, a study by Witton et al. (1998) found phonological decoding skills of both dyslexics and controls to be significantly correlated with FM sensitivity of 2 and 40 Hz. The review paper by Hämäläinen et al. (2013) noted 8 separate studies that reported correlations between FM detection thresholds and reading and/or spelling skills. Yet, 3 studies were unable to replicate these results (Van Ingelghem et al., 2005; Heath et al., 2006; Dawes et al., 2009).

An alternative measure of auditory processing that taps into aspects of slow-rate dynamic processing mechanisms and that has been indicated to be a sensitive measure in discriminating between populations of dyslexic and normal readers is rise time discrimination (RT). Rise time, in comparison with FM tasks, measures the larger grain size of the speech waveform, which focuses specifically on the speech envelope (Rosen, 1992). Specifically, the RT task accesses an individual's ability to detect subtle differences in the rate of change of an amplitude envelope. The perceptions of such cues are utilized in the segmentation of the speech signal into its base parts, such as syllables or onsets and rhymes, which is necessary for speech perception (Goswami et al., 2010). Detection of such cues has been shown to be significantly associated with reading, writing and phonological skills in an adult population (Hämäläinen et al., 2005). Goswami et al. (2002) demonstrated that 25% of unique variance in reading and spelling in children could be predicted by individual differences in rise time sensitivity, with IQ and age being controlled for. Findings demonstrating RT's relation to reading have also remained consistent across different orthographies (Goswami et al., 2011). When comparing persons with dyslexia to typical readers, child studies have demonstrated consistent group differences in RT perception across various measurement techniques (for a review see Hämäläinen et al., 2013; note the exception of Hämäläinen et al., 2009). On the other hand, adult studies have not been so clear. Despite some adult studies showing significant poorer performance on RT tasks in adults with dyslexia (Hämäläinen et al., 2005; Thomson et al., 2006; Corriveau et al., 2007), findings vary between the different measurement techniques employed (see Thomson et al., 2006; Pasquini et al., 2007). Traditionally, pure tone carrier signals are modulated in RT-tasks, but this lacks important frequencies of real speech. Hence, they do not activate a broader frequency region in the auditory system compared to speech weighted noise signals. In an effort to mimic the demand of real speech within the RT detection measure, Poelmans et al. (2011) utilized a single ramp rise time discrimination task that consists of a speech-weighted noise with a linear amplitude rise time. They showed that the application of a speech weighted noise signal resulted in reliable performance in children and did not produce any ceiling or floor effects, which differed from pilot studies of pure tone carrier signals.

However, not all auditory processing aspects seem to be impaired in dyslexic readers. In contrast to slow-rate dynamic auditory processing (RT, FM), intensity discrimination (ID) does not display group differences between typical and dyslexic readers (for a review see Hämäläinen et al., 2013). This suggests that related task demands, attention and cognitive aspects are not the driving factor of the observed auditory problems since they are equal across RT, FM, and ID tasks. In addition, as the RT measure includes changes of intensity over time, the lack of group differences on the ID tasks suggests that a poorer performance on the RT-task is not a reflection of difficulties in ID ability but rather of the changes in intensity.

An understanding of slow-rate dynamic modulations such as RT and FM is important due to their prevalence in the speech signal, appearing at various grain sizes of phonological information ranging from intonation, onset and rhyme to the phoneme. If an individual has a deficit in processing these modulations, it is believed that it would be expressed in their ability to perceive speech.

Most often speech sound processing of dyslexics is assessed through the use of a categorical perception measure. Studies utilizing categorical perception tasks have demonstrated that subjects with dyslexia possess a reduced capacity for perception and categorization of phonemes (for a review see Vandermosten et al., 2010, 2011). However, results from such tasks are often restricted to a subset of the dyslexic population sampled (Manis et al., 1997; Adlard and Hazan, 1998) or to a specific speech condition or task (Maassen et al., 2001; Blomert and Mitterer, 2004). Typically, categorical perception tasks utilize optimal listening conditions. Such conditions allow for compensation of specific deficits in phoneme identification (Manis et al., 1997; Assmann and Summerfield, 2004; Ziegler et al., 2009). Although speech-in-noise tasks are influenced by higher-order cognitive processes such as lexical and phonotactic knowledge, they provide a more ecological and natural measure of speech sound processing than categorical perception. By presenting speech stimuli in the presence of a masking noise, a participant's ability to identify and comprehend real speech sounds under varying noisemasking scenarios is assessed. The ability to identify speech-innoise requires the individual to separate out the background noise from the target speech signal. This isolation allows for the individual to produce precise representations of the rapidly evolving spectral information. It has been shown that, although all listeners demonstrate some reduced capacity for perception under noisy background conditions, dyslexic children (Snowling et al., 1986; Wible et al., 2002; Bradlow et al., 2003; Ziegler et al., 2005, 2009; Boets et al., 2011) and dyslexic adults (Dole et al., 2012) exhibit pronounced difficulty with this task while often not demonstrating any impairment of speech perception in silent conditions (Brady et al., 1983; Bradlow et al., 2003). Yet, Hazan et al. (2009) were not able to replicate these findings in an adult population.

Although studies have demonstrated deficits independently in the slow-rate dynamic processing and speech-in-noise perception in individuals with dyslexia, only two studies have assessed both of these measures of signal processing in the same population (Boets et al., 2011; Poelmans et al., 2011). Boets et al. retrospectively explored this relationship in a population of preschool children who later developed dyslexia and showed that these children were already impaired in slow-rate FM sensitivity and speech perception prior to reading instruction. These pre-reading measures were also found to relate to each other and uniquely predicted later growth in reading. A more recent study by Poelmans et al. (2011), which followed up the same students of Boets, in 6thgrade children showed no clear evidence supporting relations between slow-rate dynamic auditory processing and speech perception itself. Given that this correlation was present at an earlier age (Boets et al., 2011), this might suggest that the link between auditory and speech perception skills is disappearing through development. However, more validation in adult participants is needed.

Although studies such as that of Boets and colleagues have found support for the auditory temporal processing deficit theory of dyslexia, the theory is not without its controversy. Criticism has arisen from the heterogeneity of the found deficits. It has been suggested that differences between group means are a reflection of a small number of poor performing dyslexic subjects. Ramus et al. (2003) examined an adult population and noted that auditory deficits were limited to only 39% of the subjects with dyslexia and that auditory processing had only a weak correlation with phonology and reading. Other criticisms have suggested that general difficulties with task completion might underlie the poor performance of subjects with dyslexia in psychophysical studies and lead researchers to misinterpret non-sensory difficulties as sensory ones (Stuart et al., 2001; Roach et al., 2004).

Our study will investigate the different levels of processing skills (i.e., auditory, speech-in-noise perception, and phonological processing) in one and the same sample of dyslexic and normal reading adults. So far, such an integrative approach has not been applied to adults, despite being vital to understand the interrelations between auditory processing, speech perception, phonological processing, and reading (problems). Furthermore, in contrast to previous studies, our study will not only investigate the interrelation between these skills and compare performance between groups, but we will also examine the individual level deviance scores.

Given that dyslexia is a disability measured and defined as deviant performance, research should reflect this by demonstrating a substantial number of individuals whose performance significantly differs from normal performance (Ramus et al., 2003; Heath et al., 2006; Ziegler et al., 2008; Hazan et al., 2009). As noted in Hazan et al. (2009) group comparisons could potentially mask significant individual differences or highlight differences which may not essentially be deviant, hence it is not sufficient in dyslexia research to merely demonstrate significant group differences without investigating the individual deviance scores. In addition, according to the auditory deficit theory, dyslexic readers should show consistent deficiencies across each level of processing, otherwise phonological impairments are presumably not secondary to speech and lower-level auditory problems.

Given that performance in adults is more prone to compensational mechanisms, the slow-rate dynamic tasks (FM and RT) will be assessed together with a control measure for attention and task complexity (ID). Although the inclusion of such well-matched control task helps in distinguishing effects of task demands from true effects, so far no study has included them as a control within all levels of statistical analyses. A few studies have included a control variable for attention and task related demands in group matching (Hämäläinen et al., 2005; Thomson et al., 2006; Pasquini et al., 2007), yet this does not prevent individual variation in groups exhibiting a significant role in relationships between psychophysical, phonological, and literacy measures.

In sum, this study will address three main questions: (i) Do adults with dyslexia demonstrate deficits in auditory processing, speech perception, and phonological abilities at the group level and at the individual level? (ii) Does a close relationship exist between the auditory processing, speech perception, and phonological skills or do they rather contribute independently to reading skills? (iii) Based on individual deviance analyses, do the same participants display deviant scores across the three skills (i.e., auditory processing, speech perception, and phonological processing)?

To achieve this, auditory processing skills will be assessed by two slow-rate modulation tasks, i.e., RT and FM, and by a control task, i.e., ID. Speech perception will be assessed by a word and sentences in noise task. Lastly, phonological processing will be accessed through the classical threefold of phonological awareness (PA), verbal short-term memory (VSTM), and rapid automatic naming (RAN) tasks.

## **MATERIALS AND METHODS**

## **PARTICIPANTS**

A total number of 90 undergraduate students were recruited for this study, 54 (36 female and 18 male) non-dyslexic and 36 (26 female and 10 male) participants with dyslexia. In order to participate, the dyslexic students needed to have a diagnosis completed by a registered and qualified clinical psychologist in secondary school or earlier and had to be registered at the office of Student Development & Services. The fact that the adults with dyslexia were selected from a university population, a higher level of reading achievement is expected than in a general sample of individuals of the same age, due to the selectivity of universities. This is reflected in some dyslexic student's normal reading and spelling scores as seen in **Table 1**. Based on their higher than expected literacy scores these participants may be considered as "compensated" dyslexics. Research has shown that strengths in cognitive abilities, such as the use of contextual cues

### **Table 1 | Participant characteristics.**


*All p-values are Bonferroni adjusted for multiple comparisons. APM, Raven advanced progressive matrices; WRAT-III, Wide Range Achievement Test III. aScores are standardized (M* <sup>=</sup> *100, SD* <sup>=</sup> *15).*

(Frith and Snowling, 1983; Nation and Snowling, 1998), semantic knowledge (Snowling et al., 2000), visual memory (Campbell and Butterworth, 1985), and morphological knowledge (Elbro and Arnbak, 1996) help this group of individuals with dyslexia to minimize the expression of their reading difficulties.

The non-dyslexic population were comprised of students who have no documentation or history of reading difficulty and whose word reading scores did not fall in the bottom 5% of the WRAT norms (Wilkinson, 1993). Recruitment of the dyslexic population for the study was made through the University's Student Services, while the control population was gathered based on class announcements and posters placed throughout each campus.

All participants were at least 18 years of age and attended one of three universities in Ontario, Canada. All participants were native English speakers without a history of brain damage, language problems, psychiatric symptoms or visual problems which could not be corrected for by a corrective lens. Additionally all participants had adequate audiometric pure-tone hearing thresholds for the test ear (i.e., 25 dB HL or less on 0.25–8.0 kHz) and adequate non-verbal IQ defined by a standard score greater than 85 on Raven's advanced progressive matrices. **Table 1** shows participant characteristics for the two groups. Groups did not differ in age, gender, and non-verbal IQ.

### **TASKS**

### *Literacy*

Literacy was assessed by the WRAT-III reading and spelling subtests (Wilkinson, 1993). The reading subtest required the subject to read aloud a list of 42 words. The subject received a single point for each correctly pronounced word to a maximum score of 42. The spelling subtest required the subject to accurately spell a series of dictated words. The words were presented orally by the test administrator preceding and following a sentence containing the target word. The test was scored by giving one point for each correctly spelled word to a maximum score of 40 points.

### *Phonological skills*

Each domain of one's phonological skills, as represented in Wagner and Torgesen (1987), was individually tested.

*Phonological awareness* (PA) was assessed through the use of the Spoonerism subtest from the Phonological Assessment Battery (PhAB) (Frederickson et al., 1997). Spoonerism tasks have been demonstrated to be able to significantly differentiate between an adult dyslexic population and control groups (Ramus et al., 2003). This test of PA targeted onset-rhyme awareness and requires phoneme manipulation and deletion. This task involved two parts. The first required the participant to replace the first sound of a word with a new sound (e.g., cot with a /g/ gives "got"). In part two, word pairs were orally presented to the participant; in turn they were requested to transpose the onset of the sounds of the two words. For example, "plane crash" will become "crane plash" or "King John" becomes "Jing Kon." Rate scores, measured in number of correct items per second, were calculated as the total correct responses divided by the total time to complete the task. Due to ceiling level being reached within the control group accuracy was not separately evaluated.

*Verbal short-term memory* was assessed by The Number Repetition (digit span forward) subtest from The Clinical Evaluation of Language Fundamentals 4th edition (CELF-4) (Semel et al., 2003). Digit span forward required the immediate serial recall of an orally presented series of digits. List length was incrementally increased from two to nine digits and presented orally at a rate of one digit per second. The test score was calculated as the total number of correctly recalled lists with a maximum score of 16.

Verbal short-term memory was also assessed by the non-word recall subtest from the Working Memory Test Battery (WMTB) (Pickering and Gathercole, 2001). For this task sequences of single syllable non-sense words were presented orally to the participants. Each participant was requested to repeat the sequence in the correct order. The list length was incrementally increased, from one to six words in length. Six trials were available for presentation at each list length. The task was discontinued when three errors were made in a given list length. The test score was calculated as the total number of correctly recalled lists with a maximum score of 36.

*Rapid Automatic Naming* (RAN) was assessed through two naming tasks. A color-naming test adapted from Boets et al. (2006) was selected. Five colors (black, yellow, red, green, and blue) were presented in 5 rows containing 10 color stimuli each. In addition, the object-naming subtest from The Phonological Assessment Battery (PhAB) (Frederickson et al., 1997) was used. Five line drawings of common objects (desk, ball, door, hat, box) were presented in 5 rows each containing 10 items. For both tasks participants were instructed to name aloud each of the objects or colors as quickly and as accurately as possible. A score of the number of symbols named per second was calculated.

### *Auditory processing and speech perception experimental setup*

All tasks were conducted on campus and were administered individually in a private room, with minimal background noise and distraction. All auditory and speech perception tasks were performed on a Dell Latitude D510 and controlled by APEX software (Laneau et al., 2005; Francart et al., 2008). Speech perception and auditory processing stimuli were presented through Sennheiser HDA 200 headphones to the right ear. Auditory processing procedure and tasks were adapted from those used and described by Poelmans et al. (2011).

### *Auditory processing tasks*

All auditory processing task thresholds were estimated by means of a one-up, two-down adaptive staircase procedure which is designed to target a threshold corresponding to 70.7% correct responses (Levitt, 1971). Tasks were presented within a threealternative forced-choice, "odd-one-out" paradigm. In each trial three stimuli were presented requiring the participant to determine which sound differed from the others. An inter-stimulus interval of 350 ms was used. All tasks were terminated after ten reversals. Thresholds were the arithmetic mean of the last 4 reversals. Each participant completed two threshold runs of each task.

*FM-detection task* required participants to detect a 2 Hz sinusoidal FM of a 1 kHz carrier tone with varying modulation depth. The reference stimulus was a pure tone of 1 kHz. Modulation depth decreased by a factor of 1.2 from 100 to 11 Hz. At this point modulation depth decreases by a step size of 1 Hz. The length of both the reference and the target stimulus was 1000 ms including 50 ms cosine-gated onset and offset. The detection threshold was defined as the minimum depth of frequency deviation (in Hz) required to detect the modulation.

*Sound rise time discrimination* sensitivity consisted of a speech weighted noise with linear amplitude rise times. Rise times varied logarithmically between 15 and 500 ms in 41 steps. The total duration of the stimulus was fixed to 800 ms, including a linear fall time of 75 ms. The stimulus of 15 ms rise time was used as the reference stimulus for each trial. Discrimination thresholds were defined as the minimal difference in the rise time required discriminating between the reference and target stimulus.

*Intensity discrimination* task was identical to the FM and RT discrimination task in its presentation and procedure. Stimuli, of an 800 ms duration, consisting of a speech-weighted noise and a linear rise time and fall time of 75 ms were used. The stimulus of 70 dB SPL was utilized as a reference stimulus for each trial. Intensity was varied linearly between 70 and 80 dB SPL in 40 steps of 0.25 dB SPL each. Discrimination thresholds were defined as the minimal intensity difference (in dB SPL) required to discriminate between the reference and the target stimulus.

### *Speech-in-noise perception*

Speech-in-noise intelligibility was assessed for both words and sentences. During testing, the speech level was varied while the background noise level was fixed at 70 dB SPL. To assess the association of RT and FM discrimination in speech perception, two speech-in-noise tasks were administered. The first dealing with words-in-noise which would require less reliance on rise time processing and more on FM and the second which included sentences in noise which would rely more heavily on RT discrimination to accurately decompose and segment the sentence into finer grained elements for processing.

*Words-in-noise* perception was assessed with The Computer Aided Speech Perception Assessment (CASPA) developed by Boothroyd (2006) (for application see McCreery et al., 2010). A random selection of 3 lists of 10 CVC words were presented orally by a female speaker against a competing speech weighted noise at varying signal-to-noise ratios (SNR) (−5, −10, and −13 dB). Each list contained a single occurrence of the same set of 30 phonemes (20 consonants and 10 vowels). A practice list of 0 dB SNR was first administered to the participant. Participants were instructed to repeat each target word after presentation; if the participant was unable to repeat the target word correctly they were instructed to repeat every perceived phoneme. The percentage of correctly perceived phonemes was calculated for each SNR. The Speech Reception Threshold (SRT) was calculated for each participant through fitting to the data a logistic function relating the percentage of correct responses to SNR level (for a similar approach see Poelmans et al., 2011).

Speech-in-noise intelligibility of sentences was assessed using stimuli adapted from The Hearing in Noise Test (HINT) (Nilsson et al., 1994). Speech material consisted of English sentences spoken by a male speaker. The HINT stimuli consisted of a 70 dB long-term average speech spectrum masking noise and 12 equivalent 20-sentence lists. Two lists were administered after one practice list was presented. Lists were randomly selected from the 12 available. In the HINT adaptive procedure, beginning at 58 dB, the presentation level of all sentences were adjusted by 2 dB steps. Speech-in-noise intelligibility thresholds for each participant were calculated by averaging the last 6 SNR. Final values for each measure were inverted by multiplying by a factor of −1 to obtain a positive correlation matrix and for the creation of *z*-sores.

## **STATISTICAL ANALYSES**

All data were checked with Shapiro-Wilk's test for normality. The assumption of homogeneity of variance was assessed by Levene's Test for Equality of Variances.

### *Individual deviance analyses of composite scores*

A two-step process, as in Ramus et al. (2003) (also see Boets et al., 2006, 2007; Reid et al., 2007; Hazan et al., 2009), was used to create *z*-scores for each variable and to examine group differences in the proportion of deviant subjects on literacy tasks, phonological tasks, speech-in-noise perception, and dynamic auditory perception. As done in Ramus et al. (2003) a control mean and standard deviation were calculated for each measured variable based on the scores of the normal reading sample. However, any subject of the NR sample scoring below the set threshold of −1.65 *SD* (bottom 5% of the population) was removed to compute the final control mean and *SD*. This extra step was a means to prevent any inattentive or distracted control from exaggerating the normal range of performance. *Z*-scores for all subjects were then recalculated based on this new final control mean and *SD*. Individual deviance was calculated from these *z*-scores and defined as any subject falling below the −1.65 *SD* threshold. For the purposes of this paper the term deviancy score is referring only to those scores falling below this threshold. We do not imply any answer to the delay/deficit discussion concerning dyslexia. In acknowledgment of the possible exaggeration of the dyslexics' deficits by such a two-step method, the more strict threshold of −1.65 *SD* was chosen.

The resulting *Z*-scores were used to create composite scores. For each participant a literacy score was calculated by averaging the *z*-scores of the WRAT reading and spelling subtests (Literacy); a phonological awareness (PA) score was calculated as the *z*-score of the Spoonerism task, The two RAN *z*-scores were averaged into one overall RAN score (RAN). Digit span and non-word recall tasks were averaged to create a verbal short-term memory score (VSTM). Due to the lack of strength in the correlations found within the auditory processing and within speech perception measures no composite scores were created for these groups of variables.

### *Multiple comparison corrections*

In order to avoid the possibility of making a false positive conclusion in group comparisons all reported *p*-values for *t*tests and ANOVAs were adjusted using a Bonferroni correction, which entailed the multiplication of the given *p*-value by the total number of comparisons per question to a maximum Bonferroni adjusted *p*-value of 1. If the adjusted *p*-value remains less than the original alpha of 0.05 then the null hypothesis was rejected.

## **RESULTS**

## **PERFORMANCE OF DYSLEXIC vs. NORMAL READING ADULTS** *Literacy*

Literacy results are presented in **Table 1**. There was a statistically significant difference in the mean scores of reading and spelling between groups, with the dyslexic group preforming significantly poorer, *t*(50.283) = 8.575; *p* < 0.005, and *t*(60.675) = 10.305; *p* < 0.005.

### *Phonological skills*

Each domain of one's phonological skills, as represented in Wagner and Torgesen (1987), was tested. Phonological awareness (PA) was tested by the spoonerism task of the PhAB, verbal shortterm memory (VSTM) by digit span and non-word recall and RAN by object and color naming. Test scores are presented in **Table 2**.

Independent sample *t*-tests were run to determine differences between groups in measures on phonological skills. Scores of the non-word recall and Spoonerism tasks were not found to be normally distributed. In order to approach a normal distribution they were transformed by a square root transformation. Adults with dyslexia were found to perform significantly poorer then controls on all measures.

### *Speech perception and auditory processing*

In order to approach a normal distribution for more variables, the best score on the FM measure was transformed by a logarithmic transformation after the scores had been reversed, while the best score on the ID measure was transformed by the use of a square root transformation after the scores had been reversed, and the RT scores were transformed using a square root transformation (Field, 2009).

Since the aim of this research is to evaluate threshold estimations as an indicator of a subject's sensory capability, the two threshold trials were not averaged and instead the best score of each test was selected (for a similar approach see Boets et al., 2006). Threshold means and standard deviations of all auditory measures for each group can be found in **Table 3**.

Results demonstrated that dyslexic readers scored significantly poorer on measures of RT discrimination and ID, but not on FM-detection nor on the two tasks for speech-in-noise perception. Given the unexpected findings of a group difference in ID, ID was introduced as a control variable in order to determine whether a significant group difference on RT was due to general cognitive demands related to task design or intensity-related processes rather than dynamic-related processes. This confirmed the group difference for RT discrimination, *F*(1, 87) = 9.492, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.012, partial <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>.098, while FM remained insignificant, *F*(1, 87) = 0.643, *p* = 1 (*p*-values are Bonferroni adjusted for multiple comparisons).

### **RELATIONS BETWEEN LITERACY, PHONOLOGICAL, AND AUDITORY SKILLS**

To assess the relations between subjects' literacy skills, phonological abilities and auditory processing skills, Pearson's correlation coefficients were calculated between the subjects' scores on measures of literacy, phonology, slow-rate dynamic auditory processing and speech-in-noise perception (lower left portion of **Table 4**). Phonological awareness was related to all measures of literacy, verbal short term memory and RAN, as well as RT and ID. Although FM was only found to relate to RT and ID, RT significantly correlated with measures of reading, spelling and measures of PA (spoonerisms and both RAN tasks).


*All p-values are Bonferroni adjusted for multiple comparisons.*

**Table 3 | Auditory and speech-in-noise measures: descriptive statistics and** *t* **and** *p***-values from independent** *t***-tests.**


*All p-values are Bonferroni adjusted for multiple comparisons.*


**Table 4 | Correlations among measures for auditory processing, speech perception, phonology and literacy skills, with (upper part) and without (lower part) controlling for group.**

*Read, WRAT reading; Spell, WRAT spelling; DS, Digit Span; NWR, non-word recall; PA, Spoonerism; RANob, RAN object naming; RANcol, RAN color naming. \*p* < *0.05; \*\*p* < *0.01; \*\*\*p* < *0.001; ( \*) Approaching significance of 0.05.*

Since the correlational analyses showed that reading and spelling correlate with both PA and RT, the independent contribution of each was assessed through a multiple regression analyses with both RT and PA for predicting reading and spelling (see **Table 5**). Analyses showed that RT offers no unique influence to both literacy measure above that offered through PA.

The addition of ID in the model to control for attention mechanisms produced the same pattern of results for reading, *<sup>F</sup>*(3, 85) <sup>=</sup> <sup>21</sup>.512, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.432, and spelling, *<sup>F</sup>*(3, 85) <sup>=</sup> <sup>27</sup>.258, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.490, as well as the addition of age and IQ with ID, *<sup>F</sup>*(5, 83) <sup>=</sup> <sup>13</sup>.802, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.454, and *<sup>F</sup>*(5, 83) <sup>=</sup> <sup>17</sup>.591, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.514.

Yet further investigation of RT's relationship with literacy within the dyslexic and the normal reading population did not reveal the same relationships present above. More specifically, the addition of group as a control measure to the regression model produced a larger significant contribution of PA, and none of RT, to reading, *<sup>F</sup>*(6, 82) <sup>=</sup> <sup>16</sup>.683, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.<sup>550</sup> and spelling, *<sup>F</sup>*(6, 82) <sup>=</sup> <sup>23</sup>.392, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.631. In a similar vein, the other significant relationships that RT had across the entire population (lower left portion of **Table 4**) disappeared when controlling for group, with the exception of RAN object (upper right portion of **Table 4**).

## **INDIVIDUAL DEVIANCE ANALYSES**

### *Individual differences*

The examination of performance at the individual level in both the NR and DYS group allows for a better understanding of the proportion of individuals within each group showing poor performance on each measured variable, even when group differences are not found. Such analyses will also allow determining if any individual subject had consistent deviant performance across all levels of processing, or whether deviant performance is a more random occurrence indicating the involvement of influences different from an auditory perceptual deficit (Heath et al., 2006).

**Table 5 | Stepwise regressions showing the unique variance in the word reading, and spelling accounted for by PA and RT (***R***<sup>2</sup> change and standardized Beta).**


*\*\*\*p* < *0.001.*

Individual performance of the *z*-scores of RT, FM, ID, CASPA, HINT, PA, RAN, and VSTM were analyzed. A deviancy threshold of −1.65 was used. Thus, any *z*-score falling below this threshold would be considered as deviant performance as described by Ramus et al. (2003) and subsequently used by Boets et al. (2006, 2007), Reid et al. (2007), and Hazan et al. (2009).

The number and proportion of deviant subject per group on each of the variables are presented in **Table 6**. All measures, with the exception of CASPA, HINT, ID, and FM, demonstrated a significantly higher portion of deviant subjects in the DYS group when compared with the NR group.

An evaluation of subjects with at least one score deviating more than 1.65 *SD* for the various auditory, speech and phonological measures, demonstrated that deficits appeared inconsistent, with some subjects deviating only on one task, while others on two or three tasks. Due to the observation of a high percentage of deviancy found on measures of RT (58%) and PA (72%) within the dyslexic group, an exploration of the interrelation between deficiencies in these different skills were made. ID was included to address any questions of influence of task related demands and/or attention. **Figure 1** shows the calculated number of subjects showing isolated vs. overlapping deficits. Results show that 28% of the dyslexic subjects possess a deficit in only PA (30% when controlled for ID), while 14% dyslexic subjects were found to only have a RT deficit (19% when controlled for



*Where cells have expected count less than 5, the Fisher's Exact test p-values are reported.*

ID). Dyslexic adults possessing an overlap in deficits were found to represent nearly half of the dyslexic subjects, 44% (37% when controlled for ID). Although a large percentage of overlap is present, the proportion of shared PA and RT deficit does not exceed the expected proportions represented within the whole dyslexic group. Investigation of the normal reading individuals revealed no overlap between deviancy of RT and PA, yet this might be due to a low number of deviant subjects.

## **DISCUSSION**

It has been well established in the literature that dyslexic readers struggle with a phonological processing deficit and that such skills are related to literacy development and achievement (Snowling, 2000). Yet debate surrounds the question of whether this phonological processing impairment stems from a more primary deficit, such as a deficit in processing of speech sounds or due to a reduced sensitivity to slow-rate dynamic auditory information. This current study was set out to investigate speech perception and slow-rate dynamic auditory processing, in the form of RT and FM detection, in relation to phonological processing and literacy measures in dyslexic and normal reading adults.

### **SLOW-RATE AUDITORY PROCESSING DEFICIT**

In line with the auditory temporal processing deficit theory of dyslexia, we had expected our auditory measures of RT and FM to differentiate between dyslexic and non-dyslexic students but not our non-temporal auditory ID task.

With regard to the slow-rate auditory processing tasks, group analyses revealed significant differences between adults with dyslexia and normal readers in RT while the uncorrected *p*-value was found to be approaching significance in FM. The lack of a significant group difference for the FM measure was unexpected, since the majority of studies in dyslexic adults have demonstrated clear group differences (Witton et al., 1998, 2002; Ramus et al., 2003; Heath et al., 2006). With regard to RT, our results are in line with the bulk of previous studies demonstrating a lower performance in dyslexic children (e.g., Goswami et al., 2002; Fraser et al., 2010; Poelmans et al., 2011) and adults (e.g., Hämäläinen et al., 2005; Thomson et al., 2006; Pasquini et al., 2007), suggesting a RT-deficit across development and languages.

Plausible hypotheses to explain the unexpected finding of not finding a group difference for FM in the presence of a RT-deficit may be (1) low sensitivity of the behavioral measures used, (2) the influence of task demands and attention difficulties, or (3) specific characteristics of the auditory stimuli being used.

Stoodley et al. (2006) suggested that in a population, such as the one included in this study, psychophysical measures may not be sensitive enough to detect subtle auditory processing impairments due to possible compensation. They found dyslexic adults to be unimpaired in psychophysical FM discrimination tasks, yet group differences were found when electrophysiological recordings were used. In doing so, Stoodley and colleagues demonstrated that the inability to detect low level auditory processing deficits in some groups of high functioning dyslexics can be attributed to the task sensitivity and the level of compensation achieved by the individual. The lack of group differences for FM discrimination for our adult population differed from behavioral studies in pre-schoolers (Boets et al., 2007) and children (Poelmans et al., 2011), which employed similar methodologies and stimuli. Yet findings on the RT measures were found to be significant, which would not have been expected if Stoodley's theory of compensation influences is consistent across all psychophysical tasks, unless RT tasks offer greater sensitivity.

Criticism regarding the influence of task demand and complexity of psychophysical tasks (see Roach et al., 2004) could explain the inconsistency of these results and the unexpected group differences on the ID task. Of the 16 studies reviewed by Hämäläinen et al. (2013) that included a measure of ID, only two found a significant group difference between individuals with dyslexia and normal readers. In the only adult study which found a group difference in ID (Thomson et al., 2006), the authors attributed their findings to the task difficulty of their ID measure. Such findings of unexpected differences may support Roach et al.'s (2004) claim that poor performance and findings of group differences on psychophysical tasks are likely to be a function of attention and general task performance. In order to control for such task demand differences, ID was included in the statistical analyses as a control measure for all levels of analyses. After controlling for ID, group differences on RT remained present, indicating that this difference is rooted in processing stimuli-related properties differently rather than in attention differences.

Since our results do not clearly support the two explanations above, it is more likely that the pattern of results can be explained by a very specific deficit in slow-rate dynamic auditory processing. FM and RT tasks differ in how the auditory information is represented in the speech signal. As discussed by Rosen (1992), FM represents the fine structure of the speech waveform, while RT represents amplitude aspects of the speech envelope. The distinct pattern of results between RT and FM suggests that in adult dyslexics, the primary auditory dysfunction is more likely to be found in the perception of slow-rate dynamic auditory cues related to the speech envelope, as measured by RT, and not in the fine-structure, as measured by FM. Such findings reinforce previous studies in both child and adult populations (Goswami et al., 2002; Thomson et al., 2006; Fraser et al., 2010; Poelmans et al., 2011).

In sum, our results do not support a general deficit in slow-rate auditory processing of adult with dyslexia, yet, a subgroup of the adult dyslexic population may possess a more specific slow-rate dynamic processing deficit specific to the envelopes of the speech waveform.

### **SPEECH-IN-NOISE PERCEPTION DEFICIT IN INDIVIDUALS WITH DYSLEXIA**

Slow-rate dynamic auditory cues are found in abundance in speech. It is believed that a deficit in the processing of these auditory cues, such as RT and FM, would ultimately lead to a disruption in speech perception.

Unlike the results of auditory processing, this present study was not able to demonstrate any evidence to support the continuation of the speech-processing deficit observed in youth (Snowling et al., 1986; Wible et al., 2002; Bradlow et al., 2003; Boets et al., 2007; Ziegler et al., 2005, 2009) into adulthood, suggesting developmental or task related influences. Although our speech masking stimuli were in line with previous studies with children, it may not have offered sufficient difficulty for use in an adult or a highly compensated population (Pennington et al., 1990). According to a recently published study by Dole et al. (2012), a stationary speech weighted background noise, as used in the present study, is less effective in differentiating between dyslexic and normal reading adults than modulated noises and background speech masks. Under the masking conditions of background speech or modulated noise an individual must rely on temporal dips in the masking noise to extract signals of the target speech signal (Howard-Jones and Rosen, 1993). It is thought that individuals with dyslexia may have difficulty perceiving these temporal dips, which is in line with our results of a RT deficit. Future studies should take into account Dole's findings to further assess the potential cascade of the RT difficulties observed in some dyslexics.

## **SLOW-RATE AUDITORY PROCESSING AND SPEECH PERCEPTION RELATIONSHIP**

Our findings showed significantly poorer performance in adult dyslexic readers on the RT task assessing slow-rate dynamic auditory processing, which relates to amplitude aspects of the speech envelope. If an indirect path of an RT deficit through speech perception existed, we would have expected to find a correlation with the sentence in noise measure that required a greater reliance on larger grain segmentation of the sentence stimuli. However, examination of the relationships between these variables could not clearly support this hypothesis. Yet, once controlled for group, CASPA was found to relate to phonological skills.

As discussed earlier, the use of stationary noise in our speech perception tasks may have limited our ability to find relationships with RT, which might be more closely related to speech perception in modulated noise. An alternative interpretation is that slow-rate auditory processing independently relates to reading related measures and not via speech perception measures. However, such a situation remains unlikely considering the prevalence of slow-rate dynamic auditory cues in the speech signal. Therefore one would expect to find a relationship between these two variables. Finally, Poelmans et al. (2011) offered an alternative explanation, stating that the lack of relationship could be a consequence of the fact that the developmental link between these variables diminishes over time and is no longer evident in later years.

Due to the lack of evidence found to support the relationship of auditory deficits and speech perception in adults, our results do not support the theoretical cascade effect of the auditory deficit through speech perception to one's phonological representations.

## **SLOW-RATE DYNAMIC AUDITORY PROCESSING, PHONOLOGICAL PROCESSING, AND LITERACY**

No significant correlations were found with FM nor with speech perception tasks. On the other hand, RT was found to correlate with measures of reading, spelling, phonological awareness and RAN, similar to findings of Thomson et al. (2006). Taking the regression analyses into account, it appears that any relationship between RT and reading is mediated through phonological processing and not speech-in-noise. These findings were similar to that of Pasquini et al. (2007). As discussed by Hämäläinen et al. (2005) it is highly improbable that the lower level skills of RT discrimination could be influenced by an individual's poor phonological awareness. Therefore, it is reasonable to assume that either this relationship reflects the same underlying perceptual deficit, or the ability to detect rapid changes in the speech envelope has a causal role in the development of PA. Although once controlled for group these relationships could no longer be supported, indicating that RT is not a good predictor of reading abilities in dyslexic or in normal readers. Yet, it is worth noting that a different pattern of findings might have emerged if a more direct assessment of decoding was employed, such as non-word reading measure (Hämäläinen et al., 2005).

Although the correlational analyses across all participants suggest interrelations between PA and RT, this finding should be nuanced at the individual level. When the prevalence and overlap of deviant performance on PA and RT was evaluated at the individual level, nearly half (45%) of the dyslexic population was found to possess a deficit in both, while 28 and 14% of the dyslexic population was found to have an isolated deficit in PA or RT, respectively (30 and 19% when controlled for ID). Despite cooccurrence in a large subsample of dyslexics, independence is suggested because the overlap between these variables is in proportion to what would be expected based on the frequency of each deficit in the total dyslexic group (i.e., 72% for a PA-deficit and 53% for a RT-deficit). Complemented with the lack of relationships once group was controlled for, it appears that phonological deficits seem not to be necessarily secondary to auditory problems since both deficits do not co-occur in every dyslexic subject. To increase our understanding, a longitudinal pre-reading study will be needed to assess the prevalence of the double deficit in RT and PA at earlier stages of reading development. In addition, training studies could help in verifying how one skill influences the other.

Given that in our adult study a large proportion of reading (problems) still remains unexplained, a multifactorial approach should be explored to fully identify the mechanisms underlying dyslexia. By investigating alternative cognitive factors, such as orthographic or morphological processing (Bekebrede et al., 2009), perceptual factors (Stein, 2001) and biological explanations (Nicolson et al., 2001), the variance and comorbid symptoms associated with the dyslexic population can be better understood.

### **LIMITATIONS AND IMPLICATIONS**

A limitation of this study was the sole inclusion of university students with dyslexia. It is reasonable to assume that by mere virtue of the fact that these young adults have reached university level education, varying levels of compensation are present in this specific group. Research has shown that the presence of relatively stronger cognitive abilities in some children with dyslexia allows for the minimization of parts of their phonological deficit later in life, allowing for the attainment of normal reading ability (Shaywitz et al., 2003). For example, a reliance or a strength in the use of contextual cues (Frith and Snowling, 1983; Nation and Snowling, 1998), semantic knowledge (Snowling et al., 2000), visual memory (Campbell and Butterworth, 1985), and morphological knowledge (Elbro and Arnbak, 1996) had been shown to aid in a dyslexic's ability to minimize the impact of the deficit in the expressed reading abilities. Stoodley et al. (2006) had also noted similar top down compensation processes influencing results of slow-rate dynamic auditory processing tasks (for a description of possible top down compensation processes see Pichora-Fuller, 2008). Therefore, percentages of observed deviant performance on slow-rate dynamic auditory processing tasks and phonological awareness measures could be underrepresented within our sample. Such potential levels of compensation limit our ability to extrapolate any findings to the general adult dyslexic population and could have potentially limited our ability in establishing clear group differences or correlations between variables. Having said this, our results do have implications in typifying the characteristics of dyslexic adults in higher education and broadening our understanding of how compensation may be expressed. This is especially relevant since accommodations are offered based on valid diagnosis given to them. Although the RT task sensitivity is lower than the phonological tasks' sensitivity, our result did demonstrate its potential to be included as an additional screening measure, for it was able to characterize a proportion of dyslexic adults not identified by a PA measure alone. Our data showed that purely relying on a PA tasks will result in missing a small subsample of dyslexics (in our study 14%).

A second implication is that a control task should be included. Our findings show the possible overestimation of the number of dyslexics when attention and task related demands are not accounted for. To avoid overestimation, future research should apply such a control task as presented in this paper, when designing a psychophysical testing battery and screening tools. Therefore, future development and study of this measure is still needed.

## **CONCLUSION**

In summary, our results suggest that the lower sensitivity to RT cues that was observed in dyslexic children is still observable in adulthood, while FM deficits are not. Hence, our results suggest that a general slow-rate dynamic auditory processing deficit may not be present within an adult dyslexic population, but may be confined to speech envelope cues rather than to fine structure. RT's influence on literacy outcomes was not direct and was found to be mediated through phonological processing (this relationship was lost once controlled for group). Unlike studies in younger children (Boets et al., 2006), the existence of speech-in-noise perception deficits and its mediating role in auditory processing and reading-related measures was not observed. Further research is needed in this area with attention to the selection of speech-in-noise masking stimuli and the sampling of a more diverse adult population, which does not primarily contain a university sample.

Although findings of a deficit in RT and its correlation with phonological skills are significant when examined across the entire population, many dyslexic subjects with a severe deficit in one of these skills were often found unimpaired in the other skills. At best, conclusions regarding the primary deficit of dyslexia being a slow-rate dynamic auditory processing deficit should be restricted to the processing of RT cues and can only be generalized to a subgroup of adults with dyslexia. Such a lack of consistency could implicate the necessity of a multifactorial model of dyslexia.

### **ACKNOWLEDGMENTS**

We would like to acknowledge the help and support of Hanne Poelmans and Sophie Vanvooren in the calibration and setup of the auditory processing and speech perception instruments and task. We are also grateful to Mike Walker for his help in participant recruitment and the arrangement of testing facilities. This research has been financed by the research fund of the KU Leuven (grants dBOF/12/014 and OT/12/044). Funding was also provided by the Science Foundation Flanders (grant G.0920.12).

### **REFERENCES**


*Teaching and Instruction* eds P. Ghesquiere and A. J. J. M. Ruijssenaars (Leuven: University Press), 47–63.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 December 2013; accepted: 15 June 2014; published online: 02 July 2014. Citation: Law JM, Vandermosten M, Ghesquiere P and Wouters J (2014) The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia. Front. Hum. Neurosci. 8:482. doi: 10.3389/fnhum.2014.00482*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Law, Vandermosten, Ghesquiere and Wouters. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neuroimaging correlates of handwriting quality as children learn to read and write

## *Paul Gimenez 1, Nicolle Bugescu1,2, Jessica M. Black3, Roeland Hancock1, Kenneth Pugh4, Masanori Nagamine1,5, Emily Kutner 1,2, Paul Mazaika6, Robert Hendren1, Bruce D. McCandliss <sup>7</sup> and Fumiko Hoeft 1,4,6,8\**


#### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

#### *Reviewed by:*

*Scott K. Holland, Cincinnati Children's Research Foundation, USA*

*Donatella Spinelli, Università di Roma "Foro Italico," Italy*

#### *\*Correspondence:*

*Fumiko Hoeft, Division of Child and Adolescent Psychiatry, Department of Psychiatry, University of California, 401 Parnassus Ave. Box 0984-F, San Francisco, CA 94143, USA*

*e-mail: fumiko.hoeft@ucsf.edu*

Reading and writing are related but separable processes that are crucial skills to possess in modern society. The neurobiological basis of reading acquisition and development, which critically depends on phonological processing, and to a lesser degree, beginning writing as it relates to letter perception, are increasingly being understood. Yet direct relationships between writing and reading development, in particular, with phonological processing is not well understood. The main goal of the current preliminary study was to examine individual differences in neurofunctional and neuroanatomical patterns associated with handwriting in beginning writers/readers. In 46 5–6 year-old beginning readers/writers, ratings of handwriting quality, were rank-ordered from best to worst and correlated with brain activation patterns during a phonological task using functional MRI, and with regional gray matter volume from structural T1 MRI. Results showed that better handwriting was associated negatively with activation and positively with gray matter volume in an overlapping region of the pars triangularis of right inferior frontal gyrus. This region, in particular in the left hemisphere in adults and more bilaterally in young children, is known to be important for decoding, phonological processing, and subvocal rehearsal. We interpret the dissociation in the directionality of the association in functional activation and morphometric properties in the right inferior frontal gyrus in terms of neural efficiency, and suggest future studies that interrogate the relationship between the neural mechanisms underlying reading and writing development.

**Keywords: phonological processing, voxel-based morphometry, functional MRI, inferior frontal gyrus pars triangularis, writing, reading**

## **INTRODUCTION**

Writing by hand occupies 30–60% of a child's school day (Stewart, 1992; Simner, 1998; Feder and Majnemer, 2007; Sassoon, 2007) and correlates with self-esteem and future academic success. Children with deficient handwriting (10–30% of children; Karlsdottir and Stefansson, 2002) take longer to complete writing tasks such as homework, which can increase the difficulty of schoolwork and result in oppositional attitudes toward writing assignments that can generate problems both at school and at home (Racine et al., 2008). Crucially, handwriting performance also shares links with other language related skills. Of particular relevance, there are important associations between reading and learning to write. Studies have shown that learning to write can improve letter perception (Longcamp et al., 2005), pseudoletter learning (Richards et al., 2011), and word reading (e.g., Berninger et al., 2004, 2006a; James and Engelhardt, 2012). Correspondingly, children with learning disabilities such as developmental dyslexia, a specific reading impairment that is believed to have phonological deficits at its core, often display writing difficulties (O'Hare and Khalid, 2002).

With the increasing integration of computers into the education system, the implied implications of reduced handwriting practice have strengthened the interest of scientific investigators. Recent neuroimaging studies have concluded that while freeform handwriting practice clearly supports reading acquisition, typing (Longcamp et al., 2005) and even tracing (James and Engelhardt, 2012) do not. Impressively, James and Engelhardt (2012) showed that preliterate children recruit well established reading related brain regions, such as the fusiform gyrus, posterior parietal cortex, and the inferior frontal gyrus, during letter processing exclusively after handwriting practice compared to typing or tracing. The emerging consensus is that the motor experience of manually creating letterforms helps children discriminate the essential properties of each letter, which leads to more accurate representations bolstering both skilled letter recognition and later reading fluency. Therefore, understanding the underlying neurological mechanisms that support handwriting development is important not only for its independent relevance to educational achievement, but also for its supportive role in successful acquisition of other language skills such as reading.

The neurological basis underlying handwriting is not well understood but converging evidence points to key regions including: (a) the fusiform gyrus, which has apparent selectivity to letter (James and Gauthier, 2006) and word stimuli (Cohen et al., 2000; Cohen and Dehaene, 2004) over other visual stimuli and may provide a perceptual component for deriving "word-form" representations that facilitate grapheme writing (Dehaene et al., 2005; James, 2010); (b) the superior parietal lobule (SPL), a region important for carrying out actions in space (Goodale and Milner, 2005) that is thought to be involved in both visuospatial and visuomotor processing (Petrides and Pandya, 1984; Morecraft et al., 2004; Segal and Petrides, 2012), and the execution of writing sequences (Otsuki et al., 1999); (c) the inferior frontal gyrus (IFG), implicated for its involvement in phonological processes (Eckert et al., 2003) and its associations with encoding letterforms and words (Grafton et al., 1997; Berninger and Winn, 2006b; Longcamp et al., 2008); and (d) Exner's area, thought to be the interface of orthographic or graphemic representations and the complex movement sequences necessary for generating letters and words (Anderson et al., 1990; Lubrano et al., 2004; Roux et al., 2009) and may also be involved in retrieving letter shapes from memory (James and Gauthier, 2006).

While advances have been made, a complete understanding of the brain's writing system remains elusive. The inherent complexities involved in the task of writing, coupled with the excessive variability of its definition in the existing literature, make it challenging to delineate the extent of neuronal specialization specific to handwriting from other inter-related aspects, such as spelling. In a recent neuroimaging metaanalysis of writing in adults, however, authors dissociated linguistic input and motoric aspects of writing and identified IFG for processing linguistic input as it relates to writing, and left superior frontal sulcus/middle frontal gyrus (Exner's), left superior parietal lobule, and the right cerebellum as "writing-specific" regions (Planton et al., 2013). Another study has shown that the brain differentially recruits specialized regions based on a multiplicity of letter representations (e.g., motoric similarities "B" vs. "P," visual similarities "A" vs. "R," abstract similarities "A" vs. "a") (Rothlein and Rapp, 2014). What is lacking is detailed examination of the emergence of "neural specialization" during the period when writing skills develop and the brain basis of the underlying process (except see work by Karen James cited in this paper). Further, more investigations of association between the brain basis of writing and other processes of written language such as reading is greatly needed. Findings from such studies may not only offer important insights to improve research methodology and educational instruction, but may also contribute to a fuller understanding of the development of written language processing in the human brain.

The present study sought to focus on the neural correlates of *handwriting quality* in children at the beginning of formal handwriting instruction. Operationally, handwriting quality refers to the legibility, form, slant, spacing, and general appearance of letters and words. Handwriting researchers have generally agreed on the relevance of these key features (Freeman, 1959; Kaminsky and Powers, 1981; Graham, 1982; Ziviani and Elkins, 1984; Graham and Weintraub, 1996). Given that handwriting exposure in preliterate children has been associated with reading related processes such as letter perception and related brain activation (James and Engelhardt, 2012), it is plausible to consider that especially during early stages of development, handwriting also share links with phonological processing a skill that for decades has been casually linked to reading acquisition (Castles and Coltheart, 2004; Byrne et al., 2008). Therefore, we sought to investigate the unproven idea that handwriting and reading may rely on a common neuroanatomical mechanism at an early developmental stage of reading/writing. We therefore hypothesized handwriting quality will be associated with neuroanatomical patterns in one or more of the following: (a) IFG if phonological decoding coding is relevant to handwriting quality, (b) Exner's area if successful integration of orthography and motor programs are relevant to handwriting quality, (c) SPL if sequential motor movements and/or kinesthetic modulation are relevant to handwriting quality, and/or (d) fusiform gyrus if visual letter or word recognition is relevant to handwriting skill. Then, in order to investigate the direct relationship of handwriting and reading abilities, in particular of phonological processing, we associated handwriting quality with brain activation during a task aimed at engaging the brain's phonological processing circuit. If handwriting is associated with the development of reading, and phonological processing plays a causal and reciprocal relationship with reading acquisition, we hypothesized that brain activation patterns associated with phonological processing, may also be associated with handwriting skills in these emergent readers / writers.

## **MATERIALS AND METHODS**

Our data come from a study focused on examining brain activation during phonological processing and the relationship between reading-related behavioral measures. While the original study was not focused on the brain basis of handwriting, and hence the behavioral measures and fMRI tasks were not necessarily optimized for the purpose of the current study, yet these data provided an opportunity to investigate whether neuroanatomical patterns and brain activation during phonological processing are associated with handwriting in beginning readers and writers.

### **PARTICIPANTS**

A total of 51 (29 boys, 22 girls) healthy, native English-speaking 5- and 6-year-old children (aged 5.59 ± 0.42) toward the beginning of formal schooling participated in this study. Standard behavioral assessments of the children, along with MRI data were collected. We later excluded five left-handed children, leaving 46 remaining right-handed children to be included in all analyses unless there was missing data or excessive movement motion or severe scanner artifacts (fMRI analyses, *N* = 41). While we did not exclude children based on attention deficit hyperactivity disorder (ADHD) for example, the children in this study did not have any parental report of formal diagnosis of neurological or psychiatric disorders besides specific learning disabilities; they were not on medication and had no contraindications to MRI. Behavioral Assessment System for Children −2 (Reynolds and Kamphaus, 2006) showed that all children were within typical range.

To help prepare participants for imaging, parents received a packet of informational material, including a CD of common scanner sounds and a DVD of a child going into the scanner. Parents were instructed to review these supplemental materials with their children to familiarize and desensitize participants to the scanner environment. In addition, children participated in a guided MRI simulation at the center where they practiced lying still in the bore and underwent training to minimize motion related artifacts. Participants with excessive, uncorrectable motion were eliminated from the study.

The Stanford University Panel on Human Subjects in Medical Research and the University of California, San Francisco Human Research Protection Program approved the study and informed consent and assent were obtained from parents/guardians and participants, respectively.

### **BEHAVIORAL MEASURES**

We administered a standard battery of neuropsychological assessments, which included the Woodcock-Johnson III (WJ-III) Spelling (Woodcock et al., 2001), an untimed real-word spelling test, in order to assess spelling accuracy and handwriting quality (see below); the Beery Visual-Motor Integration (BVMI; Beery and Beery, 2004), where children copied and traced a series of moderately complex geometric figures; and the Oromotor Sequences subtest from the Developmental Neuropsychological Assessment (NEPSY-II; Korkman et al., 2007) to assess oralmotor praxis, or the ability to sequence oral-motor movements without articulation difficulty, without visual demands. Additionally, the Home Literacy Inventory (Marvin and Ogden, 2002) was used to investigate the differences in the exposure and practice of reading activities at home.

### **HANDWRITING QUALITY**

In order ensure participants were unaware that their handwriting was under investigation, handwriting samples were drawn from the WJ-III Spelling subtest were used as a basis for assessing and defining handwriting skills. Two blinded investigators, who were trained to score handwriting quality holistically based on letterform, slant, spacing and general appearance irrespective of spelling errors and speed, each rank-ordered (1 = poor handwriting, 51 = best handwriting) participants' writing sample from best to worst three times. Since spelling inaccuracies can inadvertently bias rankings, writing samples included both letters and small words. Intraclass correlation coefficients were calculated to examine intra-rater reliability (Cronbach's alpha = 0.994 for rater 1; 0.989 for rater 2), and inter-rater reliability (Cronbach's alpha = 0.980) was calculated after the three sets of scores were averaged across raters. The final ranking used was based on the mean of each investigator's scores.

### **VISUOMOTOR (COPYING) SKILLS**

A subset of test items (items 17–19) from the BMVI task was selected by the investigators to evaluate visuomotor skills; these items were developmentally appropriate, yet were also sufficiently difficult. Specifically, these were the most difficult items (non-letter objects) that all participants were able to complete. According to the manual, the validity and reliability of the task are sufficient for the age of our participants (Beery, 1997). Following the same rank-ordering procedures as for handwriting quality, two blind investigators rated participants' reproductions, which were based on copying geometric shapes (1 = poor reproduction, 51 = best reproduction). Intraclass correlation coefficients were calculated to examine intra-rater reliability (Cronbach's alpha = 0.993 for rater 1; 0.974 for rater 2) and inter-rater reliability (Cronbach's alpha = 0.969) was calculated after the three sets of scores were averaged across raters. The final ranking used was based on the mean of each investigator's scores.

### **FUNCTIONAL MRI TASKS**

Three tasks measuring a range of cognitive abilities were used to investigate neurological associations to handwriting. The first was a phonological processing task in which participants were asked to determine if the first sound of the names of two pictures of common objects matched (**Figure 1A**). This task was adapted from a sound-matching subtask of the Comprehensive Test of Phonological Processing (Wagner et al., 1999) and is well established as reliable in phonological processing investigation (e.g., Katzir et al., 2005). The second task was a non-verbalizable visual-symbol matching task in which participants were presented with unfamiliar Japanese hiragana (no participants knew that they were letters from another language). Visually similar hiraganas (e.g., vs. ) were presented to try to maximize difficulty (**Figure 1B**). This task was used to at least partially account for visual input and motor response often associated with fMRI tasks that requires processing of letters and explicit motor responses (Henson et al., 2000). Finally, the third task was a color-matching task in which participants were asked to determine whether two colors matched (**Figure 1C**). The pair of stimuli were of the same hue but of different lightness with close value optimized in a pilot study to avoid using names of the colors to perform the task and to maximize difficulty. Although there is no assumed relationship between color-matching and handwriting, this task was included as another task to help account for some of the confounds, such as the color dissimilarities in the stimuli used in the phonological task and the decision making nature of all three tasks. These latter two tasks were only obtained in a portion of the children (*N* = 18). We therefore performed a secondary analysis of the phonological fMRI task matched to include only those participants that also completed the visual-symbol matching and color matching control tasks when comparing between tasks. The results of the phonological fMRI task were unchanged regardless of the sample-size and were specific to the phonological task.

All three tasks utilized the same procedure. Each required participants to determine whether two visually presented stimuli matched for either the first syllables of the names of pictures, visual symbols or color. Stimuli were presented simultaneously in one condition (without enabling working memory, WM−)

**FIGURE 1 | Schematic diagram of the functional magnetic resonance imaging (fMRI) tasks. (A)** Phonological processing fMRI task. In this block design fMRI task, participants were asked to determine whether the name of the pictured stimuli begin with matching sounds. Condition A: required working memory (WM+), stimulus presented one after another for 2 s each with 1 s intervals (must be retained across a delay). Condition B: no working memory required (WM−), stimuli presented side-by-side for 5 s. Conditioned collapsed for the purposes of this study. **(B)** A visual-symbol matching block design fMRI task. Design was the same as phonological task except that Japanese hiragana symbols were presented instead of pictured objects, and

participants were asked to determine whether these unrecognized symbols matched. Note: working memory was pertinent to the parent study, but it was not crucial to this study, and for the purposes of this study WM+ and WM− conditions were collapsed into one condition. **(C)** A color discrimination block design fMRI task. Design was the same as the other tasks except that colors were presented instead of pictured objects or symbols, and participants were asked to determine whether these colors were the same. Note: working memory was pertinent to the parent study, but it was not crucial to this study, and for the purposes of this study WM+ and WM− conditions were collapsed into one condition.

and after a small delay (enabling working memory, WM+) (**Figures 1A–C**). In this study, we report results collapsing the two conditions. The exploration of the role of working memory in reading and writing may answer important theoretical questions and should be examined in future studies. This is not however, explored further in the current study because of the non-significant difference between conditions, which may have been due to a number of factors such as the short interstimulus interval. Rest was used as the control condition because our preliminary study showed difficulty of children performing a phonological task (sound matching of first syllable) alternating with a control condition (such as visual shape matching). We therefore opted to use the two other visual fMRI tasks to show specificity of the effects to the sound-matching task. Participants completed two runs of each task. Each of the two runs began with a 6 s countdown and a 2 s rest period. In the WM− condition, stimuli were presented side-by-side continuously for 3.5 s (followed by a 2.5–3.5 s jitter with a mean average of 3 s), whereas the WM+ condition displayed stimuli at the center one at a time for 2 s each with a jitter of 2.5–3.5 s (mean average 3 s) between stimuli (paired stimuli were also followed by a 2.5–3.5 s jitter with an mean average of 3 s). There were 5 trials per block. The 4 task blocks in each run were 32 s in duration and the order of the condition was varied from Run1 (WM−→WM+→WM+→WM−) and Run2 (WM+→ WM−→ WM−→WM+), with a 5, 15, and 5 s intervals between blocks. Participants (*N* = 41) completed 2 runs, with each run being 170 s in length (174 s total with the first 4 s of the scans in each run being discarded to establish equilibrium in MR signal). All stimuli were presented against a plain, white background and participants responded with their right finger if the stimuli matched and with their left finger if they did not match. Since participants may think of different words than intended for the pictured stimuli used in the phonological task, *post-hoc* testing asking names of each picture was performed for each child to verify whether there were discrepancies between potentially ambiguous images that may have alternative, yet still correct, pronunciations. This was necessary to ensure accurate task performance calculation tailored for each subject. Due to the young age of participants, data were used if their task accuracy total was greater than chance. Overall accuracy as well as reaction times for all correctly answered trials are shown in **Table 1**.

### **STRUCTURAL AND FUNCTIONAL MRI DATA ACQUISITION**

Imaging was conducted at the Lucas Center for Imaging at the Stanford University School of Medicine. Imaging data was acquired using GE Healthcare 3.0 Tesla 750 scanner and an 8-channel phased array head coil (GE Healthcare, Waukesha, WI). Images acquired included an axial-oblique 3D T1-weighted sequence (fast spoiled gradient recalled echo [FSPGR] pulse sequence, inversion recovery preparation pulse [TI] = 400 ms; repetition time [TR] = 8.5 ms; echo-time [TE] = 3.4 ms; flip angle = 15◦; Receiver bandwidth ± 32 kHz; slice thickness = 1.2 mm; 0.86 × 0.86 mm in-plane resolution; 128 slices; number of excitations = 1; field-of-view [FOV] = 22 cm; acquisition matrix = 256×192). The total scan time was 4:54.

Functional MRI (fMRI) data were acquired using an axial 2D GRE Spiral In/Out (SPRLIO; Glover and Law, 2001) pulse sequence (*TR* = 2000 ms; *TE* = 30 ms; flip angle = 80◦; Receiver bandwidth +125 kHz; slice thickness = 4.0 mm; number of slices = 31, descending; 3.44 × 3.44 mm in-plane resolution; number of temporal frames = 85; FOV = 22 cm). The total duration of each task was 5:12.

### **REGIONS OF INTERESTS (ROIs)**

Bilateral regions-of-interest (ROIs) used in this study were: (a) pars triangularis and pars opercularis of the IFG (IFGtri and IFGop, respectively) based on previous studies of language development, literacy, and handwriting in IFG (Longcamp et al., 2003, 2008), (b) Exner's region based on its role in generating graphemic-motor commands (Exner, 1881; Ritaccio et al., 1992; Roux et al., 2010; Planton et al., 2013), (c) SPL based on its involvement with complex motor sequences that contribute to the accuracy of written expression (Alexander et al., 1992; Sakurai et al., 2007), and (d) fusiform gyrus based on its role in letter (James and Gauthier, 2006) and word processing (Cohen et al., 2000). Automated Anatomical Labeling (AAL) (Tzourio-Mazoyer et al., 2002) in the WFU PickAtlas toolbox (Maldjian et al., 2003) was used to generate ROIs (a), (c), and (d). Exner's area ROI (b) was selected based on a neuroimaging study (Matsuo et al., 2003) as a region of the left precentral gyrus (PreCG, BA 6), adjacent to BA 9 and BA 44 (Talairach coordinates [TAL]: −46, 3, 27). A sphere with a diameter of 10 mm centered around these coordinates was used as the Exner's area ROI.

### **PREPROCESSING OF fMRI IMAGES**

Processing of fMRI data was performed with statistical parametric mapping software (SPM8; Wellcome Department of Cognitive Neurology, London, UK) in the MATLAB computing environment (The MathWorks, Natick, MA). After image reconstruction, each participant's data were slice time corrected, realigned to a reference volume and corrected for motion and artifacts using both SPM and in-house tools (http://www.nitrc.org/projects/art\_repair/). Data were spatially normalized to Montreal Neurological Institute (MNI) space using normalization parameters obtained from the children's segmented gray matter images of high resolution T1 MRI normalized to standard template and applied to the mean functional image. Resultant images were resampled to 2 × 2× 2 mm voxels in MNI stereotaxic space. Spatial smoothing was done with an 8-mm isotropic Gaussian kernel. Each participant's data were high pass filtered at 128 s, and analyzed using a fixed effects model examining task; rest was not modeled and was included as implicit baseline. Five of the 46 participant's data were not included (final *N* = 41) because of excessive motion (criteria: relative motion <1.0 mm), at or below chance task performance (criteria: accuracy ≤50%), and/or scanner artifact (*N* = 5).

### **STATISTICAL ANALYSES OF fMRI DATA: MAIN ANALYSES OF INTEREST**

Statistical analysis was performed first using a fixed effects analysis for each participant modeling each condition. Task vs. rest contrasts were used for further group analysis for the purposes of this study as stated in the Functional MRI Tasks section above. Using random effects analysis, a one sample *t*-test was performed to examine brain regions that were active during the phonological

### **Table 1 | Demographics and correlations.**



*\*p* < *0.05 level (2-tailed).*

*\*\*p* < *0.01 level (2-tailed).*

*aWriting samples derived from Woodcock-Johnson III Spelling (subtest from Test of Cognitive Abilities).*

*bFull Scale Intelligence Quotient.*

*cBrief Intellectual Ability.*

*dPeabody Picture Vocabulary Test.*

*eWoodcock Reading Mastery Tests.*

*fComprehensive Test of Phonological Processing (Phonological Awareness* <sup>=</sup> *Elison* <sup>+</sup> *Blending).*

*gRapid Automatized Naming.*

*hBeery-Buktenica Developmental Test of Visual-Motor integration.*

*i Developmental Neuropsychological Assessment.*

*j Total Gray Matter Volume.*

*kTotal While Matter Volume.*

fMRI task [*p* = 0.05 family-wise error (FWE) corrected, at the whole brain level].

Next, simple correlation analysis was performed between brain activation during the fMRI tasks and handwriting skills in the ROIs using a statistical threshold of *p* = 0.05 family-wise error (FWE) corrected for height using small volume correction. We also examined voxel-by-voxel associations in the whole brain at a more lenient threshold of *p* = 0.001 uncorrected for height to examine whether there are any clusters outside the ROIs that showed significant effects at this more lenient threshold.

### **STATISTICAL ANALYSES OF fMRI DATA: CONTROL ANALYSES**

Control analyses were performed in several ways. First, analyses examining associations between handwriting quality and brain activation during the phonological task regressing out the nonhandwriting motor and writing abilities such as visuomotor skills (rank order of BVMI), oromotor skills (NEPSY-II oromotor subtest), and spelling (WJ-III spelling subtest), as well as correlated demographic variables [age (there was a trend for significant effects of older age correlating with better handwriting), gender (males had significantly poorer handwriting than females)] were performed. Second, ROI based regression analyses between brain activation during the phonological task and these aforementioned non-handwriting motor and writing abilities were performed. Statistical threshold was set similarly to the main analysis at *p* = 0.05 FWE corrected for the ROIs (and *p* = 0.001 uncorrected for the whole brain to examine whether there are any clusters outside the ROIs that showed significant effects at this more lenient threshold). Third, whole-brain and ROI analyses were performed correlating brain activation during the supplemental visual-symbol matching and color matching tasks and handwriting skills (rank order of WJ-III spelling writing samples). Since we only had data from these tasks in half of the participants, in order to show that the significant effect in the meta-phonological task and not the supplementary tasks was not due to power issues, we went back and repeated the main correlation analysis (between brain activation during the meta-phonological processing task and handwriting skills) using a smaller sample with data from both the meta-phonological and supplementary tasks.

### **PREPROCESSING AND STATISTICAL ANALYSIS OF T1 STRUCTURAL MR IMAGES**

Voxel-based morphometry (VBM) analysis of T1-weighted MRIs was performed using Statistical Parametric Mapping, version 8, (SPM8) (http://www.fil.ion.ucl.ac.uk/spm). After alignment to AC-PC axis, T1-weighted images were biascorrected and segmented to gray matter, white matter, and cerebrospinal fluid, using SPM8 default tissue probability maps and "New Segment" tool, which also included an affine regularization to warp images to the included International Consortium for Brain Mapping (ICBM) template, producing rigidly aligned tissue class images. Inter-subject registration was achieved with Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL), using default settings. Jacobian-scaled ("modulated"), warped tissue class images were created with DARTEL's "Normalize to MNI Space" tool, which spatially normalized images to MNI space, converted voxel sizes to 1.<sup>5</sup> <sup>×</sup> <sup>1</sup>.<sup>5</sup> <sup>×</sup> <sup>1</sup>.5 mm3 to match the DARTEL template, and smoothed images with a standard Gaussian filter of full-width at half-maximum (FWHM) equal to 8 mm. For each participant, segmentation and normalization accuracy were manually inspected. 41 of 46 participants were included in this analysis due to usability issues caused either by artifacts or excessive motion. Statistical analyses were performed similarly to fMRI analyses using the same statistical thresholds but additionally controlling for total gray matter volume. Finally, associations between regional gray matter volume and brain activation were performed where the spatial location at least partially overlapped. The reported Talairach coordinates were converted from MNI space using the mni2tal function (http://www.mrc-cbu.cam.ac.uk/Imaging/ Common/mnispace.shtml). Talairach Daemon (Research Imaging Center, University of Texas Health Science Center; Lancaster et al., 1997, 2000) and the atlas by Talairach and Tournoux (1988) were initially used to identify Brodmann Areas. The final anatomic locations are reported according to their anatomic location overlaid on the custom template.

## **RESULTS**

## **BEHAVIORAL RESULTS**

**Table 1** shows demographic and behavioral characteristics as well as associations between these measures and handwriting quality. Age, handedness, and maternal education, often used as a proxy for environment, did not yield any significant associations with handwriting performance (all *p*'s > 0.05). However, as one might expect based on the fact that the handwriting measure was ranked-ordered and not standardized, even though the range of ages in these children were narrow (5–6 years of age), age showed a trend for significant positive association with handwriting [*r*(44) = 0.27; *p* = 0.075], and gender effects were found [*t*(44) = 2.64, *p* = 0.012] with boys demonstrating significantly weaker handwriting performance as compared to girls. Further, while handwriting performance was not significantly correlated with spelling standard scores [**Table 1**, *r*(44) = 0.18, *p* = 0.24], spelling raw scores were significantly related [*r*(44) = 0.36, *p* = 0.013]. (Since the ranking of handwriting quality was not a standardized measure, this was expected). Visuomotor skill ratings (see above for definition) were also significantly correlated with BVMI standard scores, which is expected since visuomotor integration skill was the construct being evaluated [*r*(44) = 0.658, *p* < 0.001]. We also found, as anticipated, that rater's ranking of handwriting and visuomotor skills were associated with one another [*r*(44) = 0.45, *p* = 0.002].

### **fMRI RESULTS**

First we examined brain regions that showed significant activation during the reading-related phonological processing task in all participants. We found that these emerging readers elicited significant activation at *p* = 0.05 corrected in bilateral (left > right) IFG, left superior, middle frontal gyrus and PreCG, left inferior parietal lobule and bilateral occipito-temporal region (**Figure 2**, **Table 2**). It is important to note that the behavioral profiles of participants included in this study are not representative of a normal population (see **Table 1**), so the results presented here are not yet generalizable.

Phonological activity was negatively associated with better handwriting quality in the right IFG within Broca's Area/ Brodmann Area 45 / pars triangularis [TAL: *X* = 44, *Y* = 24, *Z* = 15; peak *T* = 3.74; *p* = 0.033 corrected; mean cluster *r*(39) = −0.43; **Figure 3A**]. Even when performing whole-brain analysis at a lenient threshold of *p* = 0.001 uncorrected, a cluster in the right IFGtri was the only region that showed a significant effect (TAL: *X* = 40, *Y* = 27, *Z* = 17; peak *T* = 3.77; *p* < 0.001 uncorrected). Exner's area (TAL: *X* = 48, *Y* = 7, *Z* = 22), although non-significant, also showed a distinctive trend in the same direction (*p* = 0.054 corrected). Given Exner's well-documented involvement in handwriting, this trend was included in **Figure 3**). No significant positive correlations were observed either at *p* = 0.05 corrected or *p* = 0.001 uncorrected. Activity in the right IFGtri cluster during the phonological task was also negatively correlated with CTOPP phonological memory composite scores (*r* = − 0.31, *p* = 0.049) and memory for digits subtest (*r* = −0.37, *p* = 0.017).

**FIGURE 2 | Brain activation during the phonological processing fMRI task.** Clusters in warm colors indicate those significant at *p* < 0.05 family wise error (FWE) corrected. Those significant at *p* < 0.001

uncorrected, cluster extent = 10 are also included to show the extent of these clusters at a more lenient threshold. Note: Left Hemisphere is shown on left side.

### **Table 2 | Regional brain coordinates.**


Control analyses were performed in three ways. First, the negative correlation in the right IFGtri remained significant using whole-brain regression analysis of the phonological fMRI task even after regressing out variables that correlated with handwriting quality as well as other motor and writing skills such as age, gender, visuomotor skill (rank ordered BVMI responses), oromotor skills, BVMI (dominant/right hand) raw scores, and WJ-III Spelling raw scores (*r* = −0.369, *p* = 0.029).

**FIGURE 3 | Brain regions associated with handwriting quality. (A)** Clusters that show negative association with brain activation during a phonological processing fMRI task are shown. Pink circles indicate a cluster that is significant at *p* = 0.05 corrected (right inferior frontal gyrus pars triangularis, IFGtri), and cyan circles indicate a cluster that shows a trend *p* = 0.054 corrected (right Exner's area). Clusters indicate voxels significant at a lenient threshold of *p* = 0.05 uncorrected to show greater extent of activation. XYZ coordinates are in Talairach coordinates. Panel on the right

shows a scatter plot representation of the cluster that shows significant negative association at *p* = 0.05 corrected (pink cluster). Brain activation is

Second, control analyses were then performed using ROIbased (IFGtri and IFGop from AAL) and whole-brain regression between activation during phonological processing and motor and writing skills other than handwriting skills. Correlations between right IFG activation and unstandardized visuomotor skills (see Methods for definition) (peak *T* = 2.51; *p* = 0.19 corrected; *p* = 0.008 uncorrected; **Figure 4**), oromotor skills (peak *T* = 3.03; *p* = 0.071 corrected; *p* = 0.002 uncorrected) and spelling (peak *T* = 0.42; *p* = 0.85 corrected; *p* = 0.29 uncorrected) were not significant, controlling for age (either by regressing age out or by using normed scores).

Third, no significant positive or negative correlation was observed with handwriting quality and brain activation during either the visual-symbol matching or color matching tasks, demonstrating that the association is likely to be specific to the phonological processing task. Since we only had data from half the sample for both the visual-symbol and color matching tasks (in what we called Cohort 1), we repeated the main correlation analysis between brain activation during phonological processing and handwriting skills using the participants included in this control analysis and still found significant results in the right IFGtri (*r* = −0.49, *p* = 0.024).

defined as contrast estimates, which are based on combined beta estimates of the phonological condition vs. rest. **(B)** Clusters that show positive association with regional gray matter volume are shown. Pink circles indicate clusters that are significant at *p* = 0.05 corrected in the right IFGtri. Clusters indicate all voxels significant at *p* = 0.001 uncorrected, cluster extent = 10 as reference to show the extent of these clusters at a more lenient threshold. A small cluster in the left IFGtri is observed at this threshold. XYZ coordinates are in Talairach coordinates. **(C)** Voxels that show overlap in fMRI activation from **(A)** and VBM gray matter volume from **(B)** in the right inferior frontal region. Note: Left Hemisphere is shown on left side.

### **VBM RESULTS**

We specifically examined whether there were structural correlates of the functional finding by evaluating whether there were significant associations with the right IFG regional gray matter volume and handwriting quality controlling for total gray matter volume. We found a significant *positive* correlation between handwriting quality and regional gray matter volume in the right IFGtri, spatially overlapping with the fMRI results (TAL: *X* = 40, *Y* = 27, *Z* = 17; peak *T* = 3.66; *p* = 0.027 corrected; **Figures 3B,C**). The association was however, positive and in the opposite direction to the fMRI findings. Even when the whole-brain was examined rather than the a priori hypothesized ROIs, four clusters in right IFGtri—middle frontal gyrus, left IFGtri, right middle temporal gyrus, and right postcentral gyrus—intraparietal sulcus (inferior/superior parietal lobule) were the only regions that showed a significant effect at a lenient threshold of *p* = 0.001. There were no brain regions that showed significant negative association with gray matter volumes or significant positive or negative association with white matter. The positive correlation in the right IFGtri remained significant using even after regressing out variables that correlated with handwriting skills such as age, gender, visuomotor skill (rank ordered BVMI responses), BVMI (dominant/right hand) raw scores, and WJ-III Spelling raw

**FIGURE 4 | Brain regions associated with visuomotor (copying) skill. (A)** Clusters that show a negative association (*p* = 0.01 uncorrected) between visuomotor (copying) skill and brain activation during a phonological

processing fMRI task are shown. **(B)** Non-significant correlation between visuomotor (copying) skill and activation in the right IFG during phonological processing.

scores (*r* = 0.536, *p* = 0.001). Further, regional gray matter volume significantly correlated with functional activation from the main analysis (*r* = −0.323, *p* = 0.043).

## **DISCUSSION**

We have presented results examining beginning writers/readers' association between handwriting quality and brain activation. Our preliminary results showed that poorer handwriting quality was associated with stronger activation of the right IFGtri when children judged whether a pair of pictures starts with the same sound. Furthermore, these results overlapped spatially with reduced regional gray matter volume in the right IFGtri in children with less proficient handwriting. Brain activation during supplementary fMRI tasks, where children judged visual similarities between pairs of unfamiliar symbols and discriminated between colors, were not associated with handwriting quality. Regional gray matter volume associations were also significantly correlated with the functional associations specific to the right IFG during the phonological task. These findings show the significance of IFG in handwriting quality in beginning writers, demonstrating that increased activation in the right IFGtri during a task likely related to the phonological processes involved in reading is associated with reduced handwriting quality, which in turn showed structural brain correlates. While our control condition was rest in our phonological fMRI task because of the young age of our participants (see Methods—Functional MRI Tasks above), we believe the task taps at least partially into phonological processing. This is because other studies using comparable tasks as well as our own study have successfully shown phonological processing related reading networks to be active during the task (see Methods). Additionally, we have included two supplementary tasks to show that the findings were at least not due to more non-specific aspects of the task such as visual perception, judgment and motoric responses. The results of this study show that the neuroanatomical properties and phonologically related neurofunctional properties of the IFG may be essential in the development of complex motor skills required in handwriting.

The IFG is a heterogeneous region with many functions. Existing literature on the IFG suggests its involvement in an exhaustive list of language abilities, including: syntactic processing (Embick et al., 2000), accessing orthographic long-term memories in the form of stored motor plans (Hillis et al., 2002; Rapp and Dufor, 2011), coordinating orthographic lexical selection and retrieval (Purcell et al., 2011), verbal working memory (Paulesu et al., 1993), letter perception and letter transcription (James and Gauthier, 2006), activation during speech generation (Liotti et al., 1994), grasping and manipulating objects (Rizzolatti et al., 1988), silent naming of manipulable objects (Grafton et al., 1997), observation of manipulable objects (Grafton et al., 1997), and when handwriting novel letterforms (Longcamp et al., 2008). Regarding its purported function in relation to writing, a recent meta-analysis of handwriting studies (Planton et al., 2013) found evidence for IFG involvement in writing, and in particular when contrasted against a control motor task (e.g., vocalization), but not for contrasts that controlled for linguistic input processing. This supports the role of the IFG in processing linguistic input during writing rather than motoric output (Planton et al., 2013). In our study, we additionally show that handwriting quality correlated not only with IFG volume, but also with activation during a task that was at least partially related to phonological processing. This suggests that at the beginning stage of reading and writing, there is a tight coupling between IFG—albeit right lateralized—and handwriting, possibly via phonological processing. It is interesting to note that handwriting quality also correlated with a behavioral measure of phonological encoding (spelling). We interpret our predominant results on the right hemisphere (left hemisphere involvement was present but only at subthreshold) in terms of neuronal efficiency, which we discuss below.

Although there is evidence for IFG involvement in a variety of tasks, its robust associations with phonological processing and lexical retrieval are likely the most relevant with respect to reading. Many aspects of language processing show leftward functional asymmetry in the IFG in most adults (Price, 2010). Although children show some indication of frontal left hemisphere asymmetry, the degree of asymmetry increases into adulthood (Holland et al., 2001; Szaflarski et al., 2006). Increased left functional asymmetry for language production has been linked to increased vocabulary and non-word reading scores in children (Groen et al., 2012) and more bilateral or right hemisphere IFG activations found in disabled populations, such as reading impaired dyslexics (Calvert et al., 2000; Pugh et al., 2001; Hoeft et al., 2011). Larger activation extents in the IFG have also been reported in children during linguistic tasks (Gaillard et al., 2000; Hoeft et al., 2011). This suggests a developmental reorganization and refinement of frontal language circuits through young adulthood. Our finding of a negative correlation between children's handwriting performance and right IFG activation is consistent with a common maturational process affecting handwriting and phonological processing. Children with high activation in the right IFG during phonological processing may be developmentally delayed with respect to adult-like patterns of functional asymmetry for language processing and consequently be delayed in the development of handwriting performance, either via a direct link between phonological skills and handwriting or a more general, domain independent delay. However, the specificity of our findings argues against a general delay.

An alternative account, which does not assume functional homology between the left and right IFG, is that improved handwriting is associated with increased computational efficiency or neural coding—and hence reduced BOLD signal increase—in the right IFG for reading-related functions. This phenomenon, known as neural efficiency, posits that brighter individuals use their brains more efficiently and is often used to explain the inverse relationship between brain activation and task performance (Haier et al., 1988). A recent study by Holland et al. (2011) has shown that greater recruitment of the IFG is associated with slower naming (reduced proficiency) during a picturenaming task. Further, decrease in right IFG activation during an orthographic processing task has been shown with orthographic training, a process known to contribute uniquely to handwriting, spelling, and composition (Richards et al., 2006a). Traininginduced reduction in right IFG activation has also been shown to correlate with improved phonological decoding (Richards et al., 2006b). The positive association between handwriting performance and gray matter volume may be compatible with this interpretation. Morphometric studies have found that increased regional gray matter volume may result in less energy consumption when that area is employed (Haier et al., 2004), and it is generally accepted that increased volume denotes increased cognitive capacity. This interpretation is further supported by the negative correlation between behavioral measures of phonological memory and right IFG activation during the phonological task. In our study, while both age and gender showed associations to handwriting quality (see **Table 1**), our findings persisted even when these factors were regressed out. Moreover, there were no significant correlations with environmental measures (e.g., Home Literacy Inventory) used as proxies to control for differential exposure to reading/writing materials. Thus, there is some indication that observed differences are not related to age or environmental differences, but instead to differences in maturational development of language related processes or neural efficiency.

Recent studies of handwriting in children have found differences in activation within the fusiform gyrus (Longcamp et al., 2008; Richards et al., 2009a,b), an area known as critical for orthographic processing and implicated both in letter and word perception, critical components for both reading development and handwriting acquisition (James and Engelhardt, 2012). Other studies note the importance of Exner's area and the SPL. Exner's area has been implicated for its role in bridging the gap between orthography and the motor programs necessary for handwriting (Roux et al., 2010; Planton et al., 2013), and the emerging consensus regarding the SPL posits that this region is involved in the abstract representation, sequential selection, and production of letter shapes (Rapp and Dufor, 2011; Planton et al., 2013; Rothlein and Rapp, 2014). We did not demonstrate a significant association between handwriting quality and neuroanatomical structure or activation in ROIs other than the IFG, such as Exner's area, fusiform gyrus and the SPL. The absence of significant results in Exner's area (though there was a trend for significance also on the right hemisphere) and the SPL may be explained by the fact that most studies that have reported these regions have used adult participants. Research has shown that in adults specific neural substrates respectively correspond to differing letter representations (Rothlein and Rapp, 2014), but this cerebral organization is likely very different in early development. It may be that the phonological processing subserved by the IFG becomes less necessary for writing as language skills become more automatic. Once this occurs, regions such as Exner's and SPL, important in the motoric and visuo-spatial component become more involved (regions thought to be specialized for fluent, automatic handwriting). It may also be the case that significant effects may have been observed in these regions if a different fMRI task was used that emphasize more motoric and visuo-spatial components, though this will not explain the lack of associations neuroanatomically. Another probable explanation is that the inverse correlation with activation in the IFG may correspond with the emergence of neural circuits in posterior writing areas in better readers. It is possible that this was not detected in our study due to the small, age-limited sample. In which case, the IFG activation may relate not to letter formation, but rather to its well-established role in motor planning and executive function. Further, while the activation observed in our study is assumed to be essential for the phonological task, some studies have shown that activation does not necessarily correspond to what is necessary for the particular tasks being administered (Rothlein and Rapp, 2014). Future studies will need to dissociate these possibilities.

The lack of association between handwriting quality and activation and neuroanatomical patterns in fusiform gyrus is more difficult to explain, especially as significant association with handwriting and reading (letter perception) was found in beginning writers / readers. While again, this requires further investigation (as described in limitations), it is possible that the lack of significant findings in the fusiform gyrus is related to the nature of the phonological task we used, as our task requires no orthographic processing, and hence no interaction was found with handwriting quality in the fusiform gyrus. Thus, it is very likely that if we included another fMRI task related to letter perception or orthographic processing as in James and Engelhardt (2012), we would have seen associations with handwriting in the fusiform gyrus (and SPL) also even though this will not account for the lack of neuroanatomical findings in this region.

Our study provides insights into why some children with dyslexia have been found to have poorer handwriting as well (Berninger et al., 2008). Previous literature has indicated children with dyslexia taught both word decoding and handwriting showed improvement in reading as well as orthographic decoding (Berninger et al., 2008; Berninger and Richards, 2011). It has also been shown in adults with pure alexia that reading performance can be improved through handwriting practice (Seki et al., 1995; Bartolomeo et al., 2002). Recently, a related study on the relationship between handwriting experience and neurological development in beginning readers showed that those with more experience printing and tracing activated the IFG during letter perception more than children with experience typing or copying (James and Engelhardt, 2012). Accepting past literature showing the IFG as important for linking features together to construct an organized whole, these researchers proposed that the IFG may be important for motor planning, control and execution. At a minimum, our study is distinguished from James and Engelhardt in that rather than investigating letter perception, our tasks did not include stimuli related to written languages (e.g., letters and words) and still found significant associations. Further, we find neuroanatomical evidence of associations between IFG and writing. Our findings hence provide novel findings adding to the important role of the process of writing in reading development.

## **FUTURE DIRECTIONS AND LIMITATIONS**

Future studies investigating handwriting quality and development may assess the role of maturation, lateralization and neural efficiency related to handwriting by following children longitudinally, and by examining lower level visual and motor processing, spelling and writing compositions. Attention also plays a role in successful handwriting (McCutchen, 1996) and while we did incentivize and encourage attention, future studies may examine better ways to control or account for attention.

There are several limitations to our study that will need to be addressed in future research. First, our phonological processing task where children judged whether the initial sounds of the names of pictures matched did not have a well-matched control condition such as a picture matching condition. Although we included supplementary fMRI tasks we had available (e.g., visual matching and color matching tasks), these may have been inadequate to serve as control tasks. This determination was based on our preliminary study in young children (see Methods— Functional MRI Tasks for details). Second, while unrealistic to keep children in kindergarten in the scanner for long periods of time, future studies may include fMRI tasks specifically related to writing, orthographic, visual and motor processing in addition to phonological processing to examine task induced differences in activation patterns as it relates to handwriting. Third, while qualitative/holistic approaches remain the most common way to assess handwriting quality (Wagner et al., 2011), there is need to find more quantitative methods, such as using computer algorithms to interpret handwriting quality and errors. Fourth, the working memory condition during fMRI was not significantly different from the non-working memory condition, and hence we were unable to address the issue of working memory in writing. Fifth, the participants included in this study were gifted compared to normative populations with standardized behavioral profiles well above average (see **Table 1**), potentially reducing the extent to which our results are generalizable. Finally, we compared a copying task (BVMI) to a spelling task (WJ-III), and there were differences in task requirements, such as encoding differences, and letters vs. symbols, as well as other potential differences such as verbal short-term memory and visual long-term memory (remembering shapes of letters); these should be dissociated in future studies. Despite these limitations, our study is an important step in identifying the neural substrates of handwriting quality in beginning writers.

## **CONCLUSIONS**

In the current study, we provide evidence of direct neural links between handwriting quality, a skill that has been strongly associated with higher level writing skills and reading, and neural processing underlying phonological processing, which is thought to be causally related to reading acquisition. In contrast to studies focused on neurologically impaired individuals (e.g., Benson, 1979; Exner, 1881; Kaplan and Goodglass, 1981), we took a dimensional approach to investigate handwriting and have provided preliminary but novel evidence that the IFG may be a key link between phonological processing and handwriting quality during early phases of language development. The findings in the current study indicate that during early development of reading and writing skills, successful handwriting quality, measured by one's ability to shape and form letters coherently, relies on the right IFG, and that this efficiency corresponds to successful phonological processing.

### **ACKNOWLEDGMENTS**

This study was supported by grants from the National Institute of Child Health and Human Development (K23HD054720), Child Health Research Program (CHRP), Lucile Packard Foundation for Children's Health (LPFCH), Spectrum Child Health, Clinical and Translational Science Award, and the Dyslexia Foundation (TDF)'s Extraordinary Brain Series to FH. We thank Anne Sawyer and Gary Glover for assistance with MRI data collection.

### **REFERENCES**


Bartolomeo, P., Bachoud-Lévi, A. C., Chokron, S., and Degos, J.-D. (2002). Visually- and motor-based knowledge of letters: evidence from a pure alexic patient. *Neuropsychologia* 40, 1363–1371. doi: 10.1016/S0028-3932(01)00209-3

Beery, K. E. (1997). *The Beery-Buktenica VMI: Developmental Test of Visual-Motor Integration with Supplemental Developmental Tests of Visual Perception and Motor Coordination: Administration, Scoring, and Teaching Manual.* Parsippany, NJ: Modern Curriculum Press.

Beery, K. E., and Beery, N. A. (2004). *The Beery-Buktenica Developmental Test of Visual-Motor Integration with Supplemental Developmental Tests of Visual Motor Integration and Motor Coordination and Stepping Stones Age Norms From Birth to Age Six: Administration, Scoring and Teaching Manual, 5th Edn.* Minneapolis, MN: NCS Pearson, Inc.

Benson, D. F. (1979). *Aphasia, Alexia, and Agraphia*. New York, NY: Churchill Livingstone.

Berninger, V. W., Dunn, A., Lin, S. C., and Shimada, S. (2004). School evolotion: scientist-practioner educators creating optimal learning environments for all students. *J. Learn Disabil*. 37, 500–508. doi: 10.1177/00222194040370060401

Berninger, V. W., Nielsen, K. H., Abbott, R. D., Wijsman, E., and Raskind, W. (2008). Writing problems in developmental dyslexia: under-recognized and under-treated. *J. Sch. Psychol.* 46, 1–21. doi: 10.1016/j.jsp.2006.11.008

Berninger, V. W., and Richards, T. (2011). Inter-relationships among behavioral markers, genes, brain and treatment in dyslexia and dysgraphia. *Future Neurol.* 5, 591–617. doi: 10.2217/fnl.10.22

Berninger, V. W., Rutberg, J. E., Abbott, R. D., Garcia, N., Anderson-Youngstrom, M., Brooks, A., et al. (2006a). Tier 1 and Tier 2 early interventions for handwriting and composing. *J. Sch. Psychol*. 44, 3–30. doi: 10.1016/j.jsp.2005.12.003

Berninger, V. W., and Winn, W. (2006b). "Implications of advancements in brain research and technology for writing development, writing instruction, and educational evolution," in *Handbook of Writing Research,* eds C. MacArthur, S. Graham, and J. Fitzgerald (New York, NY: Guilford Press), 96–114.

Byrne, B., Coventry, W. L., Olson, R. K., Hulslander, J., Wadsworth, S., DeFries, J. C., et al. (2008). A behaviour-genetic analysis of orthographic learning, spelling and decoding. *J. Res. Read.* 31, 8–21. doi: 10.1111/j.1467-9817.2007.00358.x

Calvert, G. A., Brammer, M. J., Morris, R. G., Williams, S. C., King, N., and Matthews, P. M. (2000). Using fMRI to study recovery from acquired dysphasia. *Brain Lang.* 71, 391–399. doi: 10.1006/brln.1999.2272

Castles, A., and Coltheart, M. (2004). Is there a causal link from phonological awareness to success in learning to read? *Cognition* 91, 77–111. doi: 10.1016/S0010-0277(03)00164-1


Dehaene, S., Cohen, L., Sigman, M., and Vinckier, F. (2005). The neural code for written words: a proposal. *Trends Cogn. Sci.* 9, 335–341. doi: 10.1016/j.tics.2005.05.004

Eckert, M. A., Leonard, C. M., Richards, T. L., Aylward, E. H., Thomson, J., and Berninger, V. W. (2003). Anatomical correlates of dyslexia: frontal and cerebellar findings. *Brain* 126(Pt. 2), 482–494. doi: 10.1093/brain/awg026

Embick, D., Marantz, A., Miyashita, Y., O'Neil, W., and Sakai, K. L. (2000). A syntactic specialization for Broca's area. *Proc. Natl. Acad. Sci. U.S.A.* 97, 6150–6154. doi: 10.1073/pnas.100098897

Exner, S. (1881). *Untersuchungen über die Localisation der Functionen in der Grosshirnrinde des Menschen.* Vienna: Wilhelm Braumüller.

Feder, K. P., and Majnemer, A. (2007). Handwriting development, competency, and intervention. *Dev. Med. Child Neurol.* 49, 312–317. doi: 10.1111/j.1469- 8749.2007.00312.x

Freeman, F. N. (1959). A new handwriting scale. *Elementary Sch. J.* 59, 218–221. doi: 10.1086/459718

Gaillard, W. D., Hertz-Pannier, L., Mott, S. H., Barnett, A. S., LeBihan, D., and Theodore, W. H. (2000). Function anatomy of cognitive development: fMRI of verbal fluency in children and adults. *Neurology* 54, 180–185. doi: 10.1212/WNL.54.1.180

Glover, G. H., and Law, C. S. (2001). Spiral-in/out BOLD fMRI for increased SNR and reduced susceptibility artifacts. *Magn. Reson. Med.* 46, 515–522. doi: 10.1002/mrm.1222

Goodale, M., and Milner, D. (2005). *Sight Unseen.* Oxford: Oxford University Press.

Grafton, S. T., Fadiga, L., Arbib, M. A., and Rizzolatti, G. (1997). Premotor cortex activation during observation and naming of familiar tools. *Neuroimage* 6, 231–236. doi: 10.1006/nimg.1997.0293

Graham, S. (1982). Measurement of handwriting skills: a critical review. *Diagnostique* 8, 32–42.

Graham, S., and Weintraub, N. (1996). A review of handwriting research: progress and prospects from 1980 to 1994. *Educ. Psychol. Rev.* 8, 8–87. doi: 10.1007/BF01761831

Groen, M. A., Whitehouse, A. J. O., Badcock, N. A., and Bishop, D. V. M. (2012). Does cerebral lateralization develop? A study using functional transcranial Doppler ultrasound assessing lateralization for language production and visuospatial memory. *Brain Behav.* 2, 256–269. doi: 10.1002/ brb3.56

Haier, R. J., Jung, R. E., Yeo, R. A., Head, K., and Alkire, M. T. (2004). Structural brain variation and general intelligence. *Neuroimage* 23, 425–433. doi: 10.1016/j.neuroimage.2004.04.025

Haier, R. J., Siegel, K. H., Nuechterlein, E., Hazlett, J. C., Wu, J. C., Paek J., et al. (1988). Cortical glucose metabolic rate correlates of abstract reasoning and attention studied with positron emission tomography. *Intelligence* 12, 199–217. doi: 10.1016/0160-2896(88)90016-5

Henson, R. N. A., Burgess, N., and Frith, C. D. (2000). Recoding, storage, rehearsal and grouping in verbal short-term memory: an fMRI study. *Neuropsychologia* 38, 426–440. doi: 10.1016/S0028-3932(99)00098-6

Hillis, A. E., Kane, A., Tuffiash, E., Beauchamp, N. J., Barker, P. B., Jacobs, M. A., et al. (2002). Neural substrates of the cognitive processes underlying spelling: evidence from MR diffusion and perfusion imaging. *Aphasiology* 16, 425–438. doi: 10.1080/02687030244000248

Hoeft, F., McCandliss, B. D., Black, J. M., Gantman, A., Zakerani, N., Hulme, C., et al. (2011). Neural systems predicting long-term outcome in dyslexia. *Proc. Natl. Acad. Sci. U.S.A.* 108, 361–366. doi: 10.1073/pnas.1008950108

Holland, R., Leff P., Josephs, O., Galea, J. M. M., Desikan, M., Price, C., et al. (2011). Speech facilitation by left inferior frontal cortex stimulation. *Curr. Biol.* 21, 1403–1407. doi: 10.1016/j.cub.2011.07.021

Holland, S. K., Plante, E., Byars, A. W., Strawsburg, R. H., Schmithorst, V. J., and Ball, W. S. (2001). Normal fMRI brain activation patterns in children performing a verb generation task. *Neuroimage* 14, 837–843. doi: 10.1006/nimg.2001.0875

James, K. H. (2010). Sensori-motor experience leads to changes in visual processing in the developing brain. *Dev. Sci.* 13, 279–288. doi: 10.1111/j.1467- 7687.2009.00883.x

James, K. H., and Engelhardt, L. (2012). The effects of handwriting experience on functional brain development in pre-literate children. *Trends Neurosci. Educ.* 1, 32–42. doi: 10.1016/j.tine.2012.08.001

James, K. H., and Gauthier, I. (2006). Letter processing automatically recruits a sensory-motor brain network. *Neuropsychologia* 44, 2937–2949. doi: 10.1016/j.neuropsychologia.2006.06.026

Kaminsky, S., and Powers, R. (1981). Remediation of handwriting difficulties, a practical approach. *Acad. Ther.* 17, 19–25.

Kaplan, E., and Goodglass, H. (1981). "Aphasia-related disorders," in *Acquired Aphasia, 3rd Edn.,* ed M. T. Sarno (San Diego, CA: Academic Press), 309–333.

Karlsdottir, R., and Stefansson, T. (2002). Problems in developing functional handwriting. *Percept. Mot. Skills* 94, 623–662. doi: 10.2466/pms.2002.94.2.623

Katzir, T., Misra M., and Poldrack A. (2005). Imaging phonology without print: assessing the neural correlates of phonemic awareness using fMRI. *Neuroimage* 27, 106–115. doi: 10.1016/j.neuroimage.2005.04.013

Korkman, M., Kirk, U., and Kemp, S. I. (2007). *NEPSY II, 2nd Edn.* San Antonio, TX: PsychCorp/Pearson Clinical Assessment.

Lancaster, J. L., Rainey, L. H., Summerlin, J. L., Freitas, C. S., Fox, P. T., Evans, A. C., et al. (1997). Automated labeling of the human brain: a preliminary report on the development and evaluation of a forward-transform method. *Hum. Brain Mapp.* 5, 238–242. doi: 10.1002/(SICI)1097-0193(1997)5:4<238::AID-HBM6>3.0.CO;2-4

Lancaster, J. L., Woldorff, M. G., Parsons, L. M., Liotti, M., Freitas, C. S., Rainey, L., et al. (2000). Automated Talairach Atlas labels for functional brain mapping. *Hum. Brain Mapp.* 10, 120–131. doi: 10.1002/1097- 0193(200007)10:3<120::AID-HBM30>3.0.CO;2-8

Liotti, M., Gay, C. T., and Fox, P. T. (1994). Functional imaging and language: evidence from positron emission tomography. *J. Clin. Neurophysiol.* 11, 175–190.


spelling treatment in child dyxlexics. *J. Neurolinguistics* 22, 58–86. doi: 10.1016/j.jneuroling.2005.07.003


Sassoon, R. (2007). *Handwriting of the Twentieth Century*. Bristol: Intellect Books.


Woodcock, W., Kevin, S., Mather, M., and Mather, N. (2001). *Woodcock-Johnson Tests of Achievement (WJ III)*. Itasca, IL: Riverside Publishing.

Ziviani, J., and Elkins, J. (1984). An evaluation of handwriting performance. *Educ. Rev*. 36, 249–261. doi: 10.1080/0013191840 360304

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 December 2013; paper pending published: 02 February 2014; accepted: 02 March 2014; published online: 19 March 2014.*

*Citation: Gimenez P, Bugescu N, Black JM, Hancock R, Pugh K, Nagamine M, Kutner E, Mazaika P, Hendren R, McCandliss BD and Hoeft F (2014) Neuroimaging correlates of handwriting quality as children learn to read and write. Front. Hum. Neurosci. 8:155. doi: 10.3389/fnhum.2014.00155*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Gimenez, Bugescu, Black, Hancock, Pugh, Nagamine, Kutner, Mazaika, Hendren, McCandliss and Hoeft. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Does pronounceability modulate the letter string deficit of children with dyslexia? A study with the rate and amount model

## *Chiara V. Marinelli 1\*, Daniela Traficante2,3 and Pierluigi Zoccolotti 1,4*

*<sup>1</sup> Neuropsychology Research Centre, IRCCS Santa Lucia, Rome, Italy*

*<sup>2</sup> Department of Psychology, Catholic University of Milan, Milan, Italy*

*<sup>3</sup> NeuroMI – Milan Center for Neuroscience, Milan, Italy*

*<sup>4</sup> Department of Psychology, Sapienza University of Rome, Rome, Italy*

### *Edited by:*

*Peter F. De Jong, University of Amsterdam, Netherlands*

### *Reviewed by:*

*Sylviane Valdois, Université Pierre Mendes, France Marc Brysbaert, Ghent University, Belgium*

### *\*Correspondence:*

*Chiara V. Marinelli, Neuropsychology Research Centre, IRCCS Santa Lucia, Via Ardeatina 306, Rome, Italy e-mail: chiaravaleria.marinelli@ uniroma1.it*

The locus of the deficit of children with dyslexia in dealing with strings of letters may be a deficit at a pre-lexical graphemic level or an inability to bind orthographic and phonological information. We evaluate these alternative hypotheses in two experiments by examining the role of stimulus pronounceability in a lexical decision task (LDT) and in a forced-choice letter discrimination task (Reicher–Wheeler paradigm). Seventeen fourth grade children with dyslexia and 24 peer control readers participated to two experiments. In the LDT children were presented with high-, low-frequency words, pronounceable pseudowords (such as DASU) and unpronounceable non-words (such as RNGM) of 4-, 5-, or 6- letters. No sign of group by pronounceability interaction was found when over-additivity was taken into account. Children with dyslexia were impaired when they had to process strings, not only of pronounceable stimuli but also of unpronounceable stimuli, a deficit well accounted for by a single global factor. Complementary results were obtained with the Reicher–Wheeler paradigm: both groups of children gained in accuracy in letter discrimination in the context of pronounceable primes (words and pseudowords) compared to unpronounceable primes (non-words). No global factor was detected in this task which requires the discrimination between a target letter and a competitor but does not involve simultaneous letter string processing. Overall, children with dyslexia show a selective difficulty in simultaneously processing a letter string as a whole, independent of its pronounceability; however, when the task involves isolated letter processing, also these children can make use of the orthophono-tactic information derived from a previously seen letter string.This pattern of findings is in keeping with the idea that an impairment in pre-lexical graphemic analysis may be a core deficit in developmental dyslexia.

**Keywords: developmental dyslexia, lexical decision, Reicher–Wheeler paradigm, pronounceability, global factor, letter string**

## **INTRODUCTION**

In lexical decision tasks (LDTs) participants are required to discriminate between real words and foils. The difficulty of the discrimination varies according to the characteristics of the foils: as orthographic and phonological overlap between words and foils increases, the LDT becomes progressively more difficult. Participants are faster and more accurate at rejecting unpronounceable illegal non-words (i.e., letter strings such as GLDT) than pronounceable pseudowords (i.e., nonsense strings of letters that respect the orthographic rules of a given language but have no semantic content such as RINAFO; e.g., Holcomb and Neville, 1990; Forster et al., 2003; Ratcliff et al., 2004). Evans et al. (2012) reported that, as foils become increasingly word-like, non-word ("no") responses became significantly slower and less accurate: reaction times (RTs) were shorter, and accuracy was higher, for consonant strings compared to pseudowords. Moreover, also real word ("yes") responses in the context of increasingly word-like foils were slower and less accurate. Passing from non-word to

pseudoword foils there is a progressive increase in pronounceability as well as in orthographic and phonological similarity to real words. Moreover, as foils become more word-like and orthographic and phonological overlap increases, foils produce more activation of similar words (Harm and Seidenberg, 2004) and the discrimination becomes more difficult, indicating that a higher level of activation is required for a real word ("yes") decision to avoid false alarms.

Some authors investigated whether pronounceability influences the depth of processing required for lexical decision. James (1975) found that this task involves retrieval of semantic information (as highlighted by the concreteness effect) only in the presence of pronounceable distractors, while, decreasing the similarity between words and foils, the use of unpronounceable distractors makes the semantic retrieval unnecessary. Similarly, Evans et al. (2012) reported smaller effects of imageability and semantic priming as decision difficulty and RTs decreased from pseudohomophone, to pseudoword, and non-word foil contexts, with semantic effects minimized with unpronounceable foils. Moreover, semantic effects increased significantly as decisions became harder (and slower) with more word-like foils. Dimitropoulou et al. (2010) examined the masked onset priming effect (MOPE) by manipulating the primes' lexicality, frequency, and pronounceability. The MOPE indicates faster naming latencies when a target word (e.g., BREAK) is preceded by a briefly presented masked prime that shares its initial sound with the target (e.g., belly) compared to when it does not (e.g., merry) or when it rhymes with it (e.g., stake; Forster and Davis, 1991). This effect has been interpreted as an advantage in speech planning of the response or as evidence of prime processing by the non-lexical route. Dimitropoulou et al. (2010) found the MOPE for all types of stimuli but unpronounceable non-words, a result favoring the speech planning hypothesis. Finally, pronounceability has been examined also in terms of the facilitation present for repeated stimuli. In a LDT, the repetition priming was larger in experiments with pseudowords than in experiments with non-words (Ratcliff et al., 2004).

The effect of pronounceability has been evaluated not only in LDTs, but also with other experimental paradigms. Seidenberg et al. (submitted) examined whether pronounceabililty influences the reading aloud process. Participants had greater difficulty in naming non-words containing grapho-tactically illegal sequences of letters (e.g., JULBZ) as compared to grapho-tactically legal nonwords containing digraphs (i.e., multi-letter graphemes that map onto a single phoneme, e.g., the "ee" in NEESH). These findings indicate that pronounceability is a key factor in determining naming latencies, with pronounceable pseudowords being responded to faster than unpronounceable non-words. Differences are also observed when participants have only to identify a single letter in the stimulus in a post-cued letter-in-string identification task. Thus, letter recognition is more accurate in the context of a pronounceable pseudoword than in the context of a consonant string, the so-called pseudoword superiority effect (PSE; e.g., Baron and Thurston, 1973; Spoehr and Smith, 1975; Grainger and Jacobs, 1994, 2005).

It is worth noting that when also reading proficiency is taken into account the framework becomes more complex. In fact, if two groups (e.g., dyslexic and proficient readers) vary in general speed of processing (hereafter referred to as the global factor), group differences in latencies would depend on both the difficulty of a given task and the general group differences in processing speed (Faust et al., 1999). Then, for groups showing global differences in performance, one should expect to find over-additivity effects; i.e., the absolute group differences in performance would tend to grow as a function of task difficulty over and above the characteristics of the specific experimental manipulations (Faust et al., 1999). The presence of over-additivity may induce overestimation or underestimation of the contribution of specific variables modulating reading performance. Therefore, it may not be easy to identify the presence of a deficit in processing unpronounceable stimuli because of differences in task difficulty across experimental conditions and their interaction with basic group differences in rate of information processing. Models such as the rate and amount model (RAM; Faust et al., 1999) reveal the presence and characteristics of the global factor in information processing by

distinguishing between the performance of dyslexic and proficient readers and isolating the conditions in which children with dyslexia show specific deficits not ascribable to over-additivity (Zoccolotti et al., 2008).

In previous studies, we found that a single global factor accounted for a very large proportion of the impaired performance of children with dyslexia in making lexical decisions and reading words and pseudowords (Di Filippo et al., 2006; Zoccolotti et al., 2008; Marinelli et al., 2011). Other studies showed that the global factor was present for orthographic but not pictorial stimuli (Zoccolotti et al., 2008) and in the visual, but not the auditory, modality (Marinelli et al., 2011). The global factor did not emerge in a variety of letter and bigram tasks even though their general difficulty was made similar to that of letter strings (i.e., both words or pseudowords; De Luca et al., 2010). In fact, tasks mapping letter (and bigram) recognition loaded on a separate factor other than that accounting for words and non-words.

These studies indicate that children with dyslexia are selectively impaired in processing visually presented strings of letters with or without lexical value. We have proposed that this deficit has a pre-lexical graphemic locus (e.g., De Luca et al., 2010), i.e., marks an impairment in forming a graphemic description of the letter string (Zoccolotti et al., 2008). This idea is in keeping with other proposals based on imaging and lesional studies of the so-called "visual word form area" (VWFA; Cohen et al., 2000, 2002). The local combination detector (LCD) model (Dehaene et al., 2005) posits that written words are encoded by a hierarchy of detectors tuned to increasingly larger and more complex word fragments (visual features, single letters, bigrams, quadrigrams and, possibly, words). At the neural level, information from letter features and single letter converges on the VWFA; here, a posterior-to-anterior gradient is present with a progression in selectivity to increasingly word-like stimuli (e.g., Dehaene et al., 2004; Vinckier et al., 2007). Over years of practice, frequent combinations of letters are selected to be represented by dedicated neurons (Cohen et al., 2008), and the VWFA becomes attuned to the regularities of the writing system, yielding fast parallel processing in reading (Vinckier et al., 2007; Cohen et al., 2008). Importantly, several studies found that dyslexic individuals show selective hypo-activation of the VWFA (for a review see Richlan et al., 2009). In a similar vein, Marsh and Hillis (2005) proposed that this area is involved in the computation of a prelexical "grapheme description" independent of case, font, location, or orientation. Notably, such graphemic description does not require stored knowledge of spelling or spelling-sound correspondences. In this perspective, the reading impairment of children with dyslexia might be ascribed to a deficit at the level of graphemic analysis.

Another interpretation of the dyslexic deficit is that the impairment is related to the inability to bind orthographic and phonological information (Ziegler et al., 2010; van den Broeck and Geudens, 2012). Evidence in this direction comes from imaging studies indicating a close association between letter and speech sounds early in development (for a review see Blomert, 2011). Accordingly, effective letter–speech sound integration is an emergent property of learning to read supported by an interrelated network of visual, auditory, and heteromodal brain areas. There is evidence that dyslexic individuals are impaired in letter–speech sound integration. For example, in a functional magnetic resonance imaging (fMRI) study, Blau et al. (2009) reported that adult dyslexic readers showed underactivation of the superior temporal cortex for the integration of letters and speech sounds. In a further study, Blau et al. (2010) reported that, unlike control readers, cortical responses to speech sounds of dyslexic individuals were not modulated by letter–speech sound congruency. In a complementary line of research, it has been reported that proficient readers activate orthographic representations in phonological LDT while children with dyslexia do not (van der Mark et al., 2011). Thus, dyslexic individuals fail to activate orthographic representations during spoken language processing. Finally, also previously described results on the VWFA are not necessarily incompatible with an orthographic–phonological binding perspective. The presence of interactions between orthographic and phonological processing is suggested by evidence indicating connections between the VWFA and language areas (Cai et al., 2008; Greve et al., 2013). First, it has been noted that the VWFA shows a clear lateralization with only the left, linguistic, hemisphere that becomes specialized for reading (Cai et al., 2008); furthermore, it has been reported that asymmetries in the VWFA are correlated with the ear advantage in a dichotic listening task (Greve et al., 2013). Overall, it has been proposed that a deficit in orthographic–phonological binding may represent a proximal cause of the reading slowness in dyslexia and may also help understanding the deficit in reading fluency of these individuals (Blomert, 2011).

In the present research, we tested these alternative hypotheses by evaluating the role of stimulus pronounceability on accuracy and latency in two different tasks: a LDT (Experiment 1), and a two-alternative forced-choice task (the so-called Reicher–Wheeler paradigm: see Reicher, 1969; Wheeler, 1970; Experiment 2).

In the LDT we are interested in assessing whether the impairment shown by children with dyslexia in processing letter strings is present only for pronounceable words and pseudowords or it is also detectable (and of a similar size) with unpronounceable letter strings. The first outcome would favor an orthographic– phonological binding interpretation while the latter a pre-lexical graphemic locus of the reading deficit. In fact, strings of consonants are not pronounceable and, as such, would not activate orthographic–phonological binding to the same extent as words or pseudowords. Therefore, based on the orthographic–phonological binding hypothesis we would expect a smaller deficit for nonwords as compared to words and pseudowords (once the effect of over-additivity is controlled for). Based on the hypothesis of a letter string graphemic deficit, no interaction between groups and item pronounceability would be expected. All condition means in the LDT (for words, pseudowords, and non-words) are supposed to fit, in the RAM, with the same letter string factor.

The Reicher–Wheeler paradigm proposes the same types of stimuli used in the LDT, i.e., words, pseudowords and nonwords, but does not require the simultaneous processing of several letters. In fact, the decision is to be made on the discrimination between a target letter and a competitor in the context of a previously displayed letter string. Assuming that the main impairment of children with dyslexia has to do with simultaneous letter string processing and not with single letters, the lack of a global factor in the RAM model might be expected. However, examining the context effects (i.e., the lexicality and the pronounciability of the prime) it is possible to test whether children with dyslexia can gain advantage from lexical activation and pronounceability in graphemic processing as much as control children.

In particular, testing for the PSE (i.e., letter identification is more accurate in the context of a pseudoword than in the context of a unpronounceable non-word) would allow evaluating whether children can take advantage from a pronounceable letter string which forms a typical orthographic context in Italian. Furthermore, testing for the word superiority effect (WSE; i.e., letter identification is more accurate in the context of a word than in the context of a pronounceable pseudoword) would allow evaluating whether children can take advantage of the lexical activation triggered by a word context in the successive letter recognition. According to the orthographic–phonological binding hypothesis we would expect, for children with dyslexia, a lack of both PSE and WSE, while for skilled readers a role of lexicality and pronounceability of the context is expected. On the other hand, the letter string graphemic deficit hypothesis would predict no differences in the effect size of the WSE and PSE in relation to reading proficiency, as the task does not require any decision on the lexicality of a specific letter combination, but only the discrimination between a single target letter and a competitor one. Furthermore, as no selective deficit in orthographic–phonological interaction is envisaged, one would expect children with dyslexia to be able to take advantage of the ortho-phonotactic regularities of the language (i.e., they are expected to show a PSE).

In order to apply the RAM model to the data from both the LDT and Reicher–Wheeler paradigms, we examined the speed of processing of children with dyslexia in responding to these tasks. There is evidence that, even though most studies on the Reicher–Wheeler paradigm focused on accuracy, parallel effects have also been reported with time measures (RTs and visual evoked potentials; Ziegler et al., 1997; Martin et al., 2006). To test the global factor it is important to have a sizeable spread of performances across conditions (Faust et al., 1999). To this aim, in LDT we presented high-frequency words, low-frequency words, pseudowords and non-words, varying for length within each category (from 4 to 6 letters), for a total of 12 different conditions. Previous data on Italian children indicate that children with dyslexia show frequency (e.g., Barca et al., 2006) and lexicality (e.g., Zoccolotti et al., 2008) effects both in reading and LDT (Paizi et al., 2013). These effects tend to be greater in children with dyslexia than in typically developing children in raw data analyses but, typically, this group interaction disappears when over-additivity is controlled for (Di Filippo et al., 2006; Zoccolotti et al., 2008; Paizi et al., 2013). In the Reicher– Wheeler paradigm, primes were 4-letter high-frequency words, pseudowords and non-words, with the target letter in first, second, or third position, for a total of nine different conditions. Note that no data are yet available on this paradigm on Italian children.

### **EXPERIMENT 1**

The first experiment examined the performance of children with dyslexia and typically developing readers in a LDT with nonwords, pseudowords, low-frequency words, and high-frequency words presented intermixed.

### **METHOD**

### *Participants*

Participants were 41 fourth grade children with a normal intelligence (according to the Raven's Coloured Progressive Matrices; Pruneti et al., 1996) and adequate socio-educational conditions. In particular, there were 17 children with dyslexia (10 Male and 7 Female; mean age = 9.50 years, *SD* = 0.30) and 24 control readers (9 Male and 15 Female; mean age = 9.50 year, *SD* = 0.30). Children were selected by our psychology unit during a screening for learning disabilities carried out in local public schools of Rome. Parents were informed of the screening procedure and authorized their child's participation.

Children with dyslexia were selected for a marked reading delay (at least 2 *SD*s below normative data) in accuracy and/or speed in reading a text passage (MT reading test, Cornoldi and Colpo, 1998). None of the children had received treatment for their reading impairment. Criteria for inclusion in the control group included normal reading speed and accuracy on the MT reading test (Cornoldi and Colpo, 1998). Control participants were comparable to children with dyslexia for sex (χ<sup>2</sup> <sup>=</sup> 1.06, n.s.), age (*t*(40) = 0.08, n.s.), and Raven's test performance (*F*(1,41) = 0.04, n.s.). On the MT reading test (Cornoldi and Colpo, 1998), mean *z* scores of control participants were near zero for all parameters (accuracy: –0.03, *SD* = 0.80; speed: –0.10, *SD* = 0.57). By contrast, children with dyslexia performed worse than control readers in reading accuracy (mean *z* score = –3.00, *SD* = 1.20; *t*(40) = 9.45, *p* < 0.0001) and speed (mean *z* score = –1.75, *SD* = 1.25; *t*(40) = 5.71, *p* < 0.0001). As a group, children with dyslexia showed only mildly defective performance in reading comprehension (mean *z* score=–0.27, *SD* =0.43), but they scored lower than the control readers (mean *z* score = 0.04; *SD* = 0.26; *t*(40) = 2.84, *p* < 0.01).

A number of other tests were used to qualify the reading and cognitive profile of children with dyslexia. **Table 1** reports the performance on the *Word and Non-word Reading Test* (Zoccolotti et al., 2005), a standard test of words and non-words reading (for basic characteristics of this and other tests please refer to the legends of the Tables). As it can be seen from the table, performance of the group of children with dyslexia was significantly lower than that of control readers across all conditions. The table reports also the proportion of children performing at least 2 *SD*s below normative data; across all conditions 16 children out 17 children showed a deficit in at least one subset of the test (the odd one out had a moderate impairment in this test, about –1 *SD*, across all conditions). Impairment appeared more marked for low-frequency (both short and long) words and long high-frequency words.

On the *Test for the Diagnosis of Orthographic Deficit in Childhood* (Angelelli et al., 2008; **Table 2**), children with dyslexia, as a group, showed impaired performance on all subtests, except for that on spelling regular words. About half of the children with dyslexia showed severely impaired performance in spelling. **Table 3** reports the performance of the two groups of children on phonological and visual attention span tests (for information on these tests please refer to the Table legend). As a group children with dyslexia showed lower performance than control readers in several of these tests, i.e., the *Visual Attention Span* (Bosse et al., 2007), the *Repetition of Non-words Series* (Marinelli, 2010), and at the *Blending test* (Di Filippo et al., 2005) in the pseudoword condition (only a trend was present for the word condition). No difference was present at the *Digit Span* test (Wechsler, 2006). At any rate, it may be noted that, in most cases, the impairment was mild and (with the exception of the pseudoword condition of the *Blending test*) very few children showed frankly impaired performance in these tests.

### *Materials*

Ninety-six 4-, 5-, and 6-letter words were selected from the *EPOS 2* database (Baldi and Traficante, 2005), based on the ease of recognition (words recognized by more than 90% of subjects) and high familiarity (familiarity estimated higher than



*Values indicate z scores as compared to normative values (negative values indicate lower performance). Path. perf., pathological performance; HF, high-frequency; LF, low-frequency. The test is made of six A4 sheets, one for each subset of stimuli. There are 30 stimuli in each subset. Short stimuli are 4-, 5-letter long, while long stimuli are 8-, 10-letter long. Pseudo-words were generated from high-frequency words.*


**Table 2 | Performance of children with dyslexia and control readers in the***Test for the Diagnosis of Developmental Dysgraphia* **(Angelelli et al., 2008).**

*Values indicate z scores as compared to normative values (negative values indicate lower performance). Path. perf., pathological performance. Regular words (N* = *70) were with complete one-sound-to-one-letter correspondence; Non 1:1 regular words (N* = *10) were words containing sounds that require syllabic conversion rules {i.e., the orthographic transcription of a consonant is determined by the vowel that follows it; for example* [*k*] *followed by /a/o/u/* = *CA, CO, CU [e.g., CASA (* =*home) and CUBO (* =*cube)]; while* [*k*] *followed by /i/e/* = *CHI, CHE [e.g., CHIESA (* =*church)]}; ambiguous words (N* = *55) were words with unpredictable transcription along the phonological-to-orthographic conversion route (e.g.,* [*kwo*] *in* [*kwota*]*, the quota: QUOTA but not \*CUOTA); pseudo-words (N* = *25) were stimuli without lexical status but with one-sound-to-one-letter correspondence.*

### **Table 3 | Performance of children with dyslexia and normal readers on visual attention and phonological/metaphonological tests.**


*For all tests, values indicate z scores based on normative values (negative values indicate lower performance). Path. perf., pathological performance.*

*The Visual Attention Span (Bosse et al., 2007) is a task in which children see on the PC screen for 200 ms an unpronounceable string of five consonants (e.g., R H S D M) that, as such, cannot be recoded phonologically and must report as many letters as possible. Each letter is presented 10 times appearing twice in each of the five positions. The task includes twenty items and was implemented by the E-prime 2 software. Normative data on Italian children are presented in Marinelli (2010). Phonological span was assessed with the Digit span task of the WISC III (Wechsler, 2006).*

*In the Repetition of Non-word Series, ten lists of three bi-syllabic, 5-letter pseudo-words are read aloud by the examiner at a pace of about one every 2 s. The child is asked to repeat each list as accurately as possible immediately after presentation. Each correct non-word was awarded a point out for a maximum score of 30. Normative data on Italian children are presented in Marinelli (2010).*

*The BlendingTest (Di Filippo et al., 2005) is a measure of phonological awareness.Words (or pseudo-words) are presented phoneme-by-phoneme through an audiotape at a rate of one per second. At the end of the sequence, the child has to repeat aloud the whole stimulus. Nineteen (five- to six-letter) words/pseudo-words are presented. For each stimulus, the correctly blended pairs of phonemes are counted, irrespective of whether repetition of the entire target is achieved. The maximum score is 83.*

6 on a 7-point scale). Half of words were of high-frequency (mean = 215.4; *SD* = 142.2; range = 70–794) and half of low frequency (mean = 14.5; *SD* = 5.6; range = 7–28), according to the children words frequency corpus (Marconi et al.,1993). Both highand low-frequency word subsets were made of 16 stimuli for each length (4-, 5-, and 6- letters). Subsets were matched for bigram frequency, contextual rules, presence of orthographic complexity (double consonants and cluster of consonants), familiarity (based on *EPOS 2*, Baldi and Traficante, 2005) and percentage of recognition (Baldi and Traficante, 2005). Words were also matched for neighborhood-size (Baldi and Traficante, 2005), but only within the subsets with the same number of letters (due to the high covariance between length and neighborhood-size characteristic of Italian).

For each subset, pronounceable strings such as DASU (16 stimuli for each length for a total of 48 items) were generated from half of the words, and unpronounceable stimuli such as RNGM (16 stimuli for each length for a total of 48 items) from the other half. Although unpronounceable, the letter stimuli were made only with bigrams really existing in the Italian orthography. Usually, studies compare a pronounceable non-word such as STRENG with a consonant string such as STPFM. However, in this example, not only is the first type of stimulus pronounceable and more orthographically similar to a real word than the second one, but it is also orthographically and phonologically regular, whereas the second is not. In the present study, we used only orthographically and phonologically regular bigrams in order to control for this aspect, at least at the bigram levels. For example, bigrams in VRSN are, respectively, in the words *aVRemo* ("we will have"), *oRSo* ("bear"), *SNello* ("slim"). Digrams SC, GL, GN, CH, which correspond to a single sound, were avoided.

Pronounceable strings were obtained changing vowels with other vowels, while unpronounceable strings were obtained changing vowels with consonants. Overall, there was the same number of "yes" and "no" responses; i.e., 96 real words ("yes" responses) matched with 96 non-words ("no" responses: 48 pronounceable and 48 unpronounceable). Items were randomized and presented in four blocks of 48 stimuli. Words and respective derived pseudowords or non-words did not appear in the same block. The order of presentation of the stimuli was randomized for each subject.

### *Procedure*

Tests were carried out individually in a quiet room at the school of the children. Children performed a LDT, in which they had to decide whether or not a string of letters formed a legal Italian word.

Stimuli were printed in upper-case Courier new font, size 18, with a white color on a gray screen. Each item was preceded by a fixation point, which disappeared after 500 ms. After the appearance of the stimulus (that remained on the screen until the subject responded), there was a 1000 ms inter-trial interval. When the letter string appeared at the center of the PC screen, children had to push the right button on the keyboard as quickly and as accurately as possible if the stimulus was a word and the left one if it was not a word. The other buttons of the keyboard were hidden by means of a cardboard. A brief practice with 12 stimuli preceded the experiment. No feedback was provided. Children were allowed brief pauses between blocks.

Stimulus presentation and data recording were controlled with the E-Prime 2 software. The program recorded RTs and errors.

### *Data analysis*

Invalid trials (due to technical problems), RTs below 250 ms and outliers (i.e., RTs exceeding the individual mean plus or minus 3 *SD*s) were excluded from the analyses. The percentage of excluded RTs was very small (for control readers: 0.47, 2.18, and 1.06% for unpronounceable non-words, pronounceable pseudowords and words, respectively; for children with dyslexia: 0.29, 1.82, and 1.06%, respectively). The RTs corresponding to errors were excluded from the analyses.

RTs were examined in order to check for the presence of the global factor in the data. In particular, the RAM (Faust et al., 1999) makes a number of testable predictions to detect the presence of global factor(s) in the data (see Results). When two groups vary for some general processing speed factor, larger group differences are expected in more difficult conditions (and smaller ones in an easier condition) over and above the specific effect of a given experimental manipulation; this is referred to as over-additivity effect (Faust et al., 1999). Over-additivity may modulate the group by condition interactions when two groups differ in general ability (Faust et al., 1999), as is the case for dyslexic and control readers. According to Faust et al. (1999), this effect can be controlled for by using various data transformations, including a *z* score transformation. For each participant, *z* scores are obtained by taking the RTs in each condition, subtracting their overall mean, and dividing them by the standard deviation across conditions (therefore, each individual has an average of 0 across conditions and a *SD* = 1). This transformation rescales individual performance to a common reference; hence, it allows controlling for global differences in information processing (Faust et al., 1999) while preserving the information regarding individual variability across experimental trials and conditions. Note that this transformation is appropriate only to open-scale measures, such as time, but not closed-scale measures such as accuracy. Interactions that are significant in both the raw score and *z*-transformed score analyses indicate the selective influence of a given parameter; in contrast, interactions that are significant only in the raw data analyses, but not on those with the *z*-transformed values, indicate the presence of spurious interactions (due to over-additivity effects; Faust et al., 1999).

Three separate analysis of variances (ANOVAs) were carried out to examine the effect of pronounceability (non-words vs. pseudowords), frequency (high- vs. low-frequency words) and lexicality (pseudowords vs. words), respectively. In each of these analyses, group (dyslexic vs. control children) was entered as between-subject factor and length (4-, 5-, and 6-letter stimuli) as repeated measure. Separate ANOVAs were carried out on percentages of errors, RT raw data (r) and RT *z*-transformed data (z). For the sake of presentation, the two latter types of analyses will be presented together (using raw RT means to illustrate effects); this will allow highlighting which group by condition interactions are genuine and which can be ascribed to the over-additivity effect. Whenever appropriate, means were compared with the *a posteriori* Tukey HSD test.

### **RESULTS**

### *Analysis of global factor(s)*

The RAM (Faust et al.,1999) predicts a linear relationship between: (i) the condition means of two groups of children (e.g., dyslexic and control readers) who vary in overall information processing rate; (ii) the condition means of the overall group and the standard deviation in the same conditions (i.e., that more difficult conditions will generate greater variability).

As it can be seen in **Figure 1**, condition means for the dyslexic group were linearly related to those of control readers. This pattern indicates that a global factor (which explains a large proportion of variance, i.e., *r* <sup>2</sup> <sup>=</sup> 0.89) accounts well for the slowness of children with dyslexia across all experimental conditions; namely, condition means for high- and low-frequency words, pronounceable pseudowords and unpronounceable non-words were all well fit by the same regression line. The slope is 1.52, indicating that children with dyslexia were 52% slower than control readers in performing the task.

The test of the second prediction is presented in **Figure 2**; the means of the overall sample of children (dyslexic and control readers) for all experimental conditions are plotted against the respective standard deviation in the same conditions. A linear relationship between means and standard deviation (with a 0.40 slope) was present accounting for a substantial amount of variance (*r* <sup>2</sup> <sup>=</sup> 0.83).

**FIGURE 1 | Experiment 1.** Dyslexics' condition means in the lexical decision task are plotted as a function of the control readers' means (symbols as described in the figure; the three symbols per condition represent word lengths). The diagonal line (slope = 1) represents equal RTs for dyslexic and control readers. Note that all data points lie above the diagonal line indicating that children with dyslexia were slower than controls in all conditions. All data points are well fit by a single regression line.

Due to the presence of a global factor in the data, in accordance with the RAM, the ANOVAs were performed also on *z*-transformed RTs in order to determine whether stimulus pronounceability, as well as frequency and lexicality, have a specific role in modulating group differences over and above the variance accounted for by the global factor.

### *Analysis of variance*

*Pronounceability.* **Figure 3** shows the relevant means of the pronounceability effect in terms of errors, raw RTs and *z*-transformed RTs, separately for dyslexic and control readers.

The ANOVA on errors showed the main effect of pronounceability (*F*(1,39) =45.9, *p* < 0.001), with higher percentages of errors for pseudowords (15.8%) than non-words (3.4%). The length and group main effects as well as the interactions between these variables were all not significant (all *F*s about 1).

The ANOVAs on RTs showed the significance of the main effects of group (*F*r(1,39) = 28.0, *p* < 0.001; the group effect is by definition nil in the analyses on *z*-transformed data), pronounceability (*F*r(1,39) = 100.1, *p* < 0.001; *F*z(1,39) = 45.0, *p* < 0.001), and length (*F*r(2,78) = 13.9, *p* < 0.001; *F*z(2,78) = 8.9, *p* < 0.001), with shorter RTs for control readers with respect to children with dyslexia (1671 vs. 2532 ms), for non-words compared to pseudowords (1546 vs. 2656 ms) and for shorter stimuli compared to longer stimuli (1938, 2101, and 2264 ms for 4-, 5-, and 6-letter stimuli, respectively). Pronounceability interacted with group in the raw data analysis (*F*r(1,39) =5.1, *p*<0.05), but the effect was not

significant in the analysis with *z*-transformed data (*F*z(1,39) = 1.9, n.s.), indicating that the interaction with the raw data was due to the influence of over-additivity. All other interactions with the group factor were not significant. Pronounceability interacted with length (*F*r(2,78) = 8.5, *p* < 0.001; *F*z(2,78) = 7.3, *p* < 0.001): length effects were present for pseudowords (mean increase per letter = 301 ms), but not for non-words (mean increase per letter = 25 ms).

*Frequency.* The ANOVA on errors showed the significance of the main effects of group (*F*(1,39) = 23.8, *p* < 0.001) and frequency (*F*(1,39) = 79.8, *p* < 0.001), with higher percentages of errors for children with dyslexia (15.7%) than control (5.8%) readers and for low- (16.1%) than for high-frequency (5.4%) words. Frequency interacted with length (*F*(2,78) = 4.2, *p* < 0.05), with a larger frequency effect for shorter than longer words: the difference between high- and low-frequency words was 13.9, 10.5, and 8.2% for 4-, 5-, and 6- letter words, respectively. Frequency also interacted with group (*F*(1,39) = 25.2, *p* < 0.001), with a larger frequency effect for dyslexic than control readers (difference between low- and highfrequency words = 16.7 and 4.7% in the two groups, respectively), and a significant group difference for low- (15.9%, *p* < 0.001) but not high-frequency words (3.9%, n.s.).

The ANOVAs on RTs showed the significance of the main effects of group (*Fr*(1,39) = 32.3, *p* < 0.001), frequency (*F*r(1,39) = 65.9, *p* < 0.001; *F*z(1,39) = 67.7, *p* < 0.0001) and length (*F*r(2,78) = 20.6, *p* < 0.001; *F*z(2,78) = 15.7, *p* < 0.001), with shorter RTs for control readers than for children with dyslexia (1406 vs.

2310 ms), for high- than low-frequency words (1695 vs. 2021 ms) and for shorter than longer words (1682, 1867, and 2024 ms for 4-, 5-, and 6-letter words, respectively). Group interacted with length and frequency in the raw data (respectively: *F*r(2,78) = 5.9, *p* < 0.01; *F*r(1,39) = 17.3, *p* < 0.001), but the interactions disappeared in the analysis on *z*-transformed data, indicating that they were due to over-additivity in the data. All other interactions with the group factor were not significant.

*Lexicality.* The ANOVA on errors showed the significance of the lexicality factor (*F*(1,39) = 5.2, *p* < 0.05), with higher percentages of errors for pseudowords (15.8%) than words (10.8%). The group main effect approached significance (*F*(1,39) = 4.0, *p* = 0.053) with a tendency for children with dyslexia to make more errors (15.6%) than control readers (10.9%). The lexicality by group interaction was significant (*F*(1,39) = 5.6, *p* < 0.05) indicating the presence of a lexicality effect in control readers (difference between words and pseudowords = 10.1%, *p* < 0.01), but not for children with dyslexia (difference between words and pseudowords = –0.2%, n.s.). Groups had a similar performance in the case of pseudowords (16.0% of errors for control readers and 15.5% for children with dyslexia), while, in the case of words, control children produced fewer errors than children with dyslexia (5.8 vs. 15.7% respectively; *p* < 0.05).

The ANOVAs on RTs showed the main effects of group (*F*r(1,39) = 31.9, *p* < 0.001), lexicality (*F*r(1,39) = 80.9, *p* < 0.001; *Fz*(1,39) = 1.79, n.s.) and length (*F*r(2,78) = 23.6, *p* < 0.001; *F*z(2,78) = 33.5, *p* < 0.001). RTs were shorter for control (1753 ms) than dyslexic (2761 ms) readers, for words (1858 ms) than for pseudowords (2656 ms), and for shorter than longer stimuli (2019, 2261, and 2491 ms for 4-, 5-, and 6-letter stimuli, respectively). Group interacted with length (*F*(2,78) = 3.8, *p* < 0.05), but the interaction disappeared in the *z* score analysis (*F*z(2,78) = 0.56, n.s.), in keeping with the idea that it was due to over-additivity in the data. Group did not interact with lexicality (*F* about 1) in both raw and *z*-transformed data analyses, with similar lexicality effects for dyslexic and control readers (mean effect = 693 and 902 ms in the two groups, respectively).

### **DISCUSSION**

Both groups of children were faster and more accurate at rejecting unpronounceable non-words than pronounceable pseudowords. This finding is consistent with previous studies (e.g., Holcomb and Neville, 1990; Forster et al., 2003; Ratcliff et al., 2004; Evans et al., 2012). As foils are more word-like (as in the case of pseudowords vs. non-words) the LDT proves more difficult. This might depend from several factors such as: (i) the increase of orthographic and phonological overlap between words and foils; (ii) foils producing more activation of similar words in the orthographic input lexicon; (iii) fewer sources of information being available to solve the task. In fact, for non-words, all procedures (semantic, lexical, and sub-lexical) are in favor of a "non-word" response (similarly to what happens for words for which all procedures are in favor of a "word" response). By contrast, in the case of pseudowords, the lexical and semantic routes favor a "no" response, while the sub-lexical procedure a "yes" response.

Pronounceability did not interact with group in the case of errors, indicating a similar pattern in the two groups. With regard to RTs, pronounceability interacted with group in the raw data, but not when over-additivity was taken into account in the *z*-transformed analysis: the larger effect of pronounceability among children with dyslexia was due to the presence of over-additivity in the data and the two groups showed a similar disadvantage in rejecting pseudowords compared to non-words.

Critically for the purpose of this study was to examine if performance on unpronounceable strings maps onto the global factor which accounts for the differences in performance between dyslexic and control readers. Consistent with the predictions of the RAM, condition means for the dyslexic group were linearly related to those of control readers. This pattern indicates that a single global factor accounts quite well for the slowness of children with dyslexia across all experimental conditions. Condition means for pronounceable pseudowords and unpronounceable non-words essentially showed the same result (i.e., they were all well fit by the same regression line). The children with dyslexia's impairment was evident when they had to process strings not only of pronounceable stimuli (such as words and pseudowords) but also unpronounceable stimuli (i.e., when they had to decide that a string of consonants was not a word), a deficit well accounted for by a single global factor.

Consistently with previous studies of Italian children with dyslexia using reading (e.g., Barca et al., 2006) and LDT (e.g., Marinelli et al., 2011), results highlighted a frequency effect among both children with dyslexia and control readers. This pattern indicates that children with dyslexia benefit from lexical activation in performing the task also in a highly regular orthography, such as Italian. The size of the frequency effect was actually larger for dyslexic than control readers but this difference disappeared when over-additivity was taken into account. The effect of lexicality did not interact with group in RTs (in both raw and *z*-transformed data), but only in errors, due to the absence of group differences for pseudowords (but only for words). Also this finding confirms previous evidence on Italian children with dyslexia (Zoccolotti et al., 2008).

### **EXPERIMENT 2**

The first experiment supports the hypothesis that the difficulty in processing strings of letters accounts for a large amount of children with dyslexia's impairment, irrespective of the pronounceability of the strings. The second experiment, based on the Reicher–Wheeler paradigm, tests the ability of children with dyslexia to discriminate a target letter from a competitor in the context of strings of letters similar to the stimuli used with the LDT, i.e., words, pseudowords, and non-words. This task allows detecting the sensitivity of the two groups of children to use the prime pronounceability and lexical information to favor graphemic processing while responding is limited to the forced-choice discrimination of a target letter. If the difficulty of children with dyslexia is specifically linked to the ongoing simultaneous processing of a letter string (as in the LTD), no deficit should be present in this condition. At the same time, the possibility to test the sensitivity to context provides a further test of the distinction between a pre-lexical graphemic level and orthographic–phonological binding interpretation. Based on the orthographic–phonological binding hypothesis, the PSE (and the WSE) is expected for control children but not for children with dyslexia. Based on the letter string graphemic hypothesis, no difference in these effects is expected between the two groups of children.

### **METHOD**

*Participants* Same as Experiment 1.

### *Materials*

Three groups of 4-letter stimuli were presented: 48 words (e.g., VISO, "face"), 48 pseudowords (e.g., VESI), and 48 letter strings (e.g., VRSN). Each derived pseudoword or non-word maintained two of letters from the original word.

All words had a CVCV structure and were selected from the Elementary lexicon by Marconi et al. (1993) and were high-frequency words (*M* = 181/1 million, *SD* = 261), with high rate of familiarity (Baldi and Traficante, 2005; *M* = 6.8/7 rating scale points, *SD* = 0.08), easy to recognize as real Italian words (wordlikeness; Baldi and Traficante, 2005; *M* = 99.35% of correct lexical judgment by adult proficient readers, *SD* = 0.7), and with a mean of four orthographic neighbors (Baldi and Traficante, 2005; *M* = 3.9, *SD* = 1.7).

Pseudowords were made from words, by changing the two vowels of the base stimulus. As mentioned above, letter strings were made of legal digrams, i.e., sequences of two letters that can be found in real Italian words. Target letters in first and third position were minimal phonological pairs (i.e., phoneme that differs for only one phonological feature, such as P–B, L–R, N–M) in order to emphasize the role of phonological decoding. The competitor letter was never in the multi-letter string and, in the case of substitution in the string, the competitor did not produce a lexical orthographic neighbor of target itself. The number of visually similar competitors (53%; e.g., P–B, N–M) was matched in each position. Target letters in second and fourth positions were vowels; so, in this case, it was not possible to use minimal pairs. Moreover, due to the ortho-phonotactic structure of Italian language, in which ending is always a vowel (a, e, i, o), stimuli with targets in fourth position were presented, but considered as fillers, because, in the case of words, they often produced other words, differently from the condition of consonant targets. However, they were presented to avoid children to focus their attention only on the first three letter positions.

For each stimulus type there were 16 targets in first, 16 in second, and 16 in third position, respectively, for a total of 48 stimuli per group and a grand-total of 144 stimuli. Filler stimuli with target in fourth position were eight in each group for a total of 24 stimuli. The overall number of stimuli was 168. Three blocks of stimuli were made, separated by a brief pause, in order to avoid attention decrease. Three blocks were matched for word frequency, familiarity, wordlikeness, and number of orthographic neighbors. In each block there was the same number of words, pseudowords, and non-words, equally distributed for each target letter position, avoiding that base-words and derived pseudowords and non-words were presented in the same block.

### *Procedure*

Children made the task in a quiet room, sitting at about 54 cm from the screen. Stimuli were presented in Courier New, size 18 pt, in upper-case, in white foreground on gray background.

The trial sequence started with a get-ready display (500 ms),followed by the presentation of the multi-letter string (either word,

pseudoword, or non-word) for 350 ms<sup>1</sup> and by the target letter display, which lasted until the forced-choice discrimination between target and competitor was made (**Figure 4**). The response was given by pressing one of two buttons of the keyboard: the "Up" button to choose the letter in the upper part of the display, the "Down" button for the letter in the bottom part. Correct responses were in half of the cases the "Up" choice.

Ten training stimuli were presented at the beginning of the experimental session. Three blocks of stimuli followed in a fully randomized order between blocks and within each block.

The program automatically recoded the responses of the participant; percentages of errors and RTs (only to correct responses) were used as dependent measures. Outliers (i.e., RTs 3 *SD*s below the mean) and invalid responses (i.e., responses faster than 250 ms or RTs not recorded correctly for technical problem) were excluded from the analysis.

### **RESULTS**

Invalid responses and outliers were about 2.52% in children with dyslexia and 2.04% for typically developing children.

### *Analysis of global factor(s)*

Before proceeding to the analysis of specific effects, we examined data for the possible presence of global components in the differences between the two groups of children (Faust et al., 1999). We first tested the prediction of a linear relationship between the means of the two groups for conditions that varied in overall information processing rate. Dyslexics' and skilled readers' condition means are plotted against each other in **Figure 5**, separately for each experimental condition in the Reicher–Wheeler paradigm.

Note that all data points are above the diagonal line (which indicates the benchmark for identical performance of the two groups); thus, children with dyslexia tended to be slower than typically

<sup>1</sup>The exposure time of the multi-element strings was chosen on the basis of a previous pilot study, in which SOAs from 200 to 450 ms were tested. The duration of 350 ms was the time presentation that yielded 75% of accuracy in children with dyslexia.

developing readers across all conditions. In the Reicher–Wheeler paradigm, the percentage of variance accounted for by the regression line was moderate (59%) and the slope was less than unity (*b* = 0.75) indicating no over-additivity effect. Thus, in this case, the group differences appear entirely due to the intercept value (i.e., to a constant value).

Successively, we tested the prediction of a linear relationship between overall group means and standard deviation in the same conditions for the group as a whole. **Figure 6** reports the mean of the overall sample against the standard deviation for each corresponding experimental condition. The regression line was not very steep (0.35) and the percentage of variance explained for the conditions of the Reicher–Wheeler paradigm was moderate (69%).

As a global factor was not detected for the conditions of the Reicher–Wheeler paradigm, the *z* score transformation was not used and only standard RT analyses were carried out.

A mixed ANOVA with group (children with dyslexia vs. typically developing children) as a between-subject factor and context (words, pseudowords, and non-words) and position (first, second, and third position) as repeated measures was carried out on the percentages of errors in letter recognition. Significant interactions were explored with the *a posteriori* Tukey HSD.

The ANOVA showed the main effects of group (*F*(1,39) = 5.71, *p* < 0.05), context (*F*(2,78) = 122.46, *p* < 0.001), and position (*F*(2,78) = 26.75, *p* < 0.001), as children with dyslexia made more letter recognition errors (17.9%) than typically developing children (10.6%), letters were recognized

less well in the context of non-words (23.1%) than in the pseudoword (11.1%) and word (8.5%) contexts, and errors increased from first (9.5%) to second (15.3%) and third (18%) position.

All two-way interactions were significant. The group by context (*F*(2,78) = 16.26, *p* < 0.001) indicated the presence of both the PSE (non-word context: 29.7%; pseudoword context: 14.3%; Tukey test: *p* < 0.001) and the WSE (word context: 9.5%; *p* = 0.02) in children with dyslexia. In typically developing children only the PSE reached the significance level (non-word context: 16.5%; pseudoword context: 7.8%; *p* < 0.001), while letter recognition errors in the word context (7.6%) did not differ from those in the pseudoword context. As for the group by position interaction (*F*(2,78) = 5.40, *p* < 0.01), in children with dyslexia there was an increasing amount of letter recognition errors from the first (11.2%) to the second (18.9%) position (*p* < 0.001) and from the first to the third (23.5%) position (*p* < 0.001). For typically developing children the only significant difference was between the first (7.9%) and third position (12.4%; *p* = 0.04), while the second position (11.7%) did not differ from the others. The context by position interaction (*F*(4,156) = 9.36, *p* < 0.001) indicated that in the non-word context there was an increasing amount of errors from the first (13.6%) to the second (25.2%) position (*p* < 0.01) and from the first to the third position (30.6%, *p* < 0.01), while the difference between the second and the third position did not reach significance level. In the pseudoword context, the first position was associated to a lower amount of errors (6.7%) than the third position (14%, *p* = 0.02), while percentage of errors in second position (12.5%) did not differ from the others. In the word context, letter recognition errors were low and similar in every position: 8.2% in first, 8.1% in second, and 9.3% in third position, respectively.

The ANOVA on RTs showed the main effects of group (*F*(1,39) = 6.01, *p* = 0.019), and position (*F*(2,78) = 42.33, *p* < 0.0001), with longer RTs for children with dyslexia (1681 ms) than typically developing children (1323 ms), and for letters in third (1676 ms) compared to second (1550 ms, *p* < 0.001) and first (1281 ms, *p* < 0.01) positions, but no main effect of context (*F*(2,78) = 2.19, *p* = 0.119). Group interacted with context (*F*(2,78) = 4.24, *p* < 0.05): there were smaller group differences in the non-word context (difference=235 ms) compared to the pseudoword (difference = 427 ms) and word (difference = 411 ms) contexts. However, none of these differences reached significance at the *post hoc* analyses. In typically developing readers there was a detectable PSE (difference between the non-word and pseudoword contexts = 147 ms; *p* = 0.05), but no WSE (difference between pseudoword and word context = 15 ms). For children with dyslexia neither the PSE (difference between non-word and pseudoword contexts = –51 ms) nor the WSE (difference between pseudoword and word context = 39 ms) were present.

### **DISCUSSION**

In the case of accuracy data, the results indicated a robust PSE in both groups of children, while the WSE was present only among children with dyslexia. Thus, accuracy in letter discrimination in young Italian readers, and remarkably also in children with dyslexia, was influenced by the ortho-phono-tactic regularity of the letter string. The results were generally less clear-cut in the case of RTs where the main effect of context was not significant. However, a significant PSE was detected in the case of typically developing children.

The present pattern of findings shares a number of similarities to the previous results on French children reported by Grainger et al. (2003). They found a large PSE effect in both dyslexic and reading-matched control children but no WSE for either group of children (while the WSE was present with the same type of stimulus materials in a group of adult readers). They proposed that the joint presence of PSE and absence of WSE favors a sublexical–orthographic interpretation, based on the greater familiarity of letter combinations in pseudowords compared to non-words. Pseudowords provide letter clusters which represent typical orthographic contexts for a given letter in a given position. Within this interpretation, children with dyslexia show a spared ability to use such sublexical–orthographic information to shape their performance in letter recognition. This pattern is at odds with the orthographic–phonological binding interpretation while it is consistent with a pre-lexical graphemic interpretation.

It is worth noting that in our study the facilitating role of lexical activation producing the WSE emerged just in children with dyslexia. This is in keeping with studies that found also in Italian, a language with a very consistent orthography, evidence of the activation of lexical representations in young readers. Several Italian studies (Barca et al., 2006; Paizi et al., 2013) showed lexical involvement in reading of children with and without dyslexia. The authors suggested that children might rely more on the lexical route when the non-lexical route is not automatized yet. Results observed in children with dyslexia through the Reicher– Wheeler paradigm, in the present study, seem consistent with this hypothesis.

A general question addressed by Experiment 2 was whether performance in the Reicher–Wheeler paradigm would generate global group differences as reported for LDTs. The LDT used in Experiment 1 clearly yielded global differences in performance as previously reported with similar materials (e.g., Di Filippo et al., 2006; Paizi et al., 2013). In the case of the Reicher–Wheeler paradigm group differences were present although generally much smaller than those observed in the case of the LDT. Critically, when the RAM was applied to the time measures, group differences in RTs did not grow as a function of condition difficulty as expected in the case of a global factor (and an over-additivity effect). Indeed, the slope of the linear regression was smaller than unity. Thus, the small group differences were expressed by a constant value (intercept). How can this effect be explained? De Luca et al. (2010) noted that children with dyslexia have less practice with orthographic materials and proposed that this factor may be sufficient to explain the small deficit in letter-bigram tasks. The role of familiarity has been systematically tested by Valdois et al. (2012) who examined the performance on letter string, digit string, and color string processing; dyslexic children were impaired in the first two tasks but performed as controls in the color report task. This pattern is consistent with a familiarity account while is inconsistent with a visual-to-phonological-code interpretation. Overall, this pattern of findings is in keeping with the idea that a selective deficit in children with dyslexia is present only when the task requires the entire string of orthographic stimuli to be simultaneously processed. By contrast, it has repeatedly been shown that children with dyslexia are not (or minimally) impaired in the processing of single letters or bigrams (e.g., Bosse et al., 2007; Martelli et al., 2009; De Luca et al., 2010) or when the set of target letters is presented sequentially (Lassus-Sangosse et al., 2008).

### **GENERAL DISCUSSION**

The results from the LDT in Experiment 1 indicated that a single global factor explained the performance with orthographic strings, independent from stimulus pronounceability, as well as frequency and lexicality. The children with dyslexia's impairment was evident (and of a comparable size) when they had to process strings, not only of pronounceable stimuli (such as words and pseudowords) as already reported in previous studies (e.g., Di Filippo et al., 2006; Marinelli et al., 2011; Paizi et al., 2013), but also of unpronounceable stimuli, a deficit well accounted for by the same global factor. Thus, the present study adds a new piece of information to the understanding of the nature of the global component affecting performance of children with dyslexia. Previous studies indicated that a single global factor explains the deficit of children with dyslexia in making lexical decisions and reading words and pseudowords, i.e., independent of word frequency and lexicality (Di Filippo et al., 2006; Zoccolotti et al., 2008; Marinelli et al., 2011), but not in dealing with pictorial stimuli (Di Filippo et al., 2006; Zoccolotti et al., 2008) or stimuli in the auditory modality (Marinelli et al., 2011). The present study adds to this picture that the global factor is independent not only from the lexical status

of the stimulus, but also from the pronounceability of the letter string: when the over-additivity effect was controlled for, the deficit of children with dyslexia in the LDT was detectable, and of a comparable size, when rejecting pronounceable pseudowords or unpronounceable non-words. Therefore, the present findings are consistent with the proposal that an impairment in pre-lexical graphemic analysis (i.e., in forming a graphemic description of the letter string) is a core deficit in developmental dyslexia (Zoccolotti et al., 2008); by contrast, they do not support the idea that the deficit in dyslexia is due to an inability to bind orthographic and phonological information (Ziegler et al., 2010; van den Broeck and Geudens, 2012), not even in Italian, a language with very consistent orthography.

The RT datafrom the Reicher–Wheeler paradigm indicated that group differences in this task did not generate global differences between children with dyslexia and control readers. These data are generally in keeping with previous observations by De Luca et al. (2010)indicating that children with dyslexia were only mildly affected in tasks requiring the naming or matching of individual letters, bigrams or two-letter syllables and no over-additivity effect was present for these tasks. Therefore, it appears that the global factor accounting for the impairment of children with dyslexia is present when the child processes a (relatively long) string of letters in parallel, not when the task concerns isolated letters. The present results add to this picture that, even if the processing of a letter string is slowed down in these children, they can take advantage from the ortho-phono-tactic information deriving from such processing in discriminating a subsequent isolated target letter from a competitor; i.e., they showed a clear PSE (at least in the case of accuracy) in the Reicher–Wheeler paradigm. This differentiation can be appreciated most clearly by comparing the performance in making a lexical decision on pseudowords with that of recognizing a target letter in the presence of a pseudoword context. In the first condition, children with dyslexia were severely impaired in both accuracy and speed; in the second, they were more accurate than in the case of a four-letter non-word context (i.e., they had a PSE) and the group difference with typically developing children was quantitatively quite small. Therefore, when cognitive tasks (e.g., lexical decision, naming, semantic categorization, etc.) are to be applied to letter strings as a whole, children with dyslexia are in difficulty. On the contrary, when tasks involve isolated letter processing, also these children can make use of the ortho-phono-tactic information derived from a previously seen letter string. This spared ability appears inconsistent with the idea that children with dyslexia suffer from a selective deficit in orthographic–phonological binding. By contrast, it is consistent with a pre-lexical graphemic interpretation; according to this view, online simultaneous processing of multi-letter elements is generally impaired. However, if sufficient time is given for processing a letter string, children with dyslexia may effectively use its orthophono-tactic information to modulate orthographic processing of isolated letters.

The present findings are in keeping with the available information on the characteristics of the VWFA. Thus, neuroimaging studies indicate that the VWFA is activated not only by orthographically legal stimuli, such as words and pronounceable pseudowords, but also by illegal letter strings (e.g., Cohen et al., 2002). There is clear evidence that event-related potentials (ERPs) recorded at posterior sites within the 150–250 ms time window at fronto-central, central, and temporo-parietal sites (typically in the form of the N200) are modulated by orthographic information. This finding is consistent with the hypothesis that the word form system analyzes visual linguistic stimuli at a pre-lexical level while information concerning lexical status and meaning is processed through additional neural systems (Bentin et al., 1999). In ERP studies, the VWFA does not generally differentiate between pseudowords and words (Hagoort et al., 1999) and no difference in N200 amplitude for these two stimuli is found (e.g., Tagamets et al., 2000 for a fMRI study). With regard to non-words, in some reports, the N200 was larger for non-words than for words (Compton et al., 1991; McCandliss et al., 1997), whereas, in others, the opposite pattern was reported (Cohen et al., 2000; Dehaene et al., 2002; Grossi and Coch, 2005) or no difference between legal and illegal orthographic letter strings was detected (e.g., Bentin et al., 1999). Some inconsistencies between studies may depend from differences in the experimental task. In fact, studies used several experimental paradigms: a letter search task (Ziegler et al., 1997), a letter-in-string identification task (Coch and Mitra, 2010), a LDT (Rosazza et al., 2009; Massol et al., 2011), and a rhyme judgment task (Bentin et al., 1999). In general, research comparing the processing of consonant strings and pronounceable pseudowords reports a divergence in the ERP waveforms as a function of target type only starting at around 200–250 ms post-target onset (Ziegler et al., 1997; Rosazza et al., 2009; Massol et al., 2012). This is in line with Grainger and Holcomb's (2009) proposal that processing up to around 200 ms post-target onset is largely identical for these two types of stimuli mostly involving parallel independent letter processing. Overall, it seems that the VWFA is activated by orthographic stimuli, independent from the lexical status or the pronounceability of the stimuli. In pinpointing a parallel between the present results and the characteristics of the VWFA it is important to observe that several studies reported a marked underactivation of this area in dyslexic individuals (for a review see Richlan et al., 2009).

A recent proposal which helps in placing the letter string deficit shown by children with dyslexia is the dual-route approach to orthographic processing proposed by Grainger and Ziegler (2011). According to this model, the initial mapping of visual features onto abstract letter identities operates in parallel and simultaneously for all the letters in the stimulus (e.g., see also Grainger and van Heuven, 2003; Adelman et al., 2010): "...*the alphabetic array codes for the presence of a given letter at a given location relative to eye fixation along the horizontal meridian. It does not say where a given letter is relative to other letters in the stimulus*...*. Thus, processing at the level of the alphabetic array is insensitive to orthographic regularity of letter string*" (Grainger and Ziegler, 2011, p. 2). The distinction between non-words, pseudowords, and words would turn out only later in the pathway, when the letter identity is referred to a specific position within the word (defined as a string of letters separated by spaces). Two different types of sublexical word-centerd orthographic representations develop in the reading acquisition process, according to the frequency of occurrence of given combinations of letters: (a) coarse-grained

representations (open-bigram representations) that code for the presence of "ordered pairs of letters independently of their contiguity" (e.g., for the string WORD open-bigram representations are WO, WR, WD, OR, OD, RD); (b) fine-grained representations, that code for clusters of frequently co-occurring letter combination (e.g., multi-letter graphemes, syllables, morphemes, rhymes, etc.). The coarse-grained code offers diagnostic features for a rapid bottom-up activation of whole-word representations. However, for the correct identification of the target word is necessary the top-down activation from whole-word orthography level to coarse-grained orthography level. Only real words can activate this interactive process. In the case of pseudowords, the absence of top-down constrains makes the processing via the slower fine-grained route the only way to get disambiguating information on the letter string. Present findings highlight that the global factor explaining dyslexic's deficit is independent from pronounceability and lexicality of the stimulus. Then, according to the dual-route model (Grainger and Ziegler, 2011), it appears to indicate a deficit at an early stage of processing, i.e., when the initial mapping of visual features onto abstract letter identities is performed. In the subsequent stages of processing, children with dyslexia do not appreciably differ from control readers, as highlighted by the absence of the group by pronounceability or the group by lexicality interactions, once over-additivity was controlled for.

It is interesting to speculate on which mechanism may underlie the selective deficit in processing letter strings shown by children with dyslexia. As stated above, the deficit is confined to the simultaneous processing of several letters while it is much smaller or absent when the task regards single letters or bigrams (e.g., Bosse et al., 2007; Martelli et al., 2009; De Luca et al., 2010) or when the target letters are presented sequentially (Lassus-Sangosse et al., 2008). The present results indicate that children with dyslexia can actually use information from a letter string provided that responding is limited to a single letter presented subsequently to the letter string prime. Thus, the requirements for targets to be multiple and input to be simultaneous seem at the core of the group difference. One reasonable candidate to accommodate for these characteristics is visual crowding. Crowding refers to the decrease in recognizability of a letter surrounded by other letters placed closer than a critical distance (e.g., Pelli et al., 2004, 2007). In the case of letter strings, crowding affects the central letters much more than the initial or final ones (Bouma, 1970); thus, it seems to explain well the single-multiple dimension, as crowding between letters is only expected in the case of multiple displays and not with isolated letters. Further, as a perceptual mechanism, crowding can also easily account for the simultaneity requirement. Early evidence that children with dyslexia show enhanced sensitivity to crowding was presented by Bouma and Legein (1977). In the last years, several studies have shown results compatible with this interpretation (Spinelli et al., 2002; Martelli et al., 2009; Callens et al., 2013; Collis et al., 2013). For example, Martelli et al. (2009) found critical spacing to increase as a function of eccentricity with a greater proportionality for children with dyslexia than typically developing readers. Furthermore, particularly in the dyslexic group, degree of crowding

appears to correlate significantly with reading (Martelli et al., 2009; Callens et al., 2013).

It is important to keep in mind that we examined the reading performance of children speaking a very regular language. It is well-known that orthographic consistency modulates the reliance on holistic reading processes (e.g., Ziegler et al., 2001; Ziegler and Goswami, 2006). For this reason, the present findings cannot be directly generalized to inconsistent orthographies, such as English or Hebrew. At any rate, it is interesting that several investigations based on children speaking French, a moderately irregular language, are in keeping with a visual-orthographic, as compared to a visual-to-phonology, impairment (e.g., Lobier et al., 2012a,b; Valdois et al., 2012). For example, Lobier et al. (2012a) reported that, in a visual categorization task with verbal and non-verbal stimuli, children with dyslexia were impaired independently of stimulus type, in keeping with the idea that the impairment was visual and not verbal. These findings suggest that a deficit in pre-lexical graphemic analysis may be present also in inconsistent orthographies, although this possibility certainly deserves further examination.

Overall, children with dyslexia were impaired when they had to process strings, not only of pronounceable stimuli but also of unpronounceable stimuli, a deficit well accounted for by a single global factor. By contrast, they were much less affected when they had to recognize an isolated letter (and no global factor was present) and could take advantage of a pronounceable context, effectively using the ortho-phono-tactic information derived from a previously seen letter string. Therefore, the present findings are in keeping with the proposal that an impairment in pre-lexical graphemic analysis is a core deficit in developmental dyslexia at least in a regular orthography (such as Italian) while they are inconsistent with the alternative view that orthographic–phonological binding may represent a proximal cause of dyslexia.

### **ACKNOWLEDGMENT**

This work was supported by a grant from the Department of Health.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg.2014.01353/ abstract

### **REFERENCES**


psycholinguistic levels: time-course and scalp distribution. *J. Cogn. Neurosci.* 11, 35–60. doi: 10.1162/089892999563373


Callens, M., Whitney, C., Tops, W., and Brysbaert, M. (2013). No deficiency in leftto-right processing of words in dyslexia but evidence for enhanced visual crowding. *Q. J. Exp. Psychol. (Hove)* 66, 1803–1817. doi: 10.1080/17470218.2013.766898


Zoccolotti, P., De Luca, M., Judica, A., and Spinelli, D. (2008). Isolating global and specific factors in developmental dyslexia: a study based on the rate and amount model (RAM). *Exp. Brain Res.* 186, 551–560. doi: 10.1007/s00221-007-1257-9

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 May 2014; accepted: 21 October 2014; published online: 02 December 2014.*

*Citation: Marinelli CV, Traficante D and Zoccolotti P (2014) Does pronounceability modulate the letter string deficit of children with dyslexia? A study with the rate and amount model. Front. Psychol. 5:1353. doi: 10.3389/fpsyg.2014.01353*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Marinelli, Traficante and Zoccolotti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Order short-term memory is not impaired in dyslexia and does not affect orthographic learning

## *Eva Staels and Wim Van den Broeck\**

*Department of Clinical and Lifespan Psychology, Faculty of Psychology and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium*

### *Edited by:*

*Donatella Spinelli, "Università di Roma Foro Italico," Italy*

#### *Reviewed by:*

*Robin Litt, University of Oxford, UK Martine Marie Poncelet, University of Liège, Belgium*

#### *\*Correspondence:*

*Wim Van den Broeck, Faculty of Psychology and Educational Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium e-mail: willem.van.den.broeck@ vub.ac.be*

This article reports two studies that investigate short-term memory (STM) deficits in dyslexic children and explores the relationship between STM and reading acquisition. In the first experiment, 36 dyslexic children and 61 control children performed an item STM task and a serial order STM task. The results of this experiment show that dyslexic children do not suffer from a specific serial order STM deficit. In addition, the results demonstrate that phonological processing skills are as closely related to both item STM and serial order STM. However, non-verbal intelligence was more strongly involved in serial order STM than in item STM. In the second experiment, the same two STM tasks were administered and reading acquisition was assessed by measuring orthographic learning in a group of 188 children. The results of this study show that orthographic learning is exclusively related to item STM and not to order STM. It is concluded that serial order STM is not the right place to look for a causal explanation of reading disability, nor for differences in word reading acquisition.

**Keywords: dyslexia, short-term memory, serial order, reading acquisition, orthographic learning, phonological processing**

### **INTRODUCTION**

Developmental dyslexia is commonly defined as a disability characterized by low reading achievement and deficiencies in learning to spell and write (Snowling, 2012). Since the beginning of the research into dyslexia, a number of causal hypotheses have been formulated. The most dominant theory attributes the specific problems associated with dyslexia to a phonological processing deficit (for reviews, see Stanovich and Siegel, 1994; Vellutino et al., 2004; Ziegler and Goswami, 2005). However, based on the observations that dyslexic persons often show impairments on a wider variety of cognitive tasks, some researchers believe that the underlying cause of dyslexia should be situated in a more general process. Recently several new causal hypotheses have been formulated. Ahissar and co-authors proposed the anchoring-deficit hypothesis (Ahissar et al., 2006) which suggests that dyslexics have a general difficulty in automatic extraction of stimulus regularities from auditory inputs. Also recently formulated is the visual attention span hypothesis which proposes that difficulties in processing visual elements simultaneously is at least one cause of dyslexia (Bosse et al., 2007), and according to the visual crowding hypothesis, dyslexics are impaired in recognizing a target due to the presence of neighboring objects in the peripheral visual field (Spinelli et al., 2002). Again another hypothesis has been put forward that attributes the problems of dyslexics to a deficit in the perceptual experience of rhythmic timing (Goswami et al., 2002). All these hypotheses about the underlying deficit in dyslexia have mainly been investigated in one research group. On the one hand, this proliferation of causal theories is an exciting and positive feature of contemporary dyslexia research, but on the other hand there is also a dire need for critical replication studies. One of the most recent causal hypotheses of dyslexia attributes the specific problems of dyslexics to a general problem with learning serial order information, or at least to an additional serial learning problem. As learning to read words can be understood as the acquisition of grapheme and phoneme sequences, these researchers suggest that people with dyslexia have a specific deficit in serial order learning. This idea has been investigated by two groups of researchers. First, there is a group of researchers who consider that dyslexics experience difficulties with the consolidation or transfer of serial order information, initially stored in short-term memory (STM), into a stable long-term memory trace. Szmalec et al. (2011) reported empirical evidence for this hypothesis by showing a deficient Hebb repetition effect in dyslexic individuals, even for non-verbal modalities. However, these results could not be confirmed in a recent replication of this study including some methodological improvements (Staels and Van den Broeck, in press). As the results of this replication study show that dyslexics do not suffer from a specific deficit in the consolidation of serial order information in long-term memory, we may wonder whether the problem with storing serial order information is actually not situated in long-term memory but rather in STM. This idea has already been investigated by Martinez Perez and co-authors (Martinez Perez et al., 2012b, 2013) and will also be the main research question of the current study. We will first discuss recent studies and the underlying theoretical assumptions regarding STM deficits in dyslexia. Afterwards, we will also focus on the relationship between STM and reading development.

Martinez Perez and co-authors (Martinez Perez et al., 2012b, 2013) investigated whether the verbal STM deficits often reported in dyslexia can be explained exclusively by the poor phonological processing abilities that characterize dyslexia or whether dyslexics in fact suffer from an additional deficit at the level of serial order STM. Previously, verbal STM deficits in dyslexia had been mainly investigated using tasks that confounded item and serial order information recall (Kramer et al., 2000; Tijms, 2004). Hence, it was not clear whether the poor performance of dyslexic children on these tasks is due to a specific deficit in item STM, order STM, or both. Some recent STM models (Henson, 1998; Brown et al., 1999; Burgess and Hitch, 1999; Gupta, 2003) suggested that verbal item information is stored via temporary activation of phonological and lexo-semantic representations in the language network. Hence, storage of item information would depend directly on the quality of phonological representations in longterm memory and it would only be logical that this is impaired in dyslexia. On the other hand, storage of serial order information would occur via a language-independent system and should therefore be less sensitive to verbal long-term memory representations. Martinez Perez et al. (2012b) argued that if dyslexics would not only show impairments in item STM but would also show impairments in serial order STM, this deficit could not be explained by exclusively referring to poor phonological processing abilities but would be the result of a specific deficit of serial order STM. To investigate this hypothesis they used the distinction between STM for item information and STM for serial order information. In their first seminal study (Martinez Perez et al., 2012b), they administered two tasks designed to maximize either serial order or item retention abilities in a group of dyslexic children, a chronological age-matched group and a reading-level matched group. To assess item retention capacity a non-word delayed repetition task was constructed. In every trial a non-word was presented auditorily to the participants. Participants had to repeat the non-word after a period of time during which they had to perform a distractor task. To assess serial order retention capacity a serial order reconstruction task was administered. For this task in every trial participants had to remember sequences of two to seven real words that were also presented auditorily. Afterwards they were instructed to arrange pictures of these real words in the exact same order as they were presented. The researchers observed that children with dyslexia showed not only impairments on STM for item information but also on STM for serial order information. They concluded that the impairment on STM for serial order information was the most severe since the dyslexic group showed significantly lower performance on the serial order STM task relative to both the age-matched and the reading-level matched control groups, whereas the item STM impairment was only apparent relative to the chronological age-matched control group. In a second study, Martinez Perez et al. (2013) conducted a similar study as the one described previously, but this time they selected a group of adult dyslexics and a chronological age-matched control group without any reading problems. After observing item and serial order STM deficits in the dyslexic group in their first experiment by using the same tasks as in their first study, in a second and third experiment they assessed item and serial order STM retention capacities within the same STM task trying to make a more direct comparison. Additionally, in the third experiment they attempted to equate task sensitivity (difficulty) of item and serial order memory assessments. Again, the authors reported item and serial order STM deficits in the dyslexic group and most importantly they observed that the deficit was stronger at the level of order retention capacities.

As Martinez Perez and co-authors suggest that dyslexia is characterized by a specific deficit in serial order STM, they argue that this impairment in serial order retention capacity could have a negative effect on reading acquisition because learning to read new words can be understood as the acquisition of grapheme and phoneme sequences (Martinez Perez et al., 2012a). In a number of recent studies these researchers have explored the relation among item STM, order STM and language development. They observed that serial order STM capacity is a critical determinant of (oral) vocabulary knowledge and acquisition relative to item STM (Majerus et al., 2006a,b, 2008a,b; Leclercq and Majerus, 2010). Therefore, they argue that serial order STM capacity not only depends on a language-independent system but also appears to be important for the acquisition of new phonological representations (Martinez Perez et al., 2012a). In a recent longitudinal study, Martinez Perez et al. (2012a) investigated if this idea could also be extended to the acquisition of reading and, more precisely, to the acquisition of decoding processes. They investigated the relationship between item STM, order STM and reading development by administering an item and a serial order STM task at the age of kindergarten, and reading decoding ability was assessed 1 year later using a non-word reading task. They reported that serial order STM but not item STM predicted independent variance in reading decoding abilities. Based on the results of this study, the authors argue for a causal role of order STM capacity in reading acquisition.

The current study consists of two experiments. The first experiment will investigate STM deficits in dyslexic children by conceptually replicating the study of Martinez Perez et al. (2012b). However, the method they used will be modified as we believe that some adjustments can improve our study. The second goal of this study is to investigate the relationship between STM and reading acquisition. As STM for order information seems to play a specific role in reading decoding acquisition, order STM capacity could also be important for the acquisition of new long-term orthographic representations as Martinez Perez et al. (2012b)suggest. For that reason, in our second experiment we will use Share's (1995, 1999) self-teaching paradigm to assess reading acquisition. More information about the purpose and the theoretical background of the second experiment will be given in the introduction of Experiment 2. We will first continue by discussing our concerns about a number of methodological issues we encountered in recent studies. Afterwards, we present the methodological improvements we will introduce in our study to address these issues.

First of all we are concerned about the use of a reading-level match (RLM) design. Although this design is still used in some recent studies, it was formally proven that this method often entails methodological problems as it typically confounds diagnostic status with age (cf. Van den Broeck et al., 2010; Van den Broeck and Geudens, 2012; but see Zhou et al., 2014, for a notable exception in which a retrospective RLM-design is used comparing groups when they are at the same age).

In the RLM design of the Martinez Perez study, individuals with reading disabilities were matched with younger typical readers on a measure of reading ability (a text reading test). After this match both groups were compared on the two STM tasks and the researchers concluded that the dyslexic group had a specific deficit for serial order STM. However, Van den Broeck and Geudens (2012) have shown that a RLM design is likely to create processing deficit findings that may in fact be the result of the age differences between groups. One plausible scenario is that the group of older dyslexic readers reached the same reading score in the text reading test as the younger typical readers because they could rely on better word specific knowledge simply because they are older (for evidence see Van den Broeck et al., 2010). The younger normal readers on the other hand probably depended more on their decoding ability in order to reach the same performance level as the older dyslexic readers on the text reading task. This reasoning implies that the RLM matching procedure created an imbalance in decoding ability between both groups. As decoding ability is plausibly associated with the ability to remember order information, it is possible that the younger control group of normal readers only performed better on the serial order STM task as a result of the created imbalance by the design. To be sure that impaired serial-order learning in STM is a genuine characteristic of reading disability, a more direct comparison between typical and disabled readers of the same age is required.

Another methodological problem that occurs in many studies is the fact that researchers rely on the presence of a statistical interaction as evidence for a group related difference. In both studies of Martinez Perez et al. (2012b, 2013), they interpret their results in terms of an interaction effect between task (item memory vs. serial order memory) and group (dyslexic vs. age control group or RL control group). Although this interaction effect was not tested statistically they took the fact that the dyslexic group only showed a significantly lower performance than the reading-level matched control group on the serial order but not on the item STM task, as an indication that the serial order STM deficit was the most severe. The problem with this kind of interpretation is that researchers are usually unaware of the precise form of the relationship between the observed measures and the underlying constructs (Dunn and James, 2003). Therefore, it has been argued that relying on the presence of a statistical interaction as evidence for a qualitative group-related difference is not without problems (Loftus, 1985; Loftus et al., 1987, 2004). Even non-ordinal interaction effects can be made to disappear or reverse by applying a suitable monotonic non-linear transformation to the dependent variable (Bogartz, 1976; Loftus, 1978). This scale-dependency problem is still exacerbated in research where non-experimental variables such as age or pathology are involved because in such situations it is likely that an unspecific general factor influences performance (Kliegl et al., 1994). In the study of Martinez Perez et al. (2012b) one can easily imagine that an overall STM deficit could influence both STM tasks in an unequal manner (for an example of the effects of a general factor, see Van den Broeck and Geudens, 2012, p. 425). As a consequence, an observed interaction effect would be fictitious. This scale-dependency problem also arises when floor or ceiling effects occur in the data (Loftus, 1985).

A last methodological concern in the studies of Martinez Perez et al. (2012b, 2013) is the fact that they did not match their dyslexic and control groups on attentional functioning. Although the authors mention attentional functioning as a potential confounding factor, they refute this possibility by arguing that the order STM task was attentionally not more demanding than the item STM task because error rates were larger in the item STM task than in the order STM task. Furthermore, they convey that dyslexic participants with associated attentional impairment were excluded from the study and therefore they find it unlikely that attentional difficulties could explain the serial order STM impairment in the dyslexic group. However, as the comorbidity of developmental dyslexia and attention deficit disorders (ADHD) is a well-known fact (Araujo, 2012; Boada et al., 2012), and the serial order STM task is very demanding on sustained and focused attention, a serial order STM effect is not necessarily the result of a deficit in serial order retention, but may be attributed to the differential impact of comorbid attention problems on the two memory tasks (see also Wimmer's critique on the automatization deficit hypothesis, Wimmer et al., 1999). For this reason, any research aiming to compare a dyslexic group with a control group on cognitive processing should always make sure that both groups are matched on, or at least controlled for, attentional functioning.

As a result of these three major concerns we will adjust the method used by Martinez Perez et al. (2012b, 2013). To investigate whether dyslexics do suffer from a specific serial order learning deficit in STM it is crucial to make a direct comparison of serial order STM retention capacity when item STM retention capacity is equated between the dyslexic group and a control group of the same age. Indeed, a specific problem in serial order retention can only be proven by directly comparing dyslexic and typical individuals who score equally on the item retention task. When there is considerable overlap between the item retention scores of both groups, state trace analysis (STA) used as an equivalence method is an excellent technique to perform this comparison (see Van den Broeck and Geudens, 2012). In general, STA as a matching technique could be effectively adopted whenever a group showing a particular disorder has to be matched with a typical group, in order to test for a hypothesized specific deficit.

## **MATERIALS AND METHODS**

## **EXPERIMENT 1**

In Experiment 1 we investigated item and serial order STM capacities in a group of dyslexic and a group of control children matched on IQ and age.

## *Method*

*Participants.* A total of 97 children of fourth and fifth grade participated in this study. Thirty-six children had an official diagnosis of dyslexia (20 boys and 16 girls) and 61 were IQmatched control children without any reading problems (29 boys and 32 girls). Dyslexic participants were either diagnosed by an individual speech therapist or by a specialized center. The diagnoses were all based on three criteria which are used by the Stichting Dyslexie Nederland (2008) (Foundation Dyslexia Netherlands): (1) reading and/or spelling abilities are significantly below the level of performance expected for their age, that is below percentile 10; (2) resistance to instruction despite effective teaching; (3) impairment cannot be explained by extraneous factors, such as sensory deficits. For further validation two norm-referenced Dutch word reading tests that are diagnostic for dyslexia were administered. The first test is the One Minute Test (OMT; Brus and Voeten, 1973), a word reading test in which participants are instructed to read aloud as many words correctly as possible within 1 min. The test consists of 116 real words (nouns, verbs, adjectives, etc.). These words are ordered from lower to higher reading difficulty degree. The second test is the Klepel (Van den Bos et al., 1994), a non-word reading test in which participants are instructed to read aloud as many non-words correctly as possible within 2 min. This test consists of 116 non-words of increasing difficulty. For both reading tests the raw score is the number of words read correctly. In addition to the reading tests, we administered several phonological processing tasks to characterize reading-related skills of both groups. These tests consisted of a phonological awareness task, a phonemic discrimination task and a rapid automatized color and digit naming task (Van den Bos and lutje Spelberg, 2007).

To match the dyslexic and the control groups on IQ, a short-form IQ measure was used including a verbal comprehension subtest (Vocabulary) and a perceptual reasoning subtest (Block design) of the Wechsler Intelligence Scale for Children III (Dutch version) (Wechsler et al., 2005). We also included the Dutch ADHD questionnaire (AVL) (Scholte and Van der Ploeg, 2005) to examine attentional functioning. The questionnaire results in two partial scores: a measure of attentional functioning and a measure of impulsiveness and hyperactivity. As we were only interested in attentional functioning, we only used the partial score on attentional functioning. The questionnaire was completed by the teacher of the participant. **Table 1** shows that the experimental group and the control group only differed on the two measures that are diagnostic for dyslexia and on two of the phonological processing tasks. The dyslexic group also showed higher scores on the attentional functioning questionnaire but this difference just missed statistical significance.

All children attended regular elementary schools, located in Flanders (Dutch-speaking part of Belgium). Most children were from indigenous families (60%) and children from foreign origins were mainly of Moroccan descent. All children were checked and had sufficient command of the Dutch language to be able to study the Dutch curriculum. Two test assistants were instructed to perform this study.

*Experimental design and procedure.* Testing took place on an individual basis in a quiet classroom at the participant's school. The experimental procedure consisted of two test phases. Each test phase lasted approximately 40 min. All tasks were administered in a fixed order to ensure that the test situation was the same for every participant. During the first session the Block design subtest of the WISC, the OMT, the Klepel, the Serial order STM task and the phonological awareness task were administered. During the second session the Vocabulary subtest of the WISC, the item information STM task, the phonemic discrimination task and the rapid automatized color and digit

### **Table 1 | Characteristics of the dyslexic and control groups (means and standard deviations).**


*Values in bold are statistically significant at p* = *0*.*05.*

tasks were administered. All computerized experiments were programmed and presented on a laptop computer using Microsoft Office PowerPoint (2007).

### *Materials*

### *Phonological processing tasks.*

*Phonemic discrimination task.* Phoneme discrimination abilities were measured using a minimal pair discrimination task. One hundred pairs of nonsense CCV or CCCV syllables were constructed. Fifty pairs of syllables were identical (e.g., sta-sta), 25 pairs differed in one phonetic feature (e.g., dra-pra) and 25 pairs contained a phoneme transposition (e.g., spo-pso). Stimuli were digitally recorded by a female speaker and presented auditorily through headphones. Immediately after presenting a syllable pair, participants were asked to indicate whether both nonsense syllables were identical. The score was the total number of correct answers. Unidimensionality was tested by fitting a one-factor model on categorical data with MPlus 7.11 (Muthén and Muthén, 1998-2012). This model fitted the data well (chi square = 3809.11, *df* = 3827, *p* = 0.585; CFI = 1.00, RMSEA = 0.000). Cronbach's alpha was 0.795, indicating good reliability of the test scores.

*Phonological awareness task.* Phonological awareness abilities were assessed using a position analysis task. For this task, a list of 24 non-words was constructed as stimuli. Every non-word consisted of two syllables and had a length of six or seven letters. Stimuli were digitally recorded by a female speaker and presented auditorily through headphones. Immediately after presenting a non-word participants were asked to repeat the sound that came immediately before or after a target phoneme in the non-word indicated by the experimenter. Half of the items involved identifying the sound before and half after a target phoneme (e.g., which sound comes before "r" in "pristak"?; which sounds comes after "f" in "dreflo"?). The score was the total number of correct answers. A one-factor model fitted the data (chi square = 278.94, *df* = 252, *p* = 0.117; CFI = 0.943, RMSEA = 0.034). Cronbach's alpha was 0.837, indicating good reliability of the test scores.

*Rapid automatized naming.* To assess the speed of lexical access, we used two tasks from the CB and WL test (Van den Bos and lutje Spelberg, 2007), automatic color naming and automatic digit naming. The color naming task involved five colors (black, yellow, red, green, and blue), each presented 10 times. The digit naming task involved five digits (2, 4, 8, 5, 9), each presented 10 times. Each test card contained 50 items of the five colors/digits in random order presented in five columns. In both tasks participants were asked to name the colors/digits as quickly as possible. The score was the time participants needed to name all colors/digits irrespective of response accuracy. Reliability estimates offered by the authors of the test are very good (split half reliability for colors is 0.88 for 4the grade and 0.93 for 5th grade; for digits 0.80 for 4the grade and 0.89 for 5th grade).

### *Short-term memory tasks.*

*Item short-term memory task.* As a measure of STM for item information we used a similar task as the delayed item repetition task of Martinez Perez et al. (2012b) and Leclercq and Majerus (2010). A list of 30 CVC non-words was constructed as stimuli (see Appendix A). To maximize the phonological processing demands of this task, stimuli were new and diphone frequency and phonological neighborhood were significantly lower relative to a representative sample of word stimuli. Stimuli were digitally recorded by a female speaker and presented auditorily through headphones to the participant. Each non-word was presented separately. Immediately after the presentation of an item, participants were asked to repeat the non-word to confirm that they had correctly perceived the item. After repeating the item, participants had to count in steps of 2 during 6 s. Afterwards participants were asked to repeat the item again. No feedback was given to the participants. The score was the number of correctly repeated items. A one-factor model fitted the data (chi square = 422.18, *df* = 405, *p* = 0.268; CFI = 0.931, RMSEA = 0.021). Cronbach's alpha was 0.783, indicating good reliability of the test scores.

The task was presented to the child as a game (Leclercq and Majerus, 2010):

You are on an adventure in a castle. The castle has many doors which you have to open. In order to do so, you have to remember a password. You will hear the password through the headphones. The password is a word from a magic language you don't know. Pay close attention to the word and repeat the word out loud. Immediately afterwards start to count out loud by steps of two (0, 2, 4, 6, 8,. . . ) until I say stop and ask you to repeat the password again. Okay?

*Serial order short-term memory task.* As a measure of STM for serial order information we used a similar serial order reconstruction task as Martinez Perez et al. (2012b) and Leclercq and Majerus (2010). Seven names of highly familiar animals (kat, hond, vis, beer, aap, leeuw, kip [cat, dog, fish, bear, monkey, lion, chicken]) were chosen to form lists with lengths ranging from two to seven items. All items were monosyllabic words and every item could only appear once in one trial. The trials were presented by increasing list length, with four trials for each length. The trials were digitally recorded by a female speaker and presented auditorily through headphones to the participant. At the end of each trial, participants received cards of the mentioned animals in random order and were asked to rearrange them in the same order as they were presented. In this task, retention requirements for serial order information were maximized by offering the participant only the cards which contained the pictures that represented the animals that were named in that trial and retention requirements for item information were minimized by using stimuli that were highly frequent and well known in advance. All participants completed all trials and sequence lengths. Since items within a series are correlated, a one-factor model with correlated errors for items belonging to the same series was fitted to the data (chi square = 796.89, *df* = 700, *p* = 0.006; CFI = 0.956, RMSEA = 0.038). Cronbach's alpha was 0.874. However, with correlated errors this index may underestimate or overestimate reliability (Raykov, 1998, 2001). A more conservative estimate is given by the Spearman-Brown coefficient, which was 0.702, indicating at least reasonable reliability of the test scores.

The experimenter presented the task as follows (Leclercq and Majerus, 2010):

Every year, animals from all over the world gather to have a huge race. This year, seven animals are participating: a cat, a dog, a chicken, a lion, a fish, a bear, and a monkey [the experimenter shows the cards of the corresponding animals]. Several races take place. Sometimes only two animals are participating. Sometimes there are three, four, or five animals. At other times, there are big races with six or seven animals. Through the headphones, you will hear someone announce the animal's order of arrival at the finish line, from the first to the last animal. Immediately after I give you the cards with the animals, you have to put the pictures of the animals on the podium in their order of arrival. The animal arriving first has to be put on the highest step and the last one on the lowest step. Okay?

## *Results*

First we analyze our data exactly as Martinez Perez et al. (2012b) did in their study. Afterwards we will address the methodological issues we mentioned before. For the item STM task we determined the proportion of items correctly repeated as the dependent variable (**Figure 1**). The mean proportion of items correctly repeated was significantly higher in the control group (67%) than in the dyslexic group (53%), *t*(95) = 4.192, *p* = 0.000. For the serial order STM task we determined the proportion of correctly placed items by pooling over all trials as the dependent variable (**Figure 2**). The mean proportion of items correctly placed over all trials was significantly higher in the control group (71%) than in the dyslexic group (64%), *t*(95) = 3.200, *p* = 0.002.

We also analyzed performance on the serial order STM task as a function of serial position to obtain a qualitative view of the serial order retention process. As we noticed that all participants

obtained a maximum score on all trials of with length of 2, 3, and 4, we restricted our analyses to list lengths of 5–7 to avoid floor effects. To increase the sensitivity of the analysis we combined serial positions 4 and 5 of list length 6 and serial positions 3 and 4 as well as 5 and 6 of list length 7. This means that we used the scores on the five positions of list length 5, and for list length 6 five scores were assembled (score on items in position 1, score on items in position 2, score on items in position 3, mean score on items in positions 4 and 5 and score on items in position 6) and for list length 7 also five scores were constructed (total score on items in position 1, total score on items in position 2, mean score on items in positions 3 and 4, mean score on items in positions 5 and 6 and score on items in position 7). Consequently, scores on five serial positions were entered into the analysis. **Figure 2** shows the proportion of correct responses as a function of group and serial position. A repeated measurements ANOVA revealed significant main effects of group, *F*(1, 95) = 8.610, *p* = 0.004, and serial position, *F*(4, 92) = 176.532, *p* = 0.000. No group by serial position interaction effect was found, *F*(4, 92) = 1.372, *p* = 0.250.

In order to verify whether reading disability affected one STM task after statistically controlling for the other memory task, we conducted analyses of covariance (ANCOVA). By entering a covariate into an ANOVA the covariance of this variable with the other independent variable(s) is removed before the influence on the dependent variable is determined. For the item STM task the effect of group remained significant when the performance on the serial order STM task was entered as a covariate, *F*(1, 94) = 8.587, *p* = 0.004. This means that even if the reading groups are statistically equated on the performance on the serial order STM task, the effect of group on the item STM task still remains significant. This result was in line with the results of Martinez Perez et al. (2012b). However, in contrast with their results, for the serial order STM task the effect of group disappeared when the performance on the item STM task was entered as a covariate, *F*(1, 94) = 1.906, *p* = 0.171. This implies that the difference between the dyslexic and control group on the serial order STM task is no longer statistically significant when differences on the item STM task are taken into account. This result demonstrates that the item STM task and the serial order STM task do not measure entirely independent processes.

Martinez Perez et al. (2012b) also predicted that item STM but not order STM should be related to phonological processing measures. In order to investigate their prediction, we performed a set of correlation analyses. We only report the correlations observed in the total group (virtually the same results were observed for the dyslexic and control groups when analyzed separately). The results of these analyses not only reveal significant correlations between item STM and both phonological tasks (phonemic discrimination and phonological awareness), but also between serial order STM and the phonological tasks (see **Table 2**). In fact, the latter were even somewhat larger. No significant correlations were observed between item STM or serial order STM and rapid automatized naming tasks. In contrast to the results of Martinez Perez et al. (2012b), our data show clearly that both item STM and serial order STM are related to phonological processing measures. Remarkably, serial order STM was significantly related


**Table 2 | Correlations and partial correlations controlling for age (between brackets) between short-term memory tasks, phonological processing tasks, reading tests and IQ-subtests for all participants (***N* **= 97).**

*Values in bold are statistically significant at p* = *0*.*05.*

**Table 3 | Characteristics of the dyslexic and control groups after matching on attentional functioning (means and standard deviations).**


*Values in bold are statistically significant at p* = *0*.*05.*

to both IQ-subtests, especially with block design, whereas item STM was not.

### *State trace analysis*

We now analyze our data using STA as an improved matching design. In the analysis we present here, another methodological improvement is introduced. The dyslexic group and the control group were not only matched on intellectual functioning but on attentional functioning as well. By discarding 20 control subjects and no dyslexic subject from the initial sample, we obtained similar distributions for both groups on the attention questionnaire. As **Table 3** indicates, after this additional matching, the newly formed groups of dyslexic children and control children only differed on the two measures that are diagnostic for dyslexia and on the two measures of phonological processing.

Using STA, serial order STM performance can be compared directly between the two groups at each level of item STM performance. Compared to the traditional method of interpreting interaction effects by comparing group differences across tasks, STA is more sensitive to detect a specific serial order STM deficit because by matching dyslexic and control subjects on item STM performance, both groups are equated on STM processing without involving the crucial serial order information. After inspecting that both groups show substantial overlap on the item STM performance, serial order STM is regressed on performance on item STM separately for the dyslexic group and the control group. It is tested whether a single line is suitable to explain the data (the null model not including reading group) or whether two different lines (one for dyslexic children and one for control children) are needed to describe the relation between serial order STM performance and item STM performance (the full model). If a single line would fit the data, this would imply that the relation between serial order STM performance and item STM performance is not affected by dyslexia. If, on the other hand, two lines would fit our data better, and the one for dyslexic children would be situated lower than the one for control children, this would be direct evidence for a specific serial order STM deficit in dyslexic children1 .

In this analysis, for each participant item and serial order scores were averaged and then plotted against each other (see **Figure 3**). Then, we tested in a hierarchical regression analysis whether group contributed significantly to serial order STM after including item STM performance in the regression equation. This analysis showed that adding group as a predictor doesn't significantly improve fit [*R*<sup>2</sup> null model <sup>=</sup> 0.271 *<sup>R</sup>*<sup>2</sup> full model <sup>=</sup> 0.285; -*<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.014; *<sup>F</sup>* change (1, 74) <sup>=</sup> <sup>1</sup>.488; *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.226].

Hence, the null hypothesis—that the state trace curves for the typical and for the disabled readers do not differ—could not be rejected. This means that if there was no difference between the curves of the two groups (if H0 is true) in reality, the probability of finding a difference as large as or even larger than in our sample is 0.23. As STA entails that the null hypothesis is in fact a substantive hypothesis, this number does not seem really convincing. Note that a non-significant result due to a lack of power must not be confused with support for the null hypothesis. What

<sup>1</sup>Although STA as a matching design is conceptually very similar to the idea of controlling for item STM capacity in an ANCOVA, ANCOVA or hierarchical linear regression analysis are only a few data-analytical techniques that can be used to apply the broader concepts of STA (see Prince et al., 2012).

we really want to know is the probability that the null hypothesis or the alternative hypothesis is true given the observed data (the inverse probability). To this end, a Bayesian analysis was performed in which the probability of the null model was compared to the probability of the full model given the data and given the assumption that no model was preferred above the other. For this comparison Bayesian Information Criteria (BIC) were calculated for both models. The BIC has been proposed by Raftery (1996) as an index to assess the overall fit of a model and allows a comparison of models (see also Long, 1997). Given that BIC assesses whether the model fits the data sufficiently well to justify the number of parameters that are used, the model with the lowest BIC is the best fitting, yet parsimonious model. The BIC-values indicated that the null model fitted the data best (BICnull = −144.88 and BICfull = −142.13). Based on the difference between these BIC-values the "Bayesian factor" could be calculated (Kass and Raftery, 1995). The Bayesian factor is equal to the posterior odds in favor of the most likely hypothesis. The posterior odds of the null model (MN) relative to the full model (MF) equal:

## Pr(MN/Observed Data) Pr(MF/Observed Data)

The Bayesian factor favoring the null model was 2.75. According to the criteria proposed by Kass and Raftery (1995), the data provided "positive" evidence for the null hypothesis.

### **EXPERIMENT 2**

The results of a longitudinal study of Martinez Perez et al. (2012a) indicate that order STM abilities in kindergarten predict later reading decoding abilities. They make the stimulating suggestion that besides this specific contribution of order STM in reading decoding processes, order STM capacity might also be important for the acquisition of orthographic representations of new written words in long-term memory, and hence could be a major factor in reading development. Therefore, in this second experiment we will focus on the prediction of a differential role of item STM and serial order STM capacity in the orthographic learning of primary school readers over the entire range of reading ability. Additionally, we test the hypothesis that serial order STM would be impaired in a group of relatively poorer readers.

Item STM and order STM abilities were measured using the same tasks as used in Experiment 1. We also administered the same phonological awareness task as used in Experiment 1 to investigate whether item STM and serial order STM are related to phonological processing abilities. To assess reading and spelling ability, two word reading tests and a spelling test were administered. In addition, orthographic learning was assessed using Share's (1999) self-teaching paradigm. According to the self-teaching hypothesis (Share, 1995, 1999, 2004) children are able to acquire orthographic representations independently from an external teacher. Orthographic learning, the process through which orthographic representations are formed, consists of two independent processes. First, an unfamiliar written word is phonologically recoded into its spoken form by using known grapheme-phoneme associations. If this step succeeds, the phonological code of the word will be mapped onto its orthographic counterpart, establishing word-specific knowledge of the spelling of the new word. The self-teaching hypothesis was supported in a number of studies using an experimental paradigm adapted from Reitsma (1983). In these studies, target words were presented several (four or six) times in a natural text (Share, 1999). These targets were pseudowords representing a fictitious place, animal or fruit. Every pseudoword (e.g., yait) had an alternative homophone spelling (e.g., yate) and in each case only one spelling, the target spelling, was presented to the participant. Each participant was asked to read aloud the stories and to answer some questions about the content of the stories afterwards to ensure that they understood the text. Following Reitsma's (1983) procedure, orthographic learning was assessed 3 days after text reading using three types of measures: an orthographic choice task, a naming task and a spelling task. For the first measure, orthographic choice, children were asked to select the correct spelling of the target among the two homophone spelling alternatives. Secondly, children were instructed to read a list of words appearing on a computer screen as quickly and accurately as possible. The list of words contained all targets and their homophone spellings. Finally, the last test of orthographic learning required children to reproduce the target spelling in writing. The general outcome of studies based on this paradigm was that 3 days after independently reading the stories aloud, target spellings were recognized more often, named faster and spelled more accurately than their alternate homophone spellings. Relatively few successful identifications of an unfamiliar word appeared to be sufficient to acquire orthographic representations for young children (Hogaboam and Perfetti, 1978; Reitsma, 1983; Manis, 1985) and also for poor readers Staels and Van den Broeck (2013). Although most evidence for the self-teaching hypothesis is based on oral reading, recent studies have shown the appearance of orthographic learning in silent reading as well (Bowey and Muller, 2005; Bowey and Miller, 2007; de Jong and Share, 2007; de Jong et al., 2009). These findings provide important support for orthographic learning occurring in independent daily reading.

## *Method*

*Participants.* One hundred and eighty eight third (38), fourth (93), and fifth (57) grade children participated in this study (96 boys, 92 girls). All children of entire classes were selected to participate in this study. Their age ranged from 7 years 11 months to 11 years 10 months, with a mean age of 9 years 6 months. All children attended regular elementary schools, located in several regions in Flanders and in urban and rural areas. Most children were from indigenous families (73%) and for 83% of the children their home language was Dutch. All children were checked and had sufficient command of the Dutch language to be able to study the Dutch curriculum. Four test assistants were instructed to perform this study.

*Experimental design and procedure.* The experimental procedure consisted of two phases. In the first test session a spelling test, based on the PI-dictee (Geelhoed and Reitsma, 1999) was administered for the entire class group. Afterwards the reading phase of Share's (1999) self-teaching paradigm was carried out. All students in the class were instructed to read all eight stories once in silence. They were encouraged to read the texts very attentively as they were warned that immediately after each text, two questions would be posed about the content of the stories to check text comprehension. All students were given enough time to read the texts and answer the questions at their own pace. This session lasted approximately 30 min. The second test phase took place on an individual basis in a quiet classroom at the participant's school. Two Dutch reading tests (OMT; Brus and Voeten, 1973 and the Klepel; Van den Bos et al., 1994), two measures of orthographic learning (orthographic choice task and orthographic spelling task), two STM tasks (serial order STM task and item STM task), the phonological awareness task and the Vocabulary subtest of the WISC-III (Dutch version) (Wechsler et al., 2005) were administered. All tasks were run in the indicated fixed order, except for the order of the two orthographic learning tasks which was counterbalanced across participants. All computerized experiments were programmed and presented on a laptop computer using Microsoft Office PowerPoint (2007).

We also included the Dutch ADHD questionnaire (AVL) (Scholte and Van der Ploeg, 2005) to examine attentional functioning in our experimental procedure. As we were only interested in attentional functioning, we only used the partial score on attentional functioning of the questionnaire. The questionnaire was completed by the teacher of the participant.

## *Materials*

*Spelling task.* A spelling task was constructed based on the Dutch spelling test PI-dictee (Geelhoed and Reitsma, 1999). As children of the third, fourth and fifth grade participated in this study fifteen words were selected varying in difficulty. Five words to assess spelling in every grade were chosen from the PI-dictee. For every word a sentence in which the word occurs was read aloud by the test assistant. The word the participants had to write down was repeated afterwards. The score was the total number of correctly written words. A one-dimensional model fitted the data well (chi square = 97.73, *df* = 90, *p* = 0.271; CFI = 0.993, RMSEA = 0.021). Cronbach's alpha was 0.827, indicating good reliability of the test scores.

*Phonological awareness task.* The same phonological awareness task as in Experiment 1 was used. A one-factor model fitted the data not quite well (chi square = 303.49, *df* = 252, *p* = 0.0145; CFI = 0.929, RMSEA = 0.033). After inspection of the modification indices, a two-factor model was fitted with all items requiring to give the phoneme(s) before the target phoneme loading in one factor, and all items requiring to give the phoneme(s) after the target phoneme loading in another factor (chi square = 256.70, *df* = 251, *p* = 0.389; CFI = 0.992, RMSEA = 0.011). Because the correlation of both factors was quite high (*r* = 0.672) and because both factors showed very similar correlations with all other tests, we decided to treat this test as measuring one concept. Cronbach's alpha was 0.835, indicating good reliability of the test scores.

## *Short-term memory tasks.*

*Item short-term memory task.* The same item STM task as in Experiment 1 was used. Although a one-factor model with all items included fitted the data reasonably well, inspection of the factor loadings revealed that one item "pob" did not load significantly in this factor. Probably the reason for this is the fact that this item is phonetically not a non-word in Dutch. After removing this item in a one-factor model a nice fit was obtained (chi square = 380.38, *df* = 377, *p* = 0.442; CFI = 0.995, RMSEA = 0.007). Cronbach's alpha was 0.803, indicating good reliability of the test scores.

*Serial order short-term memory task.* The same serial order STM task as in Experiment 1 was used. Since items within a series are correlated, a one-factor model with correlated errors for items belonging to the same series was fitted to the data (chi square = 802.84, *df* = 700, *p* = 0.004; CFI = 0.955, RMSEA = 0.028). Cronbach's alpha was 0.90. However, with correlated errors this index may underestimate or overestimate reliability (Raykov, 1998, 2001). A more conservative estimate is given by the Spearman-Brown coefficient, which was 0.639, indicating at least reasonable reliability of the test scores.

*Self-teaching phase.* The self-teaching phase of this study is based on Share's (1999) self-teaching paradigm. Eight short Dutch texts, similar to Share's (1999) stories, were composed for this study. All texts were adjusted to the overall reading level of the participants and ranged in length from 65 to 148 words (mean length 94). Targets were eight novel letter strings (pseudowords) representing a fictitious animal or person. Each target included two phonemes that could be represented by two alternate graphemes. These alternate letters occurred at various positions across target strings. The eight designed target quadruplets had a length of one or two syllables and ranged from five to seven letters. Four versions of each story were created, each employing one of the four homophone spellings of the following target quadruplets: Bleip/ Blijp/Bleib/Blijb; Traug/Trauch/Troug/Trouch; Drouft/Droufd/ Drauft/Draufd; Reilt/Reild/Rijlt/Rijld; Weipsik/Wijpsik/Weipzik/ Wijpzik; plijmap/pleimap/plijmab/plijmab; Kauwand/Kouwand/ Kauwant/Kouwant; Hichtop/Higtop/Hichtob/Higtob. Each target appeared six times in one of the eight texts and once in one of the two comprehension questions. Texts were presented separately on A4 paper.

*Orthographic learning tasks.* Orthographic learning was assessed 1–7 days after the self-teaching phase with an orthographic choice task and a spelling task (Share, 1999).

*Orthographic choice task.* Participants were first asked a question to recall the target word (e.g., "Do you remember the name of the monkey who wanted to move to the zoo in the story?"). Each participant was then shown the four alternatives of the target word. The examiner presented a sheet of paper to the participant with the four alternate spellings of the target words written next to each other. The words were written in a random order. Participants were asked to choose the spelling of the pseudoword they had read in the story. The score on this task was the total number of items correctly chosen with a maximum score of eight.

*Spelling task.* Participants were asked to spell the target spelling of the animal or person they had read about in the story. If the participant could not recall the name of the target, the name was provided by the examiner. The score was the sum of the number of target graphemes written correctly within all pseudowords. This means that for every target word a score of 0, 1, or 2 was given with a maximum score of 16 on the entire task.

## *Results*

*Item and serial order STM.* For the item STM task we determined the proportion of items correctly repeated. The mean proportion of items correctly repeated was 72%. For the serial order STM task the proportion of correctly placed items was determined by pooling over all trials. The mean proportion of items correctly placed over all trials was 73%. As in our first experiment, we performed an analysis on performance as a function of serial position to obtain a qualitative view of the serial order retention process. Again, we restricted our analyses to list lengths 5–7 and we combined serial positions 4 and 5 of list length 6 and serial positions 3 and 4 as well as 5 and 6 of list length 7. Consequently, five serial positions were entered into the analysis (see **Figure 4**).

*Orthographic learning.* Because we used a silent reading procedure, it was not possible to determine the proportion of correctly decoded pseudowords. On the comprehension questions the mean proportion of correct answers was 93%, indicating that these questions were simple, yet effective to check whether the children read the texts carefully.

The dichotomous categorical (success/failure) data from the orthographic choice tasks were tested using a one-tailed *t*-test for the divergence of the predetermined chance-level proportion of 25%. We determined a chance level of 25% for the orthographic choice task because participants were forced to choose one of the four presented homophone foils. A random pick would therefore yield a score of 25%. For the orthographic spelling task participants were asked to write down the target pseudoword. As the

pseudoword was provided auditorily by the experimenter, participants had a chance of 50% to write each of the two homophone graphemes that occurred in the pseudoword correctly. Hence, they had a chance of 25% to write both homophone graphemes correctly in every pseudoword. The overall proportion of correct choices on the orthographic choice task was 0.43 (*SD* = 0.20), which was significantly larger than the chance level proportion correct of 0.25, *t*(186) = 12.374, *SE* = 0.015, *p* = 0.000, one-tailed. The proportion of correctly spelled target graphemes in the spelling task was 0.64 (*SD* = 0.15), which was also significantly larger than the proportion correct of 0.25, *t*(186) = 34.704, *SE* = 0.011, *p* = 0.000, one-tailed. Summarized, these results demonstrate that target spellings were recognized more often and correctly spelled more often than chance level.

To investigate the relationship between STM and reading acquisition we performed a number of correlation analyses. **Table 4** shows that of the two orthographic learning measures only the orthographic choice task is related to item STM capacity. Serial order STM is clearly not related to orthographic learning. We even found a small but significant negative effect of serial order STM on the orthographic spelling task after controlling for item STM (beta = –0.168, *t* = −2.17, *p* = 0.031). **Table 4** also reveals significant correlations between both STM tasks and the phonological awareness task. The correlation between the phonological awareness task and serial order STM is almost as high as the correlation between phonological awareness and item STM.

*Item and serial order STM in relatively poor readers and typical readers.* We divided the group of participants in two groups based on their reading level. To define these groups, the mean of the standard scores of the two Dutch reading tests (OMT; Brus and Voeten, 1973 and the Klepel; Van den Bos et al., 1994) was taken. Participants who scored one standard deviation below this mean were assigned to the group of poor readers. All other participants were assigned to the group of typical readers. **Table 5** shows that the two groups differ on the two reading tests, on the attention


**Table 4 | Correlations and partial correlations controlling for age (between brackets) between short-term memory tasks, orthographic learning tasks, spelling and reading tasks, vocabulary knowledge, and phonological awareness task for all participants (***N* **= 188).**

*Values in bold are statistically significant at p* = *0*.*05.*

### **Table 5 | Characteristics of the group of poor readers and the group of typical readers (means and standard deviations).**


*Values in bold are statistically significant at p* = *0*.*05.*

questionnaire, on the phonological awareness task, the spelling task and on both STM tasks.

The mean proportion of items correctly repeated in the item STM task was significantly higher in the typical readers group (75%) than in the poor readers group (58%), *t*(186) = −5.371, *p* = 0.000. For the serial order STM task, the mean proportion of items correctly placed over all trials was significantly higher in the typical readers group (74%) than in the poor readers group (69%), *t*(186) = −2.541, *p* = 0.012.

In order to verify whether reading ability affected one STM task after statistically controlling for the other memory task, we conducted analyses of covariance (ANCOVA). For the item STM task as the dependent variable, the effect of group remained significant when the performance on the serial order STM task was entered as a covariate, *F*(1, 185) = 22.339, *p* = 0.000. In contrast, for the serial order STM task as the dependent variable, the effect of group disappeared when the performance on the item STM task was entered as a covariate, *F*(1, 185) = 0.728, *p* = 0.395. These results are similar to the results of our first experiment and demonstrate that the item STM task and the serial order STM task do not measure entirely independent processes. More specifically, when differences on the item STM task are taken into



*Values in bold are statistically significant at p* = *0*.*05.*

account, serial order STM differences between the reading groups disappear.

To directly compare serial order STM performance in both groups for each level of item STM performance, we analyze our data again using STA. First, the poor readers group and the typical readers group were not only matched on intellectual functioning but on attentional functioning as well. By discarding 91 normal readers and no poor readers from the initial sample, we obtained similar distributions for both groups on the attention questionnaire. As **Table 6** shows, after this additional matching, the newly formed groups of poor readers and typical readers differed on the two measures that are diagnostic for dyslexia, on the phonological awareness task and on the spelling task.

After inspecting that both groups show substantial overlap on the item STM performance, serial order STM is regressed on performance on item STM separately for the poor reading group and the typical reading group. In this analysis, for each participant item and serial order scores were averaged and then plotted

against each other (see **Figure 5**). Then, we tested in a hierarchical regression analysis whether group contributed significantly to serial order STM after including item STM performance in the regression equation. This analysis showed that adding group as a predictor doesn't significantly improve fit [*R*<sup>2</sup> null model <sup>=</sup> 0.146 *<sup>R</sup>*<sup>2</sup> full model <sup>=</sup> 0.147; -*<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.001; *<sup>F</sup>* change (1, 94) <sup>=</sup> <sup>0</sup>.104; *<sup>p</sup>* <sup>=</sup> 0.747]. Hence, the null hypothesis—that the state trace curves for the typical and for the disabled readers do not differ—could not be rejected. A Bayesian analysis was performed in which the probability of the null model was compared to the probability of the full model given the data and given the assumption that no model was preferred above the other. The BIC-values indicated that the null model fitted the data best (BICnull = 759.57 and BICfull = 764.04). Based on the difference between these BIC-values the "Bayesian factor" could be calculated (Kass and Raftery, 1995). The Bayesian factor favoring the null model was 4.47. According to the criteria proposed by Kass and Raftery (1995), the data provided "positive" evidence for the null hypothesis.

## **DISCUSSION**

The main aim of Experiment 1 was to try to replicate the specific serial order STM deficit in dyslexic readers as reported by Martinez Perez and co-authors (Martinez Perez et al., 2012b, 2013). As our results showed, we were unable to detect a specific deficit in serial order STM capacity in dyslexic children. However, a potential limitation of this experiment is that many participants were in fact bilingual. Although all children were proficient in Dutch, as they typically attended Flemish (Dutch speaking) schools from first grade on, it is possible that the reading problems of the bilingual dyslexic children were partly affected by their bilingual status. To examine this potential confound of language background, the interaction effect of diagnostic category with language spoken at home (coded 1 for Dutch speaking children and 0 for all other children) was tested. This effect was not significant (*p* = 0.33) implying that the effect of being dyslexic or not on serial order STM performance was not different for bilingual children and monolingual Dutch speaking children. Moreover, when only the Dutch speaking children were included in the hierarchical regression analysis (13 dyslexic and 13 control), the group factor (being dyslexic or not) did not contribute significantly after controlling for item STM (*p* = 0.14). Likewise, in Experiment 2 no specific serial order STM deficit was detected in poor readers. Given that both experiments were designed with more power (larger samples sizes) than those of Martinez Perez et al. and with a direct comparison between item and serial order STM performance, using STA, we can be confident to conclude that the impairments in STM often reported in dyslexia are not due to a specific impairment in the retention of serial order information. Although it would be difficult to speculate on why the results of Martinez Perez and co-authors (Martinez Perez et al., 2012b, 2013) did show such a dyslexic deficit and our experiments did not, it seems that the use of STA enabled a more direct comparison of serial order STM performance after equating item STM performance between groups. The lack in their studies of a more stringent match on attentional functioning does not seem to be an important factor as the conclusions of our own studies were not different when we analyzed the entire original group of participants without matching on attention (not reported). Probably the match on item STM performance which is inherent in STA is already sufficient to match the groups on attentional functioning. The apparent instability of a serial order STM deficit in dyslexic individuals is also evident from two recent studies. Hachmann et al. (2014) found evidence for such a deficit in dyslexic adults whereas Binamé and Poncelet (2014) reported an item STM deficit in adult poor spellers as well as a serial order STM deficit. Based on the data of these authors we found that the serial order STM deficit disappeared after controlling for item STM performances2. Moreover, these authors could not find a deficient Hebb repetition effect in their sample of poor spellers. Everything being taken into account, would a serial order STM deficit in dyslexics be a robust phenomenon, one would have expected a more consistent pattern of results.

A second important conclusion that follows from both reported experiments is that the measurement of serial order STM is at least as strongly related to phonological processing as is the measurement of item STM. What do these results tell us about theories assuming the separability of both STM components, and about the role of phonology as a basis of reading (dis)ability? First, when the serial order STM task bears a phonological component (names of animals), even if the phonological demands are minimized, the relationship with phonological abilities proves to be at least as strong as is the case for item STM. This implies that the assumed disconnection between item and serial order STM processes does not coincide with the phonology/non-phonology distinction. On the other hand, our results are in agreement with the idea of Martinez-Perez and co-authors that both STM processes are partly independent. As the correlations between item STM performance and serial order STM performance are far below the reliability estimates of both tasks (*r* = 0.50 in Experiment 1

<sup>2</sup>We thank the authors for kindly providing their data for further analysis.

and *r* = 0.34 in Experiment 2), it is clear that serial order STM scores contain some unique variance that is not accounted for by item STM scores. There is both behavioral and neurological evidence that item information and sequence information are coded distinctly in STM. The recall of verbal item information is shown to be affected by psycholinguistic properties such as word frequency and semantic content, while recall of the order of the items is not (e.g., Saint-Aubin and Poirier, 1999; Nairne and Kelley, 2004). Moreover, neuroimaging studies have shown that the retention of item memory during STM-tasks is associated with activation of phonological and semantic processing areas in the bilateral temporal lobes, whereas non-linguistic brain areas in the right intraparietal sulcus are activated when processing order information in STM (Majerus et al., 2006b, 2008a, 2010). Quite interestingly, in Experiment 1 we found that serial order STM and not item STM shows a substantial correlation with block design of the WISC-III (*r* = 0.38 vs. *r* = 0.06). Martinez Perez et al. (2012a) reported a similar result (*r* = 0.48 vs. *r* = 0.28) for a group of Kindergarten children. Taken together, the evidence seems to indicate that the partial independence of both STM processes is not attributable to a difference in phonological involvement, but is a result of the influence of non-verbal intelligence processes in serial order STM. Apparently, reconstructing the serial order of a number of elements is aided by active higher order restructuring of the material, possibly involving a visuospatial component mediated by the right intraparietal sulcus (Van Dijck et al., 2013; for a review see Majerus, 2009). Theoretically, we think that item STM could be considered as a necessary condition for serial order STM, but not as a sufficient condition, since it needs an additional non-verbal intelligence component. To conclude this issue, serial order STM, as it is involved with higher-order intelligent processes, seems not to be the right place to look for an explanation of a deficiency in the acquisition of the "modular" word reading process (Stanovich, 1988, 1990).

Our finding of an equal contribution of phonological processes in both STM tasks could be interpreted as support for the phonological deficit hypothesis of dyslexia as it indicates that verbal STM deficits (for item or order information) often reported in dyslexia can be explained by the poor phonological processing abilities that characterize dyslexia. However, our findings do not prove that serial order STM *per se* is phonological in nature. It remains to be seen in further empirical research whether the relationship between serial order STM and phonological abilities has to be attributed to serial order processing as such, or to the phonological nature of the stimulus material. As we adopted the serial order STM task from the study of Martinez Perez et al. (2013) we used the same phonological stimuli as they did. Although this task minimizes phonological demands, it would be worthwhile to investigate the role of phonological processes in serial order STM by using a task with non-phonological stimuli. In a study on Hebb learning (Staels and Van den Broeck, in press), we observed a substantial correlation between serial order learning of abstract visual forms that could not be verbalized with pseudoword reading and real word reading. Although this finding is suggestive for a role of serial order learning *per se* in reading ability, further research has to determine whether this relationship persists if item STM for visual abstract forms is tested. We suggest using a design in which item vs. serial order STM tasks, and phonological vs. non-phonological item material are bifactorially manipulated.

The main purpose of Experiment 2 was to investigate which STM process, item memory or serial order memory, is most closely related to orthographic learning, the process by which a beginning reader stores the orthographic details of specific words. As the ability to decode unfamiliar written words into their spoken equivalent is the central means by which orthographic representations are acquired, Martinez Perez et al. (2012a) suggested that STM for serial order information could also be important for the acquisition of orthographic representations. Again the results were unambiguous. Only item STM was significantly related to orthographic learning, and only in the most sensitive test of orthographic learning, i.e., orthographic choice (see Staels and Van den Broeck, 2013). Serial-order STM capacity, on the contrary, did not show any positive relationship at all with orthographic learning. Congruent with these findings is the robust observation in both experiments that word reading ability is more strongly related to item STM than to serial order STM. Again, more research is needed to find out whether the nature of the stimulus material (phonological or not) influences this relationship.

It is important to note that in Experiment 2 we tested a novel prediction made by Martinez Perez et al. (2012a) concerning the role of serial order STM in orthographic learning, but no attempt was made to replicate their longitudinal study. Hence, our differing conclusions about the role of serial order STM may stem from the fact that we measured orthographic learning and serial order STM concurrently in already literate children, while Martinez Perez et al. (2012a) measured serial order STM in kindergarten and followed up the children for their reading ability at the end of first grade. Although there is ample evidence that (serial) phonological recoding constitutes the first step in orthographic learning (Share, 1995, 2008), it is possible that the role of serial order STM in orthographic learning is less pronounced in literate children than in beginning readers, because with increasing reading ability orthographic learning may depend more on already existing orthographic structures. However, the results of the two studies are probably more in accordance than at first sight appears. In the study of Martinez Perez et al. (2012a) serial order STM was not a stronger unique predictor of later non-word reading after controlling for item STM than item STM was after controlling for serial order STM (equal beta's). Only after additionally controlling for phonological awareness, serial order STM predicted somewhat more unique variance in non-word reading than item STM did, although the difference in the beta's (0.31 vs. 0.22) was not statistically tested and the proportion of explained unique variance (8%) was rather small. Clearly, more convincing empirical evidence is needed to sustain the hypothesis that serial order STM plays a substantial role in explaining reading (dis)ability.

### **REFERENCES**

Ahissar, M., Lubin, Y., Putter-Katz, H., and Banai, K. (2006). Dyslexia and the failure to form a perceptual anchor. *Nat. Neurosci.* 9, 1558–1564. doi: 10.1038/ nn1800


Microsoft Office PowerPoint 2007©. (2007). *Microsoft Corporation. Redmond, WA*.


*Forms A and B. A test for readability of pseu- dowords. Justification manual diagnostics and treatment]*. Nijmegen: Berkhout.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 December 2013; accepted: 31 August 2014; published online: 23 September 2014.*

*Citation: Staels E and Van den Broeck W (2014) Order short-term memory is not impaired in dyslexia and does not affect orthographic learning. Front. Hum. Neurosci. 8:732. doi: 10.3389/fnhum.2014.00732*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Staels and Van den Broeck. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX A**

Examples of items used in the Item short-term memory task.

/g u k/ /z i l/ /r a k/ /r i s/ /s o t/ /b i m/ /k e p/

## Age, dyslexia subtype and comorbidity modulate rapid auditory processing in developmental dyslexia

## *Maria Luisa Lorusso\*, Chiara Cantiani and Massimo Molteni*

*Unit of Neuropsychology of Developmental Disorders, Department of Child Psychopathology, Scientific Institute IRCCS "E. Medea", Bosisio Parini, Italy*

### *Edited by:*

*Pierluigi Zoccolotti, Sapienza University of Rome, Italy*

### *Reviewed by:*

*Paula Tallal, Rutgers University, USA Daniela Brizzolara, Stella Maris Developmental Neuroscience Institute, Italy*

### *\*Correspondence:*

*Maria Luisa Lorusso, Unit of Neuropsychology of Developmental Disorders, Department of Child Psychopathology, Scientific Institute IRCCS "E. Medea", Via Don Luigi Monza 20, Bosisio Parini, 23842, Italy*

*e-mail: marialuisa.lorusso@bp.lnf.it*

The nature of Rapid Auditory Processing (RAP) deficits in dyslexia remains debated, together with the specificity of the problem to certain types of stimuli and/or restricted subgroups of individuals. Following the hypothesis that the heterogeneity of the dyslexic population may have led to contrasting results, the aim of the study was to define the effect of age, dyslexia subtype and comorbidity on the discrimination and reproduction of non-verbal tone sequences. Participants were 46 children aged 8–14 (26 with dyslexia, subdivided according to age, presence of a previous language delay, and type of dyslexia). Experimental tasks were a *Temporal Order Judgment* (TOJ) (manipulating tone length, ISI and sequence length), and a *Pattern Discrimination Task*. Dyslexic children showed general RAP deficits. Tone length and ISI influenced dyslexic and control children's performance in a similar way, but dyslexic children were more affected by an increase from 2 to 5 sounds. As to age, older dyslexic children's difficulty in reproducing sequences of 4 and 5 tones was similar to that of normally reading younger (but not older) children. In the analysis of subgroup profiles, the crucial variable appears to be the advantage, or lack thereof, in processing long vs. short sounds. Dyslexic children with a previous language delay obtained the lowest scores in RAP measures, but they performed worse with shorter stimuli, similar to control children, while dyslexic-only children showed no advantage for longer stimuli. As to dyslexia subtype, only surface dyslexics improved their performance with longer stimuli, while phonological dyslexics did not. Differential scores for short vs. long tones and for long vs. short ISIs predict non-word and word reading, respectively, and the former correlate with phonemic awareness. In conclusion, the relationship between non-verbal RAP, phonemic skills and reading abilities appears to be characterized by complex interactions with subgroup characteristics.

**Keywords: developmental dyslexia, subgroups, rapid auditory processing, language impairment, dyslexia subtypes**

## **INTRODUCTION**

Developmental Dyslexia (DD) is defined as a specific disability in learning to read adequately despite at least normal intelligence, adequate instruction and socio-cultural opportunities, and the absence of sensory defects in vision and hearing (American Psychiatric Association, 1994). The prevailing views concerning the etiology of DD point to a deficit in encoding, representing and processing speech sounds (Snowling, 2001; Ramus et al., 2003; Ramus and Szenkovits, 2008). However, the question whether these difficulties reveal the core deficit of dyslexia or whether they are manifestations of a more general and basic auditory deficit is controversial.

According to Tallal's (1980) hypothesis, children with DD would be impaired in their ability to perceive auditory stimuli that have short duration and occur in rapid succession. Such a deficit at the auditory level could compromise the temporal analysis of speech at the phoneme level, and thus the building of correct phoneme representations. With such constraints, the development of language skills, both oral and written, would be difficult. Tallal and Piercy (1973a,b) revealed that children with Specific Language Impairment (SLI) have difficulties in discriminating between rapidly presented non-speech auditory stimuli, and in reproducing their order (discrimination and repetition tasks). Later, this hypothesis has been generalized to children with DD (Tallal, 1980). The procedure usually employed involves tasks that require discriminating between, or reproducing the order of, complex tones of varying frequency, manipulating both Inter-Stimulus Interval (ISI) and sound length. The difficulty in performing these tasks was interpreted by Tallal and colleagues as a deficit in extracting temporal information from short and rapid auditory stimuli, and it was referred to as deficient Temporal Auditory Processing (ATP) (Tallal, 1980). This interpretation was questioned by several researchers, who brought controversial evidence as to the exact nature of the deficit being linked to timing issues or rather to the analysis of complex stimuli (Rosen, 2003), spectral analysis and discrimination (Studdert-Kennedy and Mody, 1995), processing of stimulus streams and sluggish attentional shifting (Hari and Renvall, 2001; Lallier et al., 2012) or perceptual learning (Banai and Ahissar, 2009). As suggested by Mauk and Buonomano (2004), it is possible that in these tasks learning occurs as a result of interval-specific cognitive processes other than temporal processing *per se*; "for example, because interval discrimination tasks require comparing the test interval and a standard interval, improvement could rely on better representation of the standard interval or improved storage or retrieval from working or short-term memory" (Mauk and Buonomano, 2004, pp. 318). A prominent role of short-term memory in producing the results has also been proposed by Share et al. (2002), Tallal and Piercy (1973a,b). Heiervang et al. (2002) for instance, found that in longer trials (requiring to reproduce 3, 4, and 5 tones) children with DD made more errors compared to normally reading children. The temporal nature of the tasks has been subsequently toned down, and the hypothesis has been reworded as Rapid Auditory Processing (RAP), a definition leaving more space to different interpretations of the deficit. Despite extensive research effort, however, the specific nature of RAP problems remains ill-defined.

A first controversial point concerns the *speech-specific nature* of the deficits, as claimed by Studdert-Kennedy and Mody (1995), and Mody et al. (1997). Nowadays, the growing number of studies showing deficits concerning also the discrimination and reproduction of non-speech stimuli points toward a more general and basic auditory problem. Vandermosten et al. (2010, 2011)recently found a clear pattern in children and adults with DD, showing a temporal-specific deficit in both speech and non-speech categorization tasks. Nonetheless, the relationship between RAP deficits, phonemic awareness and reading is still a matter of debate (see Johnson et al., 2009; Malenfant et al., 2012).

A second controversy concerns the *selectivity* of the auditory processing deficit i.e., its being restricted to brief and rapidly presented stimuli. Tallal (1980) employing the Temporal Order Judgment task (TOJ) showed that children with DD performed worse than the control group in the identification of brief sounds (75 ms), but only for short Inter-Stimulus Intervals (ISIs) (8–305 vs. 428 ms). In support of the selectivity of the deficit in DD, Gaab et al. (2007) brought evidence of a disruption of cerebral regions specifically devoted to rapid auditory processing. Several results unequivocally consistent with the "restricted" RAP hypothesis have been reported, though the emphasis alternates between ISI and tone length, or both (Tallal, 1980; Reed, 1989; Heiervang et al., 2002; Cohen-Mimran and Sapir, 2007). On the other side, the findings of a second group of researchers support the hypothesis of a general auditory deficit, not restricted to short and rapid sounds (Marshall et al., 2001; Waber et al., 2001; Share et al., 2002; Bretherton and Holmes, 2003; Cantiani et al., 2010). As pointed out by Rosen (2003), group differences at long ISIs do not often emerge only because of ceiling performance. Other types of auditory processing have also been called into play, including processing of dynamic features of auditory stimuli, such as amplitude and frequency modulations (AM, FM) in the speech signal (Witton et al., 1998; Talcott et al., 2000) or sensitivity to longer time-scale patterns of intonation, rhythm and stress (Goswami et al., 2002; Pasquini et al., 2007; Thomson and Goswami, 2008; see Hämäläinen et al., 2012 for a review).

A last issue concerns the *predictive value* of measures of RAP with respect to reading and reading-related skills. Several studies found general correlations between different measures of impaired auditory processing and reading and/or phonological difficulties (Tallal, 1980; Witton et al., 1998; Ahissar et al., 2000; Marshall et al., 2001; Share et al., 2002; Hood and Conlon, 2004; Cohen-Mimran and Sapir, 2007). In many studies, however, correlations and/or predictive power were weak (Marshall et al., 2001; Share et al., 2002; Hood and Conlon, 2004) or non-significant (Reed, 1989; Heiervang et al., 2002). A recent study (Johnson et al., 2009) found that phonemic awareness predicts later RAP performance to a greater degree than the reverse. On the other hand, longitudinal studies in which behavioral and ERP responses to auditory stimuli had been recorded in newborn children with and without familial risk for language and reading disorders (Benasich and Tallal, 2002; Benasich et al., 2002; Leppänen et al., 2010) show that the infants' ability to discriminate temporal characteristics of the stimuli differs in the two groups (with and without risk) and predicts later language and reading-related skills: these findings at the very least rule out the hypothesis that RAP deficits are a consequence of reduced phonemic awareness. As a viable compromise, based on data from a longitudinal study, Boets et al. (2011) suggest a bidirectional relationship between auditory processing of non-speech stimuli and speech perception.

## **HOW SHOULD THIS WIDE HETEROGENEITY OF RESULTS BE EXPLAINED?**

Although the exact nature of the processes tapped by RAP tasks is a primary issue for research, the origin of the extreme variability in research findings, as described above, remains an interesting and still unanswered question. Various hypotheses have been proposed pointing to differences within the dyslexic population (McArthur and Bishop, 2001). In fact, only a subgroup of children with DD has often been found to be impaired in RAP tasks: Tallal (1980) found that only 8 (out of 20) children with DD had a clear deficit on the TOJ task. Similar within-group differences were found by Marshall et al. (2001; 4 of 17), Bretherton and Holmes (2003; 20 of 42), Ramus et al. (2003; 9 of 16), Banai and Ahissar (2004; 15 of 46), and Cohen-Mimran and Sapir (2007; 4 of 12).

First, *Age* has often been suggested to provide variability within the dyslexic group. Tallal (2000) claimed that only younger children with DD have RAP deficits, which may be explained by a maturational lag in the development of the auditory system (McArthur and Bishop, 2001; Wright and Zecker, 2004). The magnitude of this deficit is expected to diminish as children grow older: older children with dyslexia could have compensated the deficit, but only after it has compromised in a permanent way the quality of phoneme representations. Results in line with this hypothesis were found by Hautus et al. (2003), through a test of auditory temporal acuity (a gap-detection task).

Second, the *presence of language impairments* was hypothesized by several authors to be related to RAP performance (Tallal and Stark, 1982; Heath et al., 1999; Joanisse et al., 2000). In particular, Tallal and Stark (1982) did not find any tone processing deficits in reading-impaired children without concomitant oral language delay. Similarly, Heath et al. (1999) compared disabled readers with and without concomitant oral language delay in a TOJ task, and found a deficit only in the first group.

Finally, it was supposed that the RAP deficit affects only a subgroup of children with DD, based on *type of dyslexia*. Several studies suggest the existence of various subtypes of DD characterized by different cognitive and neuropsychological profiles and by different reading strategies (e.g., Bakker, 1973; Boder, 1973; Castles and Coltheart, 1993). More recently, the existence of markedly different cognitive profiles within the dyslexic population has been further confirmed (Ramus et al., 2003; Heim et al., 2008; Menghini et al., 2010). The main classification systems distinguish between dyslexic individuals with predominant difficulties in non-word reading and phonological tasks (these subtypes may be classified as L-types, phonological, dysphonetic dyslexics in Bakker's, Coltheart's and Boder's taxonomies, respectively), and dyslexic individuals who are mostly impaired in the access to the visual lexicon, as shown by their difficulties in whole-word recognition needed for reading irregular words (classified as P-types, surface or dyseidetic dyslexics). Consistent with Tallal's findings of a correlation between tone processing and non-word reading, it was assumed that only (or especially) phonological dyslexics would have a deficit in RAP (Cestnick, 2001). However, not all reports are consistent with this hypothesis: Lachmann et al. (2005) even found greater anomalies in children with dyseidetic DD compared to children with dysphonetic DD in a temporal processing task using event-related brain potentials (ERP).

Aim of the present study was to test the hypothesis that differences in age, dyslexia subtype and comorbidity with language impairments can be linked to different patterns of performance on RAP tasks. Although any subgrouping procedure may be seen as a reductive simplification of a complex, multi-factor picture (with much variability due to the specific tests and cut-offs used), our hypothesis was that a number of distinctions can highlight some crucial differences in the population. Specifically, based on previously reported findings, we expected that (a) the level of difficulty and thus of sensitivity of the different tasks would be modulated by age for children with DD in a possibly different way as compared to control children; (b) the presence of an additional language impairment would further hamper RAP performance, and (c) children with a phonological type of dyslexia would have worse RAP performance as compared to children with non-phonological dyslexia. The study is the first one, to our knowledge, to take into account all these variables in the same sample of dyslexic children. In order to avoid introducing new sources of variability in the results, only tasks that have been previously employed and well-described in the literature were used.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Forty-six children aged between 8 and 14 years participated in the study: 26 children with DD and 20 normally reading control children. The participants in the two groups were matched for gender and age. Parental consent was obtained after the purpose and procedures of the study had been explained. The study had been approved by the Ethics Committee of the Institute according to standards of the Helsinki Declaration (1964).

Children with DD included in the sample had been referred to the Unit of Cognitive Psychology and Neuropsychology of the institute because of learning difficulties. All children had been diagnosed as dyslexic based on standard inclusion and exclusion criteria (ICD-10; World Health Organization, 1992). Their performance in reading was two (or more) standard deviations below the mean in at least one of the agestandardized Italian reading tests included in the battery (word, non-word and text reading), and their non-verbal or performance IQ was above 85. Performance IQ was estimated by the Italian adaptation of the Wechsler Intelligence Scale for Children-revised (WISC-R; Wechsler, 1994) (*n* = 15), or Cattell's "Culture Free" test (Cattell, 1979) (*n* = 11). All children attended mainstream schools (as is usual in the Italian educational system), and none of them had started remediation programs at the time of participation in the study. Comorbidity with ADHD or other psychopathological conditions was excluded, based on standard diagnostic criteria (DSM-IV; American Psychiatric Association, 1994).

Control children were recruited in local schools. They all performed normally in a text reading task, and their performance IQ (Cattell's "Culture Free" test) was above 85. Participants' characteristics compared with unpaired *t*-tests are shown in **Table 1**. No group difference emerged in age, but there was a significant difference in performance IQ.

### *Subgroups*

Within the dyslexic sample, two subgroups based on the presence or absence of a previous language delay were created, after an accurate analysis of clinical records: all children had been diagnosed at the Institute following the same diagnostic protocols; thus, a detailed anamnestic record was available including in-depth information about language development. Inclusion criteria were previous diagnoses of Language Impairment (LI) (a diagnosis of LI is made when at least two scores on a standardized battery of receptive and expressive language are below 2 SDs with respect to age norms) and/or reports of significant delays (reported delays were considered significant if the main linguistic milestones were acquired with at least one year delay with respect to normal development) in early vocabulary and syntactic development, in addition or not to a history of speech and language therapy. Transient phonetic/articulatory difficulties without any additional linguistic problem were not considered sufficient for inclusion in the DD-LI group, even if speech therapy had been

**Table 1 | Participant characteristics (***p***-values indicating significant group differences are marked in bold).**


*aScores at WISC-R or Cattell's "Culture Free" test; bScores are expressed as Z-scores in the text reading task.*

delivered. Based on these elements, it was ascertained that 10 children had had a clear previous language impairment (LI) and 11 children had never presented linguistic difficulties (noLI). For five of the children, available information was not sufficient to decide on the presence of a previous linguistic impairment, so they were not classified and not included in the analysis. Hearing tests had been performed for all children with a former diagnosis of SLI as part of the diagnostic procedure. For dyslexic participants, hearing tests were performed anytime there was a reason to suspect that a hearing problem may be present (based on parents' reports or on the clinicians' assessment). Only children for whom no report of a hearing loss was recorded were included in the study.

Further, two subgroups based on type of dyslexia were created. This division was based on the difference in accuracy (z-scores) between word and non-word reading (with at least 0.5 difference in z-scores)1. This procedure is similar, although not identical, to the regression procedure suggested by Castles and Coltheart (1993) and followed by Ziegler et al. (2008) and Peterson et al. (2013), to select "relative phonological" and "relative surface" dyslexics. Children with "phonological DD" performed worse when reading non-words, while children with "surface DD" performed worse when reading words. Accordingly, a total of 10 children were assigned to the subgroup of phonological DD, 12 were assigned to the subgroup of surface DD and 4 could not be classified.

**Table 2** shows the combinations of z-scores expressing accuracy and speed in reading words and non-words, for each participant. It can be easily seen that the great majority of children had difficulties with both kinds of stimuli, and that a "relative" rather than a "pure" subtype classification is the best choice. As to accuracy vs. speed scores as the basis for classification, it can be seen that both variables would allow to identify (largely but not completely overlapping) subgroups with similar numbers of participants. It was decided to use accuracy rather than speed scores based on previous studies in which a subdivision according to accuracy scores highlighted strong and reliable differences in visual and auditory attention (Facoetti et al., 2010; Franceschini et al., 2012), i.e., in low-level processing skills.

Subgroup characteristics and one-way ANOVA comparisons are shown in **Table 3**. Subgroup comparisons reflected the inclusion criteria in each subgroup (see comparison for non-word reading accuracy in the type-of-dyslexia subgroups). All subgroups resulted comparable for IQ. Generally, children with DD-noLI had lower performances in reading and reading-related tasks than children with DD+LI (these differences reached significance for reading accuracy and short-term memory scores). No significant differences emerged in overall reading and readingrelated tasks when comparing subgroups based on type of dyslexia (except for the difference in word reading accuracy

scores). However, an interesting pattern emerged for the phonemic awareness scores, that was further explored. A repeatedmeasure ANOVA was performed, entering type of phonemic awareness task (see following section for a description) as withinsubject factor (phoneme deletion vs. phonemic blending) and type of dyslexia as between-subject factor. A significant interaction between type of phonemic awareness task and type of dyslexia was found, *<sup>F</sup>*(1, 20) <sup>=</sup> <sup>5</sup>.36, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05; <sup>2</sup> *<sup>p</sup>* = 0.211, suggesting different phonemic awareness difficulties in the two subgroups. Namely, children with phonological dyslexia had similar performances in the two phonemic awareness tasks, whereas children with surface dyslexia were more impaired in phonemic blending than in phoneme deletion [*F*(1, 11) = 21.47, *p* = 0.001; 2 *<sup>p</sup>* = 0.661].

The distribution of dyslexia subtypes in the two groups with/without previous language delay was not significantly different, <sup>χ</sup>2(1, *<sup>N</sup>* <sup>=</sup> 19) <sup>=</sup> 1.269, *<sup>p</sup>* <sup>&</sup>gt; <sup>0</sup>.05.

### **TASKS**

### *Reading and reading-related tasks*

*Reading skills* were assessed through two different tasks:


Two *phonemic awareness* tests were taken from an unpublished battery (Cossu et al., 1988):


Scores for each test are expressed as number of errors on 20 words. Only age means and cut-off scores are provided as normative data.

*Short-term memory* was assessed by a digit span subtest, comprising Digits Forwards and Digits Backwards. For the children who had undergone intelligence testing with the WISC-R, the weighted score of the Digit Span subtest was recorded. For the other participants, digit span was assessed by a subtest of TEMA (Test di Memoria e Apprendimento, an Italian adaptation of TOMAL, Reynolds and Bigler, 1994), and recorded as z-scores.

## *Experimental tasks*

The experimental tasks were created based on well-established protocols described in the literature. Moreover, they had already been used in a previous study by the authors (Cantiani et al., 2010). Only nonspeech processing skills were addressed, in order

<sup>1</sup>Although it is sometimes argued that mistakes are very rare in transparent orthographies (e.g., Landerl et al., 1997), it was clearly shown that children with DD do make mistakes (though less numerous than is described for opaque languages), even in a language with a very consistent orthography as Italian (Brizzolara et al., 2006; Menghini et al., 2010).


**Table 2 | z-scores for word and nonword reading for each participant (DD group).**

to avoid direct influences from linguistic or phonological deficits. All stimuli were digitally generated using Praat software (www. praat.org) and were presented to each child on an ASUS computer by means of E-prime Experiment Generator and Controller software (Schneider et al., 2002).

The *Temporal Order Judgment (TOJ) task* (Tallal and Piercy, 1973a,b; Tallal, 1980) was constructed using two complex tones composed of frequencies within the speech range. The two tones differed in the fundamental frequency (Fo = 100 Hz for the low tone and Fo = 305 Hz for the high one), and tone duration for both tones was either 75 or 250 ms. Children were instructed to indicate the order of the tones after each trial by pressing a yellow key for the "low" tone and a blue key for to the "high" tone. The same experimental paradigm was used in two different tasks: a Rapid Temporal Order Judgment (Rapid-TOJ) task, in which ISI was manipulated, and a Temporal Order Judgment Memory (TOJ-Memory) task, in which the number of elements to keep in memory was manipulated.

In the *Rapid Temporal Order Judgment (Rapid-TOJ) task*, stimulus pairs were created by pairing the two stimuli in all four possible combinations (AA, AB, BB, BA) with different insterstimulus intervals: 8, 15, 30; 60, 150, 305, and 428 ms, and presented randomly. A short training with visual and verbal feedback was given to familiarize the children with the task. First, each tone was demonstrated separately seven times, and participants had to answer by pressing the corresponding key. Then, single tones were presented in random order. This training was continued for a maximum of 48 trials or until a criterion of 20 correct responses in a series of 24 consecutive stimuli was reached (*p* < 0.001 Binomial Test). In the last phase of training, participants were trained to respond to each of the four possible stimulus patterns by pressing the keys in the correct order. There were four demonstrations by the experimenter, followed by eight trials in which participants responded independently. An inter-stimulus interval (ISI) of 428 ms was employed during training. After the training session, 24 similar trials were given without feedbacks. Children were then tested on twoelement stimulus patterns with ISIs of 8, 15, 30, 60, 150, and 305 ms. Each subject received a total of 24 two-element patterns, four for each ISI, with random presentation order. This training and testing procedure was carried out twice, once for each of the two stimulus durations: 75 and 250 ms (presentation order for the two blocks was balanced across participants).

In the *Temporal Order Judgment Memory (TOJ-Memory) task,* stimulus sequences consisted of four and five elements created as random combinations of the two complex tones, with a fixed ISI of 428 ms. Two different blocks were presented in


### **Table 3 | Subgroup characteristics (***p***-values indicating significant group differences are marked in bold).**

*aAge in months; bScores on WISC-R or Cattell's "Culture Free" test: cGlobal scores were created considering both scores on word and non-word reading and on text reading. Mean scores expressed as Z-scores were calculated separately for accuracy and speed; <sup>d</sup> Z-scores; eraw scores (number of errors); <sup>f</sup> Z-scores in the word and non-word reading tasks.*

a counterbalanced order: one included 10 four-tone sequences and one included 10 five-tone sequences. Both blocks were preceded by training including one trial demonstrated by the experimenter, and three trials in which children responded independently and feedback was given. The whole training and testing procedure was carried out twice, once for each of the two stimulus durations: 75 and 250 ms (order counterbalanced across participants). Data from 2-stimulus series with 428 ms ISI from the Rapid-TOJ task were included in the analyses of the TOJ-Memory task, so as to increase the range of sequence lengths and analyze memory effects on performance (stimulus sequences of two, four and five elements).

In the *Pattern Discrimination Task* (adapted from Kujala et al., 2000) a simple behavioral procedure was adopted, requiring the children to discriminate four-tone rhythmic patterns. The stimulus patterns consisted of four synthetically generated tones (500 Hz in frequency and 30 ms in duration) separated by different ISIs (50 ms; 150 ms; 200 ms). Two different stimulus patterns (rhythms) were created by changing the order of the ISIs, and separately recorded on audio files:



The two rhythms were paired in all four possible combinations (AA, AB, BB, BA) with 700-ms intervals. The children listened to the pairs of rhythms and were requested to say whether the two rhythms were equal (50%) or different (50%). The answers were recorded by the experimenter by pressing different keys on the computer keyboard. During the testing phase a fixation point was shown on the computer screen. The task, composed of 24 trials, was preceded by two different training phases: a passive training including 8 trials (two for each combination) demonstrated by the experimenter, and an active training including 4 trials (one for each combination), in which participants responded independently and feedback was given.

### *Apparatus and procedures*

All testing was conducted individually in a quiet room. Experimental and reading tasks were presented in a single session, with a total duration of about 1 h and a half. Task sequence was counterbalanced within participants to control for fatigue effects. The stimuli of the RAP tasks were presented binaurally through headphones (Sennheiser HD270) with an intensity of approximately 60 dB. All the responses were recorded via the computer keyboard.

### **RESULTS**

### **DATA ANALYSES**

In the first part of this section, the results of the whole sample of dyslexic participants on the three tasks are compared with those of control participants. Two separate ANOVAs (repeated measures GLM) were performed for the two different TOJ tasks (Rapid-TOJ, TOJ-Memory), and a univariate ANOVA was performed for the Pattern Discrimination task, considering the mean percentage of correct answers. In all ANOVAs, Group (dyslexic vs. control participants) was entered as between-subject factor and Age as a covariate, while within-subject factors differed according to the specific task. For the Rapid-TOJ task, accuracy on trials with short ISIs (from 8 to 30 ms inclusive) were compared with accuracy on trials with longer ISIs (60–428 ms inclusive). The cut-off was set at 40 ms as this time frame was suggested to be crucial for speech discrimination (Fitch et al., 1997). This subdivision was similar to that used in previous studies (e.g., Heath et al., 1999; Cohen-Mimran and Sapir, 2007). The adequacy of this 40 ms cut-off was empirically confirmed by a preliminary analysis on the single ISI values. Due to the significant difference in performance IQ between groups (control children had higher IQs), Pearson's bivariate correlations were first computed between performance-IQ measures and all experimental variables. Since significant correlations (*ps* < 0.05) were found for the TOJ-Memory task, IQ was entered as a covariate for this task. The "Delaney-Maxwell" method was applied to both IQ-scores and Age, in order to center the mean of the covariates, thus avoiding distortions of the main effects (Delaney and Maxwell, 1981). Specifically, the measure used as a covariate was the deviation of each individual score with respect to the mean score in the whole sample.

The second part of this section will focus on the dyslexic group, subdivided according to the presence/absence of a previous language impairment, and to type of dyslexia. Again, separate ANOVAs were performed for the three tasks, considering the mean percentage of correct answers. For each task, two different ANOVAs were performed, first with Language (presence vs. absence of language delay) and then with Type of Dyslexia (surface vs. phonological dyslexia) as between-subject factors. The results of control participants will be shown in the graphs as a reference point, but will not be included in the analyses. Due to the high correlations between the three tasks (all *ps* < 0.001), no statistical corrections were employed to adjust for multiple analyses. One-tailed *p*-values are reported (as specified in the text) when clearly unidirectional hypotheses were considered. Twotailed *p*-values are to be intended when not otherwise specified. Due to the limited number of participants in each subgroup, all analyses showing significant differences were repeated with nonparametric statistics, and only the results that were confirmed by nonparametric tests are reported here.

Finally, correlations between measures of RAP variables and reading, phonemic awareness and short-term memory scores will be illustrated, both concerning the whole sample and the dyslexic group.

## **COMPARING CHILDREN WITH/WITHOUT DYSLEXIA** *Rapid-TOJ task*

In addition to the described between-subject factor Group and the covariate Age, two within-subject factors were considered: Stimulus Length (75 vs. 250 ms) and Interstimulus Interval (ISI) (short ISIs: 8–30 ms vs. long ISIs: 60–428 ms). The main effect of Group reached statistical significance, *F*(1, 42) = 6.13, *p* < 0.01 [1-tailed]; <sup>2</sup> *<sup>p</sup>* = 0.127, with fewer correct responses for the dyslexic group compared to the control group. Significant effects were found for Stimulus Length, *F*(1, 42) = 18.80, *p* < 0.001; <sup>2</sup> *<sup>p</sup>* = 0.309, and ISI, *F*(1, 42) = 29.77, *p* < 0.001; <sup>2</sup> *<sup>p</sup>* = 0.415. A close-to-significance interaction (Stimulus Length <sup>×</sup> ISI: *<sup>F</sup>*(1, 42) <sup>=</sup> <sup>3</sup>.97, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.053; <sup>2</sup> *<sup>p</sup>* = 0.086) indicated a general greater difficulty associated with the processing of short and rapid sounds. Finally, an interaction ISI × Group <sup>×</sup> Age emerged, *<sup>F</sup>* <sup>=</sup> 6, 70, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05, <sup>2</sup> *<sup>p</sup>* = 0.138. This interaction is illustrated in **Figure 1**. No further interactions were found of any variable with either Group or Age (all *ps* > 0.05).

### *TOJ-Memory task*

In addition to Group and Age, two within-subject factors were considered: Stimulus Length (75 vs. 250 ms) and Sequence Length (2 vs. 4 vs. 5 elements). A significant main effect emerged for Group, *<sup>F</sup>*(1, 41) <sup>=</sup> <sup>14</sup>.93, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001 [1-tailed]; <sup>2</sup> *<sup>p</sup>* = 0.267. Moreover, a significant interaction between Group and Sequence Length was found, *<sup>F</sup>*(2, 82) <sup>=</sup> <sup>5</sup>.53, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.005 [1-tailed]; <sup>2</sup> *p* = 0.119, due to a worse drop in performance from the 2-tonesequences to the 5-tone-sequences for children with DD as compared to control children. A further interaction with Age [Group × Age × Sequence Length, *F*(2, 82) = 2.78, *p* < 0.05 [1-tailed]; 2 *<sup>p</sup>* = 0.064), indicates different performance patterns within the dyslexic group, as shown in **Figure 2**.

### *Pattern discrimination task*

Similarly to the results obtained in the TOJ tasks, the main effects of Group, *<sup>F</sup>*(1, 45) <sup>=</sup> <sup>28</sup>.42, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001 [1-tailed]; <sup>2</sup> *<sup>p</sup>* = 0.404, and Age, *<sup>F</sup>*(1, 45) <sup>=</sup> <sup>5</sup>.25, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05, <sup>2</sup> *<sup>p</sup>* = 0.111, but not the interaction between Group and Age, reached statistical significance.

### **COMPARING SUBGROUPS OF CHILDREN WITH DD**

No main effects of the subgroup divisions (Language and Type of Dyslexia) emerged in any task (all *ps* > 0.2). However, significant interactions were found, that will be presented separately for the Rapid-TOJ task and the TOJ-Memory task. **Figure 3** shows the distribution of the main variables in the subgroups, and outliers for each group.

### *Rapid-TOJ task*

Concerning the subdivision *Presence/absence of a previous language delay*, a significant interaction emerged between Language and Stimulus Length, *<sup>F</sup>*(1, 19) <sup>=</sup> <sup>4</sup>.85, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05; <sup>2</sup> *<sup>p</sup>* = 0.204. As **Figure 4** shows, children with DD+LI, similarly to control children, performed worse when sounds were shorter (*M* = 0.585;

*SD* = 0.188) and better when sounds were longer (*M* = 0.755; *SD* <sup>=</sup> <sup>0</sup>.208), *<sup>F</sup>*(1, 9) <sup>=</sup> <sup>7</sup>.09, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05; <sup>2</sup> *<sup>p</sup>* = 0.441, while children with DD-noLI had similar performances in the two conditions (shorter sounds: *M* = 0.708; *SD* = 0.226; longer sounds: *M* = 0.717; *SD* = 0.213), *F*(1, 10) = 0.062, *p* > 0.05. As compared to controls, a difference approaching statistical significance was found for children with DD+LI in the 75-ms-tone condition, *<sup>F</sup>*(1, 30) <sup>=</sup> <sup>3</sup>.919, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.058, <sup>2</sup> *<sup>p</sup>* = 0.123 while a significant difference for children with DD-noLI was found only in the long-tone condition, *<sup>F</sup>*(1, 31) <sup>=</sup> <sup>6</sup>.803, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05 ( <sup>2</sup> *<sup>p</sup>* = 0.190). No other significant interactions emerged. Concerning the subdivision *Type of dyslexia*, no significant interactions were found (all *ps* > 0.1).

## *TOJ-Memory task*

Concerning the subdivision *Presence/absence of a previous language delay*, a significant interaction between Language and Stimulus Length was found, *F*(1, 19) = 9.00, *p* < 0.01; 2 *<sup>p</sup>* = 0.321. As **Figure 5** shows, children with DD+LI, comparably to control children, performed worse when sounds were shorter (*M* = 0.365; *SD* = 0.218) and better when sounds were longer (*M* = 0.517; *SD* = 0.218), *F*(1, 9) = 9.58, *p* < 0.05 ( 2 *<sup>p</sup>* = 0.516), while children with DD-noLI had similar performances in the two conditions (shorter sounds: *M* = 0.562; *SD* = 0.206; longer sounds: *M* = 0.538; *SD* = 0.208), *F*(1, 10) = 0.51; *p* > 0.05. Compared to the control children, significant differences were found in both conditions for children with DD+LI [75-ms-tone condition, *<sup>F</sup>*(1, 30) <sup>=</sup> <sup>15</sup>.239, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.001, <sup>2</sup> *<sup>p</sup>* = 0.352; 250-ms-tone condition, *<sup>F</sup>*(1, 30) <sup>=</sup> <sup>12</sup>.90, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.001, <sup>2</sup> *<sup>p</sup>* = 0.315), while a significant difference for children with DD-noLI was found only in the long-tone condition, *F*(1, 31) = 12.15, *p* < 0.01, 2 *<sup>p</sup>* = 0.295. No other significant interactions emerged. As shown in **Figure 3**, two outliers may be identified in the DD-noLI subgroup processing short sounds. Yet, after excluding these subjects in the main ANOVA, the interaction Language × Stimulus Length remains significant, *<sup>F</sup>*(1, 17) <sup>=</sup> <sup>8</sup>.45, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.01; <sup>2</sup> *<sup>p</sup>* = 0.332.

Concerning the subdivision *Type of dyslexia*, the only significant interaction concerned Type of Dyslexia and Stimulus Length, *<sup>F</sup>*(1, 20) <sup>=</sup> <sup>4</sup>.49, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05; <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.184. In this case, children with surface DD (similarly to children with DD+LI and to controls) performed worse with shorter (*M* = 0.449; *SD* = 0.193) than with longer tones [*M* = 0.572; *SD* = 0.210, *F*(1, 11) = 7.041, *p* < 0.05, <sup>2</sup> *<sup>p</sup>* = 0.390], while children with phonological DD performed similarly in the two conditions (shorter sounds: *M* = 0.470; *SD* = 0.277; longer sounds: *M* = 0.453; *SD* = 0.229), *F*(1,9) = 0.127, *p* > 0.05, as **Figure 6** shows. Although *post-hoc* analyses do not show any significant differences between subgroups, both subgroups differ significantly from controls in both conditions (all *ps* < 0.05, <sup>2</sup> *<sup>p</sup>* ranging between 0.189 and 0.401). Also in this case, two outliers may be identified in the subgroup of children with surface DD in the processing of long sounds (see **Figure 3**). Again though, when repeating the main ANOVA without these participants, the interaction Type of Dyslexia x Stimulus Length keeps its significance, *F*(1, 18) = 5.69, *p* < 0.05; 2 *<sup>p</sup>* = 0.240.

### **CORRELATIONS BETWEEN MEASURES OF RAP AND READING/READING-RELATED VARIABLES**

Calculation of correlations was performed for the two groups of dyslexic and normally reading children separately, in order to avoid spurious effects of reading ability (see also Rosen, 2003). To reduce the number of correlations to be computed, compound scores were considered for the RAP variables, including: average of all Rapid-TOJs; average of all Memory-TOJs; average of all TOJs (irrespective of sequence length), average of all TOJs subdivided for length (75 and 250). Considering the results of the previous analyses, two new variables were purposely computed, expressing the difference between accuracy scores with 250 and with 75 ms tones (i.e., the advantage for processing long tones) in all the TOJ tasks (Long-short tones) and the difference between long and short ISIs (Long-short ISIs)2 . Pattern discrimination scores were also included in the analysis. A first interesting result is the absence of correlations in the control group between RAP variables and age, whereas strong correlations emerged in the group with dyslexia. Since no correlation was found in the DD group between RAP variables and IQ, nor did any correlations emerge between age or IQ and reading scores expressed as z-scores (all *rs* < 0.3), Pearson's bivariate correlations were computed including z-scores. The "Long-short tones" difference variable showed the strongest correlations with reading variables (but no correlations with age and IQ), namely with Text and Nonword reading accuracy (*r* = 0.402 and 0.399 respectively, *p* < 0.05) and with overall Nonword reading ability (average of speed and accuracy z-scores) (*r* = 0.554, *p* < 0.005). This variable also showed a correlation with Phoneme deletion (raw score, *r* = −0.438, *p* < 0.05), which was confirmed also when partialling out the effect of Age (*r* = −0.413, *p* < 0.05). Phoneme deletion, in turn, correlated with Pattern discrimination scores, albeit at a very moderate level (*r* = −0.380, *p* = 0.07). Significant correlations emerged also between phonemic blending and word reading speed and accuracy (*r* = 0.451, *p* < 0.05 and *r* = 0.507, *p* < 0.01, respectively) and between phoneme deletion and nonword reading speed and accuracy (*r* = 0.445, *p* < 0.05 and *r* = 0.433, *p* < 0.01, respectively). Correlations with the ISI-related variable (Long-short ISIs) did not reach significance, but it is noteworthy that almost all correlations with reading variables are negative ones, i.e., contrary to what happens with tone length, high sensitivity to differences in ISI predicts lower reading performances. A moderate correlation between the two differential variables was found only for control children (*r* = 0.446, *p* = 0.048).

Multiple linear regressions were additionally performed to further explore the relationship between RAP, reading and reading related measures. A regression analysis (backward method) based on the results of the correlation analysis allowed to predict

<sup>2</sup>This variable was computed within each subject by regressing ln(ISI) onto performance and using predicted performance for the maximum and the minimum ISI. Then, the score related to the minimum ISI was subtracted from the score related to the maximum ISI.

Nonword reading ability (average of speed and accuracy z-scores) through Phoneme deletion, total Pattern discrimination and the "Long-short tones" difference (entered together with Age and Phonemic blending, which showed no effect on the depending variable). The model was highly significant, *F*(3, 22) = 5.583, *p* = 0.006 and explained 42% of the variance (30% was explained by the "Long-short tones" difference, 12% by Phoneme deletion, and 1% by Pattern discrimination scores). The best predictive model (*<sup>F</sup>* <sup>=</sup> 3.506, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.047, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.234) for Word reading scores included phonemic blending and the difference between short and long ISIs (both with negative coefficient), that explained, respectively, 14.5 and 8.9%, of the variance. A general reading

**FIGURE 4 | Mean proportion of correct answers in the Rapid—TOJ task for the subgroups divided according to Language.**

score expressing the average of word, nonword and text reading, including speed and accuracy, was best predicted (*F* = 5.346, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.012, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.317) by both differential scores concerning ISIs (15.8%) and stimulus length (15.9%). Phonemic blending was nonsignificant and added less than 4% to the variance. On the other hand, Phoneme deletion scores could be predicted by "Long-short tones" difference (accounting for 19% of variance) and Pattern discrimination scores (accounting for 13% of variance) - whereas Age had no effect on the model—*F*(2, 23) = <sup>5</sup>.195, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.014, *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>.31. No predictive model was found for Phonemic blending scores.

### **DISCUSSION**

The aim of the present study was to investigate Rapid Auditory Processing (RAP) abilities in Italian children (reading a regular orthography) with DD, subdivided according to age, presence/absence of a previous language delay, and subtype of dyslexia, following the hypothesis that differences in performance

among subgroups could explain part of the heterogeneity of results described in the literature concerning RAP.

First of all, the presence of a general auditory processing deficit concerning non-verbal stimuli in Italian children with DD was confirmed, extending the finding obtained in a smaller sample of Italian children with DD (Cantiani et al., 2010), and replicating recent findings concerning other languages with consistent orthographies (Georgiou et al., 2010; Landerl and Willburger, 2010). Children with DD performed worse than their matched controls both in reproducing the order of pairs and sequences of tones, and in judging the equality of 4-tone rhythms. The task that seems to better discriminate between control children and children with DD is the Pattern Discrimination task. In order to exclude that the reading difficulties themselves could be the cause of a suboptimal development of these skills (see Rosen, 2003; Ziegler et al., 2009), a further comparison with a Reading Level (based on Text reading speed) matched group was performed, confirming the presence of a significant difference between children with DD (*n* = 9) and (younger) control children (*<sup>n</sup>* <sup>=</sup> 9), *<sup>F</sup>*(1, 17) <sup>=</sup> <sup>5</sup>.40, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05 [2-tailed]; <sup>2</sup> *<sup>p</sup>* = 0.268 (with IQ as covariate).

Specifically concerning the TOJ-Memory task, all participants showed a drop in performance with sequences of 5 sounds compared to sequences of 4 sounds, but children with DD performed worse with the longest sequences, compared to controls. A similar result was obtained by Heiervang et al. (2002), and might point to the role of auditory short-term memory. The influence of short-term memory on reading acquisition is largely supported (Ackerman et al., 1990; Kibby et al., 2004). Moreover, some authors consider short-term memory, and in particular working memory, as a crucial factor influencing performance in RAP tasks (Banai and Ahissar, 2004), also when only two sounds are presented.

*Age* plays a relevant role in the Rapid-TOJ task (see **Figure 1**). Its effects can be seen in control children only for the most difficult condition, i.e., with the shortest ISIs. By contrast, in children with DD performance improves with age in both conditions, but particularly so in the easier one (longer ISIs) where children with DD finally reach the level of control children. Similarly, in the TOJ-Memory task (see **Figure 2**) control children show improvements with age in the most difficult conditions (with 4 and 5-tones), but not in the easiest condition (2-tones), probably due to a ceiling effect. Conversely, in children with DD age affects performance in the 2 and 4-tone conditions, but not in the most difficult one (5 tones), which shows a relative floor effect. These results, similar to those found by Hautus et al. (2003) and Tallal (2000), may suggest a relative compensation of RAP deficits in children with DD, with difficulties appearing only when the complexity of the task increases, either through short-term memory load or through faster presentation rates. This may reflect an anomalous or slowed development of short-term memory functions in children with DD. For instance, Nicolson et al. (1992) found in 15-year-old children with DD only a lack of fluency in articulation and slight deficits in memory span, while deficits in phonological processing were no longer detectable.

The results obtained by subdividing children with DD on the basis of the *presence/absence of a language delay* are partly in line with previous findings such as those reported by Tallal and Stark (1982), Heath et al. (1999), and Joanisse et al. (2000), who found auditory processing deficits only in reading-impaired children with a concomitant oral language delay. Indeed, in our sample children with DD and a previous language delay did obtain the lowest scores in RAP measures. However, they showed a pattern of performance similar to that of normally reading children, yielding lower accuracy scores with shorter stimuli. On the other hand, children with DD-only had a more mildly impaired, but more anomalous performance pattern, not showing the advantage for longer stimuli found in the other subgroups. A possible explanation calls into play the effect of language impairment on the use of cognitive strategies during RAP tasks. Indeed, children with DD and a previous language impairment are often characterized by problems with lexical access (Bishop et al., 2009; Chilosi et al., 2009). Following Bretherton and Holmes (2003), performance on RAP tasks may be facilitated by the use of verbal labels to characterize and more easily distinguish or recognize the different sequences, but children with impaired lexical access may have difficulties in establishing and retrieving verbal labels for the tone sequences quickly and accurately. By contrast, dyslexiconly children seem to be characterized by a phonemic awareness deficit—indeed, their performance in phoneme deletion is almost significantly worse than that of children with a previous language impairment (see **Table 2**). This hypothesis is in line with the findings in English-speaking children with DD with and without oral language impairments, showing that the latter are characterized by more severe phonological deficits and the former by impairment in broader language abilities (Bishop and Snowling, 2004). Nonetheless, the greater impairments in phonemic awareness and verbal memory found in DD-only children does not produce greater impairments in RAP tasks (the hypothesis that RAP deficits are simply a consequence of phonemic awareness deficits also contrasts with data from longitudinal studies such as Benasich and Tallal, 2002; Leppänen et al., 2010). This suggests that other deficits, not related to phonemic awareness and to verbal memory, and also not expressing an effect of worse reading skills (DD+LI children have better reading scores than DD-noLI) must constitute the basis for the significantly lower performance on RAP tasks. Such abilities might have to do with lexical skills, which are a characteristic of SLI children, with and without dyslexia, as shown in many studies (e.g., Chilosi et al., 2009; Nation, 2014).

The present results from children subdivided according to presence/absence of a previous language delay need further explanation. First of all, the language-impaired subgroup does not show lower performances in phonemic awareness and reading scores as usually described in the literature concerning opaque languages such as English (Catts et al., 2005; Ramus et al., 2013). Instead, these children show similar or even better performances with respect to the subgroup without previous language delay. This finding is consistent with other studies on Italian children (Brizzolara et al., 2006; Scuccimarra et al., 2008; Chilosi et al., 2009) where language-impaired children with DD showed no clear disadvantage in reading measures compared to dyslexiconly children, and it may suggest a reduced impact of linguistic deficits on learning regular orthographies. As to the relationship with RAP, it may be hypothesized that a milder but more pervasive deficit as that observed in dyslexic-only children has a greater impact on reading (possibly—as shown by correlation scores—through its effects on phonemic awareness) than a more severe but more "normally modulated" deficit as is observed in children with DD and a previous language delay. Also the inclusion of children for whom a language disorder was present in the past but was then resolved or compensated may have led to unexpected results. Indeed, the presence of early delays in language development that are compensated or resolved before school age is a common report for many children with dyslexia (e.g., Scarborough and Dobrich, 1990; Stothard et al., 1998), and it may well be associated to less severe reading impairment as compared to children with persistent (and probably more pervasive) language disorders. Additionally, published studies on RAP deficits in children with language impairment mostly fail to distinguish children with/without a concomitant reading disorder (e.g., Tallal and Piercy, 1973a,b), and the same is true for studies investigating dyslexic children (e.g., Marshall et al., 2001; Bretherton and Holmes, 2003, etc.), without extending the analysis to (previous and/or concomitant) language abilities—so that previous characterizations of the various subgroups may have been confounded.

Results from the subdivision according to *type of dyslexia* partially support Tallal's findings of a correlation between tone processing and non-word reading (Tallal, 1980). However, we did not find that RAP impairment was specific of children with phonological DD, as was reported by Cestnick (2001). In our sample, the main difference between children with phonological and surface DD relates to sound length (processing longer vs. shorter sounds). In fact, only children with surface DD (similarly to children with DD and previous language delay, and to a certain extent to controls) improved their performance with longer stimuli, showing a more severe difficulty with short sounds. By contrast, children with phonological DD performed similarly in the two conditions. The lack of advantage in recognizing words—which was the criterion for defining these children as "surface" dyslexic—may thus be related to less efficient lexical access, similar to what was suggested for dyslexic children with previous language impairment. In both cases, generally reduced performance on RAP tasks with an otherwise "normal" performance pattern may result from impaired use of cognitive, lexical strategies to facilitate the task. What remains to be explained is the anomalous performance pattern in so-called "phonological dyslexic" children, who have more difficulties in reading nonwords.

Indeed, the interaction found in the present sample between type of phonemic awareness task and type of dyslexia (see the paragraph on the characterization of the subgroups) suggests that the equation between impaired nonword reading and low phonemic awareness skills is a too simplistic one. The direct link between RAP scores and nonword reading (correlations and regression models), and the indirect correlations emerging between nonword reading and phoneme deletion, and between phoneme deletion and pattern discrimination point to a specific bridge between RAP (the advantage in processing long over short tones and rhythmic pattern analysis), the ability to analyze and manipulate (but not to blend) phonemic strings, and nonword reading ability, the latter ability being relatively preserved in surface dyslexic children. In spite of the label of Phonological dyslexia, thus, poor nonword reading does not imply lower performance on phonemic awareness tasks [overall performance on phonemic awareness tasks in this group does not differ at all from that of surface dyslexic children, *p* > 0.99; see (Ziegler et al., 2008) and (Sprenger-Charolles et al., 2000) for similar findings] but rather a *different* type of impairment on phonemic tasks. The role of phoneme deletion as opposed to phonemic blending and its relationship with nonword reading is an interesting one: the ability to analyze and manipulate single phonemes producing nonwords from words (the result of the deletion process) is more relevant to the possibility to handle non-lexical phonemic sequences such as nonwords. Phoneme blending, by contrast, appears more strictly related to the ability to recognize strings of sounds as meaningful units, namely as lexical entries such as words. The importance of phoneme deletion in explaining nonword reading had already been highlighted by (e.g., Pasquini et al., 2007). Nonetheless, one could have expected phonemic blending to predict nonword reading as well, being one of its constituent processes: the absence of such effect calls into play the twofold nature of nonword reading as a phonological and visual task. Indeed, strong relationships have been demonstrated between visual-spatial attentional deficits and nonword reading in Italian children with DD (Facoetti et al., 2006, 2010; Franceschini et al., 2012) suggesting that reading ordered strings of letters not only requires phonological ability, but also visual-spatial attentional skills. Furthermore, attentional deficits in the visual and auditory modality seem to follow similar pathways (Facoetti et al., 2003, 2010). It may be thus be hypothesized that the children showing more problems in the auditory modality correspond to DD+LI (languageimpaired, here showing the lowest scores on RAP tasks), while children with phonological dyslexia (who show both the lowest scores and the most anomalous pattern on RAP tasks) may suffer from problems involving both the auditory and the visual modality (thus possibly more pervasive and with a more severe expression).

As a conclusion, the present findings are best interpreted within a multifactor model of dyslexia (see Pennington, 2006; Boets et al., 2007), which takes into account the effects of variables related to developmental trends and specific neuropsychological profiles. Further, it may suggest that other variables, especially related to verbal memory, lexical processing and attentional functions play a role in modulating the relationship between auditory temporal processing of nonverbal stimuli, phonemic skills and reading abilities. Since lexical access and attention have not been directly measured in the present study, their involvement follows from more speculative reasons and deserves further investigation. The lack of advantage for processing longer stimuli seems to be a very crucial issue, possibly indicating a more pervasive deficit in RAP (whatever the exact nature of this process and the mechanisms implied), which can produce detrimental effects on reading—especially when interacting with concomitant impairments at the visual-attentional level. Last but not least, the correlations between RAP variables, phonemic awareness skills and reading support the idea that processing auditory stimuli is not simply an associated problem, but concurs in determining both the quality and quantity of the reading deficit, in strict interplay with other variables. These findings thus do not support simplified versions of the RAP deficit, just focusing on deficits in processing short and rapid sounds, and rather depict a far more complex model of the interrelations between the different variables. Very crucial is the finding that RAP variables, not expressing absolute performance levels but rather differential scores describing the level of sensitivity to changes in tone length or ISI, are the best predictors of reading abilities in their various aspects, and better predictors than phonemic awareness and verbal memory skills. Even further, such differential variables contribute to the prediction of specific forms of phonemic awareness itself. Interestingly, while higher sensitivity to changes in tone length is associated to better reading performance, higher sensitivity to ISI changes are associated with worse reading performance. A closer inspection of correlation patterns in DD children suggests, in fact, that increased ISI-related differences depend on better performance with the longest ISIs (*r* = 0.630, *p* = 0.001, i.e., they express greater ability to take advantage from increases in ISI) whereas increased differences with respect to tone length depend on lower performances with short tones (*r* = −0.371, *p* = 0.062, i.e., they express more severe impairments).

The limited number of participants in each of the various subgroups calls for caution in generalizing its results. Further limitations of the study are the absence of concomitant language and attention measures, and the use of previous clinical reports (in a few cases, parents' reports) to identify children with comorbid language impairments. Nonetheless, the relatively homogeneous profiles within each group and the replication of some of the results in reading-level matched comparisons support their validity, and may offer stimulating hints as to the range and type of variables that need to be taken into account when investigating sensory processing in developmental disorders and their relationship with reading skills. The crucial role played by RAP variables (related to length of the stimuli, ISI, and to sensitivity to their modulations) in predicting specific aspects of reading performance (with differential effects on word and nonword reading) suggests that addressing such skills in intervention programs and choosing specific RAP targets according to the specific reading patterns may be an effective and innovative rehabilitation strategy. Furthermore, the present results suggest that linguistic variables (possibly at the lexical level) different from memory and phonemic awareness influence RAP and reading performance in children with comorbid DD and (even if compensated) LI. Further research seems to be necessary for the identification and characterization of such variables, that could shed better light on the complex relationships between low- and high-level processing of language.

### **REFERENCES**


Hood, M., and Conlon, E. (2004). Visual and auditory temporal processing and early reading development. *Dyslexia* 10, 234–252. doi: 10.1002/dys.273


towards a multidimensional model. *Brain* 136, 630–645. doi: 10.1093/brain/ aws356


for evaluation of learning disorders. *Child Dev.* 72, 37–49. doi: 10.1111/1467- 8624.00264


Ziegler, J. C., Pech-Georgel, C., George, F., and Lorenzi, C. (2009). Speechperception-in-noise deficits in dyslexia. *Dev. Sci.* 12, 732–745. doi: 10.1111/j.1467-7687.2009.00817.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2013; accepted: 28 April 2014; published online: 19 May 2014. Citation: Lorusso ML, Cantiani C and Molteni M (2014) Age, dyslexia subtype and comorbidity modulate rapid auditory processing in developmental dyslexia. Front. Hum. Neurosci. 8:313. doi: 10.3389/fnhum.2014.00313*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Lorusso, Cantiani and Molteni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The effect of morphology on spelling and reading accuracy: a study on Italian children

## **Paola Angelelli <sup>1</sup>\*, Chiara Valeria Marinelli <sup>2</sup> and Cristina Burani 3,4**

<sup>1</sup> Department of History, Society and Human Studies, University of Salento, Lecce, Italy

2 Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Santa Lucia, Rome, Italy

3 Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche, Rome, Italy

<sup>4</sup> Department of Life Sciences, University of Trieste, Trieste, Italy

### **Edited by:**

Peter F. De Jong, University of Amsterdam, Netherlands

### **Reviewed by:**

Séverine Casalis, Université de Lille Nord de France, France Judith Rispens, University of Amsterdam, Netherlands

#### **\*Correspondence:**

Paola Angelelli, Department of History, Society and Human Studies, University of Salento, 45/47 Stampacchia Street, 73100 Lecce, Italy e-mail: paola.angelelli@ unisalento.it

In opaque orthographies knowledge of morphological information helps in achieving reading and spelling accuracy. In transparent orthographies with regular print-to-sound correspondences, such as Italian, the mappings of orthography onto phonology and phonology onto orthography are in principle sufficient to read and spell most words. The present study aimed to investigate the role of morphology in the reading and spelling accuracy of Italian children as a function of school experience to determine whether morphological facilitation was present in children learning a transparent orthography. The reading and spelling performances of 15 third-grade and 15 fifth-grade typically developing children were analyzed. Children read aloud and spelled both low-frequency words and pseudowords. Low-frequency words were manipulated for the presence of morphological structure (morphemic words vs. non-derived words). Morphemic words could also vary for the frequency (high vs. low) of roots and suffixes. Pseudo-words were made up of either a real root and a real derivational suffix in a combination that does not exist in the Italian language or had no morphological constituents. Results showed that, in Italian, morphological information is a useful resource for both reading and spelling. Typically developing children benefitted from the presence of morphological structure when they read and spelled pseudowords; however, in processing low-frequency words, morphology facilitated reading but not spelling. These findings are discussed in terms of morpho-lexical access and successful cooperation between lexical and sublexical processes in reading and spelling.

**Keywords: morphology, spelling, orthography, typically developing readers/spellers, transparent orthographies**

### **INTRODUCTION**

Analysis of the corpora and databases of several languages revealed that about 60% of less familiar words are either derived by affixation or compounds (see, e.g., Nagy and Anderson, 1984; Thornton et al., 1997, for American English and Italian, respectively). Thus, a large proportion of the unfamiliar words read and spelled by children in primary school are likely to be morphologically complex (Nagy and Anderson, 1984). In the last few decades it has been frequently shown that familiarity with morphemic patterns helps children to reasonably guess the meanings of unfamiliar words and is a powerful tool in vocabulary acquisition (see, e.g., Bertram et al., 2000). Morphology also provides reading strategies for correctly decoding and spelling unfamiliar words (Verhoeven and Perfetti, 2003, 2011). Knowledge of word morphology develops early in children, confirming that morphological structure is one of the main organizing principles of the mental lexicon. Morphological awareness improves with age and in each subsequent grade is more predictive of both reading and spelling achievement in children exposed to different orthographies (e.g., Mann and Singson, 2003; Berninger et al., 2010; Casalis et al., 2011). Morphological awareness usually predicts unique variance in addition to phonological awareness and has different degrees of association with word recognition (and spelling) in different scripts (McBride-Chang et al., 2003, 2005). Knowledge of derived words, in particular, may contribute to both reading and spelling achievement in older children, such as 6th graders (see, e.g., for Dutch, Rispens et al., 2008). Overall, access to morphemic constituents of words fosters reading and spelling performance in the course of literacy acquisition in several languages that vary for morphological richness and orthographic transparency (Verhoeven and Perfetti, 2011).

The effect of morphology on reading in Italian has received a good deal of attention in the last decade. A series of studies demonstrated that morpheme-based reading is available and efficient in Italian developing readers. In Italian, morphology has been found to have a main effect on reading fluency. Both typically developing Italian children ranging from 2nd to 7th grade and children with dyslexia read aloud pseudowords made up of a root and a derivational suffix (e.g., *donn-ista*, "woman-ist") faster than pseudowords that did not include morphemes (e.g., *dennosto*, Burani et al., 2002, 2008; Traficante et al., 2011). Analogously, words with a morphological structure (e.g., *cass-iere*, "cashier") were read faster than simple words (e.g., *cammello*, "camel") of the same length and frequency. Morphological facilitation of word reading speed was found in the youngest readers (2nd graders) and in children with dyslexia, but it was limited to low-frequency words in older skilled children (Burani et al., 2008; Marcolini et al., 2011; see also, for a review, Burani, 2010). The facilitation induced by morphology on the reading performance of less skilled readers may reflect access to lexical reading units (morphemes) that are shorter than the whole word when this reading unit is too long and complex for the reader. Morphemes (specifically roots and suffixes) can be efficient reading units because they have an intermediate grain size between single letters—which entail extremely slow and analytical sublexical processing—and the word—which for beginning readers and children with dyslexia is usually too large a unit to be processed as a whole. By contrast, for skilled readers who master lexical reading of familiar word units, recourse to morphemic units is beneficial only for low frequency words. In this case, morphemes (roots and affixes) usually have a higher frequency than the word in which they occur. Therefore, access to morphemes may facilitate lexical reading for a low-frequency word that otherwise would probably not be represented as a whole in the mental lexicon.

Italian has a transparent orthography and knowledge of morphology is not necessary for assigning the correct pronunciation to print or for correct spelling. This is different from opaque orthographies, such as English, Danish and French, in which word spelling is to some degree morphologically governed and knowledge of morphemes may help to assign the correct pronunciation or to make the appropriate choice between graphemic alternatives for spelling a word (see, e.g., Seymour, 1997; Verhoeven and Perfetti, 2003; Pacton and Deacon, 2008 and see below for examples). As a consequence, in Italian the impact of morphological structure on reading accuracy is weaker than that on reading speed. However, the presence of morphemes in a stimulus positively affected reading accuracy in the case of novel (pseudo-) words (Burani et al., 2002, 2008; Traficante et al., 2011), but not in the case of words, irrespective of word frequency or reading skill (Burani et al., 2008; Marcolini et al., 2011).

As to word spelling, the effect of morphological structure in opaque orthographies has been studied in relation to the existence of two different spelling procedures, one based on phonology-to-orthography conversion rules (Patterson, 1986; Tainturier and Rapp, 2000) and the other relying on access to word-specific memories in the orthographic lexicon (e.g., Barry, 1994). Within this framework, morphosyntax has been considered a third source of information for spelling. In opaque orthographies, morphological information may contribute to spelling accuracy in several ways. For example, in English the use of morphological information allows: (1) choosing between several possible spellings of a given sound (e.g., the vowel /e/ in *health* can be spelled correctly if one knows the spelling of its root, *heal*); (2) spelling certain words for which phonology is misleading (the past tense of regular verbs ends in –*ed* although their pronunciation can be /d/, /t/ and /Id/); and (3) spelling cases that are morphologically distinct although phonologically identical, as in the case of the apostrophe (*boys*, *boy's* and *boys'* sound the same but are spelled differently based on their morphological structure). Especially in the latter case, morphosyntactic information is necessary to to spell correctly because the use of markers cannot be retrieved from memory but depends on the syntactic context in which they occur. Evidence of the useful role of morphology in spelling mainly regards English and French. Many studies highlighted that children learning to spell in these two languages use various sources of morphosyntactic information to spell correctly (for a review see Pacton and Deacon, 2008). However, there is no agreement about the timing (early vs. late) of children's use of morphological information.

In a longitudinal study, Nunes et al. (1997) examined the appreciation of morphological conventions such as those required for the correct spelling of regular past tense verbs in 6- to 9 year-old English children. The authors used a dictation task with regular past tense verbs (ending in –*ed* as in *called* and *dressed*), irregular past tense verbs (endings spelled phonetically as in *found* and *felt*), and non-verbs ending with /d/ and /t/ (such as in *bird* and *soft*). Four developmental stages were identified: in the first stage children wrote /d/ and /t/ endings phonetically, irrespective of stimulus type; then, they generalized the –*ed* ending to grammatically inappropriate words, as in spelling the adjective (e.g., *sofed* for *soft*). Later these generalizations applied to grammatically appropriate words (verbs) but incorrectly to irregular verbs (e.g., *keeped* for *kept*), and finally they were properly confined to regular past tense verbs. The authors concluded that children grasp the morphological principles of spelling only at late stages of literacy.

Other studies (e.g., Treiman et al., 1994; Kemp, 2006) found that very young children (5–8 years old) were able to use the principle of root consistency, although not to its full extent. Kemp (2006) examined whether young children used their knowledge of the spelling of base words to spell inflected and derived forms. The author examined how children spelled the /z/ sound in one- and two-morpheme words. In the case of two-morpheme words, the different alternatives to represent the word-medial /z/ sound (e.g., *S, Z, ZZ*) can be determined by knowledge of the base form spelling. The author found that 5- to 9-yearolds were more accurate in representing the medial /z/ sound of words derived from base forms (e.g., *noisy* from *noise*) with respect to one-morpheme control words (e.g., *busy*). These findings support the view that English-speaking children identify and represent links of meaning between words relatively early and that morphological information is a resource used also by relatively young learners. Similar conclusions come from studies in French-speaking children. Sénéchal (2000) found that children even in the first year of formal schooling spelled words that have morphologically related words better than words that do not. More recently, in a study in 8-year-old children Pacton et al. (2013) found that a facilitation due to morphological relatedness was present also when they learned to spell new words.

In transparent orthographies with consistent phoneme-tographeme correspondences, such as Spanish, Finnish and Italian, the mapping of phonology onto orthography is in principle sufficient to spell most words correctly. However, there is evidence that morphological knowledge may have a role in spelling also in transparent orthographies. One piece of evidence comes from the study of Defior et al. (2008) in Spanish first- to third-grade children, in which the recourse to morphological information in spelling was investigated by capitalizing on one of the few conditions of non-transparency in oral-to-written mapping. In Andalusia, the region where the experiment was conducted, the final /*s*/ of words is not pronounced. Since the final /s/ marks plurals and the second person singular of verbs, its presence in the children's spellings was considered an index of their adequate use of morphosyntactic knowledge. The study included two morphological conditions (high- and low-frequency plural nouns and second person verbs), and a lexical control condition (high- and low-frequency singular nouns ending in /*s*/, e.g., *martes* "Tuesday"). Results showed that although Spanish spelling relies mainly on phonology, morphological information is also a spelling resource: with low-frequency words, children's spelling accuracy on verbs and plural nouns (items with morphologically motivated /*s*/ endings) was greater than that on control words with a final /*s*/ (not morphologically motivated). However, the results also showed that the children did not use morphology systematically: in high-frequency words, they used fewer /*s*/ endings in plural nouns than in uninflected control words.

To our knowledge, no study on Italian has investigated the role of morphology in spelling. However, a recent study in firstto eighth-grade Italian typically developing readers (Notarnicola et al., 2012) reported several findings of interest in this respect: *(i*) in agreement with the hypothesis that reliance on the different procedures depends on the degree of regularity of an orthography (for reviews, see Sprenger-Charolles, 2003; Caravolas, 2004), in Italian main reliance on the sublexical phoneme-to-grapheme spelling procedure was found in all grades; (*ii*) data also showed very early reliance on the lexical procedure, with a lexicality effect (regular words spelled better than pseudowords) and an early positive influence of a lexical-semantic variable, such as word age-of-acquisition, on ambiguous word spelling found in first graders; and *(iii*) data generally supported the view of an interaction between lexical and sublexical spelling processes in Italian children. Results showed a pattern of correlations that was generally consistent with the view that spelling regular words benefits from the cooperation of both spelling procedures, with sublexical processing assisting accuracy in spelling lexical items.

In the cited study on Italian, no morphologically complex words were used. Consequently, the impact of morphology on spelling could not be estimated. However, it can be conjectured that, similarly to what happens in reading long unfamiliar stimuli, access to morphemes might help Italian children recognize lexical chunks of information and use them for (morpho-) lexical spelling, thus bypassing the use of single phonemeto-grapheme correspondences. Thus, the spelling of long and complex words might benefit from the possibility of segmenting the phoneme-grapheme array into units, such as morphemes, that are meaningful and more coherent than single phonemes or syllables. Some support to the view that morphemes may provide an efficient principle for stimulus segmentation comes from an interesting study by Lehtonen and Bryant (2005) on Finnish, a richly inflected language with highly transparent orthography. The authors used two-morpheme words in which target clusters of letters (the sequences LL and SS) appeared in different morphemes of the words, either in the root (unbound morpheme) or in the inflection for case. In Finnish, case inflections are a more prominent part of morphology than derivation, because they occur in nouns, adjectives, pronouns and numerals. The authors tested children at two different times during the first year of school and found that by the end of the year they began to spell target clusters better in case inflections than in word roots, which suggested emerging sensitivity to the morphological structure of words in spelling. Similar results were found for pseudowords: letter clusters occurring in endings corresponding to case inflections were spelled with greater accuracy than those occurring in pseudo-roots, suggesting that caselike endings prompted morphological parsing during spelling. According to the authors, the facilitation arises because the children's mental lexicon is organized in morphemes and case inflections are solidly acquired and represented in the mental lexicon due to the high frequency with which they occur. This in turn helps the subsequent parsing of words into their constituent morphemes, favoring the oral-to-written transcription process.

In the present study we investigated the effects of morphology on both reading and spelling accuracy of pseudowords and words in typically developing children in different grades, i.e., in third and fifth grade. For both reading and spelling, we expected that pseudowords made up of familiar morphemes (roots and derivational suffixes) would be read and spelled better than matched pseudowords that did not include any morphemic constituent. The expected findings would confirm those already obtained for reading (see Burani et al., 2002, 2008; Traficante et al., 2011) and would extend them to spelling. As to words, in preceding studies on Italian no evidence was found of an effect of morphological structure on reading accuracy. However, preceding studies on word reading either involved words of medium frequency (Burani et al., 2008), or, when low-frequency words were investigated, readers were in 6th–7th grade. Thus, we still do not know whether the presence of familiar morphemes in a low-frequency word favors reading (and spelling) accuracy in children as young as 3rd and 5th graders who might not yet possess a lexical representation for low-frequency words. In the present study, children's reading and spelling performance on low-frequency morphologically complex words was compared to their performance on words with no derivational structure. In order to better qualify the effect of morphology, two types of morphologically complex words were investigated: words madeup of high-frequency morphemes (root and suffix) and words with low-frequency morphemes. Some studies have found that English-speaking children in 3rd to 6th grades read aloud derived words with a high frequency base more accurately than derived words with a low frequency base matched for surface frequency (Mann and Singson, 2003; Carlisle and Stone, 2005; Deacon et al., 2011). The new experimental contrasts adopted here, in which both types of morphologically complex words (i.e., including either high-frequency or low-frequency morphemes) were compared to words that did not include morphemes, allowed us to investigate an issue that has never been studied in Italian children. Higher accuracy was expected for morphological words including high-frequency morphemes as compared to words including low-frequency roots and suffixes and words not decomposable into morphemes. The advantage for words including highfrequency morphemes was expected to hold for both 3rd and 5th graders.

Administration of the same pseudowords and words for both reading and spelling allowed us to directly compare the children's performance on both tasks. Overall, we expected that morphological knowledge would enhance not only reading but also spelling performance in Italian by facilitating the parsing process of the stimulus by retrieving lexical units smaller than the whole stimulus. Similarly to what has been observed for morpheme-based reading, morphological facilitation in spelling was expected to be evident for pseudowords, irrespective of the children's reading ability. For low-frequency words, we expected that only those made up of high-frequency morphemes would result in a morphological benefit in spelling, with no substantial differences between children in different grades.

### **MATERIALS AND METHOD**

### **PARTICIPANTS**

Participants were selected during screening activities, as part of a research agreement between the University of Bari and a local public primary school. The study was conducted according to the principles of the Helsinki Declaration and was approved by the school authority (Teaching body). Parents were informed of the screening activities and had to approve their child's participation. All data concerning individual performances were analyzed strictly for research purposes.

Participants were typically developing readers and spellers selected according to the following criteria: *(i)* normal reading speed and accuracy on a standard reading test (MT reading test, Cornoldi and Colpo, 1998; see paragraph Reading Assessment), *(ii)* normal spelling performance on a standard spelling test (DDO Test for the Diagnosis of Developmental Dysgraphia, Angelelli et al., 2008; see paragraph Spelling Assessment ); and *(iii)* normal performance on a nonverbal general intelligence test (Raven's Colored Progressive Matrices, CPM; i.e., above the 10th percentile for age range according to normative Italian data by Pruneti et al., 1996). Participants included 15 children in 3rd grade (7F, 8M; mean age = 8.65 year, sd = 0.27) and 15 children in 5th grade (7F, 8M; mean age = 10.34 year, sd = 0.38), matched one-to-one for gender and performance on Raven's CPM intelligence test (*z* scores; *F*(1,29) = 0.00, ns).

Data pertaining to the 3rd and 5th grade children's performance on Raven's CPM, the MT reading tasks (speed and errors) and the spelling test are summarized in **Table 1**. As reported in the Table, 5th grade children, compared to 3rd grade children, performed better in terms of reading speed and reduced error rates in reading as well as in all spelling subsets. Both groups of children performed close to normative data (*z* scores about zero) for Raven's CPM (3rd grade children: *z* = −0.38; 5th grade children: *z* = −0.40), reading speed (3rd grade children: *z* = −0.22; 5th grade children: *z* = −0.43), reading accuracy (3rd grade children: *z* = 0.28; 5th grade children: *z* = −0.44) and for the total spelling task (3rd grade children: *z* = −0.32; 5th grade children: *z* = 0.07), indicating only marginal deviations from the same-age normative sample.

### **READING ASSESSMENT**

Reading level was assessed using a standard reading achievement test (i.e., the MT Reading test, Cornoldi and Colpo, 1998). Participants read aloud a meaningful text passage within a 4-min time limit; speed (time in seconds per number of syllables read) and accuracy (number of errors, adjusted for the amount of text read) were computed. Stimulus materials and related reference norms varied depending on school grade. Raw scores were converted to *z* scores according to standard reference data. Normative data for third and fifth graders were based on 285 and 305 children, respectively (Cornoldi and Colpo, 1998).

### **SPELLING ASSESSMENT**

The participants' spelling abilities were tested with a standard *spelling to dictation test* (DDO test, Angelelli et al., 2008), which consisted of four sections:

Section A: regular words with full one-sound-to-one-letter correspondence (*N* = 70).

Section B: regular words requiring the application of contextsensitive sound-to-spelling rules (*N* = 10). In Italian, contextsensitive rules are required when the orthographic transcription of a phoneme depends on the following letter. For example, the phoneme /k/ is spelled C, when followed by a consonant (e.g., in *clima* (/klima/ "climate") or by A, O, U (e.g., in *casa /*kaza/ "home"; *cono* /kono/ "cone"; and *cubo* /kubo/ "cube") and CH when followed by E or I (e.g., in *chilo*/kilo/ "kilogram").

Section C: ambiguous words (words with two or more possible transcriptions along the phonology-to-orthography conversion routine; (e.g., words containing the syllables /tr e/, /r e/ and */*dZe/, which may or may not require an I (e.g., /r entsa/ "science" is spelled *scienza* and not *scenza*, while /r ena/ "scene" is spelled *scena* and not *sciena*) (*N* = 55).

Section D: pseudowords with one-sound-to-one-letter correspondence (*N* = 25).

Words with one-sound-to-one-letter correspondence and pseudowords were controlled for orthographic complexity (i.e., number of consonant clusters, double consonants) and length.

Normative data are available for first- to eighth-grade children (Angelelli et al., 2008). Reference data for third and fifth graders are based on 95 and 105 children, respectively. Raw scores were converted to *z* scores.

### **EXPERIMENTAL LISTS**

Different sets of low-frequency words and pseudowords were created.

Words: Three sets of 15 low-frequency words (Istituto di Linguistica Computazionale, CNR, unpublished) were used. Words in the first set (e.g., bruttezza, "ugliness") consisted of a root


**Table 1 | Mean (and SD) of 3rd and 5th grade children on the intelligence test (Raven's Colored Progressive Matrices), the MT Reading test (Cornoldi and Colpo, 1998), and the writing task (Angelelli et al., 2008)**.

Note: Regular words 1:1 = words with one-sound-to-one-letter correspondence; context-sensitive words = words requiring the application of context-sensitive sound-to-spelling rules; ambiguous words = words with unpredictable transcription along the phonology-to-orthography conversion routine.

(brutt- "ugly") and a derivational suffix (-ezza, "ness"), which were both of high frequency (HD). Words in the second set (*e.g*., agrumeto, "citrus grove") consisted of a root (agrum-, "citrus plant") and a derivational suffix (-eto, indicating a place where trees or flowers grow), which were both of low frequency (LD). The third set of words included simple non-derived words (ND) (e.g., aragosta, "lobster"). The three sets of words were matched for word frequency and did not differ for relevant psycholinguistic variables such as length (number of letters), consonant clusters, geminate letters, number of contextual rules and bigram frequency (all *p*s > 0.05). As expected, the first and the second set were different for root frequency (*F*(1,28) = 17.73, *p* < 0.001) and suffix frequency (*F*(1,28) = 15.79, *p* < 0.001). All words (with frequency values) are reported in Appendix A.

Pseudowords: Two sets of 16 pseudowords of three to four syllables (length range: 8–10 letters) were generated: pseudowords in the first set were morphologically complex (root + suffix) and consisted of a root and a derivational suffix (R+S <sup>+</sup>) in a combination that does not exist in the Italian language (e.g., lampadista, constituted by the bound root lampad-, meaning "lamp" and the suffix –ista, "-ist"). Pseudowords in the second set (non-root + non-suffix) were made up of orthographic sequences that did not correspond to any existing Italian root or suffix (R−S <sup>−</sup>) (*e.g*., livonosto). Analogously to the ND words, the pseudowords in the latter set had no morphological structure. The two sets of morphemic and non-morphemic pseudowords were matched for number of contextual rules, consonant clusters, geminate letters, length (in letters) and bigram frequency (all *F*s < 1). The two sets of pseudowords were also matched for the frequency of the final orthographic sequence, which corresponded either to a real suffix in the R+S <sup>+</sup> set or a non-suffix in the R−S <sup>−</sup> set. All pseudowords with frequency values of constituent parts are reported in Appendix B.

We added 43 filler stimuli to the list, that is, 15 nonmorphologically complex words and 18 pseudowords; half were morphologically complex and half were simple. A total of 110 stimuli were presented to each child for dictation; half were words and half were pseudowords, half were morphologically complex and half were simple. This list of words and pseudowords was intended to favor lexical reading without explicitly inducing morphological decomposition.

## **PROCEDURE**

For the reading condition words and pseudowords were randomized and presented in three blocks of either 36 or 37 items each, using different random orders. Stimuli were displayed at the center of the computer screen; they were printed in black lower case (Arial font, 24 pt). Each trial consisted of the following sequence: a fixation point for 500 ms; a blank stimulus for 250 ms; the stimulus, which remained visible until the onset of pronunciation. Participants read each stimulus aloud as accurately as possible. Mispronunciation errors were recorded and noted by two experimenters, who verified their annotations at the end of the experimental sections. The experimental sections were preceded by a training block of 10 stimuli, that is, five words and five pseudowords.

For the spelling condition, words and pseudowords were randomized and administered in a spelling-to-dictation task. The examiner read each item aloud in a neutral tone without emphasizing the presence of possible orthographic difficulties. To ensure that the children had correctly perceived the items, the examiner asked them to repeat each one before they wrote it down in capital letters. No feedback was provided on the correctness of the written response. Pauses were allowed if requested. Spontaneous corrections were accepted.

The reading and spelling tests were administered with an interval of about 20 days between them (were administered about 20 days apart). The order of the tasks was balanced in the experimental sample: half of the children performed the reading task first and then the spelling test, and the other half performed the tasks in reverse order; children were randomly assigned to the first or second sub-group. They were tested individually in a quiet room at their school.

### **DATA ANALYSIS**

Reading and spelling accuracy were analyzed with Logistic Mixed Effect Models (Guo and Zhao, 2000; Quené and van den Bergh, 2008) by means of SPSS 22.0 statistics software. Logistic Mixed Effect Models were used to control for the presence of a floor effect as well as for item and participant variability. In this analysis the dependent variable was accuracy on each item of each participant in each experimental condition/sample; thus, the number of observations was very high.

Data on words and pseudowords were analyzed separately. In both analyses, *Task* (reading vs. spelling), *Grade* (3rd vs. 5th grade) and *Morphology* were entered as fixed factors, and *Items* and *Participants* were entered as Random factors. Note that in the case of words the effect of *Morphology* refers to words made up of high-frequency roots and high-frequency derivational suffixes [HD], low-frequency roots and low-frequency derivational suffixes [LD] and non-derived words [ND]; in the case of pseudowords, *Morphology* refers to pseudowords made up of real roots and derivational suffixes [R+S <sup>+</sup>] and pseudowords, including orthographic sequences that did not correspond to any existing Italian root or suffix [R−S <sup>−</sup>]. Interactions were explored by means of pairwise *post-hoc* tests.

Although comparisons between word sets for word frequency were non-significant (see paragraph Experimental Lists), words in the LL condition showed some rather unbalanced word frequencies relative to the other two sets. Therefore, to ensure that the results obtained were not a by-product of some word frequency differences between sets, a second analysis was performed in which word frequency was entered as a covariate.

### **RESULTS**

### **Words**

The Logistic Mixed Effect Model showed a significant effect of *Task* (*F*(1,2732) = 20.64, *p* < 0.0001), *Grade* (*F*(1,2732) = 11.80, *p* < 0.001) and *Morphology* (*F*(1,2732) = 4.47, *p* < 0.01), with a higher error rate in reading with respect to spelling (9.3% vs. 4.3%, respectively), in 3rd compared to 5th grade children (9.5% vs. 4.2%), and in ND and LD words with respect to HD words (8.8% and 8.5% vs. 3.3%, respectively). The *Morphology* × *Task* interaction (*F*(1,2732) = 10.76, *p* < 0.0001) was significant, showing an effect of morphology in reading (*p* < 0.0001) but not in spelling. Exploration of means revealed that HD words were read significantly better than LD and ND words (*p* < 0.001 and *p* < 0.01, respectively) and that LD words were read worse than ND words (*p* < 0.05). Furthermore, *posthoc* analysis showed that the HD condition led to comparable error percentages in reading and spelling, but the LD and ND words had a significantly higher error rate in reading than in spelling (for LD, difference between reading and spelling = 17.5%, *p* < 0.0001; for ND, difference between reading and spelling = 4.2%, *p* < 0.05).

The *Task* × *Morphology* × *Grade* interaction (*F*(1,2732) = 5.57, *p* < 0.01) was significant. **Figure 1** shows how morphology modulates reading and spelling performance for words in 3rd and 5th grade children. **Table 2** reports mean error percentages (and standard errors values) as a function of task and stimulus type. The effect of morphology was significant in reading for both 3rd and 5th graders (*p* < 0.01 and *p* < 0.0001, respectively), but not in spelling (either for 3rd or 5th graders). Exploration

of means showed that in reading both 3rd and 5th grade children performed more incorrectly on LD than HD words (difference = 14.6%, and 19.3% in 3rd and 5th grade, respectively, at least *p* < 0.01); and on ND compared to HD words (difference = 9.0% and 6.5%, in 3rd and 5th grade, respectively, at least *p* < 0.05); only 5th graders showed a difference also between LD and ND (difference = 12.8%, *p* < 0.05), indicating that the LD condition was the most difficult one. In spelling, both groups had very low and comparable percentages of errors on HD and LD words; the only significant effect was in 5th graders, who spelled ND words less correctly than morphologically complex stimuli (ND vs. LD diff. = 4.5%, *p* < 0.05). Finally, progressing from 3rd to 5th grade, errors decreased for HD words (*p* < 0.05) and ND words (*p* = 0.06) in reading and for LD words (*p* < 0.01) in spelling. A comparison between reading and spelling performances showed significantly lower accuracy in reading than in spelling only for the LD condition in both 3rd and 5th graders (*p* < 0.01 and *p* < 0.0001, respectively).

The random effects of *Items* (*Z* = 3.52; *p* < 0.0001) and *Participants* (*Z* = 2.15; *p* < 0.05) were significant.

When word frequency was added as a covariate in the analysis it approached significance (*F*(1,2.732) = 3.45, *p* = 0.06). However, the effects of *Task (F*(1,2732) = 20.64, *p* < 0.0001), *Grade* (*F*(1,2732) = 11.80, *p* < 0.001), *Morphology* (*F*(1,2732) = 4.47, *p* < 0.05), as well as the second level interaction *Task* × *Morphology* × *Grade* (*F*(1,2732) = 5.57, *p* < 0.01) remained unchanged.

### **Pseudowords**

**Figure 2** shows the effect of morphology on the reading and spelling accuracy performance of 3rd and 5th grade children on pseudowords. **Table 3** reports mean error percentages (and standard error values) as a function of task and stimulus type. The analyses indicated significant effects of *Task* (*F*(1,1944) = 35.71, *p* < 0.0001), *Grade* (*F*(1,1944) = 14.94, *p* < 0.0001) and *Morphology* (*F*(1,1944) = 16.66, *p* < 0.0001). Exploration of the main effects showed higher error rates in reading (15.9%) than in spelling (3.1%) in 3rd graders compared to 5th graders (12.6% vs. 4.6%,


Note: HD = words with high frequency roots and suffixes; LD = words with low frequency roots and suffixes; ND = non-derived words.

respectively), and in R−S <sup>−</sup> with respect to R+S <sup>+</sup> pseudowords (12.9% vs. 4.0%, respectively).

The *Grade* × *Task* interaction was significant (*F*(1,1944) = 3.71, *p* < 0.05), showing higher percentages of errors in reading compared to spelling in both grades (3rd grade difference = 13.4%; 5th grade difference = 10.7%, at least *p* < 0.0001), with a larger difference between reading and spelling in 3rd grade children. Moreover, 3rd graders were less correct than 5th graders in both spelling (difference = 6.0%, *p* < 0.01) and reading (difference = 8.7%, *p* < 0.0001). The *Morphology* × *Grade* interaction was marginally significant (*F*(1,1944) = 2.98, *p* = 0.08), indicating a significant effect of morphology in both groups, which was larger for 3rd graders whose error rates decreased from 17.4% to 8.9% passing from the R−S <sup>−</sup> to the R+S <sup>+</sup> conditions (difference = 8.5%, *p* < 0.0001); 5th graders' errors decreased from 9.4% to 1.7% for R−S <sup>−</sup> and R+S <sup>+</sup>, respectively (difference = 7.7%, *p* < 0.0001).

**Table 3 | Mean percentage of errors (and SD) of 3rd and 5th grade children in reading and spelling experimental pseudowords**.


Note: R−S <sup>−</sup> = pseudowords made up of existing roots and suffixes; R+S <sup>+</sup> = pseudowords with any existing roots and suffixes.

The random effects of *Items* and *Participants* were not significant (*Z*s about 1).

## **DISCUSSION**

In the present study, we aimed to investigate whether there is evidence of an early use of morphological information in both reading and spelling in languages with transparent orthography, such as Italian, and whether the frequency of morphemes modulates the use of morphology in both tasks, an issue not yet investigated in Italian children. Results confirmed that morphological information is a useful resource in children's reading and partially extended the evidence for morphological facilitation to the spelling process. For both tasks, they also indicated the conditions in which this facilitation occurs.

Morphology was helpful for both 3rd and 5th graders when they read and spelled pseudowords. Both younger and older children benefitted from the presence of morphological constituents when processing newly encountered stimuli; indeed, pseudowords made up of existing morphemes were read and spelled more accurately than non-morphemic pseudowords, irrespective of school level (with a somewhat higher advantage for younger than for older children). These results can be interpreted as a genuine morphological effect, rather than a generic "wordlikeness" effect, because we carefully controlled for familiarity of the chunks constituting the pseudowords. For instance, suffixes could not be considered more familiar chunks than orthographic sequences in the non-morphological set, because for the latter set of stimuli we selected final orthographic sequences that were as frequent as suffixes in the root + suffix pseudoword set. Furthermore, pseudowords in the two sets (morphological and non-morphological) were matched exactly for mean bigram frequency.

As to low-frequency words, morphological facilitation was present in reading but not in spelling. Words made up of highfrequency roots and suffixes were read better than non-derived ones by both 3rd- and 5th-grade children. However, a difference between groups emerged in reading words composed of low-frequency morphemes: while third graders read words with low-frequency morphemes at a comparable level of accuracy as non-derived words, 5th graders read words composed of lowfrequency morphemes even worse than non-derived words. These data indicate that younger children do not rely on morphological parsing when morphemes are of low frequency, because these morphemes are unknown to them. Consequently, younger children treat words with low-frequency morphemes similarly to words that include no morphemes. By contrast, 5th graders may attempt morphological parsing also when reading words composed of low-frequency morphemes, but this attempt may actually result in more errors than in the case of non-derived words (which cannot be decomposed into morphemes). It can be speculated that in reading a word made up of low-frequency morphemes, the oldest children might occasionally succeed in accessing the root; however, after accessing the root the children may expect the higher frequency suffix which is present in the base word rather than the lower-frequency suffix actually present in the derived word. As a consequence, the combination of morphemes might lead to uncertainty or to the erroneous production of the base word. Thus, the worse performance of 5th graders on words with low-frequency morphemes than on all other words confirms the tendency of older children to rely on morpho-lexical reading.

The present data are consistent with the literature showing that morphology plays a role in reading in transparent orthographies, where in principle the regularity of the phoneme-to-grapheme mapping is sufficient to correctly process most words. Pseudoword data replicate those that emerged in several studies on Italian children. Burani et al. (2002) found an advantage in reading pseudowords composed of morphemes (root + suffix) compared to pseudowords without morphological structure in 3rd and 5th grade typically developing readers. Similar results were reported for 2nd grade and 6th–7th grade typically developing readers (Burani et al., 2008). Regarding words, in a first study Burani et al. (2008) used medium-frequency morphologically complex words and found that only children with dyslexia and younger typically developing children (2nd graders) benefitted from the presence of morphemes in reading words relative to simple words. By contrast, 6th–7th graders and adult skilled readers showed no difference in reading morphologically complex words vs. simple words (Burani et al., 2008). However, in a second study, Marcolini et al. (2011) showed that word frequency can modulate morpheme-based reading in skilled readers (6th–7th graders), facilitating the reading of low- but not high-frequency morphologically complex words. According to the authors, when a unit larger than the morpheme (i.e., the whole word) is available because it has a high frequency, morphemic parsing does not necessarily facilitate processing. Parsing a word into morphemes entails both benefits and costs, and costs may prevail over benefits when there is the alternative possibility of reading the word as a whole (Schreuder and Baayen, 1995; Traficante and Burani, 2003). Consequently, for skilled readers morphemic parsing may be an efficient strategy only with new or unfamiliar words, for which no whole-word representation is available. The present reading data are consistent with this interpretation: all words used in our study were of low frequency and we observed morphological facilitation. Moreover, a negative effect of morphological parsing in skilled readers also emerged, with worse performance on morphemic words made up of low-frequency roots and suffixes than on simple non-derived words.

A new finding of this study was the presence of morphological facilitation in the spelling of stimuli with regular transcription; however, the facilitation was limited to pseudowords. In fact, even if morphological parsing was attempted in spelling words, it did not produce appreciable effects; however, some facilitation was present in 5th grade children for morphemic words. We interpreted these findings as follows: The facilitation for novel stimuli may arise from parsing and subsequent access to smaller (than the whole stimulus) and more manageable lexical units. For developing readers, in fact, exposure to these frequently occurring chunks of sound and meaning in speech and their corresponding orthographic patterns in writing could allow morphemes to become relatively independent spelling units. This would enable children to process them correctly avoiding timeconsuming and error-prone phoneme-grapheme analysis. The different results obtained for pseudoword and word spelling with morphological facilitation present only in the former—give some indications. In spelling a word, morphological parsing may be less influential than in pseudoword spelling because the whole-word spelling procedure—together with the sublexical phoneme-to-grapheme conversion routine (and their mutual interaction)—may have a relevant role. This could explain the absence of a significant modulation of morphology on word spelling. Consistent with this hypothesis is the finding of very early signs that is, from the first years of schooling, of lexical spelling (Notarnicola et al., 2012), with first graders already able to spell correctly 60% of the words that require reliance on lexical orthographic representations. Overall, the morphological facilitation found in spelling, although prevalent for non lexical stimuli, is consistent with the conclusions reached by the few studies conducted in transparent orthographies (Lehtonen and Bryant, 2005; Defior et al., 2008). In those studies, morphological knowledge was found to be exploited in different experimental conditions by children learning to spell.

A final result of our study deserves some comment. In both 3rd and 5th grade children we found higher accuracy in spelling than in reading when the same sets of stimuli were compared. We believe that this difference in error rates may be due to task-specific processes. It is worth noting that, unlike reading, in spelling under dictation there is enough time to activate a word representation in the mental lexicon because the word is fully available to the speller before starting the process of writing it. Thus, an additional locus of facilitation is the activation of the spoken lexical form (see, e.g., Chua and Liow, 2014). In addition, the spelling response is produced without time pressure. In other words, in spelling the decoding phase is separate from the transcoding phase and usually neither process is under time pressure. Therefore, especially with regular stimuli, in lexical and sublexical processes (which may produce converging information) there is enough time for successful integration in spelling, thus leading to high accuracy. Conversely, reading is an online task in which the time lapse between stimulus recognition and response is very short (thus the stimulus decoding has to be done rapidly) and online corrections become reading errors. This could explain the lower number of errors in spelling with respect to reading, especially in those conditions in which the morphemic strategy is riskier, such as the case of words with low-frequency constituents. In the latter condition, that is, in the only condition that showed a significant difference between reading and spelling, online corrections led to errors in reading but not in spelling, where the response could be delayed with respect to the decoding phase and online corrections could be successfully incorporated.

The data that emerged from the present study have clear empirical implications. The facilitatory effect of morphology in reading and in spelling new words could be used to enrich standard teaching methods and rehabilitation strategies in the case of learning disabilities. Regarding reading, in our study only accuracy was considered. However, previous reports showed that morphology enhanced reading fluency (see Burani, 2010, for a review) in Italian children with dyslexia who are characterized by a prevalent deficit of reading speed (Zoccolotti et al., 1999). In the present study we found small but reliable effects in the spelling of regular stimuli of 3rd and 5th grade children. Studies in larger populations are needed to confirm the present data. However, considering that the transcription of regular stimuli is optimized very early in Italian (see Notarnicola et al., 2012), larger facilitatory effects might be found in younger learners. Furthermore, considering that some errors on ambiguous words are still present in 8th grade typically developing children and that a selective impairment of ambiguous word transcription characterizes the writing deficit of Italian children with learning disabilities (Angelelli et al., 2004, 2010), we believe that recourse to morphology is particularly helpful in situations of spelling ambiguity (e.g., knowledge of the spelling of SCIENZA "science" may facilitate the spelling of SCIENZIATO "scientist", FANTASCIENZA "science fiction", etc.). In this sense the introduction of morphemes in teaching materials and an emphasis on morphemic strategies could be particularly useful in the early phases of literacy acquisition as well as in children with learning disabilities (see, e.g., Elbro and Arnbak, 1996; Traficante, 2012). Explicit training using morphological strategies might induce children to identify patterns of letters that are consistent among several words and foster the processing of units that are larger than single phonemes/graphemes. However, future research is needed to further explore the possible benefits of morphological training, especially in transparent orthographies.

Overall the present study extends the role of morphology from reading to the spelling of newly encountered stimuli in a language with transparent orthography (Italian) and highlights the possible role of morphological knowledge in promoting literacy acquisition.

### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 February 2014; accepted: 10 October 2014; published online: 19 November 2014*.

*Citation: Angelelli P, Marinelli CV and Burani C (2014) The effect of morphology on spelling and reading accuracy: a study on Italian children. Front. Psychol. 5:1373. doi: 10.3389/fpsyg.2014.01373*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*.

*Copyright © 2014 Angelelli, Marinelli and Burani. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## The intergenerational multiple deficit model and the case of dyslexia

## *Elsje van Bergen1\*, Aryan van der Leij <sup>2</sup> and Peter F. de Jong2*

*<sup>1</sup> Department of Experimental Psychology, University of Oxford, Oxford, UK*

*<sup>2</sup> Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, Netherlands*

### *Edited by:*

*Pierluigi Zoccolotti, University of Rome La Sapienza, Italy*

### *Reviewed by:*

*Arthur M. Jacobs, Freie Universita*¨*t Berlin, Germany Angela Heine, Hochschule Rhein-Waal, Germany Marialuisa Martelli, University of Rome La Sapienza, Italy*

### *\*Correspondence:*

*Elsje van Bergen, Department of Experimental Psychology, University of Oxford, 9 South Parks Road, Oxford OX1 3UD, UK e-mail: elsje.vanbergen@psy.ox.ac.uk*

Which children go on to develop dyslexia? Since dyslexia has a multifactorial etiology, this question can be restated as: what are the factors that put children at high risk for developing dyslexia? It is argued that a useful theoretical framework to address this question is Pennington's (2006) multiple deficit model (MDM). This model replaces models that attribute dyslexia to a single underlying cause. Subsequently, the generalist genes hypothesis for learning (dis)abilities (Plomin and Kovas, 2005) is described and integrated with the MDM. Next, findings are presented from a longitudinal study with children at family risk for dyslexia. Such studies can contribute to testing and specifying the MDM. In this study, risk factors at both the child and family level were investigated. This led to the proposed intergenerational MDM, in which both parents confer liability via intertwined genetic and environmental pathways. Future scientific directions are discussed to investigate parent-offspring resemblance and transmission patterns, which will shed new light on disorder etiology.

**Keywords: intergenerational multiple deficit model, generalist genes hypothesis, dyslexia, comorbidity, family risk, developmental disorders, intergenerational transmission**

## **PROBLEMS FOR SINGLE DEFICIT ACCOUNTS OF DYSLEXIA**

Research into dyslexia has been dominated by the quest for the Holy Grail: the single cognitive deficit that is necessary and sufficient to cause all behavioural characteristics of the disorder. The dominant hypothesis of this kind has been the phonologicaldeficit hypothesis (e.g., Wagner, 1986; Snowling, 1995). However, a single cognitive deficit model of dyslexia, as single deficit models of developmental disorders in general (see Pennington, 2006, for a comprehensive overview), has a number of shortcomings. First, there is no single cognitive deficit found that can explain all behavioural symptoms of all cases with dyslexia (e.g., Ramus and Ahissar, 2012). For example, not all individuals with dyslexia show a phonological deficit (e.g., Valdois et al., 2011; Pennington et al., 2012). Conversely, not all individuals with a phonological deficit have dyslexia (e.g., Snowling, 2008; van der Leij et al., under review). This questions a one-to-one mapping and points to the possibility that various constellations of underlying cognitive deficits can lead to the behavioural symptoms of dyslexia.

In addition, a single deficit model cannot readily explain the phenomenon of comorbidity. For instance, dyslexia co-occurs more often than expected by chance with other developmental disorders, including dyscalculia, specific language impairment (SLI), speech-sound disorder, and attention-deficit/hyperactivity disorder (ADHD). To illustrate this point, suppose disorder A and B each have a prevalence of 5% in the general population. If disorder A and B were independent, then 5% of the cases with A would also have B. However, comorbidity rates for developmental disorders are commonly in the order of 30%, for example between dyslexia and speech-sound disorder or dyslexia and ADHD (Pennington, 2006). The huge discrepancy between

these figures (5 vs. 30%) implies that developmental disorders are not independent.

The single deficit model requires for each comorbidity (pair of disorders) a distinct account. Pennington (2006) discusses as an example the comorbidity between dyslexia and speech-sound disorder. Speech-sound disorder is defined by difficulties in the development of spoken language, especially problems with the intelligible production of speech sounds. Approximately 30% of children with early language or speech problems go on to develop dyslexia. A parsimonious single deficit model to explain this comorbidity is the severity hypothesis. The severity hypothesis states that speech-sound disorder and dyslexia have the same underlying phonological deficit, with speech-sound disorder being an earlier developmental manifestation of this deficit than dyslexia. Comorbid cases will have the most severe phonological deficit. If the phonological deficit is less severe, speech-sound disorder will not reach clinical boundaries but dyslexia will. To account for cases with early speech-sound disorder but without later dyslexia, the model must pose a subtype of speech-sound disorders that is caused by a phonological deficit distinct from the phonological deficit as seen in cases with dyslexia. Alternatively, the phonological deficit in such cases must be resolved by the time they come to the task of learning to read. However, Snowling et al. (2000) followed a group of former language-impaired children into adolescence. Those with early speech-sound disorder (isolated phonological impairments at 4 years of age) had normal reading skills at age 15, but continued to show phonological deficits. Similar results were obtained by Peterson et al. (2009). In their study many children with early speech-sound disorder went on to learn to read normally despite a lasting phonological deficit. Thus, in both studies the children with early speech-sound disorder had a

phonological deficit similar to children with dyslexia. This conclusion is inconsistent with the single cognitive deficit severity hypothesis.

While research at the cognitive level of explanation was still searching for a single deficit, studies at the genetic level converged on the conclusion that the aetiology of dyslexia, as of other developmental disorders, is genetically complex (Pennington, 2006). So instead of a single gene determining dyslexia, many genes act probabilistically (i.e., polygenicity), each having only a small contributory effect to the etiology of dyslexia (Bishop, 2009). Moreover, behavioural genetic studies have shown for certain developmental disorders that the relation between two traits (like reading ability and inattention) is larger in monozygotic (MZ) twin pairs than in dizygotic (DZ) twin pairs (Willcutt et al., 2000). Such a bivariate heritability supports genetic overlap between the conditions, in this example between dyslexia and ADHD. The partly shared etiology of dyslexia and ADHD does not yet rule out the possibility of a distinct single cognitive deficit for each disorder. However, studies have demonstrated that a processing speed impairment is not only a characteristic of dyslexia, but also of ADHD (e.g., Willcutt et al., 2005), suggesting that processing speed is a shared cognitive risk factor (McGrath et al., 2011). The accumulating evidence for etiological and cognitive overlap between dyslexia and ADHD speaks against a single deficit model for explaining their frequent co-occurrence. Also for other dyslexia comorbidities, shared cognitive deficits are found, for example a phonological deficit in SLI (e.g., Bishop et al., 2009) and a processing-speed deficit in dyscalculia (e.g., van der Sluis et al., 2004).

### **THE MULTIPLE DEFICIT MODEL**

It seems that single deficit models are untenable and must give way to multiple cognitive deficit models for understanding developmental disorders. The multiple cognitive deficit model proposed by Pennington (2006) is depicted schematically in **Figure 1**. In his model, multiple genetic and environmental risk factors operate probabilistically by increasing the liability to a disorder; conversely, protective factors decrease the liability. These etiological factors produce the behavioural symptoms of developmental disorders by influencing the development of relevant neural systems and cognitive processes. Importantly, there is no single etiological or cognitive factor that is sufficient to cause a disorder. Instead, multiple cognitive deficits (each due to multiple etiological factors) need to be present to produce a disorder at the behavioural level. Some of the etiological and cognitive risk factors are shared with other disorders. As a result, comorbidity among developmental disorders is to be expected, rather than something that requires additional explanations. Finally, from Pennington's the multiple deficit model (MDM) it follows that "the liability distribution for a given disease is often continuous and quantitative, rather than being discrete and categorical" (Pennington, 2006, p. 404). Therefore, the threshold between affected and unaffected is rather arbitrary.

Note that there are other models out there to explain the co-occurrence of neurodevelopmental disorders or the covariance of their associated continuous traits. For instance, Pennington et al. (2005) and Pennington (2006) set out several models

differing not only on the dimension of a shared or distinct cognitive deficit, but also of a shared or distinct etiology. The severity model discussed in the first section is among them; others are pleiotropy, cognitive phenocopy, synergy, and assortment. Disconfirmatory data for each of the models is given, leading to the proposed MDM.

Pennington (2006) concludes his paper by remarking that – in contrast to single deficit models – it remains challenging to test the multiple cognitive deficit model. The model is much more complex than single deficit models, which are attractively parsimonious, but this complexity is needed to account for the observations at the different levels of analysis. The model is universally applicable to developmental disorders, but therefore remains abstract. It is not specified which etiological factors, neural systems, and cognitive processes interact to produce a given disorder.

### **TESTING THE MDM**

behavioural disorder.

We argue here that a line of inquiry that can contribute to testing and specifying the MDM are family risk studies. In family risk studies, children are followed who are at risk of dyslexia by virtue of having an immediate dyslexic family member (usually a parent). Such studies have shown that 34–66% of them develop dyslexia (Scarborough, 1990; Elbro et al., 1998; Pennington and Lefly, 2001; Snowling et al., 2003; Torppa et al., 2010), depending on the stringency of the dyslexia criteria. The much higher prevalence of dyslexia among offspring of parents with dyslexia is consistent with

twin studies showing moderate to strong heritability of dyslexia (e.g., Olson et al., 2014c).

From the MDM it follows that children at family risk experience at least some of the etiological risk factors: they inherit genetic risk factors and might experience a less rich literacy environment. Hence, it is hypothesized that at-risk children have a higher genetic and environmental liability than children without a family history of dyslexia (labeled control children). Furthermore, the at-risk children who go on to develop dyslexia are expected to show cognitive deficits (to varying degrees) in several processes. Some of these cognitive processes are expected to be affected even before the onset of reading instruction, as a consequence of etiological risk factors and deficient neural systems.

A key prediction of the MDM for family risk studies concerns the at-risk children who do not develop dyslexia. If liability to dyslexia were discrete (as would happen if only one factor, say a gene, were involved), at-risk non-dyslexic children would not differ from controls. However, according to the MDM, liability is continuously distributed. This also follows from the fact that reading ability is influenced by many genes of small effect, producing normal distributions of phenotypes (Plomin et al., 2008, p. 33). Consequently, the MDM predicts that at-risk children without dyslexia also inherit at least some disadvantageous gene variants from their dyslexic parents, giving them a higher liability than control children, although still lower than at-risk dyslexic children. At the behavioural and cognitive level this should translate into mild deficits in literacy skills and some of its cognitive underpinnings. When plotting mean performances of the three groups, a step-wise pattern (i.e., at-risk dyslexic < at-risk non-dyslexic < controls) would support a continuum of liability, one of the characteristics of the MDM.

Comparing the three groups of children on behavioural measures sheds light on cognitive deficiencies and behavioural symptoms, the bottom two levels in **Figure 1**. These three groups have also been compared on neural processing of visual and auditory stimuli (e.g., Regtvoort et al., 2006; Leppänen et al., 2010; Plakas et al., 2013), the second level of the MDM. Some family risk studies (e.g., Snowling et al., 2007; Torppa et al., 2007; van Bergen et al., 2011) have also examined aspects of the home environment, which belong to the etiological level. However, specific genetic risk factors remain hidden in family risk studies. As genetic screening of children for their dyslexia susceptibility is still far away, we propose an indicator of their genetic risk. Since reading ability is moderately to highly heritable and children receive their genetic material from their parents, we argue that cognitive abilities of parents can partly reveal their offspring's liability. One, but maybe even both parents of at-risk children will have weaker reading skills than those of control children, reflecting selection criteria in family risk studies. However, the key issue is whether the reading skills of parents of at-risk children *with* dyslexia differ from the reading skills of parents of children *without* dyslexia. Based on the MDM it is expected that at-risk children who develop dyslexia have inherited more genetic risk variants than at-risk children without dyslexia and that this difference can be revealed by lower reading performance of parents of the at-risk dyslexic children. In Section "Parents' literacy skills" we will elaborate upon parental effects.

Finally, the MDM predicts that some of the cognitive processes related to dyslexia are specific to dyslexia (or reading ability in general) and others are shared with comorbid neurodevelopmental disorders (and their accompanying continuous phenotype). In the following two paragraphs we will pursue the specificity matter, after which we return to the predictions for a family risk study laid out above.

### **THE GENERALIST GENES HYPOTHESIS**

One of the aims of dyslexia research is to identify cognitive processes playing a role in the developmental pathways that lead to dyslexia. The MDM states that some cognitive deficits are shared among disorders. This raises the question of *which* cognitive precursors of dyslexia are distinct and which are shared with other disorders. With regard to learning abilities, like reading ability, there is a hypothesis that addresses this specificity issue: the generalist genes hypothesis (Plomin and Kovas, 2005; Kovas and Plomin, 2007).

The generalist genes hypothesis states that the same set of genes is largely responsible for individual differences in learning abilities (i.e., pleiotropy). It stems from behavioural genetic studies employing the twin design. The twin design is the major method to quantify genetic and environmental influences on a trait. If for a certain trait MZ twins are more similar than DZ twins, genetic factors must play a role. If there is no difference in resemblance heritability is negligible. Estimates for the heritability of reading ability are in the range of 0.47–0.84 (Taylor et al., 2010; Byrne et al., 2009, respectively).

As a side note, it should be borne in mind that the high heritability of reading performance does not imply at all that educational improvements are pointless. Instead, they positively impact on almost all children's reading achievement and raise the *average* of standardized scores of a class receiving effective reading intervention. Nonetheless, it is likely that individual *differences* among children remain largely genetically driven (Olson et al., 2009). This suggests that children with a genetic constraint on their reading development need increased reading instruction (as investigated by Zijlstra et al., under review).

Recently, the field of behavioural genetics has moved beyond quantifying genetic and environmental influences on a trait to studying genetic and environmental overlap between traits. For the three learning abilities reading, arithmetic, and language, empirical data have shown that the genes important for one learning ability largely overlap with the genes important for the other learning abilities. The genetic correlation is the measure that quantifies this: it indexes the extent to which genetic influences on one trait overlap with the genetic influences on another trait (independently of the heritability of the traits). The genetic correlation between learning abilities is about 0.70 (Plomin and Kovas, 2005; Kovas and Plomin, 2007). This suggests that roughly 70% of the genes associated with reading ability are generalists: they also influence other learning abilities. Hence Plomin and Kovas (2005) named their hypothesis the "generalist genes hypothesis." As genetic correlations are not 1.0, there are also specialist genes: genes that contribute to dissociations among learning abilities.

Observed differences in learning abilities among individuals are also partly due to differences between the environments in which individuals were born, were brought up and live. Behavioural genetics subdivides environmental influences into those that make family members similar (called shared environmental effects) and those that do not contribute to resemblance among family members (called non-shared environmental effects). Also for these environmental components statistics exist analogous to genetic correlation. Shared environmental correlations among learning abilities are as high as genetic correlations, so shared environmental effects are also largely general effects (Kovas and Plomin, 2007). In contrast, non-shared environmental correlations are low. This indicates that these effects primarily act as specialists, contributing to performance differences in learning abilities within a child (Kovas and Plomin, 2007).

## **THE HYBRID MODEL**

The generalist genes hypothesis and the multiple cognitive deficit model complement each other well. The MDM is more general because it holds for all common developmental disorders, while the generalist genes hypothesis specifically pertains to learning abilities and disabilities. Furthermore, the MDM includes four levels of explanation, whereas the generalist genes hypothesis only explicitly models the etiological level. Although the MDM also comprises polygenicity and pleiotropy, the generalist genes hypothesis *quantifies* for learning abilities the degree of overlapping and unique influences in each of the three etiological components (genetical, shared environmental, and non-shared environmental influences). We have visualized the generalist genes hypothesis and incorporated it into the MDM, yielding the hybrid model depicted in **Figure 2**. In this model only the first and the fourth layer are further specified because the generalist genes hypothesis only deals with these two levels. The etiological factors of the first level influence the behavioural manifestations at the fourth level by acting through the second and third level.

The hybrid model quantifies the overlap in etiological factors between learning abilities: genetic and shared environmental effects are largely shared by the three learning domains, whereas the non-shared environmental effects are largely distinct. These differential overlaps are visualized in the hybrid model as the degree of overlap between the circles. Despite this quantification of etiological overlap, the hybrid model does not specify, which etiological factors are relevant. Regarding genetic factors, molecular genetic studies will ultimately inform us which genes are implicated in dyslexia. Knowledge of specific genes contributing to dyslexia susceptibility promises to help bridge the gap from genes to neural systems, cognitive processes, and behavioural outcomes (Fisher and Francks, 2006).

Insight into which specific neural systems, cognitive skills, and behavioural symptoms are implicated in dyslexia can be gained from family risk studies. The hybrid model points to the opportunity to study reading in combination with arithmetic or language to increase insight into shared and distinct factors. We chose to focus on reading and arithmetic, both basic school skills are central during early primary school. As the model suggests, its disorders, dyslexia, and dyscalculia, indeed often co-occur (Landerl and Moll, 2010). Moreover, this pair of comorbidity is underresearched compared to the comorbidity of dyslexia with ADHD or language disorders. We aimed to study the comorbidity issue at

the cognitive level of explanation. We investigated whether known precursors of reading are specific for reading or are shared between the development of reading and arithmetic.

## **FINDINGS FROM A FAMILY-RISK STUDY**

SLI = specific language impairment.

As argued above, a study with children with and without a family history of dyslexia is valuable in relation to the MDM (or hybrid model), because specific testable hypotheses follow from the model. To reiterate, the following four hypotheses followed from Section "Testing the MDM":


The first three hypotheses pertain to the children, whereas the fourth hypothesis concerns the parents.

As an illustrative example we will present a family risk study that speaks to all four hypotheses. The family risk study is part of the Dutch Dyslexia Programme, abbreviated DDP (for an overview, see van der Leij et al., 2013). The study employs a prospective design, in which the progress of children (*N* = 212) at high and low family risk is followed. Children were considered at high family risk if (at least) one of their parents and another family member had dyslexia. After two and subsequently 3 years of reading instruction they were categorized as either dyslexic or non-dyslexic (below or above the 10th percentile cut-off on wordreading fluency, respectively). Subsequently, they were compared concurrently and retrospectively with each other and with typically developing children without such a family background. In the present paper, we will focus on the findings regarding reading and reading related (cognitive precursors and correlates of dyslexia) in parents and children (van Bergen et al., 2012, van Bergen et al., 2014a,b). We investigated the cognitive profile characteristic of the three groups of children and the impact of the cognitive profile of parents and the literacy environment parents create on children's reading outcome. An overview of the findings is given in **Table 1**.

### **CHILDREN'S (PRE)LITERACY SKILLS**

The MDM predicts that normal reading children with a family risk do slightly poorer on reading and spelling than normal reading children without such risk. In addition, they are assumed to perform more poorly on reading related skills as there is evidence that these underlying cognitive processes of reading are also complex traits, influenced by multiple genetic and environmental factors (Petrill et al., 2006; Naples et al., 2009). Whether such a step-wise pattern was observed is indicated in the last two columns of **Table 1**.

At the end of Grade 2, the at-risk children with dyslexia were severely impaired compared to control children on all measures of accuracy and fluency of (pseudo)word reading (van Bergen et al., 2012; **Table 1**). In addition, they made many errors in spelling words. Although the at-risk group without dyslexia had literacy skills within the normal range for their age they read significantly less accurately and fluently than controls on all of these reading measures. The same step pattern was found for spelling. Thus, the MDM-based hypothesis about the at-risk no-dyslexia group taking up an intermediate position between the other two was confirmed.

Importantly, we also found a stepwise pattern in the frequency of the comorbid disorder of dyscalculia (van Bergen et al., 2014ca). Of the dyslexic children, 42% of the children performed below the 10th percentile on a calculation fluency test. In the FR-nondyslexic group this was 20%, which was significantly above the 8% in the group of control children. Such a stepwise pattern is to expected as, according to the MDM, comorbidity is due to shared risk factors of both disorders and, consequently, a familiar risk for one disorder also leads to an elevated risk for the other disorder.

With regard to the reading related skills, we included the most important precursors and correlates of dyslexia: phonological awareness (i.e., the blending and segmentation of speech sounds), rapid naming of familiar items (i.e., colors and digits) and letter knowledge. Letter knowledge was assessed at the end of kindergarten (age 5 or 6), before the start of reading instruction. The at-risk dyslexic group lagged behind on letter knowledge, whereas the at-risk children without later dyslexia showed a normal level of knowledge. The absence of a stepwise pattern is not in accordance

with the MDM model. However, it could be argued that letter knowledge should be regarded as belonging to the symptom level, being a forerunner or autoregressor of reading.

Phonological awareness and rapid naming were assessed at the end of kindergarten (age 5 or 6) and at the end of Grade 2. On both occasions the findings were similar. The at-risk children without later dyslexia showed normal rapid naming, but performed below controls on phonological awareness. The at-risk dyslexic group was impaired on both skills as compared to the other two groups. Note that because the cognitive deficiencies in the dyslexic group were already in place in kindergarten, before the start of reading instruction, they are due to etiological factors rather than being the consequence of poor reading and less print exposure.

Apparently, phonological awareness is associated with both reading and risk status, while rapid naming is only related to reading status. The fact that rapid naming does not fit the MDM prediction in the DDP and in the family risk study of Moll et al. (2013) calls for an explanation. One possibility is that the at-risk children who go on to develop normal reading skills might do well despite their family risk because the efficiency of the processes that rapid naming tap might protect them against dysfluent reading. Their mild literacy problems could be due to their mild phonological awareness deficit. Another possibility is that, in contrast to the protective explanation, rapid naming is not a protective or risk factor, nor causally implicated, but an integral part of the reading system (see Section "The Intergenerational Multiple Deficit Model" for more on the reading system). On this view, Norton and Wolf (2012) conceptualize rapid naming as "a microcosm or mini-circuit of the later-developing reading circuitry" (p. 430).

We also examined the relation between more general abilities, verbal and nonverbal IQ, around the age of four, and reading outcome at the end of Grade 2 (see van Bergen et al., 2014cb). It was found that at-risk children who go on to become dyslexic were impaired relative to controls on both verbal and nonverbal IQ, with the gap being larger for verbal IQ. The at-risk children who do not become dyslexic showed good nonverbal abilities, but their verbal IQ was slightly but significantly lower than that of controls. For a discussion about the nature of the link between early IQ and subsequent reading the interested reader is referred to van Bergen et al. (2014b).

In the MDM comorbidity is explained by shared risk factors. To pursue this issue, it was examined whether children's skills before the onset of reading instruction were specifically related to reading. It appeared that nonverbal IQ was equally strongly related to later reading achievement (e.g., word-reading fluency) as to later arithmetic achievement (e.g., arithmetic fluency), while verbal IQ was specifically predictive of reading. With respect to the preliteracy skills, all were shown to be predictive of later arithmetic achievement as well. Rapid naming was equally strongly related to reading and arithmetic, but phonological awareness and letter knowledge were more specific precursors of reading (van Bergen et al. (2014a). Thus, some of the cognitive processes of importance to reading are also important for arithmetic, whereas others are distinct to reading. This is in line with the MDM (Pennington, 2006), the generalist genes hypothesis (Plomin and Kovas, 2005; Kovas and Plomin, 2007) and hence also with the hybrid model (**Figure 2**). Nonverbal IQ and rapid naming are shared and



a*Reading* = *word-reading fluency, arithmetic* = *arithmetic fluency.*

 b*Dyslexia status was assessed using word-reading fluency at the end of Grade 2 or halfway Grade 3. Regarding the parents of the control children, the best-reading parent features in the non-dyslexic parent comparison, and the other in the dyslexic-parent comparisons.*

c*Reported in van Bergen et al. (2014b).*

d*Reported in van Bergen et al. (2014a).*

e*Reported invan Bergen et al. (2012).*

*FRD* = *familial-risk dyslexia, FRND* = *familial-risk no-dyslexia, C* = *control, MDM* = *multiple*

 *deficit model.*

Non-dyslexic

Self-reported

 literacy difficultiesd

––

––

 FRD < FRND (

= C)

P

parent

therefore contribute to the correlation between arithmetic and reading. Likewise, at the lower end of the distribution, they contribute to the comorbidity between dyscalculia and dyslexia. Verbal IQ, phonological awareness, and letter knowledge were found to be skill-specific cognitive processes, contributing to the dissociation between arithmetic and reading. Rapid naming is an interesting case, as part of what it taps is shared between reading and arithmetic, but it also measures processes specific for each of the two academic domains (see **Table 1**).

### **FAMILY CHARACTERISTICS**

In addition to predictors of dyslexia residing in children we examined possible predictors in their families. More specifically, we studied effects of home literacy environment and parental literacy skills on children's reading outcome.

### *Home literacy environment*

In short, the three groups did not differ on cognitive stimulation by parents, but there was a tendency for parents of control children to own more magazines, newspapers and books. The two at-risk groups did not differ in any of the measures of home literacy environment (van Bergen et al., 2014a). Our findings are in agreement with findings from other family risk studies, which also failed to show effects of home literacy environment on children's reading outcome (Elbro et al., 1998; Snowling et al., 2007; Torppa et al., 2007; van Bergen et al., 2011). Thus, no environmental risk factors of substantial effect have been identified that would have been easy targets for intervention. Although behavioural genetic studies point to substantial heritability of reading, they also estimate that roughly 30% of individual differences is due to environmental factors (Petrill et al., 2006; Taylor et al., 2010; Olson et al., 2014c). The moderate total environmental influence and small to negligible shared-environmental influence do not leave much room to find effects of home literacy environment. Also other environmental factors warrant further investigation, such as pre- and perinatal factors and school and classroom characteristics.

### *Parents' literacy skills*

The key innovating factor of the DDP family risk study is probably the inclusion of cognitive abilities of the parents. We went beyond using parental literacy for the sole purpose of dichotomizing children into high and low family risk samples by examining the relation between reading and reading-related skills of the parents and reading skills of their children. We had objective measures of the parents with dyslexia. Although all children in the at-risk sample have a parent with dyslexia, they might still vary in their degree of family risk for dyslexia. We tested this by comparing the groups of at-risk children with and without dyslexia on the reading skills of their parent with dyslexia. Since parents pass on their genes to their offspring and shape their environment, parental reading skills might be taken as an indicator of the offspring's liability to dyslexia.

In a previous family risk study (van Bergen et al., 2011) the dyslexia of the parents of the affected children was more severe than the dyslexia of the parents of the unaffected children, yielding the stepwise pattern predicted by MDM. This is a striking finding, because the affected parents read on average at the fifth percentile compared to national norms. Yet even in this restricted range group differences were observable.

In the DDP sample the difference between the at-risk children with and without dyslexia was replicated for the affected parent's word-reading fluency (see van Bergen et al., 2012). The two atrisk groups did not differ in parental pseudoword reading. They did not differ in spelling, and non-word repetition either, though both groups were impaired compared to controls. Interestingly, however, the parents of the at-risk dyslexia children were slower on rapid naming than those of the at-risk no-dyslexia children. This underscores the special role of rapid naming, at least in transparent orthographies.

In the two above mentioned studies data were reported of the parent with dyslexia. The study of van Bergen et al. (2014a) completes this by examining the influence of the parent *without* dyslexia for the first time. As hypothesized, also for the nondyslexic parents there was a difference between the two at-risk groups: the parents of the affected children reported more literacy difficulties compared to those of the unaffected children.

The results concerning the unaffected parent further support the conclusion that children at family risk for dyslexia differ in their liability, as indicated by differences in parental reading skills between at-risk children with and without dyslexia. Moreover, differences between the two family risk groups in the severity of the dyslexia of the affected parent have now been replicated in Finnish. Torppa et al. (2011)showed differences in parental reading fluency, accuracy, and spelling.

Do the findings regarding precursors in families lend support for the MDM? According to this model, the etiology of dyslexia (and other developmental disorders) is multifactorial and probabilistic. Multiple genetic risk variants interact with each other and with multiple environmental risks to ultimately produce the disorder at the behavioural level. Some environmental factors were measured directly but did not have an effect. Genetic risk factors were not measured directly. Although there is now a huge body of evidence indicating that genes contribute importantly to individual differences in reading ability (Hayiou-Thomas et al., 2010; Olson et al., 2014c), the specific gene variants found thus far only explain a tiny part of these differences (see Bishop, 2009, for the example of the *KIAA0319* gene), despite substantive work in the field of molecular genetics (for a recent overview, see Carrion-Castillo et al., 2013). This phenomenon also applies to other common traits and is called the mystery of the missing heritability (see e.g., Manolio et al., 2009). Genetic screening is therefore not (yet) informative about a child's genetic vulnerability to dyslexia (Bishop, 2009). Instead, we propose that since parents pass on their genetic material to their offspring and shape their environment, cognitive abilities of parents could be used as an overall indicator of the genetic and environmental risk and protective factors in the MDM.

The DDP study provides two kinds of support for parental skills being an indicator for children's liability. First, as in other family risk studies, two samples of children were recruited based on having or not having a parent with dyslexia. The current and previous family risk studies (Scarborough, 1990; Elbro et al., 1998; Pennington and Lefly, 2001; Snowling et al., 2003; Torppa et al., 2010) found a large effect of having a family history on children's risk of becoming dyslexic. For example, in the DDP study it was found that the rate of dyslexia was 30% in the high-risk group and only 3% in the low-risk groups (van Bergen et al., 2012). Thus, having a parent with dyslexia increases the risk considerably. Secondly, within the at-risk sample it was found that affected *and* unaffected parents of the affected children had more literacy problems than those of the unaffected children. Moreover, when considering at-risk children's reading fluency on a continuous scale (rather than having or not having dyslexia), parental reading skills were significant predictors of children's reading skills.

Our findings thus support the view that skills of parents indicate their offspring's liability, which in itself is the combination of all genetic and environmental factors that affect reading development. Therefore, parental skills might shed light on the etiological level in the MDM. But based on parental skills it is not possible to disentangle the genetic and environmental contribution to the intergenerational transmission of skills. However, according to our data the transmission of risk seems to be mainly via genes, including gene-environment correlation (see also Figure 1 in Lyytinen et al., 1998). It is important to note that genes are inherited, not phenotypic traits. Thus, although the DDP data reinforce the view that parental skills are indicative of their offspring's liability, parental skills will never completely specify it.

## **THE INTERGENERATIONAL MULTIPLE DEFICIT MODEL**

In our opinion there are two omissions in the MDM (Pennington, 2006) when applied to dyslexia. The first one – only touched upon here – relates to the reading system and the second to intergenerational transfer, which will be discussed in the remainder of this article.

First, when modeling reading ability and disability, the boxes at the level of cognitive processes in theMDM are typically thought of as precursors or correlates of reading, such as phonological awareness and rapid naming (e.g., Willcutt et al., 2010), or the cognitive components that each of these tasks tap. How these tasks and their components relate to reading outcome is extensively studied. In parallel, there is an extensive body of research into computational models of the reading system, in which visual word recognition is simulated (e.g., Coltheart et al., 2001; Peterson et al., 2013; Ziegler et al., 2014). Computational models are evaluated by how well they predict experimentally observed characteristics of the reading system, like lexicality and word length effects. The reading process is an important link between underlying cognitive process (such as phonological awareness and rapid naming) and the outcome of the reading process, reading accuracy and speed. Hence, in a MDM of reading (dis)ability, the cognitive level could be split up into underlying cognitive processes and the reading system (see Jackson and Coltheart, 2001). Research into underlying cognitive processes and into the reading system have developed separately, but van den Boer et al. (2013) recently made an important first step in linking these fields.

Secondly and applicable to all complex developmental disorders, intergenerational transmission of risk and protective factors is not explicitly present in the MDM, as it focusses on an individual child. Therefore, we propose an extension of the MDM: the intergenerational MDM (iMDM). Below we will elaborate on this

model that is depicted in **Figure 3**. In the figure it can be seen that a top layer is added to Pennington's MDM, which represents characteristics of parents. The environment as created by parents is included in the top layer; other environmental factors are placed on the side. Note that again influences between child layers are omitted from the figure.

Cognitive abilities of parents, for instance reading ability, form part of their phenotype (PT in **Figure 3**). Their phenotype is the result of their genotype (GT) in interaction with their environment. Genes do not code for cognitive and behavioural traits but for the structure of proteins and the regulation of gene expression, which in highly complex ways and in interaction with the environment guides the building and maintenance of the brain (Fisher, 2006; Fisher and Francks, 2006). Despite this gap between genes and cognition, for traits that show genetic influences in behavioural genetic studies there must be a relationship between genotypic and phenotypic variation. In other words, for heritable traits parental phenotype is a proxy for their genotype. As both parents pass on half of their genes to their offspring, the genotype of both of them determines the genotype of their offspring. It follows that the phenotype of parents must be related to some extent to the genotype of children, which includes children's genetic risk and protective factors for a particular developmental disorder.

In addition to transmission of parental skills via genetic pathways, parental skills could be passed on via environmental pathways. Parents largely shape their children's childhood environment, which creates a relation between parents' characteristics and children's environment. This environment could exert a direct environmental effect (i.e., genetically unconfounded), referred to as *cultural transmission*. Hence, the cognitive phenotype of parents could be one of the factors that determines children's environmental risk and protective factors. The environment created by parents could also be correlated with the genotype of both parents and offspring, creating what is called (from the offspring's perspective) a *passive gene-environment correlation*. For example, good reading parents are more likely to spend a lot of time reading, thereby providing a role model to their children. Moreover, they appear to be better educated, and as a result, might live in better neighborhoods and might send their children to higher achieving schools. The family environment a child is exposed to can also be correlated with both generation's genotypes by parental behavior elicited by the child. Sticking with our reading-ability example, children genetically inclined to become fluent readers may be more likely to ask to be read to early on and ask for books and library visits later on. This phenomenon is termed *evocative gene-environment correlation.* Other aspects of the phenotype of parents might also be associated with or directly influence children's reading development. For instance, the behavior of parents and the interaction between them determines how structured or chaotic the household is, a factor that has been shown to be related to children's school performance (Hanscombe et al., 2011). Apart from genetic and cultural transmission, a third contributor to parent-child resemblance is shared *environmental confound* (D'Onofrio et al., 2003). In the case of reading, poverty could limit access to printed and digital reading material, which could affect reading ability in both generations. In conclusion, the phenotype of parents must also

**FIGURE 3 |The intergenerational multiple deficit model.** Double headed arrows indicate interactions. Causal connections between levels of analyses are omitted. GTm = maternal genotype, PTm = maternal phenotype, GTp = paternal genotype, PTp = paternal phenotype, G = genetic risk or protective factor, N = neural system, C = cognitive process, D = complex behavioural disorder, env. = environmental, rGE = gene-environment correlation. Terminology: a *phenotype* is any measurable characteristic of an individual (e.g., reading ability or parenting style); a *genotype* is an individual's genetic makeup. There is *shared environmental confound* if an environmental factor influences both the parental and child phenotype. *Genetic transmission* refers to the genotypic factors passed down from parent to offspring that

influence the phenotypes in both generations. *Cultural transmission* is the genuine environmental influence of parental characteristics on child outcome, so controlled for environmental and genetic confounds. *Assortative mating* is non-random mating. *Gene-environment correlation* (rGE) refers to the situation in which exposure to environments is not independent but correlated to the child's genotype (see the text for explanation about the three forms of rGE). The figure depicts the situation for one individual child and his/her (biological) parents. At the group level (i.e., multiple children), a second form of gene-environment interplay emerges: *gene-environment interaction*. That is, heredity depends on the environment, or sensitivity to the environment depend on genotype.

be related to a certain degree to children's environmental risk and protective factors. In addition, the mechanisms discussed highlight that measures of the environment that relate to child outcome may be attributable to familial confounding, rather than causation.

Given the above two lines of reasoning, the phenotype of parents is informative about children's genetic and environmental factors. Focussing on developmental disorders, this suggests that certain aspects of the phenotype of parents can inform us about a child's liability to a particular developmental disorder. Regarding dyslexia, the phenotypic aspects of parents that are expected to shed light on children's liability to dyslexia are skills in accurate and fluent reading, spelling, and their cognitive underpinnings like phonological awareness and rapid naming. Related skills (such as language and arithmetic) and their underlying cognitive abilities might also play a role. The ability of parents on each of the relevant continua can be conceptualized as a position in multivariate space. The position of father and mother in multivariate space is proposed to be indicative of a child's predisposition towards dyslexia.

Apart from environmental exposure closely linked to parental characteristics, children experience other environmental factors that influence their development. In **Figure 3** these extra-parental influences are put in the box on the side. They can influence all four child levels. In the case of reading, one can think of quality of the school and teacher, reading-instruction method, access to print and digital media, and factors related to child development in general, like (other) caretakers, peer influences, accidents, nutrition, and toxic threats.

Infants' environment is almost exclusively shaped by their parents, but as children grow older their environment becomes increasingly shaped independent by their parents. First, by gaining independence, running from acquiring locomotion to living independently. Second and related, by spending more and more time away from parents. This illustrates that children more and more actively select and create their environment. If this environmental exposure is correlated with the child's genotype, this is called *active gene-environment correlation*. For instance, children with a high genetic potential for good reading may actively seek out for opportunities to read. Children's genetically influenced abilities may also elicit environmental responses from others than their parents. For example, good readers may be given more difficult reading material by teachers, a form of *evocative gene-environment correlation*.

A form of gene-environment interplay not discussed so far is *gene-environment interaction*. This refers to a moderator phenomenon in which sensitivity to an environment depends on one's genotype (e.g., resilience to poor education), or the corollary, heritability depends on environmental exposure. An example of the latter in dyslexia research is a study by Friend et al. (2008) who found higher heritability of dyslexia among children from high compared to low socio-economic status. Gene-environment interaction is a group-level phenomenon and is therefore not depicted in the iMDM (**Figure 3**), which displays processes within the triad of an individual child and his/her (biological) parents, as well as the child-specific environment. As an illustration, the findings of Friend et al. would translate in an iMDM with strong genetic transmission for dyslexia predisposition in a child from a high socio-economic status family.

The iMDM is inspired by the described DDP study on dyslexia, but is generally applicable to other multifactorial developmental disorders with a genetic component. Examples of such disorders include ADHD, developmental coordination disorder, dyscalculia, SLI, and autism spectrum disorder. With respect to autism spectrum disorder, a number of studies (e.g., Happé et al., 2001; Bölte and Poustka, 2006; Losh et al., 2009) have studied the cognitive phenotype of parents of probands (as opposed to children of probands, as in family risk studies of dyslexia) and found in parents similar but milder impairments as in their children, indicating parent-child resemblance. A second example concerns SLI. Bishop's group found that language skills of probands and their parents were correlated (Bishop et al., 2012) and that a parent's non-word repetition ability was a predictor of whether the child would develop SLI (Bishop et al., 2012). These examples provide evidence of intergenerational transfer of cognitive skills other than reading.

The advantage of generally applicable MDMs comes however at a cost. First, the model is still empty and has to be specified for each particular (set of) developmental disorder(s). Candidate ingredients for the case of dyslexia are given throughout the current paper for the cognitive level. For the genetic level, the reader is referred to Carrion-Castillo et al. (2013), for the neural-system level to Richlan (2012), and for bridging these levels to Giraud and Ramus (2013). Second, the model (as depicted in **Figure 3**) is difficult to prove wrong. Still, the iMDM can inform the building of structural equation models for family data. Competing models can be tested to see which model best fits the observed data. Importantly, for (a) specific developmental disorder(s) the iMDM can therefore be falsified.

### **FUTURE AVENUES OF RESEARCH**

Despite that Pennington's MDM as such is difficult to falsify, it has initiated a large body of research and our hope is that the intergenerational extension will further fuel this movement. Pennington's model has stimulated research in which more than one level of analysis is incorporated (vertical expansion) and has especially boosted horizontal expansion of studies, investigating more than one disorder simultaneously to understand comorbidity. By doing so, one can uncover shared and distinct risk factors at each of the levels of explanation. This not only helps to understand the origin of the comorbidity, but also the developmental paths leading to each of the disorders. For instance, a developmental disorder could develop secondary as a result of a primary disorder, or the two co-occur because of shared etiological factors (as evidenced by genetic correlations and environmental correlations). In examining the specificity of precursors for dyslexia we included arithmetic and dyscalculia in the DDP, but comorbidities with other developmental disorders were not investigated. Including more than just a single (dis)ability in future work will enhance our understanding.

From a practical point of view, the iMDM sheds light on an additional way to estimate disorder risk: not only an individual child's precursors to a certain disorder carry predictive power to identify young children at risk, also the cognitive and behavioural profile of their parents indicate risk. Studying intergenerational transfer and comorbidity can also be combined: parents might confer risks in different cognitive domains, like reading, language, and attention and additionally, parenting practices might not be optimal. Therefore, future studies are needed to test whether a more complete picture of parents' cognitive and behavioural profile yields a more reliable indication of their offspring's liability to a particular disorder. Reliable assessment of liability is of clinical importance: young children identified as at high risk can be enrolled in an intervention programe to ameliorate the risk. In the case of dyslexia, Zijlstra et al. (under review) showed in high risk children that trying to prevent reading difficulties works better than remediating once children lag behind substantially.

Apart from inspiring clinical work, the iMDM draws attention to an interesting area for future fundamental research: investigating transmission from one generation to the next. Research incorporating phenotypic characteristics of parents alongside one or more analysis levels in offspring is still sparse. We have discussed some family studies that revealed traits that show intergenerational correlations. Future family studies can further explore familial transmission patterns to identify relevant parental phenotypes and quantify phenotypic intergenerational associations. If both parents are assessed, one can test firstly whether this intergenerational association is moderated by the gender of parent and/or child. For example, van Bergen et al. (2014c) recently reported that paternal reading ability (as indicator for offspring's genetic liability) was a better predictor of offspring's reading ability for higher-educated fathers. Interestingly, this interaction was absent for mothers. The observed interaction for fathers is line with the gene-environment interaction between socio-economic status and heritability of dyslexia found by Friend et al. (2008). The differential pattern for fathers and mothers demonstrates that parental influences can be parent specific.

Secondly, if data on both parents are collected one can test whether there is *assortative mating* for the trait under study (see the correlation in **Figure 3** between maternal and paternal phenotypes). For level of education for example, the intuitive idea that people tend to choose a partner with similar academic attainment has been confirmed (e.g., Mare, 1991). Regardless of the iMDM it is important to establish the degree of assortative mating because it biases heritability estimates if not accounted for (Plomin et al., 2008, p. 160).

Genetically informed family studies can ultimately disentangle the contributions of causal genetic and environmental effects and gene-environment correlations to such an intergenerational correlation. Two examples of genetically sensitive family studies that rigorously investigate the mechanisms responsible for parent-offspring resemblance are studies which include MZ and DZ twin children plus their parents (nuclear twinfamily design), or studies with adult MZ and DZ twins plus their offspring (children-of-twins design)1. Combining two such samples even allows for estimating cultural transmission and passive and evocative gene-environment correlation (Narusyte

et al., 2008). That is, the direct environmental effect of parenting (or another parental trait) on children's outcome can be estimated while controlling for familial confounds. Regarding gene-environment correlation, the direction of effect (see **Figure 3**) can be revealed. We are unaware of such studies in the field of learning (dis)abilities, but see for an example on parental depression and offspring psychopathology Silberg et al. (2010).

The next step in a genetically sensitive family design would be to test whether parents differ in the relative quantity of cultural and genetic transmission. To start with cultural transmission, it may be expected that the parent who has the largest share in the child's upbringing exerts larger environmental influence. If parental involvement information is available, the structural equation models estimating cultural and genetic transfer could be rerun with parent couples subdivided based on involvement (rather than gender), or parental involvement could be included as a moderator. The amount of cultural transmission could also depend on the quality of the parent-child relationship and the gender of parent and child.

For genetic transmission, quantitative differences in transmission of paternal and maternal risk could arise from two mechanisms. First, if susceptibility genes show parent-of-origin effects [e.g., genomic imprinting, in which genes from mother and father have differential expression levels (Lawson et al., 2013)]. And second, if susceptibility genes would be carried on sex chromosomes (X and Y). Genetically informed family studies can estimate the total genetic risk that is passed down per parent. Hence, differences in transmitted genetic risk can be tested. Molecular genetic studies are needed to investigate the biological basis of possible maternal and paternal differences. Concerning dyslexia, the well-replicated candidate genes all lie on autosomal chromosomes, although there is some evidence for a locus on the X chromosome being implicated (Carrion-Castillo et al., 2013). Parent-of-origin effects have not yet been studied in relation to dyslexia (but see for a recent example on SLI Nudel et al., 2014).

To conclude, the iMDM encourages the inclusion of parent characteristics in future studies,which will enhance our prediction of risk and understanding of common neurodevelopmental disorders. The next exciting step is to conduct genetically informative family studies, in which genetic and environmental causal effects can be separated from familial confounding. This will bring us closer to elucidating causal chains underpinning disorder etiology.

## **ACKNOWLEDGMENTS**

Elsje van Bergen is supported by a Rubicon Fellowship (446- 12-005) of the Netherlands Organisation for Scientific Research (NWO) and a Junior Research Fellowship at Oriel College, University of Oxford. The Dutch Dyslexia Programme was supported by NWO grant 200-62-304. Elsje van Bergen thanks Dorothy Bishop for feedback and encouragement.

### **REFERENCES**

Bishop, D. V. M. (2009). Genes, cognition, and communication. *Ann. N. Y. Acad. Sci.* 1156, 1–18. doi: 10.1111/j.1749-6632.2009.04419.x

Bishop, D. V. M., Hardiman, M. J., and Barry, J. G. (2012). Auditory deficit as a consequence rather than endophenotype of specific language impairment: electrophysiological evidence. *PLoS ONE* 7:e35851. doi: 10.1371/journal.pone.0035851

<sup>1</sup>The logic and modeling behind these approaches are beyond the scope of this article, but the interested reader is referred to D'Onofrio et al. (2003) for the children-of-twins design and D'Onofrio et al. (2013) for an overview of genetically informed family studies.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 January 2014; accepted: 07 May 2014; published online: 02 June 2014. Citation: van Bergen E, van der Leij A and de Jong PF (2014) The intergenerational multiple deficit model and the case of dyslexia. Front. Hum. Neurosci. 8:346. doi: 10.3389/fnhum.2014.00346*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 van Bergen, van der Leij and de Jong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*