# **PSYCHOLOGICAL PERSPECTIVES ON EXPERTISE**

**Topic Editors Guillermo Campitelli, Michael H. Connors, Merim Bilalic´ and David Zachary Hambrick**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2015 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-520-6 **DOI** 10.3389/978-2-88919-520-6

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **PSYCHOLOGICAL PERSPECTIVES ON EXPERTISE**

Topic Editors:

**Guillermo Campitelli,** Edith Cowan University, Australia **Michael H. Connors,** Macquarie University, Australia **Merim Bilalic´,** Alpen Adria University Klagenfurt, Austria **David Zachary Hambrick,** Michigan State University, USA

Experts are persons who are very knowledgeable about or skilful in a particular area. The aim of this Research Topic is to advance knowledge in the understanding of the phenomenon of expertise by putting together different lines of research that directly or indirectly study expertise.

Herbert Simon's expertise studies initiated two lines of research. One is interested in elucidating the cognitive processes underlying expertise, and the other investigates how expertise develops. These lines of research started with studies comparing experts and novices in chess, and then they extended to numerous areas of expertise such as music, medical diagnosis, sports, arts and sciences.

In the field of judgment and decision making researchers investigate the quality of judgments and decisions of experts in different professions (e.g., clinical psychologists, medical practitioners, judges, meteorologists, stock brokers).

Those lines of research explicitly investigate the topic of expertise, but there are other research areas that make a substantial contribution to understanding expertise. Scholars in language acquisition and in face perception, for example, investigate cognitive processes and development of expertise in areas in which almost everyone becomes an expert. Furthermore, skill acquisition research informs in detail about short term cognitive changes that may be important to understand how expertise develops.

We are interested in original research that advances knowledge in the understanding of decision making, cognitive processes and development of expertise in sports, intellectual games, arts, scientific disciplines and professions, as well as expertise in cognitive abilities such as perception, memory, attention, language and imagery.

We are also interested in theoretical articles in any of these areas, articles that describe computational or mathematical models of expertise, and articles offering a framework that would guide expertise research. Articles that offer integrative approaches of some of the areas described above are strongly encouraged.

The goal of this Research Topic is to produce a hallmark piece of work in the field of expertise, which complements and does not overlap with the "Neural implementations of expertise" Research Topic in Frontiers in Human Neuroscience.

# Table of Contents

## *07 Psychological Perspectives on Expertise* Guillermo Campitelli, Michael H. Connors, Merim Bilalić and David Z. Hambrick


Clare McCormack, Mark W. Wiggins, Thomas Loveday and Marino Festa


Jessica J. Ellis and Eyal M. Reingold

*42 Interference between Face and Non-Face Domains of Perceptual Expertise: A Replication and Extension*

Kim M. Curby and Isabel Gauthier


Bettina E. Bläsing, Iris Güldenpenning, Dirk Koester and Thomas Schack

*86 Timing Skills and Expertise: Discrete and Continuous Timed Movements Among Musicians and Athletes*

Thenille Braun Janzen, William Forde Thompson, Paolo Ammirante and Ronald Ranvaud

*97 Trait-Based Cue Utilization and Initial Skill Acquisition: Implications for Models of the Progression to Expertise*

Mark W. Wiggins, Sue Brouwers, Joel Davies and Thomas Loveday

*105 Understanding Expertise and Non-Analytic Cognition in Fingerprint Discriminations Made by Humans*

Matthew B. Thompson, Jason M. Tangen and Rachel A. Searston




## Psychological perspectives on expertise

Guillermo Campitelli <sup>1</sup> \*, Michael H. Connors 2, 3, Merim Bilalic´ <sup>4</sup> and David Z. Hambrick <sup>5</sup>

*<sup>1</sup> School of Psychology and Social Science, Edith Cowan University, Joondalup, WA, Australia, <sup>2</sup> Department of Cognitive Science, ARC Centre of Excellence in Cognition and its Disorders, Macquarie University, Sydney, NSW, Australia, <sup>3</sup> Dementia Collaborative Research Centre, School of Psychiatry, University of New South Wales, Sydney, NSW, Australia, <sup>4</sup> Department of General Psychology and Cognitive Science, Institute of Psychology, Alpen Adria University Klagenfurt, Klagenfurt, Austria, <sup>5</sup> Department of Psychology, Michigan State University, East Lansing, MI, USA*

Keywords: expertise, expert performance, expert cognitive processes, skill acquisition, skill transfer, training, deliberate practice

## Introduction

This Research Topic sought to advance psychological understanding of expertise by drawing together lines of research from many different domains of expertise. The outcome is a collection of 35 articles in such diverse areas as chess, music, perception, teaching, intensive-care diagnosis, video-games, sports, dance, mathematics, climbing, and fingerprint analysis.

The articles can be classed into five broad categories based on their focus: (a) the cognitive processes in expertise, (b) the development of expertise, (c) the relationship between expertise and general cognition, (d) the transfer of skills between domains, and (e) methodological issues and frameworks in expertise research. We give a brief overview of the research across these five themes.

## Cognitive Processes in Expertise

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

\*Correspondence: *Guillermo Campitelli,*

*gjcampitelli@gmail.com*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *20 February 2015* Accepted: *21 February 2015* Published: *10 March 2015*

#### Citation:

*Campitelli G, Connors MH, Bilalic M´ and Hambrick DZ (2015) Psychological perspectives on expertise. Front. Psychol. 6:258. doi: 10.3389/fpsyg.2015.00258* Articles in this research topic used a number of different methodologies to investigate cognitive processes in expertise. Four articles examined experts' eye movements as a way of studying what experts focus on when performing domain-relevant tasks. Sheridan and Reingold (2014), for example, found that expert chess players rapidly differentiate regions of the board that are relevant to the best move from irrelevant ones. Similarly, McCormack et al. (2014) found that expert intensive-care physicians directed their attention to more relevant areas of the situation, compared to competent non-experts. In addition, Godau et al. (2014) found that experts in arithmetic problem solving spontaneously used arithmetic shortcuts. Finally, Ellis and Reingold (2014) examined the Einstellung effect (i.e., where the first idea that comes to mind blocks finding the best solution to a problem) using this methodology and noted its relevance to understanding expert flexibility (see Bilalic and McLeod, 2014 ´ ).

Two articles focused on perceptual expertise. Curby and Gauthier (2014) found that acquiring expertise with a category of stimuli (i.e., car expertise) increases the interference between the visual processing of other familiar stimuli (e.g., faces) and that of the learned category (cars). In a study with novel objects, Cheung and Gauthier (2014) found that acquiring perceptual expertise involves integrating perceptual and conceptual representations of stimuli.

Four studies investigated expertise involving physical movements. In a study on dancing, Bläsing (2015) found that making a sequence of movements influenced the subsequent perception of that sequence, but not to the same degree if one was a dancing expert. In a study of climbing experts, Bläsing et al. (2014) found that expertise was associated with better perception of climbing holds and action-relevant objects. In a study of athletes and musicians, Braun Janzen et al. (2014) showed that training affects performance involving timing and rhythmic movements: athletes were more precise making continuous movements, whereas musicians were more precise for discrete movements. In a study of skill acquisition in a flight simulator, Wiggins et al. (2014) found that a general capacity for acquiring and using cues was related to performance in landing an aircraft in the simulator.

Finally, four articles focused on experts' pattern recognition (the ability to identify meaningful relationships in complex stimuli). In a review of research on fingerprint experts, Thompson et al. (2014) concluded that such expertise relies on rapid pattern recognition and discrimination rather than in analytic thinking. In an observational study, Kretz and Krawczyk (2014) found that academic economists use many analogies in research meetings. Trench (2014), however, suggests that these results may be due to the naturalistic setting of the study, rather than expertise per se. Bialek and Sawicki (2014) showed that participants asked to take an expert perspective become more risk aversive and patient in decision making tasks. Finally, Leone et al. (2014) examined the relationship between expertise and representations of space using a large dataset of chess games from an internet server. They found that novices, relative to experts, use strategies to reduce their cognitive load (see Connors and Campitelli, 2014, for a commentary).

## Development of Expertise

Six studies examined how expertise is developed. Gaschler et al. (2014) examined the learning curves in skill acquisition by analyzing the tournament performance of 1383 chess players over 10 years. They found that exponential learning curves better fitted players' improvements over time than power function learning curves. Gobet and Ereku (2014) discussed the case of Magnus Carlsen, current world chess champion, and argue that his level of performance cannot be accounted for by the deliberate practice account, which suggests that amount of deliberate practice is the critical determinant of expertise.

Citing limitations in an earlier meta-analysis by Hambrick et al. (2014a), Platz et al. (2014) conducted a meta-analysis on the influence of deliberate practice in musical achievement. They found a moderate average effect size (r<sup>c</sup> = 0.61), which they interpret as showing the importance of deliberate practice. In response to Platz et al.'s criticisms, Hambrick et al. (2014b) noted a number of conceptual problems in Platz et al.'s arguments and observe how Platz et al.'s findings can also be interpreted to show that practice, while undoubtedly an important factor in expertise, is not the sole determinant.

Healy et al. (2014) proposed a number of training principles for developing expertise. These include the acquisition of expertise (e.g., scheduling of feedback), retention of expertise (e.g., item chunking, depth of processing) and transfer (e.g., variability of practice, seeding the knowledge base). Finally, Speelman (2014) argued that treating numeracy as a form of expertise and using computer programs in teaching would address some shortcomings in current teaching and, in particular, foster a greater focus on practice and feedback in learning.

## Expertise and General Cognition

Three articles examined the relationship between expertise and general cognition. First, Gobet et al. (2014b) discussed how artificial intelligence and engineering could be used to design a brain. Based on expertise research, they propose that a better brain would have less concepts and more low-level perceptual processing. Second, Guida and Lavielle-Guida (2014) combined findings from memory research with the normal population with theories of expert memory. They argue that a less sophisticated version of the spatial method of loci used by memory experts is also used by ordinary people to encode items in working memory. Third, Christophel et al. (2014) observed that amount of teaching experience is a very poor predictor of a teacher's actual effectiveness, including, for example, the teacher's ability to offer constructive feedback to students.

## Transfer of Skills

Three articles examined the transfer of skills across domains. First, Gobet et al. (2014a) investigated the possibility of training transfer from videogame playing to selective attention and working memory capacity. Consistent with over a century of research, there was no evidence for transfer, even in videogame experts. In contrast, Lampit et al. (2014) reported evidence for transfer of computerized cognitive training to a bookkeeping task. Finally, Bart (2014) reviewed research published after Gobet and Campitelli's (2006) critical review on the effects of chess education, showing statistically significant effects of chess education on academic achievement.

## Frameworks, Recommendations, and Methodology

Seven articles discussed expertise in general. First, Vaci et al. (2014) consider alternative approaches to studying expertise, and in particular, how studying only individuals from highly restricted ranges of skill may yield different findings than studying individuals who represent wider ranges of skill. Second, Kaufman (2014) identifies points of disagreement and agreement in different views of expertise and suggests some directions for future research. Third, Bourne et al. (2014), categorize expertise as elite, peak, or exceptionally high levels of performance on a particular task or within a given domain.

Fourth, Shen et al. (2014) use birdwatching as an illustrative example to discuss such issues as selecting an appropriate domain of perceptual expertise for study, recruiting experts, assessing their level of expertise, and experimentally testing the experts' performance. Fifth, MacIntyre et al. (2014) propose that athletes are not just experts in movement execution but also in planning, metacognition, and reflection. Similarly, Toner and Moran (2014), extending Sutton et al.'s (2011) framework, argue that expert athletes do not completely automatize their skills and that an important component of their expertise is to be able to rapidly reflect on their movements. Finally, de Oliveira et al. (2014) build upon Gigerenzer's (e.g., Gigerenzer and Goldstein, 1999) heuristicbased approach to decision making. They propose that expert athletes develop a toolbox of heuristics to guide their decision making.

## References


## Conclusion

The diversity of articles in this research topic illustrates the many different approaches to studying expertise. It also indicates the keen interest in the topic. We believe that many articles in this research topic are of lasting importance and can help to guide future research in the field of expertise.

and expertise on cognitive performance. Front. Psychol. 5:1337. doi: 10.3389/fpsyg.2014.01337


of the progression to expertise. Front. Psychol. 5:541. doi: 10.3389/fpsyg. 2014.00541

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Campitelli, Connors, Bilali´c and Hambrick. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Expert vs. novice differences in the detection of relevant information during a chess game: evidence from eye movements

#### *Heather Sheridan1 \* and Eyal M. Reingold2*

*<sup>1</sup> School of Psychology, University of Southampton, UK*

*2 Department of Psychology, University of Toronto at Mississauga, Mississauga, ON, Canada*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Ramesh Kumar Mishra, University of Hyderabad, India Jochen Musch, University of Duesseldorf, Germany Peter McLeod, Oxford University, UK*

#### *\*Correspondence:*

*Heather Sheridan, School of Psychology, University of Southampton, Building 44, Highfield Campus, Southampton, SO17 1BJ, UK*

*e-mail: h.sheridan@soton.ac.uk*

The present study explored the ability of expert and novice chess players to rapidly distinguish between regions of a chessboard that were relevant to the best move on the board, and regions of the board that were irrelevant. Accordingly, we monitored the eye movements of expert and novice chess players, while they selected white's best move for a variety of chess problems. To manipulate relevancy, we constructed two different versions of each chess problem in the experiment, and we counterbalanced these versions across participants. These two versions of each problem were identical except that a single piece was changed from a bishop to a knight. This subtle change reversed the relevancy map of the board, such that regions that were relevant in one version of the board were now irrelevant (and vice versa). Using this paradigm, we demonstrated that both the experts and novices spent more time fixating the relevant relative to the irrelevant regions of the board. However, the experts were faster at detecting relevant information than the novices, as shown by the finding that experts (but not novices) were able to distinguish between relevant and irrelevant information during the early part of the trial. These findings further demonstrate the domain-related perceptual processing advantage of chess experts, using an experimental paradigm that allowed us to manipulate relevancy under tightly controlled conditions.

#### **Keywords: visual expertise, expert performance, chess, eye movements, attention, relevancy**

The remarkable perceptual skill of experts is exemplified by expert radiologists who can detect abnormalities in chest X-rays that were briefly presented for only 200 ms (Kundel and Nodine, 1975), and by chess experts who can memorize chessboards that were presented for only a few seconds (De Groot, 1946, 1965; Chase and Simon, 1973a,b). Furthermore, while examining visual displays that require multiple eye fixations for encoding, experts are adept at rapidly focusing their attention on relevant areas, such that radiologists can rapidly fixate on abnormalities (Kundel et al., 2008), and chess experts can rapidly fixate on the best move on a chessboard (Charness et al., 2001). Given that this fast extraction of relevant information is a key component of skilled performance in many different domains of expertise (for a review see Reingold and Sheridan, 2011), the goal of the present experiment is to further explore expert/novice differences in the detection of relevant information during a chess game. Accordingly, we will begin by briefly reviewing prior work on perceptual skill in the domain of chess, and we will then describe the present study's paradigm and rationale.

Of relevance to the present study, there is a long history of studying the perceptual component of expertise in the domain of chess (for reviews, see Charness, 1992; De Groot and Gobet, 1996; Reingold and Charness, 2005; Gobet and Charness, 2006; Reingold and Sheridan, 2011). The chess domain provides numerous methodological advantages, such as well-segmented visual stimuli for eye movement interest areas, and an official rating system for objectively quantifying levels of expertise (Elo, 1965, 1986). Capitalizing on these methodological advantages, chess expertise has been linked to numerous perceptual processing advantages, including superior memory performance for briefly presented chessboards (De Groot, 1946, 1965; Chase and Simon, 1973a,b), the ability to process chess configurations automatically and in parallel (Reingold et al., 2001b), and a larger *visual span* such that experts process larger configurations of pieces than novices (Reingold et al., 2001a). Consistent with these behavioral and eye movement findings, neuroimaging work has shown expert/novice differences in brain activation in regions associated with object and pattern recognition (Bilalic et al., ´ 2010a, 2011a,b, 2012). Taken together, the above findings collectively support the view that perceptual skill is a key aspect of expertise in chess (De Groot, 1946, 1965; Chase and Simon, 1973a,b) as well as in other domains of visual expertise (for a review see Reingold and Sheridan, 2011).

To provide a theoretical account of the perceptual skill shown by chess experts, Chase and Simon (1973a,b) proposed that through thousands of hours of practice chess experts acquire memories for a large number of "chunks," which consist of groups of chess pieces, and these chunks are supplemented by larger memory structures called templates (Gobet and Simon, 1996, 2000). Such memory structures facilitate performance by enabling chess players to rapidly retrieve useful information, such as advantageous strategies and moves. Thus, Chase and Simon (1973a,b) argued that chess experts use their memory for chess-configurations to constrain their search for a move to the most promising candidates, rather than performing a slow and exhaustive search of all of the possible moves on the board. This theoretical perspective echoes earlier arguments by De Groot (1946, 1965) that chess expertise stems from advantages in memory and perception, rather than from a greater breadth and depth of search during problem solving.

Consistent with this hypothesis that chess experts rely on their memory for chess configurations to efficiently guide their search for the best move, the eye movements of chess experts reveal that they can rapidly fixate on information that is relevant to the best move on the board (Tikhomirov and Poznyanskaya, 1966; Simon and Barenfeld, 1969; Charness et al., 2001; Reingold and Charness, 2005). For example, to examine the impact of relevancy on eye movements, Charness et al. (2001) monitored the eye movements of expert players (mean Elo rating = 2238) and intermediate players (mean Elo = 1786) while they selected white's best move in a series of chess problems (henceforth, the *choosea-move task*). Compared to intermediate players, the experts produced a greater proportion of fixations on pieces that were relevant to the best move, and this advantage of the experts emerged as early as the first five fixations in the trial (for a discussion of similar findings, Tikhomirov and Poznyanskaya, 1966; Simon and Barenfeld, 1969). As a follow-up to Charness et al. (2001), Reingold and Charness (2005) analyzed the first 10 s of choosea-move trials to demonstrate that experts rapidly completed a perceptual encoding phase (characterized by shorter fixations) and then shifted to a subsequent problem-solving stage (characterized by longer fixations). In marked contrast, the intermediates continued to display shorter fixations throughout the 10-s period, which indicates that they needed more time to complete the perceptual encoding phase. Taken together, these studies indicate that chess experts are more efficient at encoding chess configurations during a choose-a-move task.

Beyond the choose-a-move task, there is further evidence that experts are better than novices at rapidly encoding relevant chess configurations. For example, in a memorization task, De Groot and Gobet (1996) demonstrated that the number and total duration of fixations on chess pieces was at least partially correlated with the relevance of these pieces to the position, and the magnitude of this correlation increased as a function of skill. Moreover, using a chess-related visual search task that required participants to search for relevant pieces on a chessboard, Bilalic et al. (2010a) ´ revealed that chess experts (but not novices) were able to rapidly and exclusively fixate on task-relevant rather than irrelevant features. Finally, Bilalic et al. (2012) ´ examined relevancy effects in a threat detection task, in which experts and novices had to examine chessboards to determine the number of black pieces that were attacking white pieces. The experts displayed a higher percentage of fixations on relevant objects (i.e., the pieces that formed a threat relationship) relative to novices, and this difference between experts and novices emerged as early as the first three seconds in the trial. Based on these results, Bilalic et al. ´ (2012) concluded that the "experts' advantage lies in the ability to immediately focus on relevant objects and relations between them in the environment and ignore the irrelevant ones."

Building on this prior work, the present study introduces a new paradigm for studying relevancy effects in chess. Similar to prior work (Charness et al., 2001; Reingold and Charness, 2005; Bilalic et al., 2008a,b, 2010b; Sheridan and Reingold, 2013 ´ ), the present paradigm monitored the eye movements of chess players during a choose-a-move-task, which is an ecologically valid task given that it resembles the challenges confronting chess players during a chess match. To provide a well-controlled manipulation of relevancy, the present paradigm employs two counterbalanced versions of each chess problem, which differed by a single piece such that a bishop was changed to a knight (see Appendix A in Supplementary material for examples). This subtle change reversed the relevancy map of the board, such that the regions that were relevant to the best move in one version are irrelevant in the other version, and vice versa. Thus, the present paradigm extends prior work by employing large relevant and irrelevant regions of interest that were well-matched on a variety of characteristics (e.g., number of squares, location, etc.).

Our rationale for using this paradigm was to contrast the time-course of relevancy effects in both novice and expert players. Thus, we asked strong expert players (average Elo = 2223) and novices (unrated club players) to solve a series of problems that were designed to be simple enough that both the novice and expert players could frequently detect the best move on the board. In light of past findings concerning the perceptual encoding advantage of experts, we expected that the expert's eye movements would reveal an earlier differentiation between relevant and irrelevant regions. Such a finding would provide additional support for the importance of perceptual processing in chess skill, using a new paradigm that afforded a number of methodological advantages.

## **METHODS**

## **PARTICIPANTS**

Forty-one chess players (17 experts and 24 novices) were recruited from online chess forums and from local chess clubs in Toronto and Mississauga (Canada). The mean age was 30 (*SD* = 14*.*2) in the expert group, and 27 (*SD* = 10*.*0) in the novice group. There was one female player in the expert group, and there were five female players in the novice group. For the expert players, the average CFC (Canadian Chess Federation) rating was 2223 (range = 1876–2580). All of the novice players were unrated club players who were familiar with the rules of chess, but had never participated in a rated chess tournament. All of the participants had normal or corrected-to-normal vision.

## **MATERIALS AND DESIGN**

There were eight experimental problems (See Appendix A in Supplementary material for the complete list of problems). To manipulate relevancy, we constructed two versions of each of the problems, and these two versions were identical except that a single piece was changed from a bishop to a knight. As shown in Appendix A in Supplementary material, this subtle change reversed the relevancy map of the board, such that regions that were relevant to the best move in one version were no longer relevant (and vice versa). Similar to Charness et al. (2001), relevancy was determined by asking an international master who did not participate in the study to classify the squares on the board as either relevant or irrelevant (see also De Groot and Gobet, 1996). For example, in the first version of Problem #3 (see panel 3a in Appendix A in Supplementary material), the best move on the board is "Rook to a8 (checkmate)," and the squares associated with this move are located on the left side of the board (see relevant region in orange), whereas the other side of the board contains squares that are irrelevant to the best move (see irrelevant region in blue). In contrast, in the second version of the problem (see panel 3b in Appendix A in Supplementary material) we changed a single piece from a bishop to a knight (see piece inside the dotted lines), such that the best move on the board became "Knight to f4," and the relevant and irrelevant regions were reversed. The relevant and irrelevant regions were always located near the edge of the board, and never overlapped with the center of the board. The two versions of the problems were counterbalanced such that each player only saw one version of a given problem. The same order of problems was used for all players, and each chess player completed a total of eight experimental problems. It was always white's turn to move, and the problems incorporated a variety of solutions that ranged from checkmate to material gains to defensive tactics.

## **APPARATUS AND PROCEDURE**

Eye movements were measured with an SR Research EyeLink 1000 system with high spatial resolution and a sampling rate of 1000 Hz. The experiment was programmed and analyzed using SR Research Experiment Builder and Data Viewer software. Viewing was binocular, but only the right eye was monitored. A chin rest and forehead rest were used to minimize head movements. Following calibration, gaze-position error was less than 0.5◦. The chess problems were presented using images (755 × 755 pixels) that were created using standard chess software (Chessbase 11). These images were displayed on a 21 in. ViewSonic monitor with a refresh rate of 150 Hz and a screen resolution of 1024 × 768 pixels. Participants were seated 60 cm from the monitor, and the width of one square on the chessboard equaled approximately 3.4 degrees of visual angle.

Prior to the experiment, the participants were instructed to choose white's best move as quickly and as accurately as possible, and they were told that they would be given a maximum of 3 min to respond to each problem. At the start of each trial, the participants were required to look at a fixation point in the center of the screen, prior to the presentation of the chessboard. The participants were asked to press a button as soon as they had made their decision, and they then reported their move verbally to the experimenter. If 3 min elapsed prior to the button press (this occurred on less than 1% of the experimental trials for the novices, and 0% of trials for the experts), then the chessboard was removed from the screen and the chess player was prompted to immediately provide their best answer.

## **RESULTS**

Our main goal was to use eye movements to examine expert vs. novice differences in how attention was allocated to the relevant and irrelevant regions of the chessboard. However, prior to reporting the eye movement results, we will first report several global measures of performance (i.e., accuracy, reaction times) as a function of the chess player's level expertise (expert, novice).

To assess move quality, we asked an international chess master who did not participate in the study to rate the quality of each move on a scale from 1 to 10 (1 = a blunder, 10 = one of the best moves on the board), and we consulted the evaluation function from a chess engine (Houdini 2 Pro), which provides a score (expressed in pawn units) to quantify the change in White's positional advantage as a result of the move chosen. For both of these measures of accuracy, the experts showed superior performance than the novices (*df* = 39; all *t*s *>* 2.0, all *p*s *<* 0.05). Specifically, the average move quality rating was 9.5 (*SE* = 0*.*12) for the experts and 6.9 (*SE* = 0*.*26) for the novices, and the average chess engine score was 2.1 (*SE* = 0*.*17) for the experts and 1.5 (*SE* = 0*.*16) for the novices. Moreover, the experts selected the best move on the board (i.e., a move that received a rating of 10), on an average of 93 % of trials (*SE* = 2%), whereas the novices selected the best move on an average of 52% of trials (*SE* = 5%), and this expert/novice difference in accuracy was significant: *t*(39) = 7*.*12, *p <* 0*.*001. In addition, there was a marginally significant trend toward faster reaction times for the experts (*M* = 28*,* 946 ms; *SE* = 5441 ms) than for the novices (*M* = 41*,* 854 ms; *SE* = 4748 ms), *t*(39) = 1*.*78, *p* = 0*.*084. More interestingly, within the expert group there was a negative correlation between the mean reaction time of each player and their Elo rating (*r* = −0*.*617, *p <* 0*.*01), which indicates that increases in chess rating were associated with faster performance. Finally, the experts' (but not the novices') reaction times were significantly faster when the relevant region was on the right rather than the left side of the board, *t*(16) = 2*.*16, *p <* 0*.*05 (for similar findings, see De Groot and Gobet, 1996).

Next, we analyzed eye movements to examine the extent to which the expert and novice chess players directed their attention toward the relevant and irrelevant regions of the board. For all of the eye movement analyses reported below, we analyzed correct trials only (i.e., trials that elicited a 10-rated move), to ensure that the experts and novices were matched for accuracy. Given our interest in the time-course of relevancy effects, we began our analysis by examining an early measure of processing (i.e., *first-dwell duration*, which is the duration of the first dwell on a given region, where a dwell is defined as one or more consecutive fixations on the target region, prior to the eyes moving to a different region of the board) as well as a later measure of processing (i.e., *total time*, which is the sum of the duration of all of the dwells on a region for the entire trial). To explore the pattern of results for each measure, we examined 2 × 2 ANOVAs that included relevancy (relevant, irrelevant) as a within-subjects factor, and expertise (expert, novice) as a between-subjects factor. For both the first-dwell and total time measures, there was a main effect of relevancy (i.e., longer dwells on relevant than irrelevant regions), all *F*s *>* 8, all *ps <* 0.01, and a main effect of expertise (i.e., longer dwells for novices than experts), all *F*s *>* 60, all *ps <* 0.001. More importantly, there was a significant interaction for the first-dwell measure [*F*(1*,* 39) = 4*.*38, *p <* 0*.*05], but not for the total time measure (*F <* 1). As can be seen from **Figures 1A,B**, this interesting dissociation between these two measures occurred because the first dwell measure produced significant relevancy effects for the experts [*t*(16) = 3*.*51, *p <* 0*.*01] but not for the novices (*t <* 1), whereas the total time measure produced relevancy effects for both groups (all *t*s *>* 2, all *p*s *<* 0.05). Interestingly, the interaction between expertise and relevancy for the first-dwell measure was solely due to an increase in the number of fixations for relevant compared to irrelevant regions for the experts [relevant: *M* = 2*.*19, *SE* = 0*.*21, irrelevant: *M* = 1*.*39, *SE* = 0*.*07, *t*(16) = 3*.*95, *p <* 0*.*01] but not for the novices (relevant: *M* = 2*.*05, *SE* = 0*.*16, irrelevant: *M* = 1*.*98, *SE* = 0*.*23, *t <* 1) as shown by a significant interaction, *F*(1*,* 39) = 8*.*28, *p <* 0*.*01. In contrast, the mean fixation duration for the first dwell did not vary as a function of relevancy or expertise (all *F*s *<* 1).

The above first-dwell findings imply that experts are faster than the novices at detecting relevant information. To further

**FIGURE 1 | The duration of the First Dwell (A) Total Time (B), and the Cumulative Time of the first five ordinal fixations in the trial (C) as a function of relevancy (relevant vs. irrelevant) and level of expertise (expert, novice).**

explore this effect, we analyzed the first five fixations of the trial to quantify the amount of time that experts and novices spent fixating the relevant and irrelevant regions at the start of the trial (a similar analysis of the first five fixations was conducted by Charness et al., 2001). Specifically, for each fixation position ranging from one to five (fixation position one corresponded to the fixation which began following the initial saccade in the trial), we calculated the cumulative sum of all of the fixations on the relevant and irrelevant regions up to and including the current fixation position. This analysis was conducted separately for each participant and each condition (i.e., relevant vs. irrelevant), and then averaged across participants to produce the figure shown in **Figure 1C**. As can be seen from this figure, the experts showed stronger and earlier relevancy effects than the novices. This pattern of results was reflected in a three-way interaction, *F*(4*,*36) = 2*.*74, *p <* 0*.*05 when we examined a 5 × 2 × 2 ANOVA that included fixation position (1,2,3,4,5) and relevancy (relevant, irrelevant) as within-subjects variables, and expertise (expert, novice) as a between-subjects variable. Consistent with this three-way interaction, the experts showed a significant relevancy effect [*F*(1*,* 16) = 6*.*64, *p <* 0*.*05] that became stronger over time as shown by a relevancy by time interaction [*F*(4*,* 13) = 6*.*67, *p <* 0*.*01], whereas the novices did not show a relevancy effect or an interaction (all *F*s *<* 1).

Taken together, the first-dwell findings and the cumulative time analyses indicate that experts are faster at detecting relevant information than novices, which supports the notion that chess expertise reflects an advantage in encoding chess-related visual configurations (De Groot, 1946, 1965; Chase and Simon, 1973a,b; De Groot and Gobet, 1996; Charness et al., 2001; Reingold et al., 2001a,b; Bilalic et al., 2010a, 2011a,b, 2012 ´ ; for reviews see Reingold and Charness, 2005; Reingold and Sheridan, 2011). During the present study, this perceptual processing advantage enabled skilled players to rapidly focus on chess configurations that were relevant to the best move on the board.

## **DISCUSSION**

The present experiment examined the time-course and magnitude of relevancy effects on expert and novice chess players' eye movements during a choose-a-move task. Our most important finding was that the eye movements of the experts, but not the novices, revealed a rapid differentiation between regions of the chess board that were relevant vs. irrelevant to the best move on the board. Specifically, the experts, but not the novices, spent more time looking at relevant than irrelevant regions during the early part of the trial (i.e., during the first-dwell on a region, and during the first five fixations of the trial), whereas both the experts and novices showed strong relevancy effects later on in the trial. Importantly, these findings were obtained using an experimental paradigm that provided a well-controlled manipulation of relevancy, such that the relevant and irrelevant regions of the board were counterbalanced across participants (see Appendix A in Supplementary material).

Similar to the present findings, prior studies have also shown enhanced relevancy detection by chess experts (Tikhomirov and Poznyanskaya, 1966; Simon and Barenfeld, 1969; De Groot and Gobet, 1996; Charness et al., 2001; Bilalic et al., 2010a, 2012 ´ ). Moreover, this relevancy detection advantage is a particular instance of the perceptual encoding advantage that has been shown by chess experts in a variety of tasks employing domainrelated stimuli (De Groot, 1946, 1965; Chase and Simon, 1973a,b; De Groot and Gobet, 1996; Charness et al., 2001; Reingold et al., 2001a,b; Bilalic et al., 2010a, 2011a,b, 2012 ´ ; for reviews see Reingold and Charness, 2005; Reingold and Sheridan, 2011). To explain this perceptual encoding advantage, Chase and Simon (1973a,b) proposed that chess expertise develops due to extensive practice with domain-related visual-configurations. Over the course of thousands of hours of practice, chess experts store memories for configurations of chess pieces in memory in the form of chunks (Chase and Simon, 1973a,b), which are supplemented by larger memory structures called templates (Gobet and Simon, 1996, 2000). These memory structures lead to a perceptual encoding advantage such that chess experts are able to process chess stimuli in terms of larger configurations of pieces, rather than individual features. Consequently, chess players are able to use their memory for chess configurations to guide their search for the best move on the board, rather than exhaustively searching all possible moves. This theoretical account is consistent with the present study's finding that the chess experts were able to rapidly focus on information that was relevant to the best move on the board.

Beyond the chess domain, the present findings are also consistent with findings from other domains concerning the importance of perceptual processing during skilled performance. As reviewed by Reingold and Sheridan (2011), experts in many domains of expertise have been shown to efficiently process domain-related material in terms of larger configurations, as shown by findings that radiologists can rapidly fixate abnormalities in less than a second (Kundel et al., 2008). Moreover, this key role of perceptual processing in expertise coincides with other evidence for perceptual specificity effects in memory and learning (for reviews, see Levy, 1993; Roediger and McDermott, 1993; Reingold, 2002), such as recent findings that eye fixation times are shorter for words that were read twice in the same typography (i.e., font) rather than in two different typographies (Sheridan and Reingold, 2012a,b), and findings that chess experts perform better when viewing familiar chess symbols compared to a condition in which letters (i.e., B = Bishop, P = Pawn, etc.) were shown instead of the symbols (Reingold et al., 2001a).

More generally, the relevancy effects from the present study add to the growing evidence that higher level cognitive processes can rapidly influence eye movement control (e.g., the duration and location of fixations) on a variety of tasks. In fact, ever since the seminal eye tracking work by Yarbus (1967), it has been wellknown that our eye movements are biased toward aspects of a visual image that are relevant to our current goals, and away from areas that are irrelevant. As further evidence for the role of higher cognitive processing in guiding eye movements, participants in visual search studies spend more time fixating on distractors that are related (e.g., visually similar) rather than unrelated to the target (e.g., Findlay and Gilchrist, 1998; Reingold and Glaholt, 2014), participants in face perception tasks preferentially look at relevant features, such as the eyes (e.g., Henderson et al., 2005), participants in scene perception tasks spend more time fixating information that is task-relevant rather than irrelevant (Glaholt and Reingold, 2012), and the eye movements of skilled readers reveal rapid effects of a variety of higher-level lexical, linguistic and cognitive variables (e.g., Rayner, 1998, 2009; Staub et al., 2010; Staub, 2011; Reingold et al., 2012; Sheridan and Reingold, 2012c,d). Taken together, these findings lend support to models of eye movement control that predict a strong *eyemind link*, such that ongoing cognitive processing can have a rapid effect on lower-level perceptual and oculomotor processing (for recent reviews, see Reingold et al., 2012, in press). Moreover, these findings underscore that skilled performance reflects a complex inter-play of perceptual and cognitive processing, and future work can examine the extent to which similar findings from multiple domains are reflective of common underlying mechanisms.

Finally, a key contribution of the present study is that we introduced a new experimental paradigm to provide a carefully controlled manipulation of relevancy. As shown in Appendix A in Supplementary material, we created two counterbalanced versions of each chess problem that differed by a single piece, and the regions that were relevant in one version were irrelevant in the other version (and vice versa). Given that the relevant and irrelevant regions of the board were well-matched, we can conclude that the relevancy effects in the experiment were solely due to the relevancy of a given region to the best move, and not to some other confound (e.g., differences in visual saliency, location on the board, number of pieces in the region, etc.). The present paradigm could be used in the future to investigate additional topics concerning the impact of relevancy on a variety of aspects of chess expertise.

## **ACKNOWLEDGMENTS**

We are especially grateful to all of the chess players who participated in the experiment, and to Rick Lahaye for his valuable assistance with stimuli creation. This research was supported by an NSERC grant to Eyal Reingold, a Postdoctoral Fellowship (PDF) awarded to Heather Sheridan from NSERC, and support from the Centre for Vision and Cognition (CVC), University of Southampton, UK.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fpsyg*.*2014*.* 00941/abstract

## **REFERENCES**


of chess relations. *Psychon. Bull. Rev.* 8, 504–510. doi: 10.3758/BF031 96185


Yarbus, A. L. (1967). *Eye Movements and Vision*. New York, NY: Plenum Press.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 April 2014; paper pending published: 08 June 2014; accepted: 06 August 2014; published online: 25 August 2014.*

*Citation: Sheridan H and Reingold EM (2014) Expert vs. novice differences in the detection of relevant information during a chess game: evidence from eye movements. Front. Psychol. 5:941. doi: 10.3389/fpsyg.2014.00941*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Sheridan and Reingold. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expert and competent non-expert visual cues during simulated diagnosis in intensive care

## *Clare McCormack1, Mark W. Wiggins1\*, Thomas Loveday1 and Marino Festa2*

*<sup>1</sup> Centre for Elite Performance, Expertise, and Training, Macquarie University, North Ryde, NSW, Australia*

*<sup>2</sup> Paediatric Intensive Care Unit, Kim Oates Australian Paediatric Simulation Centre, Children's Hospital at Westmead, Westmead, NSW, Australia*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia Merim Bilalic, Alpen-Adria Universität Klagenfurt, Austria*

#### *\*Correspondence:*

*Mark W. Wiggins, Centre for Elite Performance, Expertise, and Training, Macquarie University, Balaclava Road, North Ryde, NSW 2109, Australia e-mail: mark.wiggins@mq.edu.au*

The aim of this study was to examine the information acquisition strategies of expert and competent non-expert intensive care physicians during two simulated diagnostic scenarios involving respiratory distress in an infant. Specifically, the information acquisition performance of six experts and 12 competent non-experts was examined using an eyetracker during the initial 90 s of the assessment of the patient. The results indicated that, in comparison to competent non-experts, experts recorded longer mean fixations, irrespective of the scenario. When the dwell times were examined against specific areas of interest, the results revealed that competent non-experts recorded greater overall dwell times on the nurse, where experts recorded relatively greater dwell times on the head and face of the manikin. In the context of the scenarios, experts recorded differential dwell times, spending relatively more time on the head and face during the seizure scenario than during the coughing scenario. The differences evident between experts and competent non-experts were interpreted as evidence of the relative availability of task-specific cues or heuristics in memory that might direct the process of information acquisition amongst expert physicians. The implications are discussed for the training and assessment of diagnostic skills.

**Keywords: expertise, cue utilization, diagnosis, medicine, simulation**

## **INTRODUCTION**

The accurate initial assessment of clinical patients in time-critical emergencies is an essential component of timely and appropriate intervention by critical care teams (Pham et al., 2012). It mitigates the further deterioration of the patient's condition and potentially reduces mortality and the additional burden on an already strained healthcare system. Nevertheless, it is a process that occurs within a short time-period and with potentially minimal information, thereby increasing the likelihood of error (Ely et al., 2011).

On the basis that assessments are required within a relatively short period and with minimal information, it is likely that a physician will engage lean and rapid cognitive strategies such as satisficing, relying on productions or relationships between patterns of information to guide the initial process of diagnosis (Simon, 1972; Marewski and Gigerenzer, 2012). Productions comprise rules-of-thumb or condition-action (IF-THEN) statements that are resident in memory and that can be used to assist the interpretation of a situation or event (Anderson, 1982; Hamm, 2014). For example, in the medical context, IF a patient presents with an elevated temperature, THEN it is normally associated with the presence of an infection.

The development and application of productions is generally associated with a reduction in cognitive load, since their application obviates the requirement for compensatory strategies that require the retention of task-related information in working memory (Sweller, 1988). However, such rules-of-thumb are, by definition, not necessarily applicable in all situations, and there are many cases where the application of productions has been associated with the commission of errors (Croskerry, 2003; Norman et al., 2014).

The acquisition of information as a prelude to the diagnosis of a particular condition is based, in part, upon the features that are immediately apparent on presentation to the physician (Croskerry, 2009a; Stolper et al., 2011). Where an association exists in memory, a feature or combination of features is presumed to trigger a production that will be interpreted as the basis of a diagnosis or will provide the impetus for the acquisition of additional information necessary to form a diagnosis (Khader et al., 2011). This process is consistent theoretically with the initial stages of recognitiondriven decision-making where the condition-action statements that comprise productions are referred to as cues (Klein, 2008).

The acquisition and application of cues is thought to explain the rapid and consistently accurate behavior of genuine experts (Mann et al., 2007; Kahneman and Klein, 2009). In the context of the Recognition-Primed Decision model, cues trigger associations in memory that subsequently provide the basis for mental simulations that, in turn, guide a response (Klein, 2008). Brunswik (1955), in his Lens Model, also proposes that the likelihood of an association being triggered is dependent upon the frequency with which features in the environment match features in memory. Finally, Stokes et al. (1997) incorporate cues as the precursor to diagnosis in their theoretical model of expert decision-making in the aviation context.

Like productions, cues are essentially feature-event/object relationships in memory that enable the rapid assessment of a situation and, subsequently, the formulation of a response (Wiggins, 2006, 2012). Establishing the existence of cues has generally been inferred on the basis of responses to domain-specific stimuli. For example, Morrison et al. (2013) demonstrated that, in comparison to non-experts, expert forensic investigators were relatively consistent and responded more rapidly in assessing the relatedness of feature/event pairs relating to a murder investigation. Similarly, Wiggins and O'Hare (1995) established that the acquisition of weather-related information differed between experts and non-expert pilots, with the former being less likely to access information in the sequence in which it was presented. This behavior has been interpreted as evidence to suggest a greater level of cue utilization amongst experts.

The association between levels of cue utilization and expertise has been established in squash (Abernethy, 1990), power control (Loveday et al., 2013a), pediatric assessment (Loveday et al., 2013b), and aviation (Wiggins et al., 2014). Measures of cue utilization have also differentiated performance in the context of software engineering (Loveday andWiggins,2014). However, these approaches have been based on generalized behavior and there is no indication as to the specific cues involved and how they might be activated in response to the presence of features.

As experts gain experience within a particular context, Anderson (1982)suggests that productions are revised so that they become more precise and discriminate between different circumstances. Referred to as *discrimination*, it is a process that coincides with generalization where it becomes evident that a particular production is equally applicable across a range of conditions. This combination of discrimination and generalization may explain both the domain specificity of experts, together with their capacity to perceive underlying similarities between situations (Shanteau, 1988).

If experts possess a highly refined repertoire of task-related cues in memory, then the immediate features associated with two diagnostic scenarios that differ in their immediate features but incorporate a similar intrinsic etiology, should trigger the *bottomup* application of distinct cues, and these differences should be evident in differences in the process of information acquisition (Patel and Groen, 1986; Croskerry, 2009b). Empirical support for this capacity for bottom-up discrimination can be drawn from research into the Einstellung Effect in which visual attention during expert problem-solving is implicitly drawn toward familiar solutions, even at the expense of novelty (Bilali´c et al., 2010). Since competent non-experts have yet to develop highly specialized cues, they are not expected to alter their information acquisition in response to the differences in the immediate features of the task.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

The participants in the present study were drawn from a convenience sample of medical practitioners of different levels of grade and seniority, working in the pediatric and neonatal intensive care units of a tertiary children's hospital. The local research and ethics committee approved the study, and individual participant consent was obtained from each clinician examined.

The participants comprised 11 male and seven female physicians employed in either pediatric or neonatal intensive care. Their mean age was 40.5 years, SD = 10.6. Establishing that expertise, rather than experience, has been acquired, requires that some formal criterion be established that is typically based on measures of performance. In the context of medical practitioners, an indirect measure of expertise is the seniority of their role (Patel and Groen, 1991). This constitutes recognition that they have successfully attained a level of performance where their medical interventions are both accurate and consistent over an extended period of time. While a positive relationship will inevitably exist between years of accumulated experience and performance, the recognition of expertise amongst peers presumes that a level of performance has been reached that is exceptional in comparison to other practitioners (Loveday and Wiggins, 2014). Therefore, consistent with this perspective, the participants were classified as expert or competent non-experts based on their occupational position (consultant/staff specialist, *n* = 6, or trainee registrar/fellow, *n* = 12) according to the criteria established by Patel and Groen (1991). Their accumulated experience working in medicine was between 6 and 42 years, *m* = 16.5, SD = 10.6, with a range of 1–35 years in the intensive care environment, *m* = 9.9, SD = 10.3. Experts recorded a mean 23.0 years experience working in intensive care, SD = 9.33, compared to a mean 4.8 years for competent non-experts, SD = 4.57.

#### **SIMULATION**

A realistic scenario and naturalistic environment was created for the study by using a high-fidelity infant manikin (Laerdal SimBaby) connected to a monitor that displayed simulated physiological parameters and appropriate corresponding alarms and sounds *in situ* in an intensive care cot in a bedspace within the pediatric intensive care unit of a tertiary children's hospital. The configuration of the room was typical of a bedspace in the pediatric intensive care unit and was familiar to all study participants (see **Figure 1**). A nasogastricfeeding tube was inserted and attached to a continuous feeding pump with enteral feed attached. The manikin was also connected to nasal prong oxygen with a wall-mounted oxygen flow-meter, an intravenous drip via a peripherally inserted intravenous cannula, and an appropriately sized blood pressure cuff was attached to the right arm. A familiar and experienced pediatric intensive care nurse with a pre-scripted dialog was used as a confederate actor within the scenario.

An IVIEW XTM HED eye tracking system manufactured by SensoMotoric Instruments was used to record the eye movements of participants, in addition to scene video and audio recording. The system consists of a fully mobile, head-mounted device with two cameras attached, one recording the scene and one trained on the participant's eye, recording gaze and pupil data. A piece of clear plastic was fixed in front of one eye. The device was connected to a notebook computer, which powered the cameras and stored gaze, video and audio data. The gaze sampling rate used was 50 Hz, and a fixation was defined as 100 ms with maximum dispersion of 20 pixels. Based on the limitations imposed by frame rate acquisition and the need to include all features in the intensive care environments, features were broadly categorized as belonging to one of six areas of interest (AOI). Each area of interest was defined by anatomical or environmental relationships.

## **SCENARIOS**

The two scenarios used during the study were written by two subject-matter experts, both of whom were senior intensive care specialists working in the pediatric intensive care unit. The scenarios were designed around two immediate features, the first of which related to the head and face of the manikin. In particular, the level of consciousness of the child would be an important determinant in the seizure scenario, but would be less significant in the context of the coughing scenario. This information would be determined through the child's facial features, including the eyes. In the coughing scenario, the information provided spontaneously by the assisting nurse was the immediate feature, since this would be an important determinant as to whether any respiratory assistance had been provided. Participants were randomly allocated to either the coughing or the seizure scenario as their first scenario, and all participants completed both scenarios.

The initial disease state was identical for both scenarios with the immediate features becoming evident as the symptomatology emerged. A simple respiratory arrest scenario in a self-ventilating monitored patient was used as the initial disease state, since it avoided potentially confounding effects that might be introduced by complex or unfamiliar equipment.

In the first minute of the coughing scenario, the patient demonstrated a heart rate of 150 beats per minute (BPM), blood pressure of 77/40 and a respiratory rate of 66 breaths per minute. Saturation was at 94% on 1 liter/min of nasal prong oxygen with good connections. This information, and electrocardiography (ECG), was displayed on the monitor. The patient showed see-saw breathing with bilateral crackles as well as grunting that was cycling with breaths. The cot was tilted at 30◦. The scenario began with the nurse introducing the patient, saying: *"The ward is about ready to take this baby with bronchiolitis, but I'm concerned about whether he's OK to be discharged from PICU as he's had a couple of short desaturations as I've been looking after him this morning."* They

were also advised that the patient presented to the emergency department the previous evening with increased respiratory work, and was found to have respiratory syncytial virus (RSV) – positive bronchiolitis, and hyperinflation shown on a chest x-ray. The patient was admitted to PICU late on the previous afternoon for possible continuous positive airway pressure therapy, but improved with nasal oxygen. Feeds were started at 6.00am that morning, but had been stopped a few hours later following a second desaturation episode. Desaturations were associated with coughing and not with vomiting or the reflux of feed. No apnoea, bradycardia, or seizure was noted at the time. The temperature was at 37.6◦C, and the patient was not on antibiotics. A pertussis swab had not been taken. A full blood count on admission showed hemoglobin (Hb) of 10.7 *g* per deciliter, white cell count (WCC) was 9.3 cells per cubic millimeter (Neutrophils 5.3, Leukocytes 4.0), and platelets at 210 cells per cubic milliliter.

After 1 min had elapsed, the manikin was set to cough for 20 s, desaturate to 84% over 40 s, and become bradycardic to 104 bpm over 40 s. At this point, the nurse prompted participants, saying: *"This is what he did before you came in."* After 2 min and 10 s, saturation increased to 99% if the participant had used an oxygen bag, or to 94% if no adjustment to oxygen administration was made. Heart rate increased to 160 over 20 s, and the patient showed grunting and see-saw rasps as had occurred previously. The scenario concluded following a duration of 3 min and 30 s.

Prior to commencing the second scenario, participants were advised that this was a "new patient," not related to the previous scenario. In the first minute of the seizure scenario, the patient had a heart rate of 120 bpm, blood pressure of 99/70 and respiratory rate of 33 breaths per minute. Saturation was at 94% on 1 liter/min of nasal prong oxygen with good connections. This information, and ECG, was displayed on the monitor. The patient showed seesaw breathing with bilateral crackles as well as grunting that was cycling with breaths. The cot was tilted at 30◦ and the scenario began with the nurse introducing the patient, saying: "*This baby has just been brought up from the ward by the nurse practitioner as he has had a couple of episodes of desaturation with stiffening of his arms and legs on the ward. I'm a little bit worried about him as he's just had another similar episode and dropped his 'sats' to the mid 80 s. I've just done a capillary gas, which is in the gas machine now."*

The participants were also advised that the patient was a 6 week old baby delivered at full term with no neonatal problems. The patient was presented to the ward 2 days previously with RSV positive bronchiolitis and hyperinflation shown on a chest x-ray. Since then, the patient had been on full maintenance intravenous fluids (N/4 and 5% dextrose) and nil by mouth. The patient was admitted to PICU an hour earlier. Since then, he had shown desaturation to the mid 85 associated with unusual movements of the torso and stiffening of limbs, and an increase in heart rate. The desaturations would self-correct after a minute of nasal prong oxygen, increased to 2 liter per minute. The temperature was at 37.6◦C, and the patient was not on antibiotics. A pertussis swab had not been taken. A full blood count on admission showed Hb of 10.7 *g* per deciliter, WCC was 9.3 cells per cubic millimeter (Neutrophils 5.3, Leukocytes 4.0), and platelets at 210 cells per cubic milliliter.

After 1 min had elapsed, the manikin was programmed to show rapid and slow torso movements over 20 s, desaturation to 84% over 40 s and tachycardia to 180 over 40 s. At this point, the nurse prompted participants, saying: *"This is what he did before you came in. Here's the cap gas* (hands over blood gas analysis)." After 2 min and 10 s, saturation increased to 99% over 20 s if the participant had used an oxygen bag, or to 94% if no adjustment to oxygen administration was made. Heart rate dropped to 160 over 20 s, and the patient showed see-saw rasps with the respiratory rate still at 33 breaths per minute. The scenario concluded following a duration of 3 min and 30 s.

#### **PROCEDURE**

The participants completed a pre-scenario questionnaire that included demographic questions and questions related to participants' subjective levels of fatigue and stress, and familiarity with the type of scenario encountered. The eye-tracker was then demonstrated to each participant, and the device fitted and calibrated using the recommended five-point calibration procedure.

Each participant took part in two consecutive scenarios separated by a 5 min interval. They waited outside the cubicle as the scenario was set up. The two scenarios were each of 3 min and 30 s duration and involved acute desaturation in a baby with bronchiolitis, due to either coughing (Scenario A) or a seizure/apnoea (Scenario B). A nurse was present in each scenario and briefed the clinician on the condition of the child over an equivalent period of time. The condition recovered spontaneously regardless of the treatment given.

Prior to each scenario, participants were reminded that they should regard the simulator as a real patient and that their individual performance was not being reported. The scenario began with the participant called to the bedspace by the confederate bedside nurse who introduced the scenario with a pre-scripted statement and a series of responses, and remained present throughout each scenario. Three researchers were also present in the cubicle during the study to monitor the eye-tracker, video-recording and simulator. All remained silent and out of view during the scenarios.

The eye-tracker automatically recorded eye movement data. Data for each participant were collated, including the number of fixations, the duration of fixations in milliseconds (dwell time), the number of blinks, the number of saccades, and the range of gaze. Video footage, taken from the perspective of participants, was also recorded throughout the tasks. The software package BeGazeTM was used to align longitudinal data with video footage for the purposes of analysis. Video footage was analyzed frame by frame to identify AOI. There were six AOI defined in the visual scene, namely the nurse, the monitor, the manikin's head and face, the manikin's torso, the manikin's limbs, and the wall on which the equipment and oxygen outlets were located.

## **RESULTS**

#### **DATA REDUCTION**

To derive information on the process of visual information acquisition during initial clinical assessment, the video analysis was limited to the first 90 s of the scenario. Eye-tracking data for one expert and three competent non-experts were excluded from further analysis due to failed eye-tracking calibration. There was no airway opening, bag and mask support, or cardiac compression initiated by participants during the period of analysis.

### **DESCRIPTIVE STATISTICS**

Descriptive statistics were generated for each of the dependent variables. Across the participants and the scenarios, the mean dwell time was 448.16 ms*,* SD = 9.75*.* The mean dwell times for each of the AOI is summarized in **Table 1**.

#### **FIXATIONS AND SACCADES**

Three independent, mixed between-within analyses of variance were undertaken to establish whether a relationship existed between participants' level of expertise, the nature of the scenario, and eye tracking behavior, including the frequency of fixations and saccades and the mean duration of fixations (dwell time).

No statistically significant differences were evident between experts and competent non-experts in the frequency of fixations, *<sup>F</sup>*(1,10) <sup>=</sup> 1.97, *<sup>p</sup>* <sup>=</sup> 0.19, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.17, or saccades, *<sup>F</sup>*(1,10) <sup>=</sup> 4.00, *<sup>p</sup>* <sup>=</sup> 0.07, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.29. Similarly, eye gaze data were not significantly different between the scenarios in the frequency of fixations, *<sup>F</sup>*(1,10) <sup>=</sup> 3.89, *<sup>p</sup>* <sup>=</sup> 0.07, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.28, the frequency of saccades, *<sup>F</sup>*(1,10) <sup>=</sup> 2.46, *<sup>p</sup>* <sup>=</sup> 0.15, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.20*,* or the mean dwell time, *<sup>F</sup>*(1,10) <sup>=</sup> 0.49, *<sup>p</sup>* <sup>=</sup> 0.50, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.04.

Differences were, however, evident between experts and competent non-experts in the mean dwell time, *F*(1,10) = 6.48, *<sup>p</sup>* <sup>=</sup> *0.03,* <sup>η</sup><sup>2</sup> <sup>=</sup> 0.39, with experts' mean dwell time, X¯ = 472.36 ms, SD = 14.89, greater than non-experts, X¯ = 423.36 ms, SD = 12.58. No significant interaction was evident between expertise and scenario for the frequency of fixations, *<sup>F</sup>*(1,10) <sup>=</sup> 0.01, *<sup>p</sup>* <sup>=</sup> 0.91, <sup>η</sup><sup>2</sup> <sup>&</sup>lt; 0.01, the frequency of saccades, *<sup>F</sup>*(1,10) <sup>=</sup> 0.04, *<sup>p</sup>* <sup>=</sup> 0.85, <sup>η</sup><sup>2</sup> <sup>&</sup>lt; 0.01*,* nor the mean dwell time, *<sup>F</sup>*(1,10) <sup>=</sup> 0.58, *<sup>p</sup>* <sup>=</sup> 0.46, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05.

#### **AOI DWELL TIME ANALYSIS**

For the "head and face" and "nurse" signature features, a 2 (expertise) × 2 (scenario) mixed-between ANOVA was used to test whether differences existed in the overall dwell time for experts and competent non-experts during the 90 s, initial assessment of the patient. It was assumed that differences in dwell time would reflect differences in the relative attention to features associated with the particular scenario. Consistent with expectations, the results revealed a significant difference between competent nonexperts and experts' mean overall dwell time for "head and face,"

#### **Table 1 | Mean dwell time (ms) by area of interest.**


*<sup>F</sup>*(1,12) <sup>=</sup> 6.16, *<sup>p</sup>* <sup>=</sup> 0.03*,* <sup>η</sup><sup>2</sup> <sup>=</sup> 0.34, whereby experts recorded significantly greater dwell time within the AOI, X¯ = 11358.83 ms, SD = 2698.77, in comparison to competent non-experts, X¯ = 3007.98 ms, SD = 2011.55. A significant difference was also evident in the mean dwell time for the "nurse," *F*(1,12) = 5.89, *<sup>p</sup>* <sup>=</sup> 0.03*,* <sup>η</sup><sup>2</sup> <sup>=</sup> 0.34. However, in this case, experts recorded a significantly lower dwell time within the AOI, X¯ = 433.63 ms, SD = 2619.36, in comparison to competent non-experts, X¯ = 8358.76 ms, SD = 1952.38 (see **Figure 2**).

In the context of scenarios, a significant main effect was evident for the "head and face" AOI, *<sup>F</sup>*(1,12) <sup>=</sup> 5.69, *<sup>p</sup>* <sup>=</sup> 0.03*,* <sup>η</sup><sup>2</sup> <sup>=</sup> 0.32, whereby participants recorded a greater overall dwell time for this AOI during the seizure scenario, X¯ = 9438.06 ms, SD = 2452.35, in comparison to the coughing scenario, X¯ = 4928.75 ms, SD = 1199.43. This suggests that, as a cohort, both experts and competent non-experts responded to the differences between the scenarios by changing their pattern of information acquisition in relation to the head and face. However, there was no change evident in the overall dwell time on the nurse.

At a more detailed level, an expertise by scenario interaction was evident for the mean overall dwell time on the "head and face," *<sup>F</sup>*(1,12) <sup>=</sup> 4.82, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.29. An inspection of the means indicated that, where there was relatively little difference between the mean dwell times for competent non-experts across the two scenarios, a difference was evident for experts with the mean dwell time greater during the seizure scenario than during the coughing scenario (see **Figure 3**). In combination, these results suggest that, although competent non-experts may recognize the relative importance of signature features during different diagnostic scenarios, their pattern of interaction with these features remains relatively consistent. This contrasts with expert clinicians who appear to alter both the overall time that they devote to the acquisition of information from signature features, together with the pattern of acquisition.

### **GENERAL DISCUSSION**

The aim of this study was to examine the information acquisition strategies employed by expert and competent non-expert intensive care physicians during two diagnostic scenarios that differed in

their immediate features, but incorporated a similar intrinsic etiology. It was anticipated that where competent non-experts would adopt a relatively consistent pattern of information acquisition across the scenarios, experts would vary their approach consistent with the differences in the immediate features that were presented. The results revealed differences in overall mean fixation times between experts and competent non-experts, with the former maintaining visual gaze on AOIs for significantly longer periods. Further, experts spent significantly more dwell time within the "head and face" AOI and significantly less time within the "nurse" AOI in comparison to competent non-experts.

The differential performance amongst experts and competent non-experts during information acquisition is consistent with the proposition that experts and competent non-experts differ in their repertoire of cues in memory (Ericsson and Kintsch, 1995; Jarodzka et al.,2010). Experts attended to the immediate visualfeatures associated with the patient ("head and face"), where competent non-experts tended to spend a greater proportion of the time fixated on the confederate nurse. Since this occurred independent of scenarios, it might be surmised that experts were integrating the auditory information being delivered by the confederate nurse with the visual information that was evident from the head and face of the manikin. It also implies that the"head and face"embodied a greater level of diagnostic information than was available from the nurse in isolation.

Despite the fact that, overall, experts tended to spend relatively more time than competent non-experts attending to the "head and face," differences were evident in the mean dwell times across the scenarios. For competent non-experts, the relative emphasis on the "head and face" and "nurse" did not change with the change in scenario, suggesting that non-experts did not necessarily discriminate between the scenarios based on the immediate features.

As hypothesized, expert physicians recorded greater mean dwell times on the "head and face" during the seizure scenario, than during the coughing scenario. This reflects the potentially greater utility of the "head and face" in yielding diagnostic information during the coughing scenario. The dwell time for the "confederate nurse" did not change statistically for experts, possibly due to a restriction of range associated with the mean dwell times. In combination, the outcomes suggest that overall, experts spent more time examining cues arising from the "head and face" of the patient, but that differences in the immediate features were associated with differences in the time spent examining the cues.

Although the results confirm that experts differfrom competent non-experts in their acquisition of information during diagnostic scenarios, it also suggests that their attention toward features in the environment is influenced by the interaction between their taskrelated experience and the immediate features that are present. For example, it is possible that, for competent non-experts, the situation was relatively unfamiliar and, therefore, they were seeking information that would correspond to a relatively limited number of patterns in memory. Since the "confederate nurse" was delivering an initial assessment of the symptoms, and may have experienced the event previously, directing attention toward the nurse represents a reasonable strategy, where a scenario is unfamiliar.

By contrast, experts possess a repertoire of cues in memory and therefore, are drawn toward features that are implicitly diagnostic of a particular condition (Croskerry, 2009b). The relative proportion of attention that is directed toward signature features is consistent with a bottom-up recognition process, whereby the environmentalfeatures trigger associations in memory, and a serial process of pattern matching is undertaken until a corresponding (or near to corresponding) pattern is identified (Patel and Groen, 1986; Klein, 2008).

At an applied level, the results suggest differences in the diagnostic strategies employed by experts and competent non-experts, and there are implications for training. For example, the fact that competent non-experts tended to attend to the nurse, suggests that they lacked a repertoire of cues in memory, necessary to recognize and adapt to the differences in the immediate features that were presented. This was not the case for experts who were able to identify the immediate features associated with the different scenarios and respond appropriately. One approach to the development of cues in memory involves cue-based training in which learners participate in a series of scenarios, the aim of which is to establish the relationship between features and events/objects in memory in the form of cues (Wiggins and O'Hare, 2003). The utility of cue-based training has been established in other domains (e.g., Auditors), and may be appropriate for diagnostic tasks in the medical context (Earley, 2001).

#### **LIMITATIONS AND FUTURE RESEARCH**

While a weakness of this study is the relatively limited number of participants, the fact that differences were observed between experts and competent non-experts in relation to dwell times points toward the underlying power of the effects that were observed. Moreover, the study demonstrated that, in naturalistic environments, where the number of features available is relatively constrained and where the least experienced operators are in fact competent, differences in information acquisition were evident.

Since the focus of this study was information acquisition behavior during the initial assessment of a potentially deteriorating patient, the complexity associated with therapeutic interventions was excluded. Nevertheless, it is possible that a more extended observation may have revealed new information in the attention to cues, and the interactions with auditory and tactile stimuli. While these stimuli, were experimentally controlled in the present study, future research should be directed toward examining the relative impact of communication, and the social processes that are engaged by different groups of physicians. This builds on the baseline data that has been established in the present study and contributes to a broader understanding of non-visual stimuli or cues, and the role of team and social interactions in the recognition of the deteriorating child by skilled clinicians.

#### **CONCLUSION**

This study demonstrated differences in the information acquisition behavior of experts and competent non-experts during assessments of a deteriorating child during two *in situ* simulations. Compared to competent non-experts, experts attended to specific visual features for longer periods, and exhibited longer dwell times on the manikin's "head and face," particularly during the seizure scenario. By contrast, competent non-experts displayed longer dwell times on the "confederate nurse." These results were interpreted as evidence of differences between experts and competent non-expert physicians' diagnostic cues in memory. The methodology offers a potential framework to develop behavioral standards of cue acquisition and utilization that could ultimately be used for the assessment of the diagnostic performance of physicians, particularly in time-constrained situations.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 May 2014; accepted: 08 August 2014; published online: 26 August 2014. Citation: McCormack C, Wiggins MW, Loveday T and Festa M (2014) Expert and competent non-expert visual cues during simulated diagnosis in intensive care. Front. Psychol. 5:949. doi: 10.3389/fpsyg.2014.00949*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 McCormack,Wiggins, Loveday and Festa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Spontaneously spotting and applying shortcuts in arithmetic—a primary school perspective on expertise

## *Claudia Godau1,2\*, Hilde Haider 3, Sonja Hansen3, Torsten Schubert 1,2, Peter A. Frensch1,2 and Robert Gaschler 2,4*

*<sup>1</sup> Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany*

*<sup>2</sup> Cluster of Excellence: Image Knowledge Gestaltung, an Interdisciplinary Laboratory, Berlin, Germany*

*<sup>3</sup> Department of Psychology, Universität Köln, Köln, Germany*

*<sup>4</sup> Department of Psychology, Universität Koblenz-Landau, Landau, Germany*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Ann Dowker, University of Oxford, UK Lieven Verschaffel, University of Leuven, Belgium*

## *\*Correspondence:*

*Claudia Godau, Institut für Psychologie, Fakultät für Lebenswissenschaften, Humboldt Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany e-mail: claudia.godau@hu-berlin.de* One crucial feature of expertise is the ability to spontaneously recognize where and when knowledge can be applied to simplify task processing. Mental arithmetic is one domain in which people should start to develop such expert knowledge in primary school by integrating conceptual knowledge about mathematical principles and procedural knowledge about shortcuts. If successful, knowledge integration should lead to transfer between procedurally different shortcuts that are based on the same mathematical principle and therefore likely are both associated to the respective conceptual knowledge. Taking commutativity principle as a model case, we tested this conjecture in two experiments with primary school children. In Experiment 1, we obtained eye tracking data suggesting that students indeed engaged in search processes when confronted with mental arithmetic problems to which a formerly feasible shortcut no longer applied. In Experiment 2, children who were first provided material allowing for one commutativity-based shortcut later profited from material allowing for a different shortcut based on the same principle. This was not the case for a control group, who had first worked on material that allowed for a shortcut not based on commutativity. The results suggest that spontaneous shortcut usage triggers knowledge about different shortcuts based on the same principle. This is in line with the notion of adaptive expertise linking conceptual and procedural knowledge.

**Keywords: expertise, numerical cognition, arithmetic, commutativity, spontaneous strategy application**

## **INTRODUCTION**

Expertise has various manifestations and could be defined as consistently superior performance within a specific domain relative to novices and relative to other domains (Ericsson and Lehmann, 1996). The development of expertise in real-world domains involves a complex interplay of changes in perception, categorization, memory, problem solving, coordination, skilled action, and other components of human cognition (Palmeri et al., 2004). Expert's flexibility has been frequently discussed and there exist two contradictory perspectives. Research on creativity and skill acquisition has been used to illustrate that more knowledge can make one less flexible (i.e., Luchins, 1942; Logan, 1988). However, research on expertise suggested that experts are more flexible and creative in their thought patterns (see summary in Bilalic et al., 2008a ´ ). Both options might be possible depending on the expertise level and the problem difficulty. Investigating chess experts Bilalic et al. (2008a) ´ found that "super experts" were flexible and find the optimal solution first or at least find it quickly after perceiving a salient but non-optimal solution.

Here, we focus in the domain of mathematics on spontaneously spotting and applying shortcuts in arithmetic and whether with further experience students become increasingly able to generate rapid adequate actions with less and less effort (Ericsson, 2008). Mathematic students used significantly larger numbers of appropriate strategies than adults with less expertise (Dowker et al., 1996). Experts have to be able to recognize spontaneously and without instruction that a specific element of their knowledgebase can be applied in a specific situation. It would not suffice if they possessed elaborate conceptual knowledge as well as procedures to apply it, but needed to wait for someone to tell them that the knowledge can be applied in the given situation. This someone would rarely drop by.

In recent years, research in primary school arithmetic has started to tackle this issue for a domain in which everyone should acquire elaborate knowledge. Learning about mathematical principles and procedures should lead to knowledge that can be applied across a wide range of situations (e.g., Hatano and Oura, 2003). Given the role of self-guided learning and performance in the development of mathematical abilities and concepts, recent studies have focused on the question how and when children spontaneously recognize that an everyday situation can be tackled by mathematical thinking (Hannula and Lehtinen, 2005; Hannula et al., 2010; McMullen et al., 2011). Furthermore, children should develop the skills necessary to flexibly spot and apply shortcut strategies spontaneously. It is not sufficient if they can apply a shortcut when explicitly told to do so. Adaptive expertise (Verschaffel et al., 2009) includes to autonomously regulate whether (a) to solve an arithmetic problem in a standard way or to (b) search for / apply a shortcut.

Taking the commutativity principle as a model case, past research has explored how children spontaneously spot and apply shortcuts that allow saving effort in addition problems by flexibly changing the order of addends. Wealth of research has shown that children have at least some understanding of the concept of commutativity before entering school (Baroody and Gannon, 1984; Resnick, 1992; Cowan and Renton, 1996; Wilkins et al., 2001; Canobi et al., 2003). After interviewing elementary school children how they solved problems with two addends, (Baroody et al., 1983) report an extensive use of commutativity. During development children increasingly integrate conceptual knowledge about mathematical principles and procedural knowledge about shortcuts (Haider et al., 2014). Knowledge integration should lead to transfer between procedurally different shortcuts that are based on the same mathematical principle and therefore likely both associated to the respective conceptual knowledge. In a first step, (Gaschler et al., 2013) provided a correlative study to explore this idea. They assessed spontaneous usage of two procedurally different shortcuts that are both based on the commutativity principle in children of different age. While shortcut usage was observed from second grade onwards, correlations between the usage of the two different shortcuts only emerged by grade four. In the current study we aimed at moving beyond correlational data. We tested whether being exposed to one commutativitybased shortcut helps to spot and apply a different shortcut option based on the same mathematical principle. Note that in a parallel line of research, we have observed that instructions do not seem to do the job. Instructing children to use one specific shortcut does hinder rather than assist them in spontaneously spotting and applying a different shortcut based on the same mathematical principle later on (Godau et al., submitted). Instructions about specific procedures might corrupt flexibility in shortcut usage (cf. ErEl and Meiran, 2011). Even when participants knew that a formerly instructed rule would no longer apply, they found it difficult to search for different shortcut options (see also Bilalic et al., 2008a,b; Bilali ´ c and McLeod, 2014 ´ ). Therefore, in the current work we focused on spontaneous use of the strategies. We explored whether it is possible to foster the discovery and application of shortcut strategies by transfer between different non-instructed shortcut strategies that are based on the same mathematical principle. Note that according to Baroody and Gannon (1984) understanding of commutativity was not evident in all those who invented shortcuts, but in all those who comprehend addition as a binary rather than as a unary operation. The unary view would suggest that one number is added to another, rather than that they are added together.

Specifically, the commutativity principle enables students to flexibly change the order of addends within a problem. For instance, given the problem 4 + 7 + 6, it might be easier to calculate (6 + 4) +7 (6 + 4 adds up to 10 which makes it easy to finally add 7, i.e., "Ten-strategy"). One can also use commutativity across problems. If, for instance, a student receives the problem 8 + 5 + 7 =?, and then 5 + 7 + 8 =?, he/she can refrain from calculating the second problem presupposed he / she recognizes the applicability of the commutativity principle (i.e., "addendscompare strategy"). Three-addends problems were used, because we wanted to investigate usage of the commutativity principle with unfamiliar problems. It is debatable if three-addends problems imply only the commutativity principle or additionally also the associativity principle. Associativity is the property that problems in which terms are decomposed, and recombined in different ways, have the same answer [(a + b) + c = a + (b + c)]. In the problems we used, children have to change the order of the addends [a + b + c = (a + c) + b], because otherwise it is not possible to add a + c first. Commutativity justifies changing the order or sequence of the operands within an expression while associativity does not.

In Experiment 1, we used eye tracking to explore how children search and apply different commutativity-based shortcuts. Verschaffel et al. (1994) presented third-graders with threeaddends problems and assessed eye movements combined with verbal report and found that in 71% of all possible cases commutativity was used. We used a different approach, as we rather were interested in whether children spontaneously start search processes when, after a change in the material one shortcut option is no longer present. The findings suggested that being offered an opportunity to apply one commutativity-based shortcut can help to search for and apply a different shortcut based on the same principle when the first one is no longer feasible. In Experiment 2, we explored whether transfer from shortcut to shortcut might be concept specific: on the one hand, it seems plausible that shortcuts based on the same mathematical principle trigger each other because they are linked to one-another directly or indirectly (as they are both linked to the common conceptual knowledge). This perspective is in line with research suggesting that mathematical knowledge develops in an iterative fashion, with conceptual change influencing procedural change and vice versa (Byrnes and Wasik, 1991; Hiebert and Wearne, 1996; Rittle-Johnson et al., 2001; Waldmann, 2006). For instance, Canobi (2009) showed that children's conceptual advances were predicted by their initial procedural skills. On the other hand, transfer from shortcut to shortcut might occur place for motivational reasons unrelated to the specific shortcut and underlying mathematical principle. After having experienced that task processing can be simplified by a shortcut, one might be more apt to search for and apply *any* shortcut, as one has learned that attractive shortcut options do seem to exist in the material provided.

## **EXPERIMENT 1**

In Experiment 1, we used eye tracking in order to explore the fixation patterns reflecting the usage of shortcut strategies. We were furthermore interested in how fixation patterns reflect how people accommodate to being presented with new sets of arithmetic problems within which the previously feasible shortcut no longer applies (but instead a different shortcut). To this end, children at first had to solve problems that could be facilitated by the tenstrategy (of three addends, the first and the last add up to 10). After that, they were presented with problems that allowed for the use of the addends-compare strategy (some problems contained the same addends as their precursor in different order). Both strategies are based on the commutativity principle.

## **METHOD EXPERIMENT 1**

## *Participants*

Twenty children participated in Experiment 1 (mean age 8.6 years). They were tested individually in a laboratory at Humboldt-Universität, Berlin.

## *Procedure and Materials*

Research procedures of these experiments were approved in a peer review process for applying for public funding (German Research Foundation, DFG) and were completed with approval of the Institutional Review Board of the Department of Psychology at Humboldt-Universität, Berlin. Students were informed about the content of the study and that data analysis would preserve anonymity. We ensured written informed consent of the parents. Children were than tested individually with a 250 Hz video-based eye tracker (SMI RED 250). Packages of six problems in black on a gray background were shown on a 22 TFT monitor, with the student sitting at approximately 50 cm distance. Digits were approximately 0.5 cm wide and 1 cm tall.

Children started with a five-point calibration. Afterwards the experimenter showed a single example problem and explained that the children should utter the result as quickly and as accurately as possible. Children started the main part by working on two screens with six ten-strategy problems each (first and last addend add up to 10). They then completed two screens with addends-compare problems intermixed with baseline problems. Two of six problems per screen contained identical addends in different order as the preceding problem (problems listed in the Supplementary materials). Each problem was presented in one line and consisted of three different addends between 2 and 9 (maximum result was 24; 0 and 1 were excluded as addends). We balanced problem size between the addends-compare problems and the baseline problems so that they were equally difficult for children unless they used the shortcut (for more details Gaschler et al., 2013; Haider et al., 2014).

Children were presented the first screen (of two) with six tenstrategy problems. The experimenter moved the cursor to the right of the equal sign of the first problem and waited for an answer. The answer was immediately entered as the time log of the first key press served to determine the calculation time as the span from the cursor allocation to the first (i.e., two-digit results) key press of entering the result for the current problem. After entering the answer, the experimenter moved the cursor to the next problem. The entered results remained visible on the screen while working on the remaining of the six problems of the package. This was especially important for the work on the two screens with addends-compare problems later on. If they had spotted that the addends of a problem were the commuted version of the preceding problem, that way they were provided with the opportunity to access the solution they had given on the previous problem.

## **RESULTS**

The computerized assessment allowed to track solution times on the level of single problems. As previously mentioned, students calculated 12 ten-strategy problems (Screen 1 and 2) and afterwards worked on yet another 12 problems, four of them allowed for the addends-compare strategy (Screen 3 and 4). **Figure 1** shows the mean solution times per problem for each screen. Students were faster on addends-compare problems as compared to baseline problems. A 2 (screen: first vs. second) × 2 (problem type: addends-compare problem vs. baseline problem) ANOVA with solution times as dependent variable revealed a significant main effect of problem type, [*F*(1*,* 19) = 7*.*46, *p* = 0*.*01, η2 *<sup>p</sup>* = 0*.*28]. Neither the main effect of screen, [*F*(1*,* 19) = 1*.*67, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*21, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*08], nor the interaction effect were significant, [*F*(1*,* 19) <sup>=</sup> <sup>0</sup>*.*72, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*41, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*04]. We did not find significant effects when repeating the above analyses with error rate as dependent variable (see Supplementary materials).

The analysis of the eye tracking data suggests that the tenstrategy and the addends-compare strategy can be identified by specific fixation patterns. Using the ten-strategy, adding the first and last addend first to receive the result ten, should be fast and necessitates little fixation time on the outer numbers. Adding the middle number afterwards and uttering the result might therefore result in more fixation time on the middle number relative to the other numbers. **Figure 2** suggests that the percent fixations falling on the middle vs. outer numbers of the threeaddends problems are distributed in line with this reasoning. The percentage of fixations on the middle number increased from the first to the second screen of the ten-strategy problems, as students presumably discovered the structure of the problems.

When the ten-strategy could no longer be used (first screen with addends-compare problems), the percent fixations on the middle digit were low again. Surprisingly, it increased on the second screen with addends-compare problems. A 2 (screen: first vs. second) × 2 (ten-strategy problems vs. addends-compare problems) ANOVA with percentage of fixations falling on the middle number as dependent variable revealed a significant main effect for strategy [*F*(1*,* 17) <sup>=</sup> <sup>6</sup>*.*02, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*26]. Children fixated the middle digit more in problems, in which the ten-strategy could be used compared to problems on the addends-compare screens. There was also a significant main effect of screen, [*F*(1*,* 17) <sup>=</sup> <sup>7</sup>*.*91, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*32], but no interaction, *F <* 1.

In the ten-strategy problems, addends should be checked within a line in order to identify shortcut options. In contrast, for the addends-compare strategy, it is necessary to compare the addends between the lines. Children should thus not only fixate the addition problem they are currently solving but also the previous one or the subsequent one in order to check whether a set of addends repeats. **Figure 3** presents the mean differences between (a) line fixated and (b) line of current problem. If, for instance, a student during solving a problem was fixating back on the problem in the line before, this would lead to a value of −1 for this particular fixation. While the majority of fixations were on the line of the current problem, some fixations were directed at previous (negative difference) or subsequent (positive difference) problems. We focused on comparing the above index of fixation position between the addends-compare problems and their preceding problems. Thus, addends are identical and only differ in order. We found a significant difference in the index of fixation position for these problems. In line with our assumption, students were fixating ahead on problems preceding the addends-compare problems and fixating back, once a set with identical addends was discovered, [*t*(18) = 5*.*44, *p <* 0*.*001].

In addition to identifying eye movement patterns that are specific for the shortcuts we found a significant correlation between the increase of the fixation on the middle digit in the tenstrategy problems (Screen 2—Screen 1) and the time benefit on addends-compare strategy problems *r* = 0*.*49, *p* = 0*.*05. Thus, increased usage of the commutativity-based shortcut offered on Screen 1 and Screen 2 might help in spotting and applying the other commutativity-based shortcut offered on Screen 3 and 4.

#### **DISCUSSION**

Providing children with the opportunity to spontaneously (without instruction or other hints) use one commutativity-based shortcut might help them to spot and apply another shortcut based on the same mathematical principle once the first one does no longer apply. Furthermore, the eye tracking data are in line with the interpretation that search processes might start once one shortcut no longer applies. We found that children in some cases checked addends of subsequent addition problems in advance (i.e., before uttering the result to the current problem and the allocation of the cursor to the next problem). Note that this implies that the accuracy to attribute calculation time to specific arithmetic problems might be limited in setups in which multiple problems are simultaneously presented. Such arrangements resemble work on arithmetic problems on worksheets in the schooling context. Eye tracking or reliance on aggregate measures from paper-and-pencil versions might both be useful approaches to this variant of the dilemma of external vs. internal validity.

Experiment 1 provided a first hint in line with the idea that there might be transfer from one shortcut to another one. This suggests two different explanations. On the one hand, spontaneously spotting and applying shortcuts on Screen 1 and 2 might affect processing of Screens 3 and 4 on a motivational route. Participants learn that shortcut options seem to exist and can be exploited. This would suggest that such transfer could take place from any easily identifiable shortcut to a second one. On the other hand, transfer might involve specific mathematical knowledge. It might first and foremost take place between shortcuts based on the same mathematical principle. We tried to disentangle these two perspectives in Experiment 2.

## **EXPERIMENT 2**

This experiment focused on the question if the ten-strategy facilitated the usage of the addends-compare shortcut. For this purpose, we compared three conditions: students in the ten-strategy warm-up condition started with the ten-strategy problems followed by problems that allowed for applying the addendscompare strategy (similar to Experiment 1). In the baseline warm-up condition, children worked on material with no shortcut option at all before being transferred to the addends-compare booklet. The inversion warm-up condition started with inversion problems (e.g., 9 + 2 − 2). Thus, a shortcut *not* based on the commutativity principle was offered first. This was important in order to test whether all shortcut strategies would alter the usage of the addends-compare shortcut simply by motivation children to look for shortcuts. Alternatively, it might be that only the ten-strategy increases the probability to spot the addendscompare strategy, as it is the only shortcut strategy, which is also based on the commutativity principle. It is conceivable that offering problems with an easy-to-find shortcut option (inversion or ten-strategy) might lead students to assume that it is worthwhile to search for shortcut options in general in later material. This could accordingly lead to transfer which is simply based on

the motivation to search for shortcuts. In contrast, a finding of transfer for the ten-strategy problems but not for the inversion problems would suggest that indeed triggering the basic principle of commutativity is important for transfer to occur.

## **METHOD EXPERIMENT 2**

#### *Participants*

We tested 153 children at the end of second grade (most of them were taught in combined classes of first and second grade) and 140 children in third grade. We ensured written informed consent of the parents in collaboration with the schools. Either group was provided with advance information concerning the content of the study (calculating mental arithmetic problems) and was informed that participation was voluntary. Parents and students were also informed that data analysis would preserve anonymity. Data were acquired in a classroom setting with paper and pencil. Gender was balanced as much as possible. Eleven children (second grade) and 20 children from the third grade were excluded by median ± 3 *MAD*s. The MAD is a robust method to detect outliers by using absolute deviation from the median; for further information see (Leys et al., 2013). For the descriptive data of the sample see **Table 1**.

## *Procedure and Materials*

The arithmetic problems were the same as in Experiment 1 and are listed in the Supplementary materials. Each problem was presented in one line and consisted of three different addends between 2 and 9 (maximum result was 24; 0 and 1 were excluded as addends). The different types of problems were presented as a paper pencil test in separate booklets. As dependent variable we measured the number of problems solved in the booklet that allowed vs. the booklet that did not allow for the addends-compare strategy. We took care that the amount of time provided per booklet was not sufficient to solve all problems so that we could use number of problems solved per time as a dependent variable (see **Table 1** for time provided per booklet).

**Table 1 | Sample data and time provided per booklet in Experiment 2.**


*\*We started with 210 s and than reduced it after testing one group of students in order to avoid ceiling effects.*

Experimental conditions differed in the warm-up booklet. The ten-strategy warm-up started with problems in which children could use the ten-strategy. The baseline warm-up conditions started with addition problems of comparable size, but that did not include any option for applying the commutativity principle to solve the problems (e.g., 4 + 3 + 5 or 7 + 6 + 2). A second control condition, the inversion warm-up condition, started with problems that allowed for a shortcut, but, importantly, not for a commutativity-based one. Inversion problems (e.g., 9 + 2 − 2) allow refraining from calculation by comparing the numbers involved in the problem mixing addition and subtraction. Thus, while the ten-strategy and addends-compare strategy are both based on the same arithmetic principle, inversion and addendscompare are not. However, on the surface the latter two shortcuts are similar as they both enables students to avoid calculation altogether (in contrast, the ten-strategy does reduce instead of avoid calculation demands).

After the warm-up phase, all children worked on five more booklets. Starting with (1) a booklet, where the addends-compare strategy could be used, they then were presented (2) a baseline booklet with no shortcut opportunities, followed by (3) another booklet, where the addends-compares strategy could be used. This second addends-compare booklet was applied as we had obtained high variability across students as well as large general practice effects in the first booklets in earlier work (Gaschler et al., 2013). Booklets 4 and 5 served the purpose to control whether the induced shortcut is known and would be used (see **Table 2**). The children in the ten-strategy warm-up condition received another booklet with addition problems allowing for the ten-strategy (4) plus afterwards a matched baseline booklet (5). This was also the case for children of the control condition with the baseline warmup. The children of the inversion warm-up condition worked for the second time on a booklet with inversion problems (4) followed by a matched baseline booklet (5).

Students were instructed to solve the problems as quickly and as correctly as possible. The time for each booklet was fixed and we counted the number of problems solved and errors. Students were additionally informed that it would be almost impossible to solve all problems during the period of time given for each booklet. As dependent measure we calculated the average time per problem on addends-compare booklets as compared to baseline booklets.

#### **RESULTS**

After the short warm-up phase, children were still rather slow in calculating the first set of addends-compare booklets and between students variability was rather high (see **Table 3**). On closer examination, we found that the practice effects were stronger than the effect of problem type. For further analysis we focused on the second addends-compare booklet. We first analyzed the effects of our different warm-up phases on the addends-compare problems. For calculating the addends-compare benefit in second graders, we subtracted for each child the average solution time per problem in Booklet 3 (addends-compare strategy) from the average time


per problem in Booklet 2 (baseline). The benefits are depicted in **Figure 4** separately for each of the three conditions in second and third graders. In addition, **Table 3** presents the average time per problem for every booklet for the second and third grade.

For the second graders with the ten-strategy warm-up phase, we observed a significant benefit on the addends-compare strategy problems compared to baseline problems *t*(47) = 2*.*48, *p* = 0*.*05. Second graders with the warm-up problems not allowing for any shortcut did not benefit from the addends-compare booklets relative to the baseline booklets. The inversion problems group also did not show such a benefit either. Third graders, however, seemed to use the addends-compare strategy in every warm-up condition. Each of the three warm-up groups significantly benefitted from the addends-compare strategy [tenstrategy: *t*(40) = 2*.*64, *p* = 0*.*05; baseline: *t*(39) = 3*.*71, *p* = 0*.*001; inversion: *t*(38) = 3*.*79, *p* = 0*.*001]. The time used to solve the addends-compare strategy problems was shorter than that needed to calculate the baseline problems.

We calculated a 2 (problem type: baseline vs. addendscompare booklet) × 3 (warm-up condition: ten-strategy vs. baseline vs. inversion warm-up) × 2 (grade: second vs. third grade) mixed ANOVA with mean benefit time as dependent variable. This ANOVA yield significant main effects of problem type [*F*(1*,* 256) <sup>=</sup> <sup>14</sup>*.*98, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*055] and grade [*F*(1*,* 256) <sup>=</sup> <sup>38</sup>*.*44, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*131] and a significant threeway interaction of problem type × warm-up condition × grade [*F*(2*,* 256) <sup>=</sup> <sup>3</sup>*.*75, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*028]. We found neither a significant main effect for warm-up condition, nor other interaction effects (see **Table 4**). The three-way interaction suggests that the different warm-up phases differentially affected second and third graders. Whereas the ten-strategy warm-up increased the probability of applying the addends-compare strategy in second graders, it did not in third graders. The results suggest that shortcut to shortcut transfer specific to the underlying mathematical principle was observed in second graders. Third graders, on the other hand, maybe spontaneously used the addends-compare shortcut anyways and thus did not profit from a prior task with a conceptually related shortcut.

One could argue that second graders did not show transfer from an inversion warm-up to addends-compare problems, because they did not discover the shortcut option in the

**Table 3 | Mean time per problem and standard deviation analyzed for booklet type and grade in Experiment 2.**


*\*See Table 2.*

inversion problems. Our manipulation checks do not support this alternative explanation. We analyzed the Booklets 4 and 5 (induction shortcut—and respective baseline). The results suggested that students were capable of using the inversion strategy (see **Table 5**). For the second graders, a 2 (Booklet 4 vs. 5) × 3 (warm-up condition) ANOVA revealed a significant interaction effect of both factors, [*F*(2*,* 139) <sup>=</sup> <sup>3</sup>*.*20, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*044]. It depended on the warm-up condition, whether the shortcut in Booklet 4 was used.

For the third graders we also found an interaction effect of Booklet 4 vs. 5 and warm-up condition, [*F*(2*,* 117) = 15*.*41; *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*208]. While there was a pronounced inversion effect, surprisingly, neither baseline warm-up condition nor the ten-strategy warm-up condition showed a ten-strategy effect in the booklets administered at the end of the experiment. We did not find relevant effects when repeating the above analyses with error rate as dependent variable, but needless to say we found different error rates in grade two and three (see Supplementary materials).

#### **DISCUSSION**

In Experiment 2, we tested whether it is possible to make students to spot and apply a shortcut strategy by first providing

an easy-to-find shortcut strategy based on the same mathematical principle vs. one based on a different principle. Our findings suggest that in second graders, transfer was related to the mathematical principle rather than to general motivational factors. There was no indication that second graders were motivated to search for and apply *any* shortcuts after being offered the first one. If the additional conceptual link between the two different strategies is the reason for the transfer, this would support understanding of adaptive expertise as the ability to apply meaningfully learned procedures flexibly and creatively (Hatano and Oura, 2003). The inversion warm-up phase—an easy-to-find shortcut that is not based on commutativity—did not lead to increased usage of the addends-compare strategy. While inversion did not promote transfer, our manipulation check suggested that inversion was indeed used. This is in line with Robinson and Dubé (2009) who found that the inversion shortcut is easier to apply than associativity (which is similar to commutativity). In both studies (Robinson and Dubé, 2009; Dubé and Robinson, 2010), inversion shortcut use was far more frequent than the associativity-based strategy. Focusing on commutativity as model case a limitation of the experiment is that we so far only used one shortcut not based on commutativity (i.e., inversion) in order to differentiate between transfer effects based on motivation vs. on mathematical principles shared by subsequently offered shortcut options. For instance, it would be interesting to know whether the current setup can be turned around with inversion usage as dependent variable and commutativity vs. inversion warm-up as independent variable (cf. Dowker, 2014). Generalizability beyond the specific pairing of shortcuts tested here might for instance depend upon the relative difficulty of shortcuts used as warm-up and dependent variable.

While the results suggest that second graders profited from shortcut-to-shortcut transfer based on commutativity, third graders did not seem to benefit from such extra scaffolding. Spontaneous usage of the addends-compare strategy was not improved further by a warm-up condition with a shortcut-option based on the same mathematical principle. We assume that in this age group, the concept of commutativity is more developed so that extra support is less needed. With further experience, students become increasingly able to rapidly generate adequate actions with less and less effort (Ericsson, 2008). In line with these findings, differences between second and third graders in their mathematical abilities are mirrored in functional changes of the brain. Rosenberg-Lee et al. (2011) examined the behavioral



## **Table 5 | Results of the ANOVA problem type × condition separately for grade 2 and 3.**

and neurodevelopmental changes between grades 2 and 3 and found that arithmetic complexity was associated with regions implicated in domain-general cognitive control but also regions for numerical arithmetic processing. The results showed that brain response and connectivity relating to an arithmetic task significantly change within the narrow 1-year interval.

## **GENERAL DISCUSSION**

We presume that one crucial feature of expertise is the ability to spontaneously recognize where and when knowledge can be applied to simplify task processing. In some domains, it is necessary for everyday life to develop this ability. Research of expertise showed that experts are more flexible and creative in their thought pattern. For instance, "super experts" were more flexible to find an optimal solution despite distraction by a non-optimal but salient solution of a chess problem (Bilalic et al., 2008a ´ ). Players at lower levels of expertise reported that they were looking for a better solution, but their eye movements showed that they continued to look at features related to the solution they had already thought of (Bilalic et al., 2008b ´ ). For expertise in object recognition, Harel et al. (2013) developed an interactive framework, which posits that expertise emerges from multiple interactions within and between the visual system and other cognitive systems, such as top-down attention and conceptual memory. The interplay between these other, multiple cognitive processes and perception are often not consciously accessible for the experts themselves (Palmeri et al., 2004).

In some parts of arithmetic, procedural and conceptual knowledge start to develop even before primary school. In the first years of primary school, integration of different fragments of procedural and conceptual knowledge should lead to a knowledge base that allows to spontaneously spot and apply shortcut options already in primary school. If successful, knowledge integration should lead to transfer between procedurally different shortcuts that are based on the same mathematical principle and therefore likely are both associated to the respective conceptual knowledge. For the case of commutativity, we tested whether different strategies that are based on the same principle trigger each other via the concept and so could support flexibility in strategy use. According to the adaptive expertise metaphor (e.g., Hatano, 1988; Star and Rittle-Johnson, 2008; Verschaffel et al., 2009) children first of all need to spontaneously recognize where knowledge can be applied.

Experiment 1 provided first evidence that children who are provided an opportunity to spontaneously spot and apply one shortcut might be more inclined to search for and use a second shortcut, once the first one no longer applies. This is in line with the suggestion to differentiate between (a) quick and accurate routine-based solving from (b) an adaptive use of solution strategies, which draws upon conceptual understanding (Hatano, 1988). Experiment 2 verified that transfer occurred from one shortcut to another. It furthermore specified that this transfer effect was not only based on motivation. While we obtained transfer (at least in second graders) from one commutativity-based shortcut to another commutativitybased shortcut, no transfer was observed between inversion and commutativity. Thus, our results are in line with the view that links between different elements of procedural knowledge and potentially conceptual knowledge (compare Haider et al., 2014) are used to spontaneously spot and apply shortcut options.

Several studies on commutativity have shown that children have at least some understanding of the concept of commutativity before entering school (Siegler, 1989; Resnick, 1992; Cowan and Renton, 1996; Wilkins et al., 2001; Canobi et al., 2003) and already first graders seem to understand the commutativity principle (Canobi et al., 2002). We thus focused on triggering the usage of knowledge rather than knowledge acquisition as such. In primary school, children should link different strategies based on the same concept and develop the ability to select an efficient strategy for the current problem (Verschaffel et al., 2009). As implied by these authors in the adaptive expertise metaphor, the learner should be able to spot and apply options for a shortcut independently without having to rely on instruction or explicit cues. In a similar vein, research on skill acquisition and expertise stresses the importance of linking perceptual skills and principle-knowledge in order to be able to spontaneously spot and apply shortcuts (e.g., Gentner and Toupin, 1986; Koedinger and Anderson, 1990; Haider and Frensch, 1996; Anderson and Schunn, 2000; Bilalic et al., 2008a; ´ Frensch and Haider, 2008). Adaptive strategy use can be regarded as the ability to select procedures that can simplify the solution of a problem (Selter, 2009). In the end the person should be faster and/or the solution should be more accurate. Strategy use can be seen as an indicator for the state of development of a mathematical concept. Adaptive strategy use necessitates shifts between: (a) calculating problems in the general mode (b) investing some time and effort to search for shortcut options, and (c) using a shortcut option. We are interested in factors that can tip the balance on the exploitation-exploration continuum. Experts know when to search for a new shortcut strategy and when not, children have to learn how much time and effort they want to spend for searching. Teachers etc. cannot sustainably take over the regulation of this dilemma calculating in standard way or flexible change strategies—they can only help children to calibrate the balance between flexibility vs. stability (or exploration vs. exploitation).

We have to acknowledge that the effects of spontaneously using a shortcut were small in many cases of the current experiments and the variability across students was large. This is to be expected when taking into account the difference between competence and performance (i.e., principle knowledge and application). Larger estimates of both procedural and conceptual knowledge have been obtained when knowledge was probed more directly (Prather and Alibali, 2009). Direct probing, however, does convey to the students that and which shortcut options exist. It is therefore not suitable when trying to measure the extent to which knowledge about a mathematical principle is applied spontaneously (cf. Haider et al., 2014). In addition, Robinson and Dubé (2012) have suggested that personality characteristics bridge between knowledge and application. They argued that some children have more positive attitudes toward accepting strategies that are highly efficient but are novel to their current strategy repertoire of algorithmic approaches. In a similar vein, (Guerrero and Palomaa, 2012) highlighted that some children change their strategies during calculation while some do not. Furthermore, children change their strategies for different reasons. It is not always the goal to choose the most efficient strategy (Newton et al., 2010) suggested that flexibility involves the use of strategies, which are considered the most appropriate for a given problem. They also discussed what "appropriate" means. It could be the most efficient or the most understandable strategy in a given situation. Which strategy in general is used depends on the problem, the numbers presented and other contextual, developmental, or personal factors (Newton et al., 2010; Guerrero and Palomaa, 2012). An U-shaped relationship between knowledge/understanding and variety of strategy use suggests that novices as well as experts may use a large variety of strategies (Siegler and Jenkins, 1989; Dowker et al., 1996). Experts like mathematic students used large numbers of appropriate strategies (Dowker et al., 1996) whereas children (novices) may use a large variety of appropriate and inappropriate strategies, because they have not yet acquired a small set of well-learned strategies (Dowker et al., 1996). In contrast to this assumption Newton et al. (2010) argued that low achieving students might be particularly appreciative and excited about a focus on multiple strategies to compare the possible ways to solve the problem and maximize the accuracy. Although the idea is prominent that an educational approach for low achieving children should promote routine mastery of a single well-thought solution strategy for a given type of problems (e.g.,Woodward and Baxter, 1997; Baxter et al., 2001). Future work should explore how students at different ability levels profit from sequences of problems allowing for different shortcuts based on the same mathematical principle.

In order to optimize the chances to measure spontaneous (i.e., no cues and no instruction) recruitment of knowledge about the commutativity principle we chose a paper-and-pencil test in the classroom in Experiment 2. Our informal observations suggest that children taking part in an eye tracking study on mental arithmetic appreciate that the measurement is (not only) about whether they solve the problems correctly, but also on how they solve them. The paper-and-pencil method was closer to usual test situations in the classroom. Children focused on being fast and accurate rather than on the fact that someone might be trying to assess *how* they solved the problems. Verschaffel et al. (2009) highlighted the importance of ecological validity for studies on adaptive expertise. We suggest that trial-by-trial process measures (as in our eye tracking experiment) and ecologically valid but less sensitive methods (as in Experiment 2) should be combined to convey the full picture. For instance, eye tracking can help to figure out whether increased time demands after a change in shortcut option reflect prolonged solution times or, alternatively, a mixture of prolonged solution times plus time invested in search for alternative shortcut options. Potentially, learners at different levels of expertise might differ in both the efficiency in spotting shortcuts as well as in using them. For instance, third graders might have discovered the options for the addends-compare shortcut relatively quickly even without a fitting warm-up condition.

In line with the research on adaptive expertise Verschaffel et al. (2009) or Star and Rittle-Johnson (2008) defined flexibility in problem solving as knowledge of multiple strategies and their relative efficiency. In addition to weighing different strategies according to their efficiency, students need to weigh the potential costs and benefits of flexible strategy usage. There are time costs of switching between strategies, once a shortcut option has been discovered (Lemaire and Lecacheur, 2010). Luwel et al. (2009) found longer response times but no reduced accuracy and the size of these switching costs varied as a function of the associative strength between a strategy and a particular problem. More importantly, there is a dilemma between (a) investing time and attention in order to spot potential shortcut options that might or might not exist and (b) using processing strategies readily available (e.g., Jepma and Nieuwenhuis, 2011). Thus, process measures that provide evidence on when, how and to what extent students invest in spotting and applying shortcuts (Haider and Rose, 2007) are necessary in order to better understand the bases of the transfer effect observed in Experiment 2. To illustrate the search process, we additionally used eye tracking assessment in the Experiment 1. On the one hand this is a more specific method than paper pencil and on the other hand we could measure the shift of attention. The eye tracking results are in line with the view of (Robinson and LeFevre, 2012). For discovering new strategies, children need to shift their attention to the relevant part of the problem. The eye movement patterns were different for the different shortcut strategies and fit to the points of interests of the according strategies.

## **ACKNOWLEDGMENTS**

This work was supported by Grant FR 1471/12-1 from the Deutsche Forschungsgemeinschaft (DFG) as well as by the Berlin Cluster of Excellence Image Knowledge Gestaltung (www. interdisciplinary-laboratory.hu-berlin.de). Some of the results were presented at the Conference of Experimental Psychologists 2013 in Vienna, Austria. We thank Christine Arndt, Kate Könnecke and Maria Wirth for help with data collection.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fpsyg*.* 2014*.*00556/abstract

## **REFERENCES**


and 3rd grades during arithmetic problem solving. *Neuroimage* 57, 796–808. doi: 10.1016/j.neuroimage.2011.05.013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 March 2014; paper pending published: 16 April 2014; accepted: 19 May 2014; published online: 10 June 2014.*

*Citation: Godau C, Haider H, Hansen S, Schubert T, Frensch PA and Gaschler R (2014) Spontaneously spotting and applying shortcuts in arithmetic—a primary school perspective on expertise. Front. Psychol. 5:556. doi: 10.3389/fpsyg.2014.00556 This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Godau, Haider, Hansen, Schubert, Frensch and Gaschler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The Einstellung effect in anagram problem solving: evidence from eye movements

## *Jessica J. Ellis and Eyal M. Reingold\**

*Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University, Austria*

#### *Reviewed by:*

*Robert Gaschler, Universität Koblenz-Landau, Germany Peter McLeod, Oxford University, UK*

#### *\*Correspondence:*

*Eyal M. Reingold, Department of Psychology, University of Toronto Mississauga, South Building, Room 2037B, 3359 Mississauga Road North, Mississauga, ON L5L 1C6, Canada*

*e-mail: reingold@psych.utoronto.ca*

The Einstellung effect is the counterintuitive finding that prior experience or domain-specific knowledge can under some circumstances interfere with problem solving performance. This effect has been demonstrated in several domains of expertise including medicine and chess. In the present study we explored this effect in the context of a simplified anagram problem solving task. Participants solved anagram problems while their eye movements were monitored. Each problem consisted of six letters: a central three-letter string whose letters were part of the solution word, and three additional individual letters. Participants were informed that one of the individual letters was a distractor letter and were asked to find a five-letter solution word. In order to examine the impact of stimulus familiarity on problem solving performance and eye movements, the central letter string was presented either as a familiar three-letter word, or the letters were rearranged to form a three-letter nonword. Replicating the classic Einstellung effect, overall performance was better for nonword than word trials. However, participants' eye movements revealed a more complex pattern of both interference and facilitation as a function of the familiarity of the central letter string. Specifically, word trials resulted in shorter viewing times on the central letter string and longer viewing times on the individual letters than nonword trials. These findings suggest that while participants were better able to encode and maintain the meaningful word stimuli in working memory, they found it more challenging to integrate the individual letters into the central letter string when it was presented as a word.

**Keywords: eye movements, problem solving, anagrams, Einstellung effect, insight problems**

## **INTRODUCTION**

The concept that stimulus familiarity and previously acquired domain knowledge might impair problem solving performance has been referred to by a variety of interrelated terms including functional fixedness, negative transfer, mental set, and Einstellung. Functional fixedness refers to cases where familiarity with habitual uses of objects blocks other uses from being considered. For example, in the classic "candle-box" insight problem introduced by Duncker (1945) the presented use of the box as a container is hypothesized to interfere with the required consideration of the box as a shelf for supporting the candle. Similarly, negative transfer refers to the notion that the retrieval of previously acquired stimulus–response associations can impair the establishment and maintenance of new stimulus-response associations (e.g., Schultz, 1960; Sweller, 1980; Chrysikou and Weisberg, 2005; Landrum, 2005; Osman, 2008). Finally, a problem solving set or mental set refers to the negative impact of prior exposure to similar problems (either pre-experimental or during the experiment), which triggers a familiar but inappropriate solution and prevents alternative solutions from being considered. The Einstellung effect (*Einstellung* is German for attitude), which was originally demonstrated by Luchins'(1942) seminal series of water jar experiments, constitutes an excellent illustration of the negative impact of a mental set (for a review see Bilali´c and McLeod, 2014). In this paradigm, habitual approaches to problem solving are induced through exposure to multiple problems that have

similar solution methods. When a problem is subsequently presented for which the habitual solution method is not appropriate, many participants claim that the problem is unsolvable. However, naive participants can find the solution quickly, thus showing that the problem is not intrinsically difficult and that the difficulty experienced by solvers reflects the negative impact of prior experience.

When considered in the context of human expertise, the idea that prior experience and stimulus familiarity might interfere with problem solving performance seems at first blush to be rather counterintuitive. This is because there is a large body of research demonstrating that stimulus familiarity and domainspecific knowledge acquired through extensive and deliberate practice underlie the superior performance of experts relative to their less skilled counterparts (for a review see Ericsson and Charness, 1994). However, experts are not immune to the negative impact of prior experience and stimulus familiarity as demonstrated in studies of expertise in medicine (de Graaff, 1989; Croskerry, 2003; Gordon and Franklin, 2003) and chess (Saariluoma, 1992; Reingold et al., 2001b; Bilali´c et al., 2008a,b, 2010; Sheridan and Reingold, 2013; Bilali´c and McLeod, 2014). For example, Bilali´c et al. (2008a) employed eye movement monitoring to study the Einstellung effect in chess experts. Players were required to find a checkmate with the fewest number of moves. There were two possible solutions: a familiar five-move sequence and a less well-known three-move sequence (the optimal

solution). After identifying the familiar solution, expert chess players reported that they were searching for the optimal one. However, the eye movement record revealed that their attention continued to be directed more often towards chess board regions involved in the familiar rather than the optimal solution. Thus, it appears that the Einstellung effect demonstrated in chess experts was due to the familiar scenario activating a schema in memory that directs attention towards information relevant to itself, and away from other information (Bilali´c et al., 2008a, 2010; Bilali´c and McLeod, 2014).

In the present study, in order to further investigate the negative influence of stimulus familiarity on problem solving, we monitored participants' eye movements while they performed a modified anagram problem solving task that was introduced by Ellis et al. (2011). Anagram tasks provide a unique opportunity to study the Einstellung effect in a domain of expertise possessed by most adults, that is, their familiarity with words. In addition, unlike most problem solving tasks that were employed to study the Einstellung effect, the use of anagrams allows for the creation of a large number of independent trials in which an Einstellung effect might occur. Anagrams have long been used to study insight problem solving (for a review see Ellis et al., 2011) as well as to demonstrate the negative impact of a mental set on problem solving performance (e.g., Rees and Israel, 1935; Maltzman and Morrisett, 1952; Kaplan and Schoenfeld, 1966; Juola and Hergenhahn, 1967). In particular, it has been established (e.g., Beilin and Horn, 1962; Ekstrand and Dominowski, 1965, 1968; Tresselt and Mayzner, 1965; Mayzner and Tresselt, 1966) that solution rates are lower and response times are slower when the solution word (e.g., HEART) is scrambled to create a word anagram (e.g., EARTH) than a nonword anagram (e.g., THREA). It is likely that the familiar word anagram produces activation (orthographic, phonological, lexical, and/or semantic) that is irrelevant to the solution and hinders the decomposition and restructuring operations that are required to produce the solution word (e.g., Hollingworth, 1935; Devnich, 1937).

The present investigation involved eye movement monitoring during anagram problem solving. As illustrated in **Figure 1**, the anagram task we used consisted of six uppercase letters: a centrally located three-letter string, plus three individual letters positioned above and to either side of the central letter string. The central letter string could be arranged either as a familiar three-letter word, or as a meaningless string of three letters. Participants were asked to produce a five-letter solution word, and were informed that one of the individual letters was a distractor letter that was not part of the solution word. Using a similar anagram task, Ellis et al. (2011) reported that near the beginning of trials, viewing times on the distractor and solution letters were indistinguishable, meaning that participants did not immediately perceive that the distractor letter was irrelevant to the solution. Towards the end of trials, viewing times on the distractor letter decreased relative to the solution letters, indicating that partial solution knowledge had developed, and this change occurred several seconds prior to solution. Importantly, the pattern of viewing times was the same regardless of whether or not participants reported a subjective experience of insight upon solution, thereby

demonstrating a dissociation between the subjective experience of insight and the objective accumulation of solution knowledge. In the present study we expected to replicate the pattern reported by Ellis et al. (2011). In addition, based on previous findings with anagrams (e.g., Beilin and Horn, 1962; Ekstrand and Dominowski, 1965, 1968; Tresselt and Mayzner, 1965; Mayzner and Tresselt, 1966) we predicted better problem solving performance when the central letter string was presented as a nonword than a word. Finally, we explored differences in the pattern of looking behavior as a function of the familiarity of the central letter string.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Sixty undergraduates from the University of Toronto Mississauga participated in exchange for partial course credit or \$10. All participants had normal or corrected-to-normal vision and were fluent English speakers.

## **APPARATUS**

An SR Research EyeLink 1000 eye tracking system was used to record participants' eye movements with a sampling rate of 1000 Hz. The stimuli were displayed on a 19-inch Viewsonic monitor with a refresh rate of 75 Hz and a screen resolution of 1024 × 768. Participants were seated 60 cm from the display and used a chinrest with a head support to minimize head movement. Following calibration, gaze-position error was less than 0.5◦.

## **MATERIALS**

Anagram problems consisted of six uppercase letters: a centrally located three-letter string, plus three individual letters positioned above and to either side of the central letter string (see **Figure 1** for examples). All three letters in the central letter string belonged to the solution word, while only two of the individual letters belonged to the solution word, with the third being a randomly placed distractor letter. The task was to combine two of the three individual letters with the central letter string to create a five-letter solution word. Each anagram problem had only one possible solution, meaning that the distractor letter did not allow for the formation of any alternative five-letter words.

Each anagram problem could be presented in either "word" or "nonword" condition. In the "word" condition, the central letter string consisted of a three-letter word, while in the "nonword" condition, the central letter string consisted of a scrambled nonword version of those same three letters (see **Figure 1**). For each anagram problem, the identity and location of the three individual letters was the same across both conditions, such that the only difference between a given anagram problem in the two conditions was the configuration of the central letter string as a word or nonword. In the "word" condition, the central letter string words had a mean frequency of 435 per million (SD = 1243) according to Brysbaert and New (2009).

Solution words were made up of five unique letters, always began with a consonant, and contained either one vowel (33% of problems) or two vowels (67% of problems). Solution words had a mean frequency of 175 per million (SD = 396) according to Brysbaert and New (2009). The central letter string always consisted of two consonants and a vowel, as did the three individual letters. In order to remind participants that the three letters in the central letter string must always be included in the solution word, these letters were displayed in green, in a slightly smaller and bolder font than the three individual letters, which were

displayed in black. Each anagram problem subtended approximately five visual degrees in height and 14 visual degrees in width.

The location of the individual distractor letter was counterbalanced across anagram problems. In an attempt to avoid any a priori bias away from the distractor letter, we matched the distractor letter with the other two individual solution letters in terms of letter frequency (averaged across all five possible letter positions within the solution word) using tables by Mayzner and Tresselt (1965). Across all experimental anagrams, the mean frequency of the distractor letter was no different from the mean frequency of the individual solution letters (distractor *M* = 193, SD = 91, solution *M* = 199, SD = 115, *t* < 1).

## **PROCEDURE**

Participants completed six practice trials followed by 72 experimental trials. Half the anagram problems were presented in the "word" condition and half were presented in the "nonword" condition, and both anagram order and central letter string type were randomized for each participant. Across participants, each anagram problem was presented an equal number of times in the "word" and "nonword" conditions.

Every trial began on a blank screen with a central fixation cross. After 1000 ms, the anagram problem appeared and remained on the screen until a response was made, or until the trial timed out after 45 s. Participants were instructed that speed of responding was of utmost importance and were discouraged from verifying their solution prior to response, even if that might elicit the occasional incorrect solution. Participants pressed a button on the response pad in order to respond. The stimulus display then disappeared and participants verbalized their answer to the experimenter, who provided feedback as to whether or not their response was correct.

After every trial, participants were asked to classify their subjective experience of solving the anagram problem. Participants selected one of the following options (from Novick and Sherman, 2003) by pressing a corresponding button on the response pad.


We considered options 1 and 2 to describe subjective experiences of insight, and labeled all trials where participants selected option 1 or 2 as "popout" trials. Option 3 does not describe a subjective experience of insight, so trials where participants selected option 3 were labeled "non-popout" trials. Participants made another button press to advance to the next trial at their own pace.

## **RESULTS**

Our main focus in this experiment involved examining the effect of central letter string type (word vs. nonword) on mean task performance and eye movement measures. However, we also wanted to ensure that manipulating central letter string type did not alter the nature of the problem solving task as compared to prior findings (Ellis et al., 2011). Accordingly, while we primarily focus on the differences between word and nonword trials, we also examined any interaction between the familiarity of the central letter string and participants' subjective experience of insight (i.e., trials in which the solution was experienced as emerging suddenly were classified as popout trials, whereas trials in which the solution was experienced as gradual were classified as non-popout trials; see Method section for details). Across participants, 52.7% of trials were correct, 4.2% of trials were incorrect, and 43.1% of trials timed out with no solution. Mean response time for correct trials was 14.3 s (SD = 3.3 s), with 69.0% of correct trials classified by participants as popout, and 31.0% classified as non-popout. The effect of central letter string type on overall problem solving performance is summarized in **Table 1**. Importantly, response times were significantly slower for word trials than for nonword trials. In addition, there was a numerical trend toward lower accuracy for word trials than nonword trials, although this difference did not reach significance. Finally, there was no impact of stimulus familiarity on the subjective experience of insight problem solving, as shown by the virtually identical proportion of word trials and nonword trials that were classified as popout.

Eye movement analyses were performed only on correct trials, and only included fixations that could be assigned to a particular item in the stimulus display. Specifically, a fixation was assigned to an individual letter or to the central letter string if it fell within a 192 pixel diameter circle around that item (these fixation areas did not overlap; for an illustration, see **Figure 1C**). Within correct trials across participants, 69.9% of fixations were assigned to the central letter string, 28.3% of fixations were assigned to one of the three individual letters, and 1.8% of fixations could not be assigned to either the central letter string or the individual letters. Assigned fixations were then converted to dwells, where a dwell is defined as one or more consecutive fixations within the same area prior to an eye movement to another area. As shown in **Table 1**, corresponding to the slower response times for word than nonword trials, there was a greater number of dwells per trial on both the central letter string and the individual letters for word trials as compared to nonword trials, revealing

the classic negative influence of familiarity on problem solving performance.

However, several fine-grained eye movement measures, shown in **Figure 2**, revealed a more complex pattern of the effects of central letter string type. More specifically, we calculated overall means for the following eye movement measures: (a) duration of the initial latency on the central letter string (i.e., the interval from stimulus onset until the first eye movement that exited the central letter string area); (b) dwell duration on the central letter string during subsequent revisits; and (c) dwell duration on the individual letters. For each eye movement measure, we carried out a 2 × 2 ANOVA with subjective report (popout vs. non-popout) and central letter string type (word vs. nonword) as independent variables. As can be seen in **Figure 2**, some eye movement measures revealed facilitation for word trials relative to nonword trials, while other eye movement measures revealed interference. In addition, there were no significant interactions between central letter string type and subjective report for any eye movement measure (all *F*s < 1.45, n.s.), indicating that the differences between word and nonword trials were the same for both popout and non-popout trials. Specifically, the initial latency on the central letter string was much shorter for word trials than for nonword trials (**Figure 2A**; *F*(1,59) = 29.78, *p* < 0.001), and this processing advantage for words over nonwords was also present for subsequent dwells on the central letter string (**Figure 2B**; *F*(1,59) = 7.78, *p* < 0.01).This processing advantage for word trials is likely due to a working memory advantage in encoding and maintaining the central letter string when it is arranged as a word as compared to a nonword. In marked contrast, dwell duration on individual letters revealed a processing disadvantage for word trials. Specifically, dwells on individual letters were longer for word trials than for nonword trials [**Figure 2C**; *F*(1,59) = 25.31, *p* < 0.001]. This processing disadvantage might be due to difficulty in integrating the individual letters into the central letter string when it is in the form of a unitary gestalt.

In addition, we contrasted viewing times on the distractor and solution letters during the first half and second half of trials as a function of both the familiarity of the central letter string (word vs. nonword) and the reported subjective experience (popout vs. non-popout). Based on the findings reported by Ellis et al. (2011), viewing times for the distractor letter and the solution letters were expected to be the same at the beginning of trials, whereas solution knowledge towards the end of trials should result in lower viewing times on the distractor letter as compared to the solution letters. To examine this prediction, we compared the proportion



*SEs are shown in parentheses.*

of time spent on the distractor letter and the mean of the two solution letters separately for the first and second half of trials. Accordingly, we conducted 2 × 2 × 2 ANOVAs on the proportion of viewing time with letter type (distractor vs. solution), subjective report (popout vs. non-popout) and central letter string type (word vs. nonword) as independent variables. As can be seen in **Figures 3A–D**, for all conditions, there was no difference in the first half of trials between viewing times on the distractor letter and the solution letters (all *t*s < 1.50, n.s.). Likewise, the ANOVA revealed no significant main effects or interactions (all *F*s < 2.74, n.s.). In contrast, in the second half of trials for all conditions, a significantly greater proportion of viewing time was spent on the solution letters as compared to the distractor letter (all *t*s > 3.83, all *p*s < 0.001). In this case, the ANOVA revealed a significant main effect of letter type [*F*(1,59) = 75.65, *p* < 0.001] but no other main effects or interactions approached significance (all *F*s < 0.23, n.s.). The lack of any main effect or interaction involving central letter string type suggests that the accumulation of solution knowledge prior to insight is independent of the effects of the familiarity manipulation. Finally, we also examined the pattern of looking behavior during the first and second half of trials in which participants failed to provide a solution for the anagram. As shown in **Figures 3E,F**, failure to solve anagrams was reflected by a small but significant tendency for greater viewing time on the distractor letter relative to the solution letters [*F*(1,59) = 5.51, *p* < 0.05]. No other main effect or interaction was significant (all *F*s < 2.48, n.s.).

## **DISCUSSION**

The main goal of the present study was to explore the negative influence of familiarity on performance in a simplified anagram problem solving task. We replicated prior findings from the anagram literature that showed that task performance is poorer when anagrams are presented in word form than when they are presented as scrambled letters (e.g., Beilin and Horn, 1962; Ekstrand and Dominowski, 1965, 1968; Tresselt and Mayzner, 1965; Mayzner and Tresselt, 1966). This effect is thought to be due to the difficulty in breaking the gestalt of the existing word in order to rearrange the letter order and form a new word. However, participants' eye movements in the present study revealed a more intricate pattern of the effects of stimulus familiarity on anagram problem solving, including both interference and facilitation. Specifically, the present study documented shorter viewing times on the central letter string when it was presented in word form than in nonword form, suggesting that participants were better able to encode and maintain the central letter string in working memory when it was a meaningful word than when it was a meaningless string of letters. This finding is consistent with the well established perceptual encoding and working memory advantage for familiar stimuli relative to unfamiliar stimuli (e.g., Chase and Simon, 1973; Reingold et al., 2001a).

Of more interest is what the eye movement record revealed about how familiarity interferes with anagram problem solving. Our paradigm demonstrated that the interference of stimulus familiarity was due to longer viewing times on the individual letters when the central letter string was presented as a word than when it was presented as a nonword. As originally proposed by Gestalt psychologists, these longer viewing times on the individual letters might suggest that participants find it more challenging to integrate the individual letters into the central letter string when it is a holistic entity. Supporting evidence for this possibility comes from simple letter-insertion tasks that were introduced by Reingold (1995). For example, similar to the present task, in a letter-insertion task participants were presented with a letter string that was either a word (e.g., CASH) or a nonword (e.g., CRAH) and were required to insert one of two alternative letters into the letter string to create a word (e.g., CRASH). Performance was substantially better for nonword than word letter strings (Reingold, 1995). Thus, consistent with the conceptualization of Gestalt psychology, it seems intuitive that restructuring and integrating new elements into a pre-existing holistic representation would be

more difficult than integrating items into an unrelated collection of problem elements. However, more work is required in order to specify the mechanisms underlying this theoretical possibility, and to explain how familiarity interferes with the integration process.

Finally, a previous eye movement study of the Einstellung effect in chess experts suggested that the activation of familiar schemas in memory creates perceptual biases towards information that confirms these schemas and away from information that is required to find a less familiar but more optimal solution (e.g., Bilali´c et al., 2008a). Our eye movement analysis did not reveal a perceptual bias towards the central letter string when it was presented in word form. This was likely due to the encoding and processing advantages that are associated with the familiar central word. These advantages might have allowed problem solvers to easily maintain familiar stimulus information in working memory

while directing their visual attention elsewhere in the stimulus display. However, participants' processing resources might have been captured by the familiar central letter string in ways that our eye movement methodology could not reveal. Specifically, it might be that the familiarity of the central letter string causes an unhelpful bias in the search for solution words based on the irrelevant orthographic, phonological, lexical, and/or semantic activation associated with the central word. Unlike previous findings which demonstrated that an exhaustive encoding mental set could be replaced by a selective encoding strategy which ignores irrelevant aspects of the stimulus regardless of stimulus familiarity (e.g., Gaschler and Frensch, 2007, 2009; see also Dreisbach and Haider, 2008, 2009), in the present study, participants were seemingly unable to ignore the irrelevant but familiar central word. This is likely due to the fact that the irrelevant activation caused by the central word was involuntary in nature and required processing resources in order to oppose it. Taken together, the present findings and prior results indicate that while stimulus familiarity and domain knowledge are clearly fundamental to establishing expertise, these aspects of skilled performance are not without their pitfalls when a problem solving scenario requires flexibility.

## **ACKNOWLEDGMENT**

This research was supported by an NSERC grant to Eyal Reingold.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 April 2014; accepted: 12 June 2014; published online: 02 July 2014. Citation: Ellis JJ and Reingold EM (2014) The Einstellung effect in anagram problem solving: evidence from eye movements. Front. Psychol. 5:679. doi: 10.3389/fpsyg.2014. 00679*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Ellis and Reingold. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Interference between face and non-face domains of perceptual expertise: a replication and extension

## *Kim M. Curby1\* and Isabel Gauthier <sup>2</sup>*

*<sup>1</sup> Department of Psychology, Macquarie University, Sydney, NSW, Australia <sup>2</sup> Department of Psychology, Vanderbilt University, Nashville, TN, USA*

### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Amy L. Boggan, Young Harris College, USA Eva M. Dundas, Carnegie Mellon University, USA*

#### *\*Correspondence:*

*Kim M. Curby, Department of Psychology, Macquarie University, Building C3A, Room 409, Sydney, NSW, Australia e-mail: kim.curby@mq.edu.au*

As car expertise increases, so does interference between the visual processing of faces and that of cars; this suggests performance trade-offs across domains of real-world expertise. Such interference between expert domains has been previously revealed in a relatively complex design, interleaving 2-back part-judgment task with faces and cars (Gauthier et al., 2003). However, the basis of this interference is unclear. Experiment 1A replicated the finding of interference between faces and cars, as a function of car expertise. Experiments 1B and 2 investigated the mechanisms underlying this effect by (1) providing baseline measures of performance and (2) assessing the specificity of this interference effect. Our findings support the presence of expertise-dependent interference between face and non-face domains of expertise. However, surprisingly, it is in the condition where faces are processed among cars with a disrupted configuration where expertise has a greater influence on faces. This finding highlights how expertise-related processing changes also occur for transformed objects of expertise and that such changes can also drive interference across domains of expertise.

**Keywords: perceptual expertise, interference, faces, objects, holistic processing**

## **INTRODUCTION**

Face perception is often described as a domain of perceptual expertise. Our skill with faces manifests itself across many different tasks and is often particularly impressive for familiar faces. For example, normal adults can recognize familiar faces with accuracy >90% despite not having seen some of these faces for over 35 years (Bahrick et al., 1975). Most people are as fast to categorize an image as a "face" as they are to categorize it at an individual level ("Bill Clinton's face"; Tanaka, 2001). In contrast, observers are much slower to categorize an image of a bird at a similar subordinate level—for example, categorizing an animal as a "cardinal" is slower than categorizing it at the basic level, "bird" (Tanaka and Taylor, 1991). But even the processing of unfamiliar faces outshines our performance with other objects in several respects. For instance, observers can retain more faces in visual short-term memory than they can other objects (Curby and Gauthier, 2007; Curby et al., 2009). Face processing is also more sensitive to subtle changes in the spatial-relations between features than object processing (Haig, 1984; Hosie et al., 1988; Kemp et al., 1990; Bruce et al., 1991).

Our skill with faces is believed to result, at least in part, from our extensive experience with them (but see Kanwisher, 2000), and to be mediated by the acquisition of a holistic processing strategy (Diamond and Carey, 1977; Richler et al., 2011a). What holistic means varies to some extent, as it is sometimes described as a sensitivity to configuration, a global (as opposed to local) information sampling strategy, a situation where perceiving the whole is greater than the sum of its parts, or integrality of processing for different dimensions (see Richler et al., 2012 for a review). These different meanings motivate authors to use a number of tasks to compare face and object recognition. One commonly used meaning of holistic processing is as a failure of selective attention. For instance, Young et al. (1987) asked people to name the identity of part of a face composite, and found they were unable to do so while ignoring other parts of the composite. When the composite is inverted or the composite parts are misaligned, people can more easily selectively attend to a face part. This has been replicated in variations on this original paradigm, such as in matching tasks with unfamiliar faces (Farah et al., 1998; Richler et al., 2008; Curby et al., 2013). Like several other hallmarks of face perception [e.g., the inversion effect; Rossion et al., 2002; Curby et al., 2009; the sensitivity to spatial frequency content; McGugin and Gauthier, 2010; recruitment of the fusiform face area (FFA); Gauthier et al., 2000, 2005; McGugin et al., 2012a], this kind of holistic processing has also been obtained for non-face categories when expert observers are tested, for instance for cars in car experts (Gauthier et al., 2003; Bukach et al., 2010), chess displays in chess experts (Boggan et al., 2012), and novel objects after expertise training in the lab (Gauthier and Tarr, 2002; Wong et al., 2009).

Based on the idea that face processing can be understood as a kind of expertise, it has been suggested that it may share more resources with the processing of other objects of expertise than with typical object perception. The logic is simple: if the perceptual strategies and neural substrates were found to be similar, this may lead to interference when two categories of expertise are processed simultaneously. We originally tested this prediction in an electrophysiological study using the composite paradigm to measure holistic processing and a neural marker of expertise, the N170. We recruited participants with a range of car-recognition skills, from none to extensive, and developed a paradigm in which participants processed faces and cars concurrently (Gauthier et al., 2003). Participants matched the bottom parts of face and car composites, while faces and cars alternated. In this 2-back partmatching task, we were able to measure holistic processing of faces when presented in two different contexts: (1) Among normal cars, which car experts were found to process more holistically than car novices, and (2) among cars with inverted tops, which car experts did not process holistically. Therefore, we expected that holistic processing of normal cars would compete with that of faces, only in car experts. Indeed, we found that faces in the context of normally configured cars were processed less holistically [i.e., there was less influence from the to-be-ignored (top) part on bottom judgments] than those presented in the context of cars in a transformed configuration (tops inverted). These results suggested a functional overlap between face and car processing that is related to an individual's level of expertise with cars.

Since our original study, there have been other studies providing evidence of interference between faces and objects of expertise using event-related potential (ERP; Rossion et al., 2004, 2007), functional magnetic resonance imaging (fMRI; McGugin et al., 2014), or other behavioral paradigms (McKeeff et al., 2010; McGugin et al., 2011). There are also other studies suggesting that the processing of faces and words may compete during development and influence their lateralization in the brain (Dehaene and Cohen, 2011; Dundas et al., 2013). However, Gauthier et al. (2003) is the only study that looked at functional overlap specifically in holistic processing. The goal of the present study was to replicate this finding (Simons, 2014) and to explore its underlying mechanisms using baseline conditions that were not used before.

The measurement of holistic processing is relatively complex, with holistic processing in the composite task quantified using a difference score between two indices of discriminability (each a *d*- measure that depends on a hit-rate and a false-alarm rate). The design used by Gauthier et al. (2003) not only requires holistic processing to be measured for both faces and cars concurrently, but also the calculation of an interference index which is the relative amount of holistic processing for faces in two different car contexts. A significant correlation of this interference index with a measure of car expertise can be obtained for several reasons, and our goal was to try to understand what led to this correlation.

The interference index is a difference of differences: the congruency effect for faces in the context of normal cars – the congruency effect for faces in the context of transformed cars, with each congruency effect being a difference score itself. One concern with correlations with difference scores is that the variance captured in a correlation can come from the main condition, the control condition, or both (e.g., DeGutis et al., 2013). This is not necessarily a problem, depending on the construct measured, but it can lead to misleading interpretations. The difference score that yields the congruency effects is central to the definition of holistic processing as a failure of selective attention and authors generally do not consider its components further (DeGutis et al., 2013; Richler and Gauthier, 2014). In contrast, the difference between holistic processing in the two contexts is not a unitary construct. The original prediction is that interference occurs in one context (when both faces and cars are shown in their normal configurations) and that it is not found in the other context (when cars are transformed so that they do not engage expert processes in car experts).

Here we first replicated the original finding (Experiment 1A), then unpack the effect in ways that were not explored before. In particular, we ask whether interference as a function of car expertise is attributable to the condition in which faces are shown in a normal car context. To preview our results, we find that it is not, and so we set out to compare the effect to different baseline conditions, in the hope of clarifying the locus of the effect. In Experiment 1B, we test our prediction that when comparing to a baseline with no irrelevant parts, it would be the car experts' performance that would show interference, and not car novices. The baseline will also help characterize the interference as facilitation in congruent trials or interference in incongruent trials. Finally in Experiment 2, we replace cars with novel objects to assess whether the interference between two domains can be obtained when performance on the interleaved task is matched, but does not tap into expert processes.

## **EXPERIMENT 1A**

To assess the robustness of this effect, we first conducted a replication of the study previously reported in Gauthier et al. (2003).

## **METHOD**

## *Participants*

Thirty-five individuals with normal or corrected-to-normal vision volunteered to participate for payment: 17 self-reported as car experts and 18 as novices (six women, one reporting as a car expert). The rights of the subjects were protected according to a protocol approved by Vanderbilt University's Institutional Review Board. The data from two novices were later discarded, one because of poor overall performance in the task (54%) and the other because he was an outlier (>3 SD) on our interference index (see design and procedures).

## *Stimuli*

For the car expertise test, 120 pictures of different year and/or model cars and 120 pictures of different bird species from viewpoints varying from profile to three-quarter view were used (**Figure 1**). In the interference task, 336 grayscale (256 × 256 pixels) composite images of cars (profile) and faces (front view) made out of the top and bottom of different original images (64 faces and 64 cars) were used (**Figure 2**; see Gauthier et al., 2003). All images had a horizontal red line covering the seam between the two parts. In half of the car images the top part was inverted. The stimuli were presented on a 19-inch monitor with a display resolution of 1280 × 960 pixels. Participants sat ∼70 cm from the screen. The position of participants' heads was not fixed.

## *Design and procedure*

Self-report of expertise is not always a good predictor of performance (Diamond and Carey, 1986; Rhodes and McLean, 1990; McGugin et al., 2012b) and thus participants were required to perform a car expertise test (**Figure 1**; see Gauthier et al., 2000) in addition to the main interference task (**Figure 2**). This car expertise test yielded a quantitative estimate of their perceptual skill with cars relative to their skill with a baseline category, birds. Over 224 trials, participants matched sequentially presented, 256 × 256 pixel, grayscale images of cars and birds on the basis of their model

or species (see **Figure 1**). The first image was presented for 1000 ms and was followed by a mask for 500 ms. Then the second image appeared and remained until either the subject made a response or 5000 ms had passed. Performance on the bird trials provided a baseline measure for individual differences at subordinate-level matching for a category of familiar objects in the absence of expertise. As in Gauthier et al. (2003), a car expertise score was calculated by subtracting the *d* for birds from the *d* for cars for each individual.

In the interference task, participants performed 1020 trials (60 practice, 960 experimental) in which an image was presented centrally either for 1500 ms or until they made a response. Images alternated between car and face composites (see **Figure 2**). Participants pressed a key indicating whether the bottom of the current image was the same or different from the last image of the same category, triggering the presentation of the next image. Thus, participants performed a 2-back part-matching task in which they were told to always ignore the top of cars and faces.

Similar to the paradigm used in Gauthier et al. (2003), car configuration was manipulated to influence the extent to which they should elicit HP in car experts: (i) an upright normal condition (**Figure 2A**) and (ii) an inverted-top condition (**Figure 2B**). The two interference conditions alternated in 15 blocks of 60 trials (a break was given every 30 trials). Half of the trials were congruent, where the information from the to-be-ignored top parts would lead subjects to make the same judgment as the information from the attended bottom part (when compared to the 2-back stimulus from the same category). The other trials were incongruent; information from the to-be-ignored top part would lead subjects to make the opposite judgment as the attended bottom part.

Notably, if participants could follow instructions and completely ignore the top part of composites when making 2-back judgments on the bottom part, it would make no difference whether the top part was congruent or incongruent with the correct response for the bottom part. Thus, the degree to which the irrelevant top parts influence judgments about the task-relevant bottom part provides an index of HP (as in Wenger and Ingvalson, 2002, 2003; Gauthier et al., 2003).

## **ANALYSIS**

Face-matching trials performed in the context of cars with inverted tops and those performed in the context of cars with upright tops were split into congruent and incongruent trials. The car trials were also split into congruent and incongruent trials. Sensitivity (*d*- ) was calculated for the congruent and incongruent trials for each of the face (upright car-top context, inverted car-top context) and car (upright tops, inverted tops) conditions. HP was operationalized as the sensitivity for congruent minus incongruent trials (HP = *d*- congruent – *d*- incongruent).

The Interference index was then calculated by subtracting the amount of HP for faces in the high interference condition, where the faces were processed in the context of upright cars, from that in the low interference condition, where faces were processed in the context of cars with inverted tops. This index provides a measure of the change in HP of the faces due to manipulating the configuration of the cars. Because modifying the configuration of objects of expertise has been shown to impact HP (Young et al., 1987), this index will allow us to detect any trade-offs in HP between the two tasks. Crucially the faces and the cars presented in both conditions were identical except for the orientation of the top (irrelevant) part of the cars, and therefore any difference in HP of the faces between the two conditions can be attributed to the context within which the faces were processed.

## **RESULTS**

Expertise in car recognition varied from none to extensive. There was little variability in bird-matching performance (none of our participants reported any special experience with birds and birdmatching performance was low, ranging from 0.12 to 1.38 *d*- , consistent with their self-report) compared to car-matching performance where *d* scores ranged from 0.37 to 3.76. Consistent with past work, there was a modest, non-significant, correlation between car and bird scores (*r*<sup>32</sup> = 0.28, *p* = 0.10).

Even though participants were never asked to make a judgment about the top, they apparently could not refrain from processing both faces and cars holistically (see **Table 1**). This bias was stronger for faces than cars (*t*<sup>32</sup> = 7.941, *p* < 0.0001, *d* = 2.81), this is likely a result of more extensive expertise with faces (Gauthier and Tarr, 2002). Normal cars were processed more holistically than


**Table 1 | Sensitivity (***d* **and % accuracy) and the derived measures of holistic processing and interference for subjects in Experiment 1A divided in a novice and expert group according to a median split on the measure of car expertise.**

transformed cars, a manipulation check (HP expressed as *d*- for normal cars: 0.86 ± 0.07, for transformed cars: 0.42 ± 0.09, *t*<sup>32</sup> = 3.601, *p* < 0.002, *d* = 1.27).

Faces seen in the context of normal vs. transformed cars led to approximately the same degree of HP when expertise was ignored (HP for faces seen in the context of normal cars: 1.38 ± 0.08, for faces seen in the context of transformed cars: 1.35 ± 0.10, *t*<sup>32</sup> = 0.289, *p* = 0.77, *d* = 0.10). Critically however, when car expertise was taken into account, HP for faces depended on the configuration of the interleaved cars; as predicted, individuals with higher levels of car expertise had higher interference indexes (HP for faces seen in the context of transformed cars minus that for faces seen in the context of normal cars; *r*<sup>32</sup> = 0.45, *F* = 7.67, *<sup>p</sup>* <sup>=</sup> 0.009; **Figure 3**)1.

Intriguingly, while most experts had a positive interference index, suggesting that they processed faces more holistically in the context of cars in a modified, rather than intact, configuration, most novices had a negative interference index. This suggests that car novices actually processed faces more holistically in the context of cars in an intact, rather than modified, configuration. We also looked at the correlations between car expertise and each of the two face conditions separately: car expertise did not predict HP for faces viewed among normal

**FIGURE 3 | Relationship between the participants' car expertise score (sensitivity for cars minus their sensitivity for birds in the expertise test) [***d*- **] and the face interference index defined as the amount of holistic processing for faces when cars were in a new configuration compared to the holistic processing of faces when cars were in their normal configuration.**

cars (*r*<sup>32</sup> = −0.01, *p* = 0.96), whereas it predicted HP for faces viewed among transformed cars (*r*<sup>32</sup> = 0.45, *p* = 0.007), a significant difference (Steiger's *Z* = −2.05, *p* = 0.04). These findings suggest that the interference between faces and cars occurs

<sup>1</sup>Here we subtracted bird scores from car scores to index car expertise, consistent with prior work with this paradigm. However, in more recent work with car experts, we have regressed out performance in a non-car task from a car task to assess domain-specific effects (McGugin et al., 2014). The partial correlation between the interference index and car *d*- , partialing out bird *d*- , was *r*31=0.38, *p* = 0.01).

in the transformed car condition, which was not the original prediction.

### **DISCUSSION**

Consistent with previous findings, Experiment 1A provided evidence of interference between face and car processing as a function of expertise with cars (Gauthier et al., 2003). These data demonstrate that interference across different domains of perceptual expertise, as measured via the impact on an established index of holistic perception, is a robust and replicable effect. However, the effect was unexpectedly driven by an interaction between faces and transformed cars. That is, car novices and car experts differed more in their face processing in a transformed car context than in the familiar car context (see **Table 1**). This is inconsistent with the original predictions that motivated this task (i.e., that in car experts, face recognition would compete most with the processing of whole cars that produce more HP).

One reason why these findings are relatively difficult to interpret is that car experts may differ from novices in the amount of HP in several ways. For instance, car experts may show more HP of cars than car novices because they experience more interference from incongruent to-be-ignored parts, more facilitation from to-be-ignored parts, or both. Likewise, the difference in HP of faces by car experts and car novices in the different car contexts can also be driven by a difference in facilitation, interference or both. In addition, while we measured the effect of car expertise as a continuous variable, it may seem reasonable to predict that relative to a baseline condition where holistic processing is not implicated, it would be performance for the car experts, and not for the novices, that would show an interaction with face processing. But because our initial prediction that faces and normal cars would be mainly responsible for this interaction was not supported, we decided to better characterize the effect. While the theoretical significance of whether facilitation and/or interference are critical in this interaction across domains is unclear at the moment, the proposed link between expertise and holistic processing makes a strong prediction that novices should not be affected by the presence of the irrelevant part, while car experts should.

Therefore, in Experiment 1B, we tested new car experts and novices to provide baseline conditions without to-be-ignored parts. These baselines will be used to estimate whether car experts and car novices in Experiment 1A differ most in facilitation from congruent to-be-ignored parts, or interference from to-be-ignored parts.

## **EXPERIMENT 1B**

To ask whether the expertise effects observed in Experiment 1A influence facilitation, interference, or both, we presented participants with only the bottom half of each image. This experiment provided baseline measures to which the sensitivity of congruent and incongruent trials could be compared.

#### **METHOD**

#### *Participants*

Twenty-two (six females) subjects with normal or correctedto-normal vision and varying levels of car expertise who had not performed in Experiment 1A participated in this study for payment or course credit. The rights of the subjects were protected according to a protocol approved by Vanderbilt University's Institutional Review Board.

## *Materials*

Stimuli were the same as Experiment 1A except the top half of each was removed.

## *Design and procedure*

The procedure was the same as in Experiment 1A except only the bottom half of the car and face images were presented (i.e., the parts above the red line in **Figure 2** were omitted).

## **ANALYSIS**

The data were divided into expert and novice groups based on participants' car expertise indices (i.e., the difference between carand bird-sensitivity in the car expertise test; see Experiment 1A for details). Consistent with previous studies, we defined experts as individuals with a *d* for cars >2 and a car expertise index (car *d*- –bird *d*- ) >1 (Gauthier et al., 2000). The data from Experiment 1B was also divided into groups of experts and novices. To facilitate comparison across Experiments 1A and 1B, the average expertise index for each of the groups was matched across the two studies. This was done in such a way as to exclude data from as few participants as possible, based only on their car expertise scores. The resulting mean *d* and sample sizes (Experiment 1A/Experiment 1B) for the groups were as follows: Expert 1.83 (*N* = 14)/1.81 (*N* = 9) and Novice 0.37 (*N* = 14)/0.36 (*N* = 9).

In general it is more statistically powerful to use car expertise as a continuous variable than as a dichotomous variable. However, because Experiments 1A and 1B include different subjects who are not matched individually but in groups of novices and experts, we report next a series of ANOVAS on Experiment 1A alone and relative to the baselines obtained in Experiment 1B in which car expertise is treated as a dichotomous variable.

## **RESULTS**

## *ANOVA comparing expert and novice performance in the 2-back (tops present) task (Experiment 1A)*

A 2 (group; novice, expert) × 2 (car top context; upright, inverted) was performed on the HP measures for the new groups created from the data reported in Experiment 1A. Consistent with a role of car expertise in modulating the effect of car context on face processing, as revealed in the correlation analysis reported above, a significant interaction emerged between group and car context, *<sup>F</sup>*(1,26) <sup>=</sup> 5.97, *<sup>p</sup>* <sup>=</sup> 0.022, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.064. Car context had a different effect on face processing depending on car expertise; for novices, inverting the top of the cars led to a decrease in HPforfaces. In contrast, for car experts, inverting the top of the cars led to an increase in HP for faces. There was also a marginally significant effect of group, *<sup>F</sup>*(1,26) <sup>=</sup> 4.13, *<sup>p</sup>* <sup>=</sup> 0.053, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.090, suggesting that car experts in general processed the faces more holistically. There was no main effect of car context on HP of faces (*F* < 1). Planned *t*-tests showed that while HP for faces among normal cars failed to differentiate between car experts and novices, *t*(26) = 0.30, *p* = 0.77, *d* = 0.12, car experts processed faces shown among

transformed cars more holistically than car novices, *t*(26) = 2.76, *p* = 0.01, *d* = 0.48.

Planned comparisons on HP for cars revealed more HP for upright cars than cars with inverted tops in car experts, *t*(26) = 3.58, *p* = 0.001, *d* = 1.40, but not in novices, *t*(26) = 1.03, *p* = 0.31, *d* = 0.40 (see **Figures 4A,B**). In fact, HP was significantly different from 0 in all car conditions (all *p*s ≤ 0.0005) except for cars with inverted tops in car experts (*p* = 0.13). This suggests that inverting car tops made them easier to ignore for car experts.

## *ANOVA comparing expert and novice performance in 2-back half (tops absent) task (Experiment 1B)*

A 2 (group; experts, novices) × 2 (category; faces, cars) ANOVA on the results for the baseline task (no-top) revealed a significant interaction between group and category, *F*(1,16) = 7.34, *<sup>p</sup>* <sup>=</sup> 016, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.041. As expected, car novices and experts did not differ in their performance matching face bottoms (*p* > 0.05), but car experts were better at matching car bottoms than novices (*p* < 0.001).

## *Comparison of Experiment 1A against baselines from Experiments 1B*

**Figure 4** illustrates the results in Experiment 1A in each condition, for car novices and car experts, relative to the baselines obtain in Experiments 1B where no to-be-ignored parts were present. As we have seen, these baselines indicate when participants performed the task with only the to-be-attended parts, car novices and car experts only differed in their processing of car parts, being better in car experts. Now we use these baselines to interpret the results of Experiment 1A and specifically ask in what way car novices and car experts differ.

The results of planned *t*-tests comparing sensitivity (*d*- ) in the top-present congruent and incongruent conditions (from Experiment 1A) with their respective no-top baselines (from Experiment 1B) reveals that HP during the top-present task was mainly due to interference from incongruent top halves rather than facilitation from congruent ones (**Figure 4**). Incongruent conditions led to lower sensitivity than their respective baselines (all *p*s < 0.05) in all cases except for cars with inverted tops in both groups (*p*s < 0.05). In contrast, the only condition with significant facilitation was for

the mean.

conditions in Experiment 1A (when the top of cars were upright and when

faces seen in the context of cars with inverted tops, in car experts (*p* < 0.05).

### *Summary and discussion*

Car experts experienced less interference than car novices for transformed cars (**Figures 4A** vs. **B**), while they experiences more facilitation from congruent face tops processed in this context (**Figures 4C** vs. **D**).

One account of these results is that because car experts processed cars in a transformed configuration less holistically than regular cars, they therefore recruited less HP resources. Therefore, once the top part of the car is inverted it can be effectively ignored by car experts.

Because the car context was manipulated, while the face task was the same in both car contexts, and because car novices and car experts did not differ in their processing of face parts in a car context when there was no to–be-ignored part, it is reasonable to infer that this is what led to facilitation by congruent face parts in the transformed car context. Exactly why this happened though is unclear but it is plausible that the transformed car context allowed car experts to reduce executive control when car tops were easier to ignore, and as a result also exerted less effort trying to ignore face tops.

Most experiments using the composite task have not used a baseline condition to examine facilitation/interference, but when it has been used, the congruency effect obtained for aligned faces was generally due to interference from incongruent parts, without significant facilitation (Richler et al., 2008). This is consistent with what we observed here in car novices (even if the baseline was obtained in a different group). This highlights how abnormal the processing of faces among transformed cars was in our car expert participants.

It should be noted that the choice of a baseline condition such as the isolated relevant parts used in Experiment 1B can be difficult. In the comparison between Experiments 1A and 1B, there are differences in stimuli (parts vs. composites) and in task requirements (requirement to selectively attend or not).

Given the complexity of the 2-back interleaved dual task, it is possible that other task components unrelated to HP and affected by car expertise, such as more general effects related to the executive control and/or short-term memory load demands of the task, play a role in producing these effects. Experiment 2 investigates an alternative account of the interference between different expert domains, assessing whether explanations appealing to contributions from these more general effects can be ruled out.

## **EXPERIMENT 2**

Our proposed account of why car experts showed more facilitation from to-be-ignored congruent face parts in the context of transformed cars points to how car experts were better able to ignore inverted car tops. Experts showed no congruency effects of inverted car tops while novices did. Non-face objects are not processed holistically in the composite paradigm, which generally means that they do not show more of a congruency effect in a normal than transformed (typically misaligned) configuration. However, there is sometimes a small but significant congruency effect for non-face objects that is not modulated by configuration (e.g., Wong et al., 2009; Richler et al., 2011b). There are situations where training has a main effect of reducing this congruency effect for stimuli in a transformed configuration (Chua et al., 2014), similar to the difference between our car novices and experts in Experiment 1A.

The absence of holistic processing for cars with inverted tops among car experts, but not novices, suggests an alternative account of the interference between face and car processing that is unrelated to expertise. Because it was the transformed car context that drove the interference effect, it is possible that the same interference effect would be observed when faces are processed in the context of any object category that does not show a congruency effect, regardless of expertise. This would suggest that an alternative account, grounded in the more general demands of the two concurrently performed tasks, rather than in participants' expertise, would better explain this interference.

In the original paradigm, the condition where car novices process faces in the context of normal cars should had offered a test of this hypothesis. However, there was a significant congruency effect for cars in car novices here, perhaps in part because they had some non-negligible experience with cars (this congruency effect could also be amplified by cars being processed in the context of faces, see Richler et al., 2009).

Therefore, to test this hypothesis, we designed simple and unfamiliar stimuli in an oval shape with parts defined by colored gratings (varying in both hue and luminance, see **Figure 5A**) and in a pilot experiment, we verified that their processing in the composite task did not produce any congruency effect. Here we ask whether faces processed in our dual task with these "egg" stimuli would produce the same facilitation as observed in Experiment 1A. If the increase in the facilitation component of HP for faces processed in the context of transformed cars is simply due to the fact that the to-be-ignored parts of these transformed objects are easy to ignore, then we should observe it here. In contrast, if facilitation is not observed, this would indicate that the interaction between selective attention to faces and the car context is specifically dependent on car expertise.

## **METHOD**

## *Participants*

We recruited fourteen volunteer participants (seven females), to match the size of the car expert group in Experiment 1. The rights of the subjects were protected according to a protocol approved by Vanderbilt University's Institutional Review Board.

## *Stimuli*

Face stimuli were the same as in Experiment 1. Stimuli were constructed from 64 oval shapes or "eggs" approximately the same size as the face stimuli with 15 vertical stripes of two alternating colors. Twelve different colors were used, selected from the Adobe Photoshop© palette to be similar but still distinguishable (all shades of blue, green or purple). The 64 original eggs were split in half and recombined (in the same manner as the face stimuli) to create 128 composite eggs used in the experiment (see **Figure 5A**).

judgments in Experiment 2. The results of congruent and incongruent trials are plotted against the baseline obtained from the same subjects performing the same task with no top half present.

## *Design and procedure*

The procedure was identical to that of Experiment 1, except that whole faces were processed in only one context, that is the orientation of the top part of the egg was not manipulated (**Figure 5A**). The same participants also performed a baseline task with no tops on faces or eggs (as in Experiment 1B). There were an equal number of top-present and no-top trials, presented in four blocks of 240 trials. The two conditions alternated and their order was counterbalanced across participants. As in Experiment 1, participants performed 2-back judgments on the bottom half of all images.

#### **RESULTS AND DISCUSSION**

As expected, the eggs produced a pattern of sensitivity very similar to that of cars with an inverted top in Experiment 1 (**Figure 5B**). There was no significant HP for eggs, *t*(13) = 1.21, *p* = 0.25, *d* = 0.67. Sensitivity for the egg baseline did not differ from the baseline for cars among car experts in Experiment 1, *t*(21) = 0.77, *p* = 0.45, *d* = 0.34. Sensitivity for eggs was also not significantly different from that for cars with inverted tops among experts in Experiment 1, both in the congruent, *t*(26) = 0.68, *p* = 0.50, *d* = 0.27, and incongruent conditions, *t*(26) = 1.16, *p* = 0.26, *d* = 0.45.

However, unlike in Experiment 1, this context did not lead to facilitation for congruent face trials. There was no significant facilitation (congruent trials relative to no-top baseline) for faces processed in the context of eggs, *t*(13) = 1.97, *p* = 0.07, *d* = 1.09. Comparing with Experiment 1, the estimate of facilitation from faces among eggs was both indistinguishable from that obtained from car novices matching faces among transformed cars, *t*(26) = 1.27, *p* = 0.22, *d* = 0.50, and significantly less than that obtained from car experts matching faces among cars with inverted tops, *t*(26) = 2.39, *p* = 0.02, *d* = 0.94.

The results of Experiment 2 suggest that it is not the lack of HP from the transformed car context *per se* that led to facilitation from congruent face trials in Experiment 1 as the egg stimuli were also not processed holistically, yet did not impact holistic face perception. Further, the findings of Experiment 1 also demonstrate that the interferences was not simply a general effect of participants car expertise (because in the upright car condition there was no facilitation from the to-be-ignored part for faces) nor because faces were processed among cars (because facilitation for faces was not obtained in car novices). Rather, face HP was specifically influenced by the concurrent processing of transformed objects of expertise.

## **GENERAL DISCUSSION**

Our results replicated the interference between holistic processing of faces and cars observed in Gauthier et al. (2003). Having found that the interaction depended on the processing of faces among transformed cars, we investigated these effects further by using a dual-task with isolated parts to partition congruency effects into interference from incongruent parts and facilitation from congruent parts. We found that car expertise was associated with less interference from incongruent car parts, and that in this context, congruent face parts produced more facilitation. Finally, we showed that this facilitation effect for faces was not obtained in another dual task with objects that produced no congruency effect, suggesting that the interaction depends on expertise.

Our results highlight the fact that transformed objects of expertise can lead to effects that are distinct from control objects. While there has been much more focus on how the processing of whole objects of expertise is special (both faces and objects), there is no question that experts also process parts differently. This is most clearly shown in our results by the advantage of car experts on car novices for car part performance. There are other examples of transformed objects of expertise producing effects that are distinct from control objects. For example, the ERP response (N170) to inverted faces is larger than the response for upright faces and delayed by 10 ms, while other objects elicit a response of much smaller amplitude and invariant to orientation (Rossion et al., 2000). This is also found in subjects who have been trained with non-face objects (Rossion et al., 2002). It is possible that these responses triggered by transformed objects of expertise index a mechanism that can interfere with expert processing of objects from another category.

Recent findings from studies of other domains of expertise, such as chess, that have also been shown to result in increased HP as indexed via the composite task, support the suggestion that transformed stimuli of expertise can trigger stronger responses in brain regions linked with perceptual expertise than their intact versions. For example, greater FFA activation was found in response to chess stimuli among chess experts when the structure of the chess stimuli was distorted compared to intact (Bilali´c et al., 2011). Further, recent findings suggest that disrupted objects of expertise, such as the transformed cars used here, may trigger a search for structure or meaningful chunks by experts that appears to also involve a frontal-parietal network (Bartlett et al., 2013; Rennig et al., 2013; see also Bor and Owen, 2007). These existing findings, and the frequency with which transformed objects of experts (inverted, scrambled, misaligned etc.) are used as a control or comparison stimulus category, highlight the importance of further studies exploring the processing of such objects among experts.

Another possibility is that the expertise-related interference between face and car processing primarily reflects an attentionbased effect. For instance, because car experts can more easily selectively attend to the bottom part and thus ignore the (taskirrelevant) top part in the transformed car condition, they may "relax" control of their attention in this task-context, resulting in more intrusions of congruent face parts. The effect could be carried by facilitation because interference from incongruent face parts may already be strong to start with. Our egg control task suggests that such facilitation is not observed in any situation where the to-be-ignored part is easily ignored – perhaps a certain degree of fluency with the relevant parts is also required. This account is obviously quite speculative and will require further testing.

Although this study cannot reveal where in the brain interactions between faces and cars may occur, our task is likely to engage parietal and frontal areas implicated in short-term memory (Goldman-Rakic, 1987; Belger et al., 1998) as well as the FFA (Courtney et al., 1997; Grady et al., 1998; Haxby et al., 2000; Druzgal and D'Esposito, 2001), which is the part of the brain most associated with the idea of a "face module" (Kanwisher et al., 1997; McCarthy et al., 1997). Activity in the FFA increases directly with the short-term memory load for faces (Druzgal and D'Esposito, 2001). This region is a plausible candidate for a locus of interference obtained in dual experts for a number of reasons. In particular, it is recruited for both cars and faces in car experts (Gauthier et al., 2000, 2005; McGugin et al., 2012a); the activity in this area in response to cars correlates with behavioral measures of car expertise (Gauthier et al., 2000) and although it is not the only area to show an effect of expertise for cars, when the task demands are made more difficult, as was the case here, effects in these other regions drop whereas the expertise effect in the FFA remains (McGugin et al., in press). Finally, a recent fMRI study found that car expertise effects in the FFA survived manipulations of clutter and of divided attention, but that they were abolished when cars were presented in the context of faces, especially when the faces were also task-relevant (McGugin et al., 2014). Much work remains to be done to relate this example of competition between faces and cars in the FFA with whole objects and the present finding of interaction between faces and transformed objects of expertise during a dual-task that requires selective attention.

Other findings of expertise-dependent interference between the concurrent processing of face and non-face objects of expertise suggest that not only can faces and objects of expertise be processed in a similar manner, as well as neurally close in space and time, but the neural networks responsible for their HP may not be functionally independent (Gauthier et al., 2003; Rossion et al., 2004, 2007; McKeeff et al., 2010; McGugin et al., 2011). Importantly, it is not necessary to postulate that processing of faces and cars depend on overlapping sets of neurons to account for competition between the two domains. It is sufficient to assume that face and object processing are closer together in experts than novices in "cerebral functional space" (Kinsbourne and Hicks, 1978); cerebral functional space refers to the physical size of and distance between brain areas responsible for different functions. This only assumes that competition is more likely between neural ensembles that are more densely interconnected, and/or that are separated by fewer synapses.

In conclusion, our results are consistent with previous findings of observable interference across different domains of real-world expertise where the particular domains are proposed to rely on a common resource. This functional overlap between face and non-face domains of expertise has implications for the potential of extensive learning, as in the case of real-world expertise, to lead to a dynamic reorganization of cognitive resources. Because most normal adults possess a certain degree of expertise with faces, it may be important to consider training and application environments for real-world experts and assess the extent to which competition between different domains can impact learning and performance.

## **ACKNOWLEDGMENTS**

Kim M. Curby was supported by funding from the Australian Research Council (DE130100969). This work was also supported by funding from NSF Award BCS-0091752, National Eye Institute Grant R01-EY13441, and a grant from the James S. McDonnell Foundation to Isabel Gauthier.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 June 2014; accepted: 11 August 2014; published online: 10 September 2014.*

*Citation: Curby KM and Gauthier I (2014) Interference between face and non-face domains of perceptual expertise: a replication and extension. Front. Psychol. 5:955. doi: 10.3389/fpsyg.2014.00955*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Curby and Gauthier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Visual appearance interacts with conceptual knowledge in object recognition

## *Olivia S. Cheung1,2\* and Isabel Gauthier <sup>3</sup>*

*<sup>1</sup> Department of Psychology, Harvard University, Cambridge, MA, USA*

*<sup>2</sup> Center for Mind/Brain Sciences, University of Trento, Trentino, Italy*

*<sup>3</sup> Department of Psychology, Vanderbilt University, Nashville, TN, USA*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Elan Barenholtz, Florida Atlantic University, USA Thomas A. Farmer, University of Iowa, USA*

#### *\*Correspondence:*

*Olivia S. Cheung, Department of Psychology, Harvard University, 33 Kirkland Street, Cambridge, MA 02138, USA e-mail: sccheung.olivia@gmail.com* Objects contain rich visual and conceptual information, but do these two types of information interact? Here, we examine whether visual and conceptual information interact when observers see novel objects for the first time. We then address how this interaction influences the acquisition of perceptual expertise. We used two types of novel objects (Greebles), designed to resemble either animals or tools, and two lists of words, which described non-visual attributes of people or man-made objects. Participants first judged if a word was more suitable for describing people or objects while ignoring a task-irrelevant image, and showed faster responses if the words and the unfamiliar objects were congruent in terms of animacy (e.g., animal-like objects with words that described human). Participants then learned to associate objects and words that were either congruent or not in animacy, before receiving expertise training to rapidly individuate the objects. Congruent pairing of visual and conceptual information facilitated observers' ability to become a perceptual expert, as revealed in a matching task that required visual identification at the basic or subordinate levels. Taken together, these findings show that visual and conceptual information interact at multiple levels in object recognition.

**Keywords: object learning, semantics, visual features, perceptual expertise**

## **INTRODUCTION**

A chocolate bunny is more visually similar to a stuffed animal but more conceptually similar to a baking chocolate bar, and the combination is such that a child may not allow her parent to melt it to bake a cake, nor would the parent allow the child to bring it in bed. Our interactions with objects must take both visual and conceptual information into account but little research addresses how object recognition mechanisms are constrained by the *interactions* between these two sources of information.

Object perception involves more than processing visual features. For familiar objects, visual knowledge, such as color of a fruit, modulates perception of salient features of an object (Hansen et al., 2006; Witzel et al., 2011). Conceptual knowledge about familiar object categories is also represented in the visual system (e.g., animals, tools, Chao et al., 1999; Mahon and Caramazza, 2009; Huth et al., 2012). While is often assumed that visual features of novel objects engage minimal conceptual processing (Tarr and Pinker, 1989; Bülthoff and Edelman, 1992; Gauthier and Tarr, 1997; Hayward and Williams, 2000; Schwoebel and Srinivas, 2000; Curby et al., 2004; Bar and Neta, 2006; Op de Beeck et al., 2006), shape dimensions of novel objects (e.g., sharpness, symmetry, contrast, complexity) can impact observers' subjective preferences (Reber et al., 2004; Bar and Neta, 2006, 2007). Moreover, intuitions may also be formulated about the similarity of novel objects to familiar objects (e.g., smooth novel objects resembling "women wearing hats," Op de Beeck et al., 2006, p.13031), and such meaningful interpretations of ambiguous shapes appear to be robust and stable within individual observers (Voss et al., 2012). However, how meanings evoked by visual features may influence object processing remains a question that has not been explored systematically.

Some information on how object representations are constrained by both visual and conceptual factors comes from experiments where new conceptual associations are created for visual stimuli. Conceptual associations can facilitate perceptual categorization (Wisniewski and Medin, 1994; Lin and Murphy, 1997), bias perceptual interpretation of neutral stimuli (Bentin and Golland, 2002; Hillar and Kemp, 2008), and improve visual discrimination (Dux and Coltheart, 2005; Lupyan and Spivey, 2008). The discriminability of shapes or faces increases after having been paired with words from different categories, compare with having been paired with words from similar categories (Dixon et al., 1997, 1998; Gauthier et al., 2003). Observers also activate recent conceptual associations during visual judgments, even when the information is task irrelevant (James and Gauthier, 2003, 2004). However, in these studies (e.g., Dixon et al., 1997, 1998; Gauthier et al., 2003; James and Gauthier, 2003, 2004), the conceptual and visual information are arbitrarily associated, leaving entirely open whether some of these associations are created more easily than others, such as when the visual and conceptual features convey congruent, compared to contradictory, information.

We start with the assumption that the animate/inanimate distinction exists in the visual arena (objects can look like an animal or not) as well as in the non-visual conceptual arena (we can list attributes of objects that are animate or not). In this study, we manipulated both visual and conceptual features to study their interaction, more specifically the alignment of an animate vs. inanimate dimension in the visual and conceptual domains. We used words that described non-visual attributes that would normally apply to either people or man-made objects (e.g., cheerful, affordable), and created novel objects that resembled either living or non-living things. For visual features, we attempted to convey the animate vs. inanimate character of novel objects by manipulating shape, texture and color. These dimensions were chosen because bilateral shape symmetry is a powerful indicator of animacy (Concar, 1995; Rhodes et al., 1998), whereas the shape of man-made objects is more variable depending on their function. Also, the objects were rendered in colors and textures generally associated with animals or tools (e.g., skin color/organic vs. non-skin color/metallic). Experimental manipulation of both conceptual and visual information afforded us more control to investigate their interaction.

### **INITIAL VISUAL-CONCEPTUAL BIASES**

We first examined to what extent the visual appearance of novel objects from unfamiliar categories evokes conceptual processing, when observers see the objects for the first time. We asked whether visual features of the "animal-like" and "tool-like" object sets are sufficient to evoke the conceptual biases of animacy. Instead of asking participants directly to categorize the novel objects as animate or inanimate entities, we tested if the visual appearance of the objects evoked the concepts related to animate vs. inanimate categories by testing whether their (task-irrelevant) presence interfered with judgments of non-visual attributes as being more relevant to people or to man-made objects (e.g., "excited," "grateful" vs. "durable," "useful").

#### **VISUAL-CONCEPTUAL INTERACTION ON EXPERT RECOGNITION**

Beyond any early conceptual biases evoked by visual appearance, it is also possible that visual-conceptual interactions become more important with experience with a category. If visual features of novel objects activate abstract biases, anchoring the objects into existing conceptual networks appropriately (e.g., calling animatelike objects "animals" vs. calling tool-like objects "animals") may constrain their representations during expertise training. There may be differences in the acquisition of expertise between objects that look like animals or not (i.e., the effect of visual appearance), or between objects that are introduced as having animate or inanimate conceptual properties (i.e., the effect of conceptual associations). But more importantly, we asked whether it is easier to acquire expertise with a category that is assigned conceptual features congruent with its appearance (i.e., the interaction between visual and conceptual information), as we conjectured that learning objects with congruent visual and conceptual information might enhance the ability to locate diagnostic visual features for fine-level discrimination.

#### **TRAINING PROCEDURES**

Here we combined training procedures used in previous conceptual association studies (James and Gauthier, 2003, 2004) and expertise studies (Gauthier and Tarr, 1997, 2002; Wong et al., 2009). During the two-stage training, participants first learned to associate particular concepts with individual objects, and then learned to rapidly recognize objects at the subordinate level. Critically, participants were divided into two groups during the first training stage: Both groups were shown identical words and objects, but the Congruent pairing group learned to associate animate attributes with animal-like objects and inanimate attributes with tool-like objects, while the Incongruent pairing group learned the opposite pairings. In the second training stage, both groups practiced individuating objects from both animal-like and tool-like categories, without further mention of conceptual information.

## **DEPENDENT MEASURES**

We used two dependent measures to reveal potential visualconceptual interactions. First, in a word judgment task, participants categorized words as appropriate for describing people or man-made objects presented on task-irrelevant objects. This task was first completed prior to any training, and then completed after each training stage. This task uses an opposition logic similar to the Stroop task (1935) and several tasks since (e.g., see Bub et al., 2008), to test whether the visual appearance of the animallike and tool-like objects would be sufficient to evoke concepts relevant to animacy/non-animacy. If our manipulation of visual appearance does not evoke animate vs. inanimate concepts, word judgment performance should not be affected by whether congruent or incongruent objects are present. While the actual locus of any interference may be at the response level, such responses would have to be evoked by visual appearance (note that at pretest, no response had ever been associated with these or similar objects).

Second, in an object matching task, participants judged if two objects were from the same category (basic-level trials), or showed the same individual (subordinate-level trials). The reduction of the "basic-level advantage" is a hallmark of real-world expertise (Tanaka and Taylor, 1991), which is also sensitive to short-term expertise training (Bukach et al., 2012). Expert observers recognize individual objects in their expert categories at the *subordinate level* (e.g., "eastern screech owl," or "Tom Hanks") as quickly as at the *basic level* (e.g., "bird," or "man"), whereas novices recognize the objects faster at the basic than the subordinate levels (i.e., the "basic-level advantage," Rosch et al., 1976). The basic-level advantage is reduced in experts for both animate and inanimate object categories (e.g., faces: Tanaka, 2001; birds: Tanaka et al., 2005; Scott et al., 2006, 2008; novel 3D objects: Gauthier et al., 1998; Wong et al., 2009; Wong et al., 2012). With novel objects, explicit conceptual information is often absent during training (e.g., Wong et al., 2009; Wong et al., 2012). Although faster subordinate-level processing in experts might depend predominantly on experience with perceptual information of similar exemplars in a category, it is possible that conceptual information also impose processing constraints. For instance, brief learning of a diverse set of semantic associations with novel objects can facilitate subordinate-level judgment compared to that of a restricted set (Gauthier et al., 2003). The question of interest here is whether observers apply conceptual knowledge about familiar categories to novel objects, based on the visual resemblance between the familiar and novel categories. If this is the case, conceptual information expected based on past experience with visual features may facilitate fine-level discrimination of similar exemplars, as both visual and conceptual information interact to constrain object representations.

Here, we assessed whether having associated concepts that are congruent with the visual appearance of a category may facilitate the recognition of the objects at the subordinate-level compared to the basic-level, even though the object matching task can be accomplished based on visual features alone. We measured any differences in the "basic-level advantage" after semantic training and after individuation training. If visual processing is facilitated by visual-conceptual pairings, then the basic-level advantage should be more reduced in participants who associated the objects with congruent conceptual features than in those who received incongruent pairings.

## **METHODS**

## **PARTICIPANTS**

Twenty-four adults (normal/corrected-to-normal vision) from Vanderbilt University participated for payment (\$12/h). The study was approved by the Vanderbilt University IRB. Participants were randomly assigned to the Congruent pairing group (6 females and 6 males, age *M* = 22*.*58, *SD* = 4*.*32) or the Incongruent pairing group (4 females and 8 males, age *M* = 23*.*67, *SD* = 4*.*29). Twelve additional adults (5 females and 7 males, age *M* = 22*.*67, *SD* = 3*.*08) participated only in the object-matching task once as a Control group.

## **STIMULI**

## *Objects*

Each participant was shown 48 novel objects called "Greebles" (see examples in **Figure 1A**) created using 3D Studio Max. Half of the objects (24) were *Symmetric-organic Greebles* with smoothedged parts and organic textures. The rest (24) were *Asymmetricmetallic Greebles* with sharp-edge parts and metallic textures. Note that symmetry refers to object, and not image, symmetry. Each Greeble had a unique set of four peripheral parts. To minimize object-specific effects, we generated two versions of Symmetricorganic and Asymmetric-metallic Greebles that differed in color (i.e., yellow/pink, blue/green), central and peripheral part assignment to the objects. Each version was shown to half of the participants in each of the two training groups. There were 18 Greebles from each of the Symmetric-organic and Asymmetricmetallic categories in the trained subsets, and 6 in the untrained subsets (which were used as foils in the basic-level recognition task). The two subsets (trained or untrained) within each category had different central and peripheral parts. From each trained subset, six Greebles were used in semantic training. An additional six Greebles from each trained subset were also used in individuation training. All objects were shown during the testing tasks. The objects used for training and testing were counterbalanced across participants within each group and matched between groups. All Greebles were rendered on a white background at four viewpoints (0/6/12/18◦: The 0◦ view was an arbitrarily defined orientation with the symmetric axis rotated 40◦ to the right). The image size was approximately 6 × 3*.*6◦ of visual angle. To avoid

image-based effects, objects used during training were shown at 0 and 18◦. During testing, the objects were presented at 6 and 12◦. Additionally, phase-scrambled images of the Greebles were also created as control stimuli in one of the tasks.

## *Words*

quickly and accurately.

Eighty-four words were used; each described a *non-visual* attribute appropriate for describing either people ("animate attributes") or man-made objects ("inanimate attributes"; **Figure 1A** and Appendix A in Supplementary Material ), generated in a pilot study (*N* = 20). Word length was controlled across the animate (*M* = 7.17 letters, *SD* = 2*.*05) vs. inanimate (*M* = 7*.*5 letters, *SD* = 1*.*86) features. According to the SUBTLEXus word frequency database (Brysbaert and New, 2009), the mean frequency was higher for the animate (*M* = 38*.*06, *SD* = 63*.*53) than inanimate (*M* = 4*.*16, *SD* = 6*.*29) attributes. But since the critical manipulation here was the object-word *pairing* and identical words were used for both training groups, word frequency alone could not account for differences between groups. Twenty-four animate and 24 inanimate attributes were used in the word judgment task. Eighteen animate and 18 inanimate attributes were used during semantic training. The words used were also counterbalanced across participants within each group and matched between groups.

#### **PROCEDURE**

The study was conducted on Mac mini computers with 19 CRT monitors using Matlab. Below are the details on the twostage training (which lasted approximately 9 h across 6 days), the word judgment and object matching tasks (which lasted 15 and 45 min respectively). The entire study consisted of a pre-test (word judgment task), followed by two sessions of semantic training, followed by a session with two post-test tasks (word judgment and object matching), followed by four sessions of individuation training, then another session with the two post-test tasks.

#### *Training*

*Training stage 1: semantic training.* During semantic training (two 90-min sessions; **Figure 1B**, **Table 1**), each group learned three randomly selected words each for 12 Greebles (6 Symmetric-organic Greebles and 6 Asymmetric-metallic Greebles). The Congruent pairing group learned animate attributes with Symmetric-organic Greebles and inanimate attributes with Asymmetric-metallic Greebles, whereas the Incongruent pairing group learned the opposite pairing. Identical sets of word triplets were assigned to one participant in the Congruent pairing group and another in the Incongruent pairing group. The two categories of Greebles were shown in interleaved blocks.

*Training stage 2: individuation training.* During individuation training (four 90-min sessions; **Figure 1B**, **Table 1**), all participants learned to individuate 24 Greebles (12 Symmetricorganic and 12 Asymmetric-metallic Greebles; in which 6 from each category were previously shown during semantic training). Additional objects were used in this phase to increase the

#### **Table 1 | Task details of the two-stage training paradigms.**


*We used a variety of tasks in every session to promote task-general learning. These tasks were previously used in several studies (semantic training: James and Gauthier, 2003, 2004; individuation training: e.g., Gauthier and Tarr, 2002; Wong et al., 2009). The semantic training (stage 1) consisted of four tasks promoting associations between a set of three words to a trained object. The trained objects were introduced across the first two training sessions: 8 Greebles (4 Symmetricorganic and 4 Asymmetric-metallic Greebles) were introduced in session 1 and all 12 Greebles (6 of each category) were introduced in session 2. The individuation training (stage 2) consisted of three tasks that aimed to enhance the speed and accuracy of identification for individual objects at the subordinate level. The trained objects were introduced across the first two training sessions: 12 Greebles were used in session 1 and all 24 Greebles were used in sessions 2–4.*

difficulty of rapid identification. During this training, each Greeble was named with a 2-syllable nonsense word (e.g., Pila, Aklo, see Appendix B in Supplementary Material for the full list). Name assignment was randomized within group but matched between groups. Both speed and accuracy were emphasized in all training tasks. To motivate participants, the mean speed and accuracy for each block were shown at the end of each block. Symmetric-organic and Asymmetric-metallic Greebles were shown in interleaved blocks.

## *Testing*

*Word judgment.* Participants first completed the 15-min task (**Figure 2A**) prior to training, and again after semantic training and after individuation training. In this 2-alternative forced choice task, participants judged if a word was more appropriate for describing people or objects, while told to ignore an image presented behind each word. In a total of 432 trials, each of the 24 animate and 24 inanimate attributes was presented nine times. Each word appeared twice with each of the 48 Symmetric-organic Greebles and 48 Asymmetric-metallic Greebles at each of two slightly different viewpoints (difference = 6◦), and four times with each of 12 phase-scrambled Greeble images (6 Symmetric-organic and 6 Asymmetric-metallic). The phase-scrambled images were included to evaluate whether participants paid additional attention to the task-irrelevant Greebles during the word judgment task. All stimuli were shown until a response, with a 1-s interval in between trials. All conditions were randomized.

**FIGURE 2 | Example trials of (A) the word judgment task and (B) the object matching task at the basic level (top: a basic-level trial with two individuals from different categories) and at the subordinate level (bottom: a subordinate-level trial with two individuals from the same category).**

### *Matching at basic- and subordinate-levels*

Participants completed this 45-min task (768 trials) after semantic training and after individuation training (**Figure 2B**). In different blocks, participants judged if two sequentially presented objects were identical or different, at either the basic or subordinate level. In basic-level blocks, object pairs could be Greebles from the same category (the same central body part) or different categories (different central body parts). In subordinate-level blocks, the object pairs could be identical or different individuals from the same category (the same central body parts but different peripheral parts). All object pairs were shown across 6◦ rotation. The following conditions were blocked: Categorization level (basic/subordinate), Visual appearance (Symmetric/Asymmetric), and training status (trained/untrained objects). On each trial, a 300 ms-fixation was followed by a study image (800 ms), a mask (500 ms), and by a test image (1 s).

## **RESULTS**

## **TRAINING RESULTS**

The training was meant to form conceptual associations and improve individuation performance, and the training results (**Figure 3**) were not a focus of the study. Both groups showed accuracy near ceiling throughout training (i.e., well above 90% in all tasks across all sessions), with the expected significant increases in all individuation training tasks. Responses became faster with time in all semantic training and individual training tasks but the single-attribute matching task. Note that responses were also faster for Symmetric-organic Greebles than Asymmetric-metallic Greebles, but there was no statistical significant difference in performance between the groups in all but the passive viewing task during semantic training. We do not report statistical analyses here, but **Figure 3** shows confidence intervals relevant to the significant training effects across sessions.

## **TESTING RESULTS**

## *Word judgment*

We focused on RT in correct trials because accuracy in this task was high (*>*95%)1 .

There was no effect of the Pairing group on this task at any stage of the study (pre-test, after semantic training or individuation training) ANOVAs (all *p >* 0*.*35). There results are therefore presented in **Figure 4** collapsing over this factor.

It was entirely expected that there would be no difference between the two pairing groups at pre-test because no pairings had actually been done. At this stage, the question was whether the visual appearance of novel objects imply conceptual information about animacy. We also measured performance in the word judgment in a baseline condition where task-irrelevant scrambled images were shown behind the words (**Table 2**).

As mentioned above, we observed no effect of Pairing groups after pairings were learned in semantic training. Like any null result, this is difficult to interpret, but given we found other effects of pairing group in the study (described later), this suggests that the word judgment is simply not sensitive to these effects.

<sup>1</sup>Trials that involved the word "curvy" were discarded because of chance performance across participants.

This could be because it is an explicit conceptual task in which participants can as easily retrieve all the explicitly learned associations, whether congruent or incongruent. In contrast, in a more perceptual task where no explicit conceptual search is activated, congruent visual-conceptual pairings may show more of an advantage.

*Interaction between visual appearance and conceptual information.* After collapsing over the non-significant factor of pairing group, we focus here on the effect of Visual appearance on word categorization across sessions (**Figure 4**). A Session (pre-training/post-semantic training/post-individuation training) × Word type (animate/inanimate) × Object type (Symmetric-organic/Asymmetric-metallic) ANOVA was conducted. Responses became faster with time, *F*(2*,* 46) = 13*.*53, *η*˜2 *<sup>p</sup>* = 0*.*37, *p <* 0*.*0001. Responses were also faster for judging animate than inanimate attributes, *<sup>F</sup>*(1*,* 23) <sup>=</sup> <sup>10</sup>*.*60, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*32, *p* = 0*.*0035, possibly because of the higher word frequency for **Table 2 | Mean response times (ms) in the word judgment task for each image type (Symmetric-organic Greebles, Asymmetric-metallic Greebles, and scrambled images) across the three testing sessions.**


*Standard errors of the mean are reported within the parentheses.*

animate than inanimate attributes. There was no significant effect of Object type, *<sup>F</sup>*(1*,* 23) <sup>=</sup> <sup>0</sup>*.*86, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*04, *p* = 0*.*36, nor significant interactions between Session and Word type, *F*(2*,* 46) = 0*.*48, *η*˜2 *<sup>p</sup>* = 0*.*02, *p* = 0*.*62, or between Session and Visual appearance, *<sup>F</sup>*(2*,* 46) <sup>=</sup> <sup>0</sup>*.*35, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*015, *p* = 0*.*71. Critically, however, Word type and Object type interacted, *<sup>F</sup>*(1*,* 23) <sup>=</sup> <sup>6</sup>*.*68, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*25, *p* = 0*.*017, although the 3-way interaction of Session, Word type and Object type did not reach significance, *<sup>F</sup>*(2*,* 46) <sup>=</sup> <sup>2</sup>*.*31, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*09, *p* = 0*.*11.

One of our main goals was to investigate whether Word type and Visual appearance might interact when participants were first presented with the objects during the pre-training sessions, and how a putative interaction would be affected by further training. Therefore, we conducted a Word type (animate/inanimate) × Object type (Symmetric-organic/Asymmetric-metallic) ANOVA separately for each session to examine if the effect was already significant at pre-test (**Figures 4A–C**). Critically, we found a significant interaction between Word type and Object type in pre-training [*F*(1*,* 23) <sup>=</sup> <sup>7</sup>*.*18, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*24, *p* = 0*.*013]. Scheffé's *post-hoc* tests revealed faster judgment for animate attributes with the presence of Symmetric-organic Greebles compared with Asymmetric-metallic Greebles (*p* = 0*.*0045), and a significant effect of the opposite result for judging inanimate attributes (*p* = 0*.*015).

The Word type and Object type interaction was also significant after individuation training [*F*(1*,* 23) <sup>=</sup> <sup>6</sup>*.*16, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*21, *p* = 0*.*02], but interestingly, it was not immediately after semantic training [*F*(1*,* 23) <sup>=</sup> <sup>0</sup>*.*033, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*0015, *p* = 0*.*86]. We observe a bias for relating novel animal-like (or tool-like) objects to human (or object) attributes at pre-test, and it seems that introducing explicit semantic associations can temporarily alter this bias. This is also consistent with the idea, suggested above to explain the lack of a Pairing group effect, that this explicit word judgment task may be most sensitive to implicit influences. During semantic training, participants in all groups had to learn associations with the objects, and the training ensured that all associations were learned. These explicit associations would have been more salient to the minds of participants in Session 2 than later on. We would therefore speculate that these associations blocked the effects of visual appearance that we observe in Sessions 1 and 3, and the reappearance of the interaction effect in Session 3 demonstrates that the faster RTs or practice with the word judgment task cannot account for the lack of effect in Session 2.

*Manipulation check: objects vs. scrambled objects as taskirrelevant images.* To test whether participants paid less attention to the words during the word judgment task due to the presence of task-irrelevant objects, we compared performance to that for the same task with words shown on scrambled images. The presence of an object was apparently not more distracting than the presence of a scrambled image, in fact if anything the objects were easier to ignore than the scrambled images (perhaps due to lowlevel image properties). Indeed, RTs for the word judgment were consistently faster when objects were present relative to scrambled images (**Table 2**). A Session (pre-training/post-semantic training/post-individuation training) × Image type (Symmetricorganic/Asymmetric-metallic/Scrambled) ANOVA showed an effect of Image type, *<sup>F</sup>*(2*,* 46) <sup>=</sup> <sup>8</sup>*.*26, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*26, *p <* 0*.*001, with faster RT with the presence of either type of objects compared to the scrambled images (*p*s *<* 0.01), and no difference between object types (*p* = 0*.*80). There was also an effect of Session, *<sup>F</sup>*(2*,* 46) <sup>=</sup> <sup>15</sup>*.*53, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*40, *p <* 0*.*0001, with faster RT as the sessions progressed, and no interaction between Session and Image type, *<sup>F</sup>*(4*,* 92) <sup>=</sup> <sup>1</sup>*.*11, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*05, *p* = 0*.*37.

#### *Matching at the basic- and subordinate-levels*

After finding that the visual appearance of novel objects can activate conceptual information in a word judgment task on the first encounter with these objects, we then examined the influence of acquired conceptual associations with animate vs. inanimate objects, in a matching task at the basic- and subordinate-levels. As in prior work (e.g., Gauthier and Tarr, 1997; Wong et al., 2009; Wong et al., 2011), we focus only on trials with unfamiliar objects from the trained categories that were not used during training (i.e., "transfer" objects)2 , as a critical aspect of expertise is generalization of the skills to unfamiliar exemplars in the expert domain (e.g., car experts viewing cars, Bukach et al., 2010, face experts viewing faces, Tanaka, 2001). Here, we measured both response times (RT) and sensitivity (d : z(hit rate)-z(false alarm rate)). We first compared the performance of the two training groups after semantic training to an untrained control group. We then compared the effects in the two training groups after both stages (semantic and individuation) of training.

*Effects of semantic training (comparison between a Control group and the training groups).* Semantic training with a few exemplars was sufficient to reduce basic-level advantage, even for untrained exemplars in the training groups compared to a Control group that did not receive any training (**Figures 5A,B**). A Group (Control/Congruent/Incongruent) × Object type (Symmetric-organic/Asymmetric-metallic) × Categorization level (Basic/Subordinate) ANOVA revealed a significant interaction of Group and Categorization level in RT, *<sup>F</sup>*(1*,* 33) <sup>=</sup> <sup>9</sup>*.*00, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*40, *p <* 0*.*001: the basic-level advantage was smaller in the training groups compared to the control group (*p*s *<* 0.05), and also smaller in the Congruent than Incongruent pairing group (*p* = 0*.*04). The Group and Categorization level

<sup>2</sup>For the trained objects, the two training groups showed comparable improvement in this task after individuation training, with comparable magnitude of reduction in basic-level advantage. These results were consistent with the results from the individuation training procedures whereby both groups successfully learned to quickly and accurately identify this subset of the exemplars at the subordinate-level

interaction did not reach significance d , *<sup>F</sup>*(1*,* 33) <sup>=</sup> <sup>2</sup>*.*72, *<sup>η</sup>*˜<sup>2</sup> *p* = 0*.*12, *p* = 0*.*08.

*Effects of both semantic and individuation training (comparison between the training groups).* We then assessed how pairing during semantic training influenced the acquisition of perceptual expertise in the two training groups. RT and d were analyzed in a Pairing (Congruent/Incongruent) × Session (post-semantic/post-individuation) × Object type (Symmetricorganic/Asymmetric-metallic: each category was paired with either animate or inanimate attributes) × Categorization level (Basic/Subordinate) ANOVA, respectively. RT and d results (**Figures 5A,B**) revealed different aspects of conceptual influences: RT showed a long-lasting pairing effect throughout the tests, whereas d showed an effect of conceptual association type only after semantic training.

In RT, object matching was faster after individuation training than after semantic training, *<sup>F</sup>*(1*,* 22) <sup>=</sup> <sup>41</sup>*.*17, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*65, *p <* 0*.*0001. The basic-level advantage was present, *F*(1*,* 22) = <sup>109</sup>*.*6, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*83, *p <* 0*.*0001, with faster recognition at the basic level compared to the subordinate level. The basiclevel advantage was smaller for Symmetric-organic Greebles than Asymmetric-metallic Greebles, *<sup>F</sup>*(1*,* 22) <sup>=</sup> <sup>12</sup>*.*82, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*37, *p* = 0*.*002. Critically, the Congruent pairing group showed a reduced basic-level advantage compared to the Reversed pairing group, as revealed by an interaction between Pairing and Categorization level, *<sup>F</sup>*(1*,* 22) <sup>=</sup> <sup>7</sup>*.*12, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*24, *p* = 0*.*014. The interaction of Pairing, Category level, and Session was not significant, *<sup>F</sup>*(1*,* 22) <sup>=</sup> <sup>0</sup>*.*11, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*005, *p* = 0*.*74, nor was any other effect (*p*s *>* 0.31). Thus, visual-conceptual *pairing* impacted both matching performance and a marker of perceptual expertise: associations with congruent conceptual facilitated perceptual judgments, relative to incongruent associations.

The basic-level advantage was also present in d , *F*(1*,* 22) = <sup>75</sup>*.*35, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*77, *p <* 0*.*0001. All other results were not significant (all *p >* 0*.*09) except for an unexpected result regarding the *type* of conceptual associations. This was a 4-way interaction of Group, Session, Object type and Categorization level, *<sup>F</sup>*(1*,* 22) <sup>=</sup> <sup>6</sup>*.*69, *<sup>η</sup>*˜<sup>2</sup> *<sup>p</sup>* = 0*.*23, *p* = 0*.*017. Although a 4-way interaction could be difficult to interpret, the result essentially revealed that immediately after semantic training, both groups showed a smaller basic-level advantage for Greeble categories associated with inanimate attributes compared with the categories associated with animate attributes (*p*s *<* 0.006). However, following individuation training the basic-level advantage no longer differed depending on animate or inanimate associations (*p*s *>* 0.32). Unlike the effect of visual-conceptual *pairing* in RT that was observed both after semantic and individuation training, the *type* of conceptual associations had an initial impact on matching, but the effect was absent once the conceptual associations were no longer emphasized.

## **DISCUSSION**

We found an implicit bias to relate animate concepts to unfamiliar symmetric, animal-like objects, and to relate inanimate concepts to unfamiliar asymmetric, tool-like objects. This is a rare and important experimental demonstration that the processing of novel objects is far from neutral conceptually. Moreover, whether visual and conceptual information are associated in a congruent or incongruent manner influences visual processing of untrained objects from the category. These effects last long after associations are no longer task-relevant.

Consistent with previous work, we found that concepts can be quickly associated with novel objects (Dixon et al., 1997; Gauthier et al., 2003; James and Gauthier, 2003, 2004), and that learning distinctive semantic associations can facilitate subordinate-level processing (Gauthier et al., 2003). Our results also led us to speculate that such conceptual associations, especially right after they were freshly learned, may in some tasks block the automatic activation of semantic information evoked by the visual features themselves. This conjecture is based on the absence of an interaction between semantics and visual appearance in the word judgment task only in Session 2.

For the first time we considered the effect of different kinds of pairings of conceptual information with novel objects, information that was either congruent or incongruent with the animacy of the visual appearance. We found that both congruent and incongruent pairings of objects and concepts can be learned. Moreover, these associations generalize to an object category, as they influenced performance for untrained objects during a visual matching task.

Specifically, congruent visual-conceptual pairings facilitated the acquisition of subordinate-level perceptual expertise, resulting in a smaller basic-level advantage in the Congruent than Incongruent pairing group. When learning to individuate objects, observers not only utilize visual information, they are affected by conceptual cues implied from visual features. The new associations introduced during semantic training interacted with the initial conceptual biases for the objects, such that congruent cues from different sources facilitate forming precise representations for visually similar exemplars in the trained categories.

On the other hand, the fact that even relatively unexpected conceptual associations (e.g., inanimate attributes to animal-like objects) generalized to objects that shared only some of the visual properties of the trained objects suggests a mechanism to explain the implicit bias observed in the word judgment task for novel objects prior to any training. We showed that unfamiliar objects from a novel category (e.g., symmetric-organic objects) appear to derive conceptual meaning on the basis of visual similarity with familiar categories (e.g., animals or people). Likewise, unfamiliar objects from recently familiarized categories (i.e., the untrained objects in the trained categories in the current study) derive conceptual meaning on the basis of visual similarity to objects from a recently learned category. If relatively novel and arbitrary associations that run contrary to much of our experience can generalize in this manner, a lifetime's history of conceptual learning likely has a very powerful influence on how we represent any object we encounter.

Additionally, while the main focus of the study is on the interaction between visual and conceptual properties, we found transient effects regarding the type of conceptual information on object processing immediately after associations were learned. For instance, objects associated with inanimate attributes showed less of a basic-level advantage compared to objects associated with animate attributes. One possibility is that inanimate concepts possess lower feature overlap than animate concepts (Mechelli et al., 2006). Two objects that are "elastic, shiny and antique" vs. "eco-friendly, plastic and durable" may seem to be quite different and likely to belong to different basiclevel categories. Conversely, two objects that are "adorable, funny and sensitive" and "cheerful, talented and forgiving" are more likely two individuals within the same basic-level category. Therefore, inanimate associations may be more distinctive than animate associations, facilitating visual discrimination (Gauthier et al., 2003). Note, however, this difference cannot account for the pairing effect, because identical sets of associations were used for both training groups. Also, this effect regarding the type of associations faded once the associations were no longer emphasized, even though the visual-conceptual pairing effects remained. Further research should aim to replicate and explore the different temporal dynamics of the more short-lived effect of distinctive conceptual associations, and the congruency of the visual-conceptual associations, which were longer-lasting.

Several influential object recognition theories focus almost entirely on visual attributes of objects (e.g., Marr, 1982; Biederman, 1987; Perrett and Oram, 1993; Riesenhuber and Poggio, 1999; Jiang et al., 2007), assuming that conceptual associations should have no influence on object recognition (e.g., Pylyshyn, 1999; but see Goldstone and Barsalou, 1998). Additionally, researchers interested in the role of shape in object processing have often used novel objects to prevent influences from non-visual information, such as object names, familiarity and conceptual content (e.g., Op de Beeck et al., 2008). Our findings suggest that novel objects are not necessarily conceptually neutral, and that both visual and conceptual factors, and their interaction are important in the formation of object representations.

## **ACKNOWLEDGMENTS**

This research was supported by the James S. McDonnell Foundation, NIH grant 2 R01 EY013441-06A2 and NSF grant SBE-0542013. We thank Timothy McNamara, James Tanaka, Frank Tong and Jennifer Richler for valuable comments on a previous version of the manuscript, and Magen Speegle and Sarah Muller for assistance with data collection.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.00793/abstract

## **REFERENCES**

Bar, M., and Neta, M. (2006). Humans prefer curved visual objects. *Psychol. Sci.* 17, 645–648. doi: 10.1111/j.1467-9280.2006.01759.x


Concar, D. (1995). Sex and the symmetrical body. *New Sci.* 146, 40–44.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 March 2014; paper pending published: 27 April 2014; accepted: 06 July 2014; published online: 29 July 2014.*

*Citation: Cheung OS and Gauthier I (2014) Visual appearance interacts with conceptual knowledge in object recognition. Front. Psychol. 5:793. doi: 10.3389/fpsyg. 2014.00793*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Cheung and Gauthier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Segmentation of dance movement: effects of expertise, visual familiarity, motor experience and music

## *Bettina E. Bläsing1,2,3\**

*<sup>1</sup> Faculty of Psychology and Sport Science, Neurocognition and Action Research Group, Bielefeld University, Bielefeld, Germany*

*<sup>2</sup> Center of Excellence - Cognitive Interaction Technology, Bielefeld University, Bielefeld, Germany*

*<sup>3</sup> Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University, Bielefeld, Germany*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Juliane J. Honisch, University of Birmingham, UK Máximo Trench, Universidad del Comahue, Argentina*

#### *\*Correspondence:*

*Bettina E. Bläsing, Neurocognition and Action Research Group, Bielefeld University, PO Box 100 131, 33501 Bielefeld, Germany e-mail: bettina.blaesing@ uni-bielefeld.de*

According to event segmentation theory, action perception depends on sensory cues and prior knowledge, and the segmentation of observed actions is crucial for understanding and memorizing these actions. While most activities in everyday life are characterized by external goals and interaction with objects or persons, this does not necessarily apply to dance-like actions. We investigated to what extent visual familiarity of the observed movement and accompanying music influence the segmentation of a dance phrase in dancers of different skill level and non-dancers. In Experiment 1, dancers and non-dancers repeatedly watched a video clip showing a dancer performing a choreographed dance phrase and indicated segment boundaries by key press. Dancers generally defined less segment boundaries than non-dancers, specifically in the first trials in which visual familiarity with the phrase was low. Music increased the number of segment boundaries in the non-dancers and decreased it in the dancers. The results suggest that dance expertise reduces the number of perceived segment boundaries in an observed dance phrase, and that the ways visual familiarity and music affect movement segmentation are modulated by dance expertise. In a second experiment, motor experience was added as factor, based on empirical evidence suggesting that action perception is modified by visual and motor expertise in different ways. In Experiment 2, the same task as in Experiment 1 was performed by dance amateurs, and was repeated by the same participants after they had learned to dance the presented dance phrase. Less segment boundaries were defined in the middle trials after participants had learned to dance the phrase, and music reduced the number of segment boundaries before learning. The results suggest that specific motor experience of the observed movement influences its perception and anticipation and makes segmentation broader, but not to the same degree as dance expertise on a professional level.

**Keywords: expertise, event segmentation, dance, movement learning, motor experience, music**

## **INTRODUCTION**

Despite its continuous nature, human motor action is functionally based on task- and event related perception. Research suggests that ongoing processing resources are devoted to this perceptual process, and that the online perception of events determines how episodes are understood and encoded in memory (Zacks and Tversky, 2001; Kurby and Zacks, 2008). According to Event Segmentation Theory (Zacks et al., 2007), the perception of events depends on both sensory cues and knowledge structures that represent previously learned information about event parts and inferences about actors' goals and plans. Related studies have revealed that the segmentation of observed actions is crucial for the understanding and memorizing of these actions (e.g., Swallow et al., 2009; Zacks et al., 2009; Sargent et al., 2013). Furthermore, the theory states that any observed activity is spontaneously segmented into events during perceptual processing, which enables the system to anticipate upcoming information and react appropriately. As long as anticipation is successful, representations in working memory (named "event models" in this context, see Zacks et al., 2007; Kurby and Zacks, 2008) are maintained in a stable state, guiding further prediction and saving processing costs. When the frequency of anticipation errors increases as prediction becomes more difficult, event models are updated based on incoming information; these instances of increasing insecurity are subjectively experienced as boundaries between events. Perception of common goal-directed activities has been found to be hierarchical, with coarse-grained and fine-grained segmentation layers, corresponding to the hierarchical structure of action organization with goals and sub-goals. Furthermore, perception has been described as cyclical, with ongoing comparison of predictions to the perceived feeding back into processing. Event segmentation thereby results from the ongoing anticipation of what will happen next, which serves action understanding, prediction and learning.

Studies using event segmentation paradigms have shown that segmentation characteristics can be related to the understanding and memory of the observed actions. In these studies, actions from everyday life, such as assembling objects, setting a table or folding laundry, were presented to participants with no specified expertise (e.g., Zacks et al., 2009; Sargent et al., 2013). The presented actions typically involve the manipulation of objects and/or interactions between people, and are defined by clear action goals and a clear semantic context. In the context of dance or sports, the same characteristics do not necessarily apply. Even though many skilled actions in a sports context are object- and person-related and have clearly defined goals (e.g., passing the ball to a team member), there are also many other examples of movements that do not share these features. As such motor actions occur particularly in dance, the term "dance-like actions" has been used to describe motor actions that lack common features ascribed to actions from everyday contexts, such as interactions with objects and persons and obvious external action goals.

It has been stated that the goal of such dance-like actions is "the movement itself," which certainly is often the case in a dance context. Schachner and Carey (2013)showed that observers tended to interpret actions as being intentionally movement-related if they were not able to infer external goals from observing the action, or if the action seemed to be inefficient or inappropriate with regards to any recognizable external goal. The authors state that a dance-like action is also (in the eye of the observer) primarily characterized by its goal, which is movement-based, whereas other "rational" actions have external goals. This is particularly true for dance movements, the goal of which is commonly not only movement-based but also related to communicating to partners or an audience via the body. It can therefore be assumed that segmentation of dance-like actions or dance movements follows different "rules" and cognitive strategies compared to segmentation of typical everyday activities with external goals.

Specifically in modern and contemporary dance, movement performance often requires a fluent quality that does not afford obvious partitioning or segmenting. The ability to perform long movement phrases with this obvious fluency is an important skill in these dance disciplines. This fluent quality of the movement can be supported and enhanced by the accompanying music or sound. Choreographers might choose music that does not have a clear beat or rhythm but that rather provides an associated sound layer, allowing the dancer and the spectator to integrate more freedom in integrating sound and movement. This means that the dance movement, when accompanied by music at all, does not necessarily follow a musical beat or rhythm, and might even contravene the music in order to create a more exciting impression for the audience. The interrelation between the dance movement and the accompanying music deliberately influences the spectator's perception and should therefore be taken into account when investigating the segmentation of dance movement; in this respect, dance differs from dance-like actions that are not commonly associated with music.

Numerous studies have provided evidence that the perception of skilled actions is modulated by expertise (see Cheung and Bar, 2012) and is specifically facilitated by motor experience of the observed action type (e.g., Abernethy and Zawi, 2007; Aglioti et al., 2008; Güldenpenning et al., 2011; Steggemann et al., 2011). Even though empirical approaches to expertise often differentiate between perception, cognition (e.g., decision making) and action (motor control), this distinction can hardly be maintained in the context of athletes' practice-dependent task-specific skills (see Yarrow et al., 2009). Evidence for the interdependency of perception, action and cognition in movement expertise has been found in many studies with athletes and other movement experts (e.g., Aglioti et al., 2008 see also Yarrow et al., 2009 for review). Dance expertise in particular has been shown to comprise a multitude of perceptual-motor and cognitive skills, including motor control, timing, learning, memorizing, imagery, entrainment, as well as multimodal communication and artistic expression (see Sevdalis and Keller, 2011; Bläsing et al., 2012; Waterhouse et al., 2014). Studies with expert dancers have shown that movementrelated memory is more functionally structured in dancers than in non-dancers (Bläsing et al., 2009; Bläsing and Schack, 2012), and that dancers show shorter fixation times while watching dance movements than non-dancers, which points toward perception facilitation (Stevens et al., 2010). Furthermore, dance provides a highly adequate framework for studying expertise effects related to action-perception coupling, because dance, more than most types of sports, is performed with the primary goal of being observed by an audience. In dancers, increased activity has been found in specific brain areas commonly referred to as action observation network (AON) while watching familiar dance movements (Calvo-Merino et al., 2005). This network of brain regions (comprising the ventral and dorsal premotor cortices and parts of the parietal cortex, including the inferior parietal lobe, the superior parietal lobe, and the superior parietal sulcus, as well as the superior temporal sulcus) is typically involved in the execution, observation and imagery of actions. Studies showed that the activation of these regions is modulated differently by visual and motor expertise (Calvo-Merino et al., 2006). Dancers showed increased activity in areas belonging to the AON while watching movements from their own dance discipline compared to similar movements from other dance disciplines (Calvo-Merino et al., 2005; Cross et al., 2006). Activation of AON regions was further increased when dancers watched movements they had previously performed themselves, compared to movements they had frequently watched but not physically performed (Calvo-Merino et al., 2006). Learning to dance a specific movement phrase affects AON activation while watching the same phrase already early during the learning process (Cross et al., 2006). Different types of learning have been found to activate the AON in specific ways, with the right ventral premotor cortex responding specifically to the experience of having performed an observed movement, and the bilateral superior temporal cortex responding to the presence of a human model (Cross et al., 2009). These findings reflect that dance expertise affects both the production and the perception of dance-like movements. Dance expertise should therefore not only enable dancers to perform movement phrases fluently, but should also influence their perception of observed movement material in favor of fluency and greater over-all connectedness. Only few studies have investigated the segmentation of dancelike actions (e.g., Pollick et al., 2012; Noble et al., 2014), and so far none has focused on effects of dance expertise on segmentation. Evidence from preliminary studies suggests that observers' dance expertise affects the segmentation of dance-like actions, but not of other actions that have an obvious external goal (Bläsing et al., 2010).

The aim of the present study was to investigate how different factors, namely dance expertise, visual familiarity (via repeated presentations), motor experience (via learning to dance the presented phrase) and music would influence the segmentation of observed dance movement. Specifically, we presented a choreographed contemporary dance phrase of fluent character that did not contain interactions with objects or persons, communicative signals or semantic content. The dance phrase was choreographed on the basis of modern/contemporary dance technique, and was initially novel to all participants. This means that participants who regularly trained modern contemporary dance were likely to be familiar with the type of movement in general, but not with the presented movement material as such (note that modern/contemporary dance choreography commonly involves the exploration and creation of new movement material rather than re-combination and variation of defined partial movements, as this is often the case in classical dance). During the experimental procedure, the participants watched the sequence repeatedly and became thereby increasingly familiar with it. Their visual familiarity, in terms of knowing the exact dance phrase (rather than similar movement material from the same disciplinary background) was addressed here as a factor potentially affecting segmentation. It was expected that segmentation would become less variable with increasing visual familiarity over consecutive trials.

The issue of dance expertise was addressed in the current study by comparing groups of participants differing in their specific skill level in dance. In Experiment 1, professional dancers who had undergone professional dance training for many years and were currently all members of a professional dance theater company or free-lancing professional dancers performing with different companies as well as teaching dance on different levels (these participants are in the following referred to as "dancers") were compared to sports students (in the following referred to as "nondancers") who had no particular experience in dance training apart from few very basic mandatory courses in their study program. It has to be pointed out that the non-dancer participants were "novices" only with respect to dance, but not to movement skills in general; most of them performed their preferred sports on an advanced to high level. For the purpose of the study, this group was preferred to a group of participants without any movement expertise (i.e., persons who did not perform sports or physical exercise on a regular basis) because of the specific segmentation task. Expertise has been shown to be task-specific and does not generalize well over domains (e.g., Ericsson and Charness, 1994). It was assumed that athletes without dance expertise would show similar responses to observed human body movement in general compared to dancers, including corresponding levels of motor activation and simulation, and that any differences in the results could be related to expertise in dance rather than a high level of physical training and motor skill in general. It was expected that dancers' segmentation behavior would differ from that of the non-dancers, with dancers defining less segment boundaries based on their training-based ability to anticipate dance movement more successfully despite its novelty, and their preference for viewing the observed dance movement as more connected.

As a third group, dance amateurs who trained modern/contemporary dance on intermediate level participated in Experiment 2 of the presented study. These participants (referred to as "amateurs" in the following) were chosen for two reasons. First, they represented a viable intermediate step between the non-dancers and the dancers, offering the opportunity to monitor expertise effects on different levels. Second, the amateur participants all belonged to the same dance class that was trained by the choreographer of the stimulus dance phrase. Crucially, this class was taught the dance phrase as part of their training schedule, which provided the opportunity to add the aspect of learning to that of expertise and relate the two aspects to each other. In Experiment 2, the participants thereby gained specific motor experience of the presented movement material (referred to as "motor experience" in the following, applied as factor in Experiment 2). The term "motor experience" is in this case related to the experience of having danced the exact phrase presented as stimulus, not more generally to experience with similar movement material from the same disciplinary background. It was expected that specific motor experience would increase the participants' expertise for the dance phrase and thereby make their segmentation behavior more "expert-like," potentially even more than the dancers' in Experiment 1 who had greater dance expertise in general but no motor experience of the presented dance phrase.

As a fourth factor, the presence of music was added. The music chosen by the choreographer to accompany the dance phrase did not have a clear metric rhythm but rather consisted of an underlying sound layer of chords with slowly increasing and decreasing pitch and volume. It was hypothesized that the added music, because of its specific character, would influence the segmentation of the movement by reducing the number of segment boundaries, thereby binding movements together and reducing the over-all number of segment boundaries.

## **EXPERIMENT 1: SEGMENTATION OF A DANCE PHRASE BY DANCERS AND NON-DANCERS: EFFECTS OF VISUAL FAMILIARITY AND MUSIC**

In Experiment 1, professional dancers and sport students without dance expertise repeatedly watched a video clip showing a dancer performing a phrase from a contemporary dance choreography. Each participant watched the sequence 20 times on a computer screen, 15 times without music followed by 5 times with music, and indicated segment boundaries by key press. This experiment was conducted in order to gain information about the effects of dance expertise, visual familiarity and music on the segmentation of dance movement.

## **METHOD**

## *Participants*

Twenty-two participants voluntarily took part in Experiment 1 without any exchange for course credit or money. Twelve students of sport science (six females, one left-handed; age 25.91 ± 3.29 years, range 22–30 years) without any particular dance training experience (except for basic courses as part of their study program) were assigned to the non-dancers' group. All non-dancers were physically active; their most regularly performed sports included soccer, handball, rugby, and fitness training.

Ten professional dancers (six females, two left-handed; age 30.1 ± 6.59, range 23–40 years) participated as experts; all were trained in classical, modern and contemporary dance on professional level and were currently active as company dancers. Six of the dancers were current members of Tanztheater Bielefeld; four of the dancers were freelancing dancers and dance teachers.

All participants reported having normal or corrected-tonormal vision, and were naive with regard to the purpose of the experiment. All participants provided written informed consent before testing started. The experiment was performed in accordance with the ethical standards of the sixth revision (WMA, 2008) of the 1964 Declaration of Helsinki.

## *Apparatus and Stimuli*

The stimulus material consisted of a video clip (92 s, 2.290 frames, 25 Hz, recorded with a Sony camcorder) showing a dance phrase created and performed by dancer and choreographer Ilona Pászthy. The dance phrase was choreographed on the basis of modern/ contemporary dance technique, and was novel to all participants. For stimulus presentation and data collection, Interact® (Mangold) software running on a Notebook (Acer) with a 15 inch VGA-Display (vertical retraces 60 Hz) was used. The software recorded key presses during the presentation of the video clip, linked them to the adequate runtime and frame number and provided a protocol of these data.

### *Design and Procedure*

The data collection took place in a quiet lab or office room or in a free rehearsal space at the theater. Each participant was tested individually. During the experiment, the participant sat in front of the notebook computer and watched the presented video clip. The following instructions were given verbally by the experimenter: "You will now see a video clip of a dancer dancing a part of a dance piece. The clip will be repeated 20 times. While watching, please keep your finger on the space bar and press the space bar each time a part of the dance phrase ends and a new one begins. Apply your own criteria; you do not need to mark the same moments in each repetition." This instruction was phrased in a similar way as instructions in previous segmentation studies (e.g., "to press a button... whenever... one natural and meaningful unit of activity ended and another began," Zacks et al., 2009). No instruction was given regarding the resolution of segmenting (fine or coarse), as had been done in other segmentation studies (e.g., Swallow et al., 2009; Zacks et al., 2009). The sequence was presented 20 times, the first 15 trials without sound, followed by five trials accompanied by the music that had been chosen by the choreographer.

After completing all 20 trials, the participant was verbally asked two questions by the experimenter:


The answers were written down by the experimenter in the form of key notes. This explorative interview was not carried out according to any established qualitative method, but was added to the data collection only to gain an impression of the participants' use of criteria and strategies. It was not expected that participants would be able to give a complete and objective account of their segmenting behavior, but the experimenter was rather interested in the criteria and strategies the participants applied explicitly or even deliberately. The complete experimental session for each participant lasted 60–90 min.

## *Data analyses*

For every participant, the number of segment boundaries was recorded for each of the 20 trials. Mean group results of dancers and non-dancers were calculated for each trial number (1–20) separately, for all trials together, and for four groups of trials (trials 1–5: early trials; these trials were regarded as familiarization phase during which visual familiarity with the dance phrase was still low; trials 6–10: middle trials, with increasing visual familiarity; trials 11–15: late trials; for these trials, visual familiarity with the dance phrase was regarded as high; and trials 16–20: music trials, presented with sound). Non-parametric tests were applied to compare dancers and nondancers regarding their defined numbers of segment boundaries for each trial separately, for all trials, and for each group of five trials (early trials, middle trials, late trials and music trials). Within each group of participants, mean numbers of segment boundaries of the four trial groups (early, middle, late, and music) were compared to each other using nonparametric tests (Mann Whitney *U*-test, Wilcoxon signed-rank test).

## **RESULTS**

## *Segment boundaries*

Comparisons between dancers and non-dancers (Mann-Whitney *U*-test) revealed that dancers generally defined less segment boundaries than non-dancers for all trials together (*z* = −2*.*853, *p* = 0*.*005), for each individual trial (trials 1–5: *p <* 0*.*01; trials 6–13: *p <* 0*.*05; trials 14–20: *p <* 0*.*01), and for all groups of trials (early trials: *z* = −3*.*269, *p* = 0*.*001; middle trials: *z* = −2*.*474, *p* = 0*.*013; late trials: *z* = −2*.*440, *p* = 0*.*015; music trials: *z* = −2*.*969, *p <* 0*.*003). Comparisons between groups of trials (Wilcoxon signed-rank test) in the non-dancers revealed differences between middle trials and music trials (*z* = −2*.*296, *p* = 0*.*022) and between late trials and music trials (*z* = −2*.*173, *p* = 0*.*030), with more segment boundaries occurring in the music trials than in the other groups. In the dancers, less segment boundaries were defined in the early trials than in the middle trials (*z* = −2*.*018, *p* = 0*.*044). In contrast to the non-dancers' results, less segment boundaries were defined in the music trials than in the late trials (*z* = −2*.*092, *p* = 0*.*036). Results for the four groups of trials are displayed in **Figure 1**, results for all individual trials are shown in **Figure 3**. The distribution of segment boundaries (calculated as average over all trials for 92 bins of 1 s) as defined by the experimental groups is illustrated in **Figure 4**.

**FIGURE 1 | Results of Experiment 1.** Mean numbers of segment boundaries defined for the four groups of trials [1: trials 1–5 (early); 2: trials 6–10 (middle); 3: trials 10–15 (late); 4: trials 16–20 (music)]; blue columns: dancers, red columns: non-dancers; asterisks mark significant differences: ∗*p <* 0*.*05, ∗∗ *p <* 0*.*01, ∗∗∗ *p <* 0*.*001.

## *Post-hoc interviews*

After finishing the experimental procedure, each participant was asked two questions:


The experimenter asked the participant verbally and wrote down the answers in key points (note that this informal procedure did not follow any established qualitative approach but only aimed at gaining additional information in an explorative way).

From the informal answers to Question 1, the most common criteria were extracted, and naming frequencies of these criteria were counted. The most common criteria and their frequencies of naming are displayed in **Table 1**. Remarks made by individual participants in response to Questions 1 and 2 are listed in Supplementary Table 1.

## **DISCUSSION**

In Experiment 1, dancers and non-dancers segmented a dance phrase repeatedly presented in a video clip by key press. Segmentation grain (i.e., numbers of segment boundaries) was expected to be influenced by expertise (comparison between the two groups), by visual familiarity of the movement phrase (comparison between early, middle and late trials), and by music (comparing the last group of trials presented with music to the previous groups of trials).

The results showed that in all trials, in each individual trial and in the four groups of trials, dancers generally defined less segment boundaries than non-dancers. The effect of expertise on movement segmentation was thereby very clearly reflected by the results, with dancers defining less segment boundaries and thereby segmenting the whole movement phrase into fewer

**Table 1 | Segmentation criteria named by the three groups of participants in the** *post-hoc* **interviews (numbers indicate absolute frequencies of naming in both experiments).**


and longer sections than non-dancers. This finding is supported by the comment of one dancer, who reported perceiving the entire phrase as a whole, "in a flow," therefore segmenting did not feel natural. Perceiving a longer dance phrase as a whole despite the occurrence of various movement characteristics that could be used (and were typically named) as segmentation criteria is also in accordance with the claim often made in modern and contemporary dance to dance longer phrases fluently without obvious breaks or partitions, without "losing the energy." The finding that this principle commonly applied to the dancers' action performance is transferred to perception when observing a dance phrase accords with the principle of perceptual resonance (Schütz-Bosbach and Prinz, 2007) described in various areas of expertise (e.g., Kiesel et al., 2009; Güldenpenning et al., 2011; Steggemann et al., 2011). Dancers defined less segment boundaries in early trials than in the middle and late trials, whereas no difference between early, middle and late trials was found in the non-dancers. An effect of visual familiarity was thereby only found in the dancers, but not in the non-dancers.

Interestingly, music affected segmentation differently in dancers and non-dancers: In the music trials, dancers defined less segment boundaries than in late trials, whereas non-dancers defined more segment boundaries in music trials than in middle and late trials. Apparently, music had a binding effect on the perceived movement in the dancers' group. (Comments given by individual dancers in response to Question 2 supported this interpretation: music was experienced as binding the movement together, slowing down the movement, adding a harmonic feeling). In the non-dancers, in contrast, music seemed to confuse and thereby cause more segment boundaries to occur, possibly based on the perceived lack of segmentation cues in the music that might have interfered with previously defined movement cues.

Expertise in sport or dance typically involves visual as well as motor experience of specific actions, and differences found between experts and novices can be based on any of the two, or both. To gain further understanding of expertise effects in action perception, it is necessary to differentiate visual and motor expertise either by studying observation experts (e.g., Calvo-Merino et al., 2006; Aglioti et al., 2008) or by applying a learning intervention (e.g., Cross et al., 2006, 2009). In order to gain information about potential effects of motor experience on segmentation, a second experiment was conducted with dance amateurs who solved the same experimental task as applied in Experiment 1 before and after learning the presented dance phrase.

## **EXPERIMENT 2: SEGMENTATION OF A DANCE PHRASE BEFORE AND AFTER LEARNING: EFFECTS OF MOTOR EXPERIENCE, VISUAL FAMILIARITY AND MUSIC**

In Experiment 2, the same segmentation task as in Experiment 1 was applied to dance amateurs who regularly trained modern dance in the same group. After learning the phrase in their training as part of a performance program, all participants repeated the experimental task. The main goal of the experiment, apart from adding a third (intermediate) group of participants, was to gain information regarding the effect of specific motor expertise on segmenting a dance phrase.

#### **METHOD**

### *Participants*

Eight participants (all female, one left-handed; age 18.5 ± 6.55 years, range 14–30 years) voluntarily took part in Experiment 2 without any exchange for course credit or money. All participants trained regularly in classical and contemporary dance on average to advanced amateur level (years of training in classical dance: 9.13 ± 1.45 years, range: 8–12 years; years of training in modern dance: 3.38 ± 1.51 years, range: 1–6 years; dance training: 3.38 ± 2.20 h per week, range: 2–6 h) and were currently members of the same modern dance class (Theater Bielefeld ballet school). All eight participants named classical and modern dance as their primary types of training, single participants also trained in one of the following disciplines: capoeira, hip-hop, karate and acrobatics. All participants reported having normal or correctedto-normal vision, and were naive with regard to the purpose of the experiment. All participants provided written informed consent before testing started. The experiment was performed in accordance with the ethical standards of the sixth revision (WMA, 2008) of the 1964 Declaration of Helsinki.

#### *Apparatus and Stimuli*

The same stimulus material and experimental set-up was used as in Experiment 1.

## *Design and Procedure*

Two data collections were applied, one before and one after the participants learned the dance phrase in their training. Data collections took place in a quiet lab or office room or in a free dress room at the ballet school. Each participant was tested individually, the experimental procedure was exactly the same as in Experiment 1. The single experimental session lasted 60–90 min.

After all participants had completed the experiment once (data collection 1, pre-learning), they learned the presented dance phrase as part of their regular training. After approximately 6 weeks in which the dance phrase had been trained regularly, the experiment was repeated in exactly the same way as before (data collection 2, post-learning). Crucially, at the time of data collection 1, participants were neither informed that they would learn the dance phrase nor that they would be asked to participate in a second data collection, and participants were not informed about data collection 2 when learning the dance phrase. The data collections were separated by a time interval of approximately 6 weeks during which the dance phrase was learned and trained as part of a choreography for later stage performance.

## *Data Analyses*

Mean numbers of segment boundaries were analyzed in the same way as for Experiment 1. Non-parametric tests were applied to compare the two experimental conditions, pre- and post-learning (i.e., without and with motor experience of dancing the phrase, respectively), regarding the defined numbers of segment boundaries for each trial separately, for all trials, and for each group of five trials (early trials, middle trials, late trials, and music trials). For each data collection (pre- and post-learning), mean numbers of segment boundaries of the four trial groups (early, middle, late, and music) were compared to each other using non-parametric tests (Wilcoxon signed-rank test).

## **RESULTS**

#### *Segment boundaries*

Comparisons of trial groups between the pre- and post-learning conditions (Wilcoxon signed-rank test) revealed a difference in the middle trials (*z* = −2*.*240, *p* = 0*.*025), in which less segment boundaries were defined in the post-learning condition than in the pre-learning condition. Comparisons of individual trials revealed differences in trials 8, 9, 10, 12, and 15 (all *p <* 0*.*05). No difference between pre- and post-learning was found, however, when comparing segment boundaries over all trials. Comparisons between groups of trials within each condition revealed that only in the pre-learning condition, less segment boundaries were defined in the music trials than in the late trials (*z* = −2*.*371, *p* = 0*.*018), whereas trial groups did not differ in the post-learning condition. Results for the four groups of trials are displayed in **Figure 2**, results for all individual trials (Experiments 1 and 2) are shown in **Figure 3**. The distribution of segment boundaries (calculated as average over all trials for 92 bins of 1 s) as defined by the amateurs before and after learning the dance phrase is illustrated in **Figure 4**.

Comparing the group of amateurs to the groups of dancers and non-dancers from Experiment 1 showed differences between the non-dancers and the amateurs in the pre-learning condition for the early trials (*z* = −2*.*013, *p* = 0*.*044) and the music trials (*z* = −2*.*394, *p* = 0*.*017), and differences between the non-dancers and the amateurs in the post-learning condition in all groups of trials (early: *z* = −2*.*355, *p* = 0*.*019; middle: *z* = −2*.*431, *p* = 0*.*015, late: *z* = −2*.*546, *p* = 0*.*011, music: *z* = −3*.*009, *p* = 0*.*003), as well as over all trails (*z* = −2*.*508, *p* = 0*.*012). No differences were found between the dancers' and the amateurs' results. Results for all individual trials are displayed in **Figure 3**.

## *Post-hoc interviews*

As in Experiment 1, each participant was verbally asked two explorative questions after each data collection (again, no established qualitative approach was applied but the two questions were asked informally and key points of the answers were written

down by the experimenter). The most common criteria and their frequencies of naming (in response to Question 1: "Which criteria or strategies did you use for segmenting the dance phrase?") are displayed in **Table 1**. Remarks made by individual participants in response to Questions 1 and 2 ("Did the music in the last five trials affect your decisions?") are listed in Supplementary Table 1.

#### **DISCUSSION**

In Experiment 2, dance amateurs segmented a dance phrase repeatedly presented in a video clip by key press. The experiment was repeated after the participants had learned to dance the presented dance phrase. Segmentation grain (i.e., numbers of segment boundaries) was expected to be influenced by motor experience (comparison between pre- and post-learning), by visual familiarity of the movement phrase (comparison between early, middle and late trials), and by music (comparing the last group of trials presented with music to the previous groups of trials).

Results showed that less segment boundaries were defined in the post-learning condition compared to the pre-learning condition in the middle and late trials. No difference between pre- and post-learning was found in the early trials and in trials with music, and when comparing mean numbers of segment boundaries over all trials. Consequently, the motor experience of dancing the presented movement phrase was found to affect segmentation grain slightly, with less segment boundaries being defined by the participants after they had learned to dance the phrase, however, this difference only reached significance in the middle and late trials. This finding was reflected by the participants' comments: segments were perceived as longer, "larger shapes" were recognized, longer segments were "more fun." Participants' comments also reflected that watching the dance phrase was experienced as more embodied and more competent after learning.

Music was found to affect segmentation in the pre-learning condition, but not in the post-learning condition. Before learning the dance phrase, less segment boundaries were defined in the music trials than in the late trials, showing an effect of music on segmentation comparable to the one observed in the dancers. It can be assumed that the effect was caused by a binding effect of music (as assumed in the dancers). The finding that the difference between late trials and music trials was not significant anymore after learning the movement phrase might be explained by the fact that the participants danced the phrase in their training in combination with the same music as used in the experiment, therefore music and movement might have been coupled during the learning and training process. When watching the movement without the music in the post-learning condition, participants did not really experience the movement "without the music," as music and movement had become parts of the same integrated representation in their long-term memory (see Land et al., 2013), and segmentation (even when no music was played) related not only to the presented movement, but to the representation of "movement-with-music."

In contrast to Experiment 1, no differences were found between early, middle and late trials within each condition, showing that no effect of visual familiarity was found in the amateurs (or that the potential effect of visual familiarity was too weak to produce significant results). The finding that visual familiarity did not significantly affect segmentation is contrasted by the impression of several participants that, with repeated observations of the dance phrase in consecutive trials, the dance phrase was more strongly perceived as a whole (the movement was "growing together"). Similar to the dancers, several amateurs had expressed before learning that they found it difficult to segment the phrase because of the perceived fluency and connectedness of the movement.

Comparing the results of the amateurs' group in Experiment 2 to the results of the two groups of participants from Experiment 1 revealed that the amateurs did not differ from the dancers, whereas differences found between the amateurs and the nondancers were found and increased from the pre-learning to the post-learning condition. These findings indicate that the amateurs might have become more "expertly" in perception and

segmentation by learning to dance the presented movement, which corroborates findings on effects of motor expertise on action perception (e.g., Aglioti et al., 2008; Güldenpenning et al., 2011).

## **GENERAL DISCUSSION**

Expertise in dance and various sports disciplines has been found to modulate the perception of actions on different levels, specifically of those actions belonging to the specific area of expertise (Calvo-Merino et al., 2005, 2006; Aglioti et al., 2008; Cheung and Bar, 2012). Evidence exists that these effects specifically relate to motor experience and learning of the observed actions (Cross et al., 2006, 2009). Furthermore, Event Segmentation Theory (Zacks et al., 2007) and related empirical studies (e.g., Zacks et al., 2009; Sargent et al., 2013) provide evidence for the assumption that the segmentation of observed actions is influenced by relevant visual and motor expertise. In the present study, this assumption was tested using segmentation of a dance phrase as experimental task performed by three groups of participants differing in dance expertise. Results of two consecutive experiments revealed broader segmentation applied by professional dancers, but also by dance amateurs, compared to non-dancers. The effect was increased in the amateurs after learning the presented dance phrase, pointing toward a specific effect of motor experience. It has to be emphasized that in this study participants were not instructed to segment with fine or coarse segmentation grain, whereas this was commonly done in studies on event segmentation. When participants were instructed to segment observed actions into coarse units, this typically resulted in segment lengths of 30–60 s, whereas the instruction to apply fine-grained segmentation resulted in units of 10–20 s. Zacks et al. (2009) showed that fine-grained segmentation generally correlated with movement parameters whereas coarse-grained segmentation rather correlated with external goals and context information. In the current study, average segment lengths defined by the participants ranged between 6 and 13 s. As the context of the presented movement in this study was dance and the dancer's intention was clearly movement-related, it can be assumed that segmentation was predominantly fine-grained, and the results support this assumption.

Visual familiarity was induced in the current study by repeated presentation of the same stimulus dance phrase. This approach clearly differs from the approaches taken in other segmentation studies in which observes typically watched the same action twice, in part with different instructions (e.g., coarse vs. fine segmentation, Zacks et al., 2009; Sargent et al., 2013; or just watching vs. segmenting, Noble et al., 2014). Under natural conditions, actions are not repeated in exactly the same way, and segmentation occurs spontaneously as part of perceptual processing. The same commonly applies to watching dance: most audience members watch the performance of a dance piece once, from a naive perspective. "Expert observers" like dancers, choreographers, teachers, and dance enthusiasts watch the same movement phrases repeatedly, however, performances naturally differ. Watching different versions of the same movement material performed with natural variation may increase the observers' sensitivity for subtle differences. Watching the identical performance 10 or 20 times (e.g., from a video clip) certainly increases the observer's sensitivity for details, but might also become boring. In the current study, the phrase was presented repeatedly to create high visual familiarity with the previously unknown phrase, thereby facilitating prediction and increasing anticipation success. Following Event Segmentation Theory (Zacks et al., 2007), this should result in a decrease of the number of segment boundaries. Results of this study confirm this assumption, and individual participants' comments reflect this (see Supplementary Table 1). Other participants' comments suggest that certain segment boundaries became fixed over trials, as participants felt more confident to be "right" in their decisions. This aspect of experiencing competence might play a crucial role for the esthetic evaluation of dance, which has been related to prediction success and moments of surprise (e.g., Hagendoorn, 2004). Studying segmentation in relation to the novelty and esthetic evaluation of dance movement would be highly relevant in this context, addressing for instance observers' subjective experiences of boredom and competence. In the present study, several participants reported that they deliberately varied their strategies, playfully trying out different ways of segmenting (see Supplementary Table 1). This creative attitude toward the experimental task might have been an attempt to counteract boredom, but might also have been elicited by the type of stimulus (dance) and the general appreciation of watching it.

Music added during the last five trials had contradictory effects in the different experimental groups. While music decreased the number of segment boundaries in the dancers and in the amateurs (before learning), it increased the number of segment boundaries in the non-dancers. This finding has been interpreted in terms of a binding effect in the dancers and amateurs and irritation in the non-dancers.

Empirical evidence exists that event segmentation not only occurs while observing actions, but also while listening to music (Sridharan et al., 2007). Based on the proposed multimodality of event models (see Zacks et al., 2007), it can be assumed that music modifies the perception (and performance) of the dance movement it accompanies, by potentially increasing the experience of uncertainty in movement prediction at musical transition points (Sridharan et al., 2007) and decreasing it within musical phrases. The effect of this interrelation between perceived movement and sound on segmentation is likely to depend on the specific characteristics of both and their temporal relation to each other. As listening to music alone has been shown to be sensitive to event segmentation (Sridharan et al., 2007), it can be assumed that music would affect the segmentation of movement in different ways, depending on its characteristics (i.e., its metrics, pitch, rhythm, pulse, etc.). In the current study, the music that accompanied the movement did not have any metric rhythm or pulse, but consisted of slowly rising and falling chords, and might therefore have had a binding rather than a dividing influence. It can furthermore be hypothesized that segmentation of dance movement accompanied by music would be sensitive to the way both are integrated, and to what extent the dancer entrains with the music, reacts to musical cues or deliberately counterpoints them. This aspect and the more general question to what extent music influences the segmentation of observed dance movement warrants further study. It would be particularly interesting to systematically combine movement and sound in different ways to shed light on the roles of visual and auditive information and their interrelation for the perception and segmentation of dance-like actions. Research responding to these questions would not only be of interest for scientists investigating action perception, but also for dancers and choreographers interested in audience reactions. Promising manipulations could include the presentation of a movement phrase combined with different types of music and sound, or with the same music or sound varying in temporal relation (i.e., music systematically shifted relative to the movement). In such studies, the different combinations of music and movement would have to be presented in a counterbalanced way to control for order effects. In the current study, this was not the case; music was added only to the last five trials to prevent interference with the factor visual familiarity. Confounding of the factor music with visual familiarity can therefore not be excluded, which represents a clear limitation of the current study with regards to the influence of music on movement segmentation. Furthermore, as previously mentioned, the typical situation of a dance spectator is to watch a dance performance once, without knowing it. Therefore, potential effects of music on the perception of novel, unfamiliar and potentially surprising movement material would be relevant to study in the context of dance.

The task of parsing movement phrases has also been applied as artistic tool in choreography, as it is assumed to have the potential to change the dancers' perception of the movement and thereby their artistic expression. In a "parsing and viewing" task performed by Wayne McGregor | Random Dance as part of a choreographic process, dancers segmented movement phrases and subsequently commented their decisions, which revealed different cognitive frameworks underlying dance parsing (deLahunta and Barnard, 2005). To attend to the latter aspect in our study, the participants of the current study were asked informally about their personal segmentation criteria and strategies. Participants of all groups named changes in movement characteristics (movement type, active body part, height level, direction, speed). Criteria related to learning the movement were only named by the amateurs (e.g., "how the teacher would teach it"), whereas only the dancers named dynamic features (e.g., "where force would be needed"). It can be assumed that the latter criterion requires efficient on-line motor simulation of the observed movement, which might be a skill that is specific to dancers on a high level of expertise (see Bläsing and Schack, 2012). To address this issue, further research would be needed relating experts' segmentation with movement analysis. Similar approaches have been used by Zacks et al. (2009) for actions from a non-dance context and by Noble et al. (2014) for Indian dance with a narrative character, but these studies did not include an expertise-related paradigm. It would be of particular interest to know to what extent dancers, compared to non-dancers, are able to specify and predict dynamical measures (i.e., forces) from motor simulation while watching dance movement, and how this influences the way they segment a dance phrase.

An aspect of this study that has been rarely addressed is the combination of comparing expertise with a learning task. Learning, however, was investigated only in the amateur group with intermediate skill level, but not on different levels of expertise, which could be a topic for further study. A related question of interest that could not be addressed here is to which extent the way movement material is learned or taught affects the perception and segmentation of the same material later on. In other words, would the way the teacher has structured the dance phrase be reflected by the segment boundaries defined by the students after learning the phrase? In dance training, teachers commonly break movement phrases down into sub-phrases and partial movements in order to facilitate learning, and students imitate and practice these parts and subsequently combine them again to a whole phrase. This procedure naturally affords breakpoints in the mental representation of the phrase in the dancer's memory that can become undesired breakpoints in the fluent quality of movement performance. To cure this problem, different measures are taken during further training of the phrase, such as varying the lengths of partial phrases and paying special attention to the transitions. Studies using segmentation paradigms could help to shed light on the effects and the efficiency of such practice.

Taken together, it can be concluded that segmentation of dance movement is clearly influenced by expertise, with broader segmentation grain being applied by professional and amateur dancers than non-dancers. Effects of visual familiarity and music on movement segmentation seem to be modulated by expertise, and motor experience had a slight effect in the amateur dancers. These findings contribute to the literature on dance expertise and segmentation of dance-like actions, and raise future research questions particularly addressing effects of the novelty or familiarity of the observed movement material, interrelations between movement and music or sound, learning in different ways and on different levels of expertise, and the esthetic evaluation of the observed dance movement.

### **ACKNOWLEDGMENTS**

The author thanks the members of Tanztheater Bielefeld, the ballet school of Theater Bielefeld and all participants for their friendly support. I am particularly grateful to Ilona Pászthy who choreographed and danced the phrase presented in the experiments and trained the class of dance amateurs who participated in Experiment 2. The author acknowledges support for the Article Processing Charge by the Deutsche Forschungsgemeinschaft (DFG) and the Open Access Publication Funds of Bielefeld University Library.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fpsyg*.*2014*.* 01500/abstract

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2014; accepted: 05 December 2014; published online: January 2015. 07*

*Citation: Bläsing BE (2015) Segmentation of dance movement: effects of expertise, visual familiarity, motor experience and music. Front. Psychol. 5:1500. doi: 10.3389/ fpsyg.2014.01500*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Bläsing. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expertise affects representation structure and categorical activation of grasp postures in climbing

## *Bettina E. Bläsing1,2,3†, Iris Güldenpenning1,2\*†, Dirk Koester 1,2 and Thomas Schack1,2,3*

*<sup>1</sup> Neurocognition and Action–Biomechanics Research Group, Faculty of Psychology and Sport Science, Bielefeld University, Bielefeld, Germany*

*<sup>2</sup> Center of Excellence-Cognitive Interaction Technology, Bielefeld University, Bielefeld, Germany*

*<sup>3</sup> Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University, Bielefeld, Germany*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Markus Janczyk, University of Würzburg, Germany Oliver Herbort, Julius-Maximilians-Universität Würzburg, Germany*

#### *\*Correspondence:*

*Iris Güldenpenning, Neurocognition and Action Research Group, Bielefeld University, PO Box 100 131, 33501 Bielefeld, Germany e-mail: iris.gueldenpenning@ uni-bielefeld.de*

*†First authorship.*

In indoor rock climbing, the perception of object properties and the adequate execution of grasping actions highly determine climbers' performance. In two consecutive experiments, effects of climbing expertise on the cognitive activation of grasping actions following the presentation of climbing holds was investigated. Experiment 1 evaluated the representation of climbing holds in the long-term memory of climbers and non-climbers with the help of a psychometric measurement method. Within a hierarchical splitting procedure subjects had to decide about the similarity of required grasping postures. For the group of climbers, representation structures corresponded clearly to four grip types. In the group of non-climbers, representation structures differed more strongly than in climbers and did not clearly refer to grip types. To learn about categorical knowledge activation in Experiment 2, a priming paradigm was applied. Images of hands in grasping postures were presented as targets and images of congruent, neutral, or incongruent climbing holds were used as primes. Only in climbers, reaction times were shorter and error rates were smaller for the congruent condition than for the incongruent condition. The neutral condition resulted in intermediate performance. The findings suggest that perception of climbing holds activates the commonly associated grasping postures in climbers but not in non-climbers. The findings of this study give evidence that the categorization of visually perceived objects is fundamentally influenced by the cognitive-motor potential for interaction, which depends on the observer's experience and expertise. Thus, motor expertise not only facilitates precise action perception, but also benefits the perception of action-relevant objects.

**Keywords: expertise, mental representation, action activation, categorical knowledge, grasping**

## **INTRODUCTION**

Rock climbing requires a multitude of physical and cognitive abilities, as well as their well concerted interaction. One of them is the ability to perceive properties of climbing holds and to execute adequate grasping actions. In indoor climbing, the athlete's goal is to reach the top of a climbing wall by using specific climbing holds that are arranged as routes of different skill requirements. The shape, orientation and relative position as well as the combination of holds thereby determines the adequate grasp and step techniques. Apprehending climbing holds correctly is crucial for planning corresponding actions, and thereby for optimizing the climber's performance (Boschker et al., 2002). Thus, the ability to assign optimal grasps to the reachable holds is a relevant part of a climber's specific cognitive expertise (Pezzulo et al., 2010).

The cognitive aspects of expertise in the domain of rock or indoor climbing have hardly been investigated. One of the rare experimental studies on cognitive issues of climbing expertise has been conducted by Pezzulo et al. (2010), who investigated the ability of novice and expert climbers to memorize climbing routes from presented photographs. Cognitive performance was measured as number of recalled grips in the correct sequence of a route. The main finding was that expert climbers outperformed novices in recalling the sequence of climbing holds of a difficult climbing route that could only be climbed by experts, but not in recalling an easy route that could be climbed both by experts and novices. The authors argued that having the motor competence to climb a visually perceived route enables climbers to mentally simulate mastering it, and that this motor simulation improves the climbers' recall of the grip sequence. Moreover, as participants were not explicitly instructed to mentally simulate the required climbing actions, it was suggested that visually perceiving a climbing route automatically activates the corresponding action sequence in skilled climbers.

Automatic activation of motor components of grasping actions has also been found in other contexts, for example, when participants had to classify kitchen objects or tools (Labeye et al., 2008), or manufactured and natural objects (Tucker and Ellis, 2001, 2004; Grèzes et al., 2003). Labeye et al. (2008) investigated the processing of object features (kitchen tools vs. do-it-yourself (DIY) tools) and action features (implied actions performed with each tool). Participants had to categorize target pictures as kitchen tools or DIY tools. The targets were preceded by a prime picture either depicting a kitchen tool or a DIY tool. Both the object (same vs. different category) and the action features (similar vs. dissimilar implied action) were independently manipulated between the prime and the target pictures. Object and action features independently led to faster processing if they were from the same category or implied similar actions, respectively, compared to different categories or dissimilar actions. Labeye et al. (2008) argued that perceiving the prime picture automatically activates the motor components of the action that may potentially be performed with regard to the object (see also Ellis and Tucker, 2000; Tucker and Ellis, 2001; Bub and Masson, 2010; Masson et al., 2011).

Studies investigating well-known everyday objects (e.g., kitchen tools) suggest that such object representations are associated with certain grasping actions (e.g., Tucker and Ellis, 2004; Labeye et al., 2008). These associations may emerge as a consequence of associated action experience. To further investigate the role of motor experience for object-based action activations, laboratory training studies have been conducted (Creem-Regehr et al., 2007; Kiefer et al., 2007; Weisberg et al., 2007; Cross et al., 2012; Bellebaum et al., 2013). After a training period in which participants explicitly learnt to use objects in a tool-like manner, the manipulation experience became a part of participants' object representations and were automatically activated when the objects were perceived (e.g., Weisberg et al., 2007).

A sports context provides a suitable scenario for investigating expertise effects in object-related action knowledge. Climbing holds have been artificially designed to be used in indoor climbing and are encountered exclusively in this context. Accordingly, people who do not practice indoor climbing have no manipulation experience with climbing holds, whereas sport climbers who frequently train on indoor climbing walls have a large amount of specific manipulation experience. Yet, climbing holds are objects from which particular manipulation potential might be inferred even by non-climbers based on the perceivable shape-properties of the object. Comparing non-climbers' and climbers' processing of climbing holds therefore provides a suitable scenario to dissociate the role of grasping experience and physical object properties in the representation and processing of grasping actions.

Climbing is not a sport performed under time pressure (except speed climbing; Florine and Wright, 2004). However, automatic activations of single grasping actions are an important aspect of a climber's performance. The immediate activation of a grasping action to a perceived climbing hold might decrease cognitive effort in short-term memory and thus save cognitive resources necessary for further action planning (Spiegel et al., 2013). Besides, the direct activation of a grasping action also allows a quick action execution and may thus prevent high energy costs arising when a climber has to remain in a static position evaluating the next action possibility. Based on these considerations, the present study examines object-related action knowledge (Experiment 1) and automatic action activation (Experiment 2) based on perceived climbing holds.

To investigate knowledge representations of climbing-specific grasping actions, Experiment 1 evaluated the relevant cognitive structures of grasping actions related to typical climbing holds in the long-term memory of climbers and non-climbers via Structure Dimensional Analysis (Schack, 2004; SDA; Schack, 2012). It was expected that climbers, but not non-climbers, would categorize climbing holds according to appropriate grip types (functional features) used in indoor climbing rather than according to other object properties.

Experiment 2 was conducted to clarify whether or not the representational clusters are actually associated with motor components that fit the holds in climbers. A priming paradigm was used with pictures of climbing holds as primes and grasping postures as targets (e.g., Güldenpenning et al., 2011, 2013). Climbers were expected to show different effects than non-climbers. Specifically, climbers but not non-climbers should show a congruency effect, i.e., facilitation by congruent primes and inhibition by incongruent primes (Dehaene et al., 1998).

## **EXPERIMENT 1: CATEGORIZATION OF CLIMBING HOLDS**

In indoor climbing, climbers have to manage routes of different difficulty level, which requires a multitude of physical and cognitive skills. Applying adequate grasp techniques to the available climbing holds is one of the crucial tasks in this challenge, as it enables the climber to master the route in a safe and efficient way. Experiment 1 investigated if climbing holds are categorically organized depending on their associated manual actions (i.e., adequate grip types) in the long-term memory of experienced indoor climbers. Structure dimensional analysis (SDA; Schack, 2004, 2012) was applied to reveal the relevant representational structures related in the long-term memory of climbers and nonclimbers. It was expected that climbers, but not non-climbers, categorized climbing holds according to the appropriate grip types. Previous studies using SDA showed that experts' cluster solutions referred to functionally structured mental representations of complex movements (Bläsing et al., 2009; Land et al., 2013) or to objects affording similar actions (e.g., Stöckel et al., 2012) representing functional action-based categories of partial actions or objects.

## **METHODS**

## *Participants*

Twenty-one participants voluntarily took part in Experiment 1 without any exchange or in exchange for course credit. Ten students of sport science without any experience in indoor or outdoor rock climbing, all from Bielefeld University, Germany, were assigned to the non-climbers' group (two females, all righthanded, mean age 24.0 years, range 23–25). All non-climbers were physically active (eight out of ten participants performed individual sports, four performed team sports). The sports most regularly performed by the participants of the non-athlete group included soccer, basketball, fitness training, and running.

Eleven climbers (two females, all right-handed, mean age 27; range 22–34) were recruited from an indoor climbing area, due to their experiences in climbing (mean climbing experience: 5.3 years of training, 3.4 training sessions per week). Referring to the Union Internationale des Associationes d'Alpinisme's (UIAA) grading system describing the difficulty of the climbing route, participants' indoor climbing skills ranged from 6 to 8 (two participants climbed routes graded up to 8, five participants climbed routes graded 7). Five of the climbers also regularly climbed outdoors, with skills corresponding to routes ranging from 5 to 7. Additionally, five out of the ten climbers regularly performed other sports (mostly mountain sports or running).

All participants reported having normal or corrected-tonormal vision, and were naive with regard to the purpose of the experiment. All participants provided written informed consent before testing started. The single experimental session lasted about 30 min. The experiment was performed in accordance with the ethical standards of the sixth revision of the 1964 Declaration of Helsinki (World Medical Assocition, 2008).

## *Apparatus and stimuli*

Stimuli consisted of 16 colored photographs of climbing holds of different shapes and sizes, as commonly used in indoor climbing (see **Figure 1**). The holds were presented to match the climber's perspective, in adequate size relative to each other. All holds were chosen to elicit specific grip types rather than being ambiguous or non-specific. Six out of sixteen holds typically required a crimp grip, four a sideways pull toward the body, four a pocket grip, and two an open grip. This *a priori* attribution of climbing holds to grip types was informed by climbing experts who did not participate in the study and was used as reference for the results of the experiment.

## *Design and procedure*

The participants were tested individually while sitting in front of a computer screen. An experimental paradigm named Structure Dimensional Analysis (SDA; Schack, 2004, 2012) was applied to investigate the categorization of climbing holds on the basis of mental representations of specialist grip types in the long-term memory of climbers and non-climbers. SDA was applied via custom-made software (NetSplit). The stimuli were presented in such a way that in each trial, the reference stimulus (or anchor) occurred in the top position marked by a green frame, and the stimulus directly below the reference, marked by a blue frame, had to be assigned by key press to a positive (left) or negative (right) list relative to the reference (see **Figure 2**). The anchor and the active stimulus picture were presented on the screen with a size of approximately 6 × 6 cm. The participants were instructed to indicate by pressing one of two marked keys if the adequate grasping action directed toward the currently active hold (marked blue) would be of the same type as the one directed toward the climbing hold in the reference position (marked green). Once the response was given, the next trial began, in which the same anchor was presented with another of the remaining items, until all 15 items had been judged as affording a similar or dissimilar grip compared to the anchor; this procedure composed one block. In the next block, a different hold was presented as anchor, in combination with all remaining 15 holds. The whole experiment comprised 16 blocks applied in randomized order, each block with a different item as anchor, resulting in a total of 240 trials.

## *Data analyses*

A hierarchical cluster analysis according to SDA (Lander and Lange, 1996; Schack, 2004, 2012) was carried out on the data collected via the previously described splitting procedure in order to obtain mean cluster solutions for the two experimental groups. To achieve this, the sorting task described as part of the experimental

procedure was applied to deliver a distance scaling between the items (climbing holds). By this procedure, 16 decision trees were established, as each item occupied once the reference position, resulting in a 16 × 15 matrix of partial quantities in which values took either a negative or positive sign depending on whether the item was judged as belonging to the positive or the negative list relative to the anchor (e.g., if 4 out of the 15 items were assigned to the positive list, these items were each given the value +4, whereas the remaining 11 features assigned to the negative list were each given the value −11). The resulting matrix was then z-transformed for standardization and subsequently transformed into a Euclidian distance matrix as basis for a hierarchical cluster analysis (in accordance with the average-linkage-method). Cluster solutions were determined using a critical Euclidian distance (dcrit), with all junctures lying below this value forming the apical pole of an underlying concept cluster (for more details on the method, see Schack, 2012). As reference structure, an *a priori* classification of stimulus climbing holds according to specific grip types had been achieved based on interviews with climbing experts who did not participate in the study (see **Figure 1**). To calculate the similarity between mean group results with the reference structure and to compare each individual participant's cluster solution with the averaged cluster solution of the group and the reference structure (holds 1–6: crimp grip; holds 7–10: sideways pull; holds 11–14: pocket grip; holds 15 and 16: open grip; see also **Figure 1**), we used the adjusted rand index (ARI; Rand, 1971; Santos and Embrechts, 2009). The ARI provides a measure of similarity on a range between 0 and 1; a score of 0 indicates that two cluster solutions are independent, whereas a score of 1 indicates that two cluster solutions are identical. Scores between these two values indicate the degree of similarity between cluster solutions; the higher the ARI score, the greater is the similarity between the variables.

## *Results*

The hierarchical cluster analysis revealed four clusters corresponding to four grip types for the group of climbers, and three smaller clusters for the non-climbers. The four clusters of the climber group included all 16 holds into clusters that reflected the correct assignment of holds to grip types (cluster 1: items 1–6, cluster 2: 7–10, cluster 3: 11–14, cluster 4: 15 and 16). Euclidean distances between the items of all clusters were all below 1.5 (critical value: 3.4, alpha value: 5%), which reflected a high consistency of the climbers' decisions. The non-climbers' cluster solution contained three clusters, consisting of items 2 to 6, 7 and 8, and 13 and 14. Euclidean distances were all larger than 1.5, and the remaining seven items were singled out (i.e., were not significantly assigned to any cluster). The mean group dendrograms for climbers and non-climbers are presented in **Figure 3**.

To compare individual participants' cluster solutions to the groups' mean cluster solutions, adjusted rand index (ARI, Santos and Embrechts, 2009) was calculated, which expresses the extent to which the individual cluster solutions differ from the respective averaged group dendrogram. Comparison of the mean group cluster solutions with the reference structure resulted in an ARI score of 1.0 for the climbers (i.e., both cluster solutions were

identical) and of 0.535 for the non-climbers. Individual ARI scores of the climbers ranged from 0.685 to 1.0, with a mean of 0.952 ± 0.195 (SD). For the non-climbers, ARI scores were smaller than the climbers' (Mann-Whitney *U*-test; *Z* = −3*.*887, *p <* 0*.*001), they ranged from 0.0 to 0.643, with a mean of 0.311 ± 0.206. When non-climbers' ARI scores were calculated with reference to the reference structure, they were also smaller than the climbers' (*Z* = −4*.*033, *p <* 0*.*001), ranging from 0.0 to 0.638, with a mean of 0.268 ± 0.194. The non-climbers' ARI scores calculated relative to the group average and the reference structure did not differ (Wilcoxon signed-rank test; *Z* = −1*.*836, *p* = 0*.*066).

#### *Discussion experiment 1*

The hierarchical cluster analysis via SDA revealed four clusters for the group of climbers, and three clusters for the group of nonclimbers. For the group of climbers, the mean cluster solution was identical with the functional assignment of grasping holds to grip types (see **Figure 1**), and individual participants' cluster solutions differed only little from each other; nine out of eleven participants produced a result that was identical with the mean cluster solution. Euclidean distances between items within each cluster were all below 1.5, and thereby small compared to the critical value (*dcrit* = 3*.*4). These results point toward a high consistency of decisions made by participants during the experiment, within as well as between participants. These findings suggests that climbers, on the basis of their experience in indoor climbing, could easily associate the presented climbing holds to the corresponding grip types, thereby producing consistent clusters representing functional task-related categories.

In contrast, the non-climbers' mean cluster solution included only nine out of 16 items into clusters, whereas the remaining seven items remained as singletons (that is, these seven items were not categorized by the non-climbers, reflecting a partly noncategorical representation). The three clusters each contained items that belonged to the same grip category. This finding could be explained by two mechanisms: despite their lack of climbing experience, non-climbers might have been able to assign certain climbing holds to appropriate grip types, potentially based on their experience with manipulable objects from a non-climbing context. Non-climbers thereby might have applied the same (or similar) criteria as climbers, but succeeded in doing so only for a subset of the presented items. In this case, the results reflect that attribution to a specific grip type was more difficult for certain climbing holds than for others (e.g., the appropriate grip that required inserting one or more fingers into openings in the hold was apparently easier to recognize for items 13 and 14 than for items 11 and 12).

Alternatively, non-climbers might have grouped items on the basis of other feature-based object similarities related to shape or even color. The latter explanation is supported by the observation that items grouped into the same cluster by the non-climbers looked similar (this is particularly obvious for items 7 and 8 and for items 13 and 14, respectively). Previous studies have shown that novice participants often tend to combine items that show similarity regarding superficial characteristics, rather than taskrelated functional dependence, into the same cluster (see Schack, 2004). In studies that use partial movements (or basic action concepts, see e.g., Schack and Mechsner, 2006; Schack and Ritter, 2009) in order to investigate mental representations of complex movements, such characteristics often regard the use of body parts (e.g., movement concepts referring to the arms might be combined into one separate cluster, even though this might have no relevance for the functional structure of the movement, as arm movements might have different and even opposed functions during different movement phases). The results suggest that the non-climbers, due to their lack of task-related experience, were less able than the climbers to decide consistently which specific grips were required for the presented climbing holds.

These findings corroborate the notion that the categorization of visually perceived objects is fundamentally influenced by their potential for interaction (i.e., their affordances; Gibson, 1977), the evaluation of which strongly depends on the observer's taskspecific experience and expertise. For task-specific objects such as climbing holds, certain features determine their potential use and are, therefore, relevant for functional categorization, whereas other features play a minor role. Climbers, compared to nonclimbers, choose more purposefully which of the characteristics of a climbing hold are relevant for the task in question. (In the current study, shape and orientation were task-relevant features, whereas in a different context, the color of climbing holds could be a task-relevant object feature, as it would allow the climber to view the hold as part of a color-coded route).

Taken together, the results of Experiment 1 show that the cognitive representation of objects (i.e., climbing holds) strongly depends on their functional relation to action-based experiences. Based on this finding, we investigated in the subsequent experiment with similar stimuli and two similar groups of participants if and to what extent the perception of task-related objects influences (short-term) processing and the (pre-)activation of object-related actions.

## **EXPERIMENT 2: COGNITIVE ACTIVATION OF GRASPING POSTURES**

Experiment 2 investigated whether visually perceiving a climbing hold activates the grasping posture commonly associated with this climbing hold. Moreover, it was asked whether the activation of the grasping postures depends on the manipulation experiences with the climbing holds. In a priming experiment, climbers and non-climbers were asked to decide whether a presented target picture reflected a crimp grip or a sideways pull. The preceding prime picture either depicted a climbing hold requiring the grip shown in the target picture (congruent condition; e.g., a hold requiring a sideways pull followed by a sideways pull) or the alternative grip (incongruent condition; e.g., a hold requiring a sideways pull followed by a crimp grip). Moreover, two unspecific conditions were applied. In the positive unspecific condition, the target picture was preceded by a climbing hold which could both be grasped with a crimp grip as well as a sideways pull. In the negative unspecific condition, the preceding climbing hold could neither be grasped with a crimp grip nor with a sideways pull.

It was expected that in experienced climbers a climbing hold would activate the grasping posture commonly associated with the climbing hold, and thus influence responses to the target picture depicting a particular grasping posture (i.e., crimp grip vs. sideways pull). No such activation was expected in participants without manipulation experience with the climbing holds. Moreover, for both groups it was expected that unspecific climbing holds would not activate any grasping posture.

The following three predictions were made: first, a congruency effect was expected for participants with climbing experience; that is, faster response times under conditions in which a climbing hold shown in the prime picture would require the same grasping posture as depicted in the following target picture, and slower response times under conditions in which a climbing hold would require the alternative grasping posture as shown in the target picture. Second, no congruency effect was expected for participants without specific climbing experience. Third, for climbers, response latencies in the unspecific condition were predicted to be in between the congruent and the incongruent condition, whereas for non-climbers, response times should not vary between conditions. The described differences between groups related to the factor congruency are expected to be statistically indicated by an interaction between group and congruency.

## **METHODS**

## *Participants*

Thirty two participants voluntarily took part without any exchange or in exchange for course credit. Eighteen students or employees from Bielefeld University, Germany, were assigned to the *non-climber group* (three female, one left-handed, mean age 29.6; range 23–43). Participants of the non-climber group had no experience in climbing. All non-climbers were physically active, performing at least one type of sport (11 participants performed individual sports, 12 performed team sports, 2 performed competitive sports, 2 performed racket sport). Participants of the non-climber group played, for example, soccer, handball, basketball, or regularly performed swimming, running, or fitness training.

Fourteen climbers (one female, two left-handed, mean age 30.0; range 16–43) were recruited from an indoor climbing area for their experiences in climbing (mean training experience: 8.7 years; mean training frequency per week: 4.0 sessions). All recruited climbers were members of the *Deutscher Alpenverein* (German Alpine Association). Referring to the UIAA grading system describing the difficulty of climbing routes, participants' climbing skills ranged between 6 and 10 (two participants were able to climb routes graded 6, two participants climbed routes graded 7, six participants climbed routes graded 8, three participants climbed routes graded 9, and one participant climbed routes graded 10).

All participants reported to have normal or corrected-tonormal vision and were naive with regard to the purpose of the experiment. All participants provided written informed consent before testing started. The single experimental session lasted about 20 min. The experiment was performed in accordance with the ethical standards of the sixth revision of the 1964 Declaration of Helsinki (World Medical Assocition, 2008).

## *Apparatus and stimuli*

For stimulus presentation and data collection, a Toshiba Notebook with a 17 inch VGA-Display (vertical retraces 60 Hz) and the software Presentation® (version 14.8) was used. The software controlled the presentation of the stimuli and measured reaction times. Responses had to be given by pressing one of two external response buttons connected via a parallel port with the notebook.

The target pictures were 16 photographs of hands in grasping postures. Eight pictures reflected a crimp grip and eight pictures reflected a sideways pull. Half of the grasping postures reflected a right hand and half of the postures reflected a left hand (images were mirrored) which were used equally often.

As prime pictures, 32 photographs of climbing holds were presented. All climbing holds were red. Four of the climbing holds would require a crimp grip, and four climbing holds would require a sideways pull. Moreover, four climbing holds could be grasped with a crimp grip or a sideways pull (positive unspecific climbing holds). Last, four climbing holds would require a grip that was different from a crimp grip or a sideways pull (negative unspecific climbing hold). The 16 pictures of the climbing holds were mirrored vertically to extend the spectrum of the stimulus material to 32 prime pictures in total. An exemplary illustration of the stimulus material is given in **Figure 4**.

The background of the climbing holds and of the grasping postures was an indoor climbing wall (see **Figure 4**). The stimuli had a size of 9.2 × 9.2 cm (346 × 346 pixels). All stimuli were presented centrally on a black background and subtended a visual angle of 8.9◦ horizontally and vertically from the viewing distance of 60 cm.

### *Design and procedure*

The present study used a 4 × 2 mixed factorial design with the within-subject factor *congruency* (congruent condition, incongruent condition, positive and negative unspecific condition) and participants' *expertise* as between-subject factor (climbers vs. non-climbers). The impact of these factors was analyzed with reaction time (RT) and error rate (ER) measures as the dependent variables.

Participants sat in front of a computer screen (60 cm) and were instructed in written form to classify the presented target picture as a crimp grip or as a sideways pull as quickly as possible by pressing one of the two response buttons with the index finger. Moreover, participants were instructed to respond as accurately as possible. The response button assignment was counterbalanced across participants within each group. Before starting the experimental session, each participant performed ten randomized practice trials. Data from this practice block were not analyzed. The following test block consisted of 128 pseudorandomized prime-target pairs. Each prime picture appeared four times and was either combined with a left hand or right hand crimp grip or with a left hand or right hand sideways pull. The presentation of the order of the prime-target pairs was completely randomized.

Each trial started with the presentation of a central fixation cross (400 ms), followed by a blank screen (100 ms), the prime (100 ms), a second blank screen (100 ms), and the target (which remained visible on the screen until a response was

**FIGURE 4 | Examples of the stimulus material used in Experiment 2.** On the left side examples for each type of climbing hold are given. On the right side examples for each type of grasping posture are presented.

given). Incorrect responses elicited the word "Fehler" (German for "error"). An inter-trial interval of 1500 ms elapsed before the next trial started. The within-trial procedure is illustrated in **Figure 5**.

### **RESULTS**

### *Data analyses*

Reaction times (RTs) were screened for outliers using a total cut off. RTs below 200 ms and above 1000 ms were excluded (2.0%). Trials with wrong answers (3.4%) were not used in the analysis of the RTs. The mean RTs from the factorial combinations of the within-subjects factor *congruency* and the between-subjects factor *expertise* were computed for further analysis. A preliminary comparison of the RTs for positive unspecific primes and negative unspecific primes was performed. Separate paired *t*tests revealed neither a significant difference between positive (547 ms, *s.e.m.* = 21 ms) and negative unspecific primes (552 ms, *s.e.m.* = 22 ms) for climbers, *t*(13) = 0*.*56, *p* = 0*.*59, nor for positive (529 ms, *s.e.m.* = 14 ms) and negative unspecific primes (535 ms, *s.e.m.* = 14 ms) for non-climbers, *t*(17) = 0*.*98, *p* = 0*.*34. This result indicates that positive unspecific primes and negative unspecific primes did not evoke differential priming effects. Thus, further analyses were computed with the mean value of positive unspecific and negative unspecific primes. This parameter value of the factor congruency was simply termed neutral condition.

Mixed ANOVAs with the within-subjects factor *congruency* (congruent, incongruent, neutral) and the between-subjects factor *expertise* (climbers vs. non-climbers) were performed with RT and ER as dependent variables. A violation of the sphericityassumption resulted in a correction of the *p*-values according to Greenhouse-Geisser1 .

#### *Reaction times*

The within subjects factor *congruency* reached significance, *<sup>F</sup>*(1*,* 60) <sup>=</sup> <sup>4</sup>*.*86, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*01, <sup>ε</sup> <sup>=</sup> <sup>0</sup>*.*84, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*14, as well as the

hold requiring a crimp grip is followed by a target picture depicting a sideways pull.

interaction between *congruency* and *expertise*, *F*(2*,* 60) = 3*.*88, *p* = <sup>0</sup>*.*03, <sup>ε</sup> <sup>=</sup> <sup>0</sup>*.*84, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*12. The between subjects factor *expertise* did not reach significance (*p* = 0*.*75).

To illuminate the source of the interaction, paired *t*-tests were performed separately for climbers (one-tailed) and non-climbers (two-tailed). Climbers responded significantly faster to congruent compared to incongruent prime-target pairs, *t*(13) = 2*.*52, *p* = 0*.*01. Responses to grasping postures preceded by a neutral prime were significantly slower than responses to congruent prime-target pairs, *t*(13) = 2*.*02, *p* = 0*.*03, and significantly faster than responses to incongruent prime-target pairs, *t*(13) = 1*.*89, *p* = 0*.*04.

Non-climbers in contrast responded not differently fast (*p* = 0*.*72) to grasping postures preceded by a congruent climbing hold and by an incongruent climbing hold. Interestingly, responses to grasping postures preceded by a neutral climbing hold were significantly faster than congruent, *t*(17) = 2*.*18, *p* = 0*.*04, and also faster than incongruent climbing holds, *t*(17) = 2*.*78, *p* = 0*.*01.

Mean values of the RTs and ERs and corresponding confidence intervals are illustrated in **Figure 6** and additionally displayed in Data Sheet 1 in the Supplementary Material.

## *Response errors*

A mixed ANOVA on the mean ERs neither revealed a significant effect for the within subjects factor *congruency* (*p* = 0*.*23) nor for the between subjects factor *expertise* (*p* = 0*.*60). The interaction between *congruency* and *expertise* reached statistical significance, *<sup>F</sup>*(2*,* 60) <sup>=</sup> <sup>4</sup>*.*30, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*01, <sup>ε</sup> <sup>=</sup> <sup>0</sup>*.*82, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*17. To compare the results of the analysis of the RTs with the ERs, paired *t*-tests were computed separately for climbers (one-tailed) and non-climbers (two-tailed).

For climbers, responding was less error-prone with congruent compared to incongruent trials, *t*(13) = 2*.*80, *p* = 0*.*01. The comparison between the incongruent and the neutral condition also reached statistical significance, *t*(13) = 2*.*8, *p* = 0*.*01, indicating a higher ER for incongruent compared to neutral prime-target pairs. The comparison between the congruent and the neutral condition did not reach significance (*p* = 0*.*27).

For non-climbers, none of the comparisons revealed a significant effect (all *p*s *>* 0.30), but the error rate was slightly smaller for incongruent primes compared to congruent and neutral primes.

<sup>1</sup>A violation of the sphericity-assumption requires modifications of the degrees of freedom (*df*s) to reduce the Type I error. The estimates for this modification is denoted by epsilon (*ε*) providing a measure of deviation from sphericity. An epsilon of 1 indicates that sphericity is exactly met, smaller epsilon values indicate increasing deviation. Due to better reading, the uncorrected *df*s are given for the *F*-value, but the corrected *df*s can be easily calculated by multiplying the original *df*s with *ε*.

### *Discussion experiment 2*

Experiment 2 aimed to investigate the activation of grasping postures by visually presenting climbing holds and how such activation is influenced by skill level. In accordance with the hypotheses, a congruency effect was found for climbers, that is, faster responses were found for congruent trials compared to incongruent trials. The inclusion of an unspecific condition revealed that the found congruency effect is based both on speeded responses in the congruent condition and on slowed responses in the incongruent condition compared to the neutral baseline. It would be interesting to determine whether this priming effect arises at the perceptual level (perceptual priming, e.g., Biederman and Cooper, 1992), the cognitive level (e.g., Labeye et al., 2008), or at a motor stage of processing (response priming; e.g., Kunde et al., 2003). Regarding a perceptual locus of the priming effects, there is no visual relation or similarity between the climbing holds and the grasping postures that is larger for the congruent prime target pairs (hold-posture) compared to the incongruent pairs. Hence, we consider it implausible that our priming effects are due to perceptual (dis)similarity of the stimuli. Regarding a potential motor locus, the priming effect is unlikely to reflect a response activation or a response competition effect because the task instruction to classify a grasping posture as a crimp grip or as a sideways pull cannot directly be applied to the climbing hold pictures (primes). That is, the holds themselves should not activate or elicit a response *per se* (i.e., pressing a left or right response button). Therefore, we would argue that the priming effects do not arise during perceptual or motor stages but during cognitive processing stages.

More precisely, the result pattern of speeded responses in congruent trials and slowed responses in incongruent trials, both relative to a neutral baseline, points toward two cognitive processing mechanisms. First, the perception of a given hold leads to an activation of the representation of that hold including the corresponding grasping action. Second, the perception of a given hold seems to lead to a reduced availability of non-corresponding grasping actions. These mechanisms of action activation and action inhibition might help to explain the efficient selection of grasping actions in climbing.

The results found in the non-climbers are also in accordance with the interpretation of a cognitive locus of the priming effects. Participants with no climbing experience are expected to possess no representations of climbing-specific grasping actions. Accordingly, non-climbers did not show any congruency effect. Unexpectedly, however, non-climbers had shorter reaction times for unspecific prime-target pairs compared to congruent und incongruent ones. Besides the possibility that this finding might be a random finding in principle, a possible *post-hoc* explanation for this result could be the following. Round objects similar to the unspecific climbing holds are also common in daily life, for example, as round rotatable button on a washing machine or as knob-like hold of a drawer. It is thus possible that nonclimbers applied their general knowledge, that round objects can be grasped either with a crimp grip or with a sideways pull (without knowing these climbing specific concepts), to the predominantly round shapes of climbing holds in the neutral condition. In contrast, non-climbers could not infer any associated grasping action from the unfamiliar climbing holds requiring a crimp grip or a sideways pull. The faster responses to targets following a prime picture with a round hold thus may reflect unspecific activations of grasping representations. Further studies are needed to confirm this suggestion.

#### **GENERAL DISCUSSION**

This study explored the relationship between the action-based cognitive representations of climbing holds and the object-based activation of the corresponding grasping actions. Experiment 1 investigated the structure of skill representations. (Note that we use the term climbing skill in this context specifically for the skilled manual use of climbing holds, i.e., the knowledge of correct grip application for individual climbing holds.) The results of Experiment 1 suggest that climbers organize visually perceived climbing holds categorically according to functional features (i.e., how to grasp in an indoor climbing context). Non-climbers, in contrast, showed an organizational structure that was not categorical in terms of skill-based (climbing) knowledge but rather based on unspecific world knowledge or superficial object features (e.g., form or color). Experiment 2 investigated the access of grasping knowledge, in particular whether and how functional features (i.e., grasping postures) are activated by object features (i.e., shapes of climbing holds). Here, we found evidence for activation of matching grasping postures and inhibition of nonmatching grasping postures by the perception of grip-specific holds.

We argue that in climbers, but not in non-climbers, the categorical memory structure reflects the functional features of climbing holds in the context of indoor climbing. This structure appears to follow functional distinctions of the associated actions and reflects climbers' manifold experiences of task-related action-effect coupling (Hoffmann, 2003). The qualitative changes in memory structures might also change perceptual information processing (cf. Ericsson and Kintsch, 1995). Having command over, for example, a grip category related to sideways pull would facilitate the recognition and processing of potentially distinct holds regarding their applicability and functional relevance for the adequate motor action. The present findings thus provide insights as to how skilled climbers may achieve a better climbing performance: automatic activation of adequate grasping actions in response to the perception of a specific climbing hold can be regarded as crucial mechanism to reduce the cognitive demands involved in decision making and the planning of selected motor actions. This mechanism thereby serves to reduce cognitive processing time, which, importantly in the climbing context, leads to reduced physical energy costs (i.e., muscle force needed to remain in a static posture while evaluating the next move). Climbing thereby represents a relevant example of how skill-based knowledge that can be accessed explicitly for cognitive control also supports evaluative, strategic action planning under resource constrained conditions.

Our results are in accordance with findings by Pezzulo et al. (2010) who reported better recall performance for difficult climbing routes in climbers compared with non-climbers. Pezzulo et al. (2010, p. 72) speculate that experts are better able to form motor chunks that are necessary for mastering perceived climbing routes by means of simulation and hence are better in memorizing the routes, but these authors also consider alternative explanations such as better visual imagery in experts. Whereas Pezzulo et al. (2010) used an offline measure of memorization, Experiment 2 in our study investigated online processing (i.e., short-term activation) of skill representations which are suggested to be categorically structured according to our Experiment 1.

The assumption of categorical skill representations raises the question of how specific actions are selected. Generally, it is conceivable that some holds have multiple grasping possibilities and should, thus, activate more than one category of climbing holds. Here, Experiment 2 yielded evidence for processes of activation and inhibition. The results of Experiment 2 are also in accordance with Labeye et al. (2008) who also found that the perception of (manipulable) objects activates associated actions, even though we used considerably longer stimulus onset asynchronies compared to Labeye et al. (2008). Moreover, Experiment 2 suggests that such action feature activations depend on previous learning or experience with the object and not on the pure physical object properties as the non-climbers' data pattern showed the fastest responses in the unspecific conditions.

Our results corroborate findings from studies of complex movement representations (Bläsing et al., 2009; Güldenpenning et al., 2011, 2013; Weigelt et al., 2011; Land et al., 2013) and skill acquisition (Frank et al., 2013). They emphasize the role of cognitive representations and processes in action control and support the view that skill representations are based on categorical knowledge. In this regard, our study confirms the role of cognitive processes in the control of complex human actions as proposed in frameworks such as the ideomotor approach (Koch et al., 2004; for a histoical overview of the ideo-motor principle, see Stock and Stock, 2004), the theory of event coding (TEC; Hommel et al., 2001), or the cognitive action architecture approach (CAA-A; Schack and Ritter, 2009; Land et al., 2013).

Taken together, the present studies investigated the cognitive representations of indoor climbing holds and the perceptual processing of such holds and associated grasping postures in climbers and non-climbers. Experienced climbers represent holds according to their functions (i.e., grip types) whereas non-climbers show less structure in their representations and organize these according to unspecific action knowledge or superficial features. It was also found that the perception of climbing holds activates the matching grasping posture and inhibits non-matching postures in climbers but not in non-climbers. These findings suggest a processing advantage and mechanism of categorical action representations. Furthermore, the findings show that action experience modifies the relevant object representations by associating action features to the representations of corresponding objects.

## **ACKNOWLEDGMENTS**

The authors thank Marco Schweizer and Chistiane Preuß for help with the production of stimulus pictures and acquisition of data and Dr. Christoph Schütz for providing the NetSplit software used in Experiment 1. This study was supported by German Research Foundation Grant DFG EXC 277 "Cognitive Interaction Technology" (CITEC) (www.cit-ec.de/). We acknowledge support for the Article Processing Charge by the Deutsche Forschungsgemeinschaft (DFG) and the Open Access Publication Funds of Bielefeld University Library.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fpsyg*.* 2014*.*01008/abstract

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 May 2014; paper pending published: 15 June 2014; accepted: 24 August 2014; published online: 15 September 2014.*

*Citation: Bläsing BE, Güldenpenning I, Koester D and Schack T (2014) Expertise affects representation structure and categorical activation of grasp postures in climbing. Front. Psychol. 5:1008. doi: 10.3389/fpsyg.2014.01008*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Bläsing, Güldenpenning, Koester and Schack. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Timing skills and expertise: discrete and continuous timed movements among musicians and athletes

#### *Thenille Braun Janzen1,2, William Forde Thompson1 \*, Paolo Ammirante3 and Ronald Ranvaud2*

*<sup>1</sup> Department of Psychology, Macquarie University, Sydney, NSW, Australia*

*<sup>2</sup> Department of Neuroscience and Behavior, Institute of Psychology, University of São Paulo, São Paulo, Brazil*

*<sup>3</sup> Department of Psychology, Ryerson University, Toronto, ON, Canada*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Bettina E. Bläsing, Bielefeld University, Germany Lawrence Baer, Concordia University, Canada*

#### *\*Correspondence:*

*William Forde Thompson, Department of Psychology, Macquarie University, Building C3A, Level 5, Macquarie Drive, Sydney, NSW 2109, Australia e-mail: bill.thompson@mq.edu.au*

**Introduction:** Movement-based expertise relies on precise timing of movements and the capacity to predict the timing of events. Music performance involves discrete rhythmic actions that adhere to regular cycles of timed events, whereas many sports involve continuous movements that are not timed in a cyclical manner. It has been proposed that the precision of discrete movements relies on event timing (clock mechanism), whereas continuous movements are controlled by emergent timing. We examined whether movement-based expertise influences the timing mode adopted to maintain precise rhythmic actions.

**Materials and Method:** Timing precision was evaluated in musicians, athletes and control participants. Discrete and continuous movements were assessed using finger-tapping and circle-drawing tasks, respectively, based on the synchronization-continuation paradigm. In Experiment 1, no auditory feedback was provided in the continuation phase of the trials, whereas in Experiment 2 every action triggered a feedback tone.

**Results:** Analysis of precision in the continuation phase indicated that athletes performed significantly better than musicians and controls in the circle-drawing task, whereas musicians were more precise than controls in the finger tapping task. Interestingly, musicians were also more precise than controls in the circle-drawing task. Results also showed that the timing mode adopted was dependent on expertise and the presence of auditory feedback.

**Discussion:** Results showed that movement-based expertise is associated with enhanced timing, but these effects depend on the nature of the training. Expertise was found to influence the timing strategy adopted to maintain precise rhythmic movements, suggesting that event and emergent timing mechanisms are not strictly tied to specific tasks, but can both be adopted to achieve precise timing.

**Keywords: emergent timing, event timing, expertise, training, music, sports**

## **INTRODUCTION**

Experts such as musicians and athletes rely on precise timing of bodily movements. However, whereas musicians are especially skilled at discrete rhythmic actions that adhere to regular cycles of timed events (meter and pulse) (Repp and Doggett, 2007; Baer et al., 2013; Albrecht et al., 2014), athletic sports often involve fluid and continuous movements that are not timed in a cyclical manner (Sternad et al., 2000; Jaitner et al., 2001; Jantzen et al., 2008; Balague et al., 2013). Research suggests that the timing of discrete movements (i.e., those preceded and followed by a period without motion) and continuous movements depend on different strategies or processes (Robertson et al., 1999; Zelaznik et al., 2002; Huys et al., 2008; Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). Specifically, the timing of discrete movements is thought to involve a clock-like mechanism that incorporates an explicit representation of the time interval delineated by each discrete movement. In contrast, activities that involve smooth and continuous rhythmic movements are thought to be based on emergent timing, whereby timing regularity emerges in the absence of a representation of time interval from the control of parameters such as movement trajectory and velocity.

The hypothesis that event and emergent timing are distinct and dissociable systems is supported by a substantial body of evidence. Behavioral studies have shown that temporal variability in finger tapping is usually uncorrelated with variability in continuous circle drawing (Robertson et al., 1999; Zelaznik et al., 2005), and that event-timed movements, such as tapping, are significantly more precise and adjust faster to timing perturbations than continuous movements such as circle drawing (Elliot et al., 2009; Repp and Steinman, 2010; Studenka and Zelaznik, 2011). There is also neurological (Ivry et al., 2002; Spencer et al., 2003, 2005) and neuroimaging (Schaal et al., 2004; Spencer et al., 2007) evidence that event and emergent timing processes recruit different brain areas.

However, recent results have raised doubts that discrete and continuous movements always engage event and emergent timing mechanisms, respectively (Jantzen et al., 2002, 2004; Repp and Steinman, 2010; Studenka et al., 2012; Studenka, 2014). For example, evidence suggests that the presence of perceptual events marking the completion of time intervals can induce event timing even for tasks performed with continuous movements (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). Computational simulations and behavioral studies also suggest that task tempo and movement speed constraints (Huys et al., 2008; Zelaznik and Rosenbaum, 2010), as well as task order and practice (Jantzen et al., 2002, 2004), are important influences on the timing mechanism adopted for a certain task. Based on the suggestion that the timing mechanisms recruited to perform rhythmic movements are significantly influenced by several factors, the present investigation tested whether two distinct forms of expertise and training (music and sport) differentially influence the strategy that is engaged to perform movement-based timing tasks.

Practice is generally regarded in the motor learning literature as one of the most essential predictors of motor skill acquisition (Schmidt and Lee, 1988; Smith, 2003; Tenenbaum and Eklund, 2007; but see Mosing et al., 2014) and researchers have suggested that the amount of deliberate practice is directly associated with the level of expertise acquired by athletes and musicians (Ericsson et al., 1993; Ericsson, 1996; Howe et al., 1998). It is well known that highly trained musicians are exceptionally precise in discretetiming tasks, such as finger tapping with an auditory metronome (Repp, 2005, 2010; Repp and Doggett, 2007; Baer et al., 2013). Musicians tend to show smaller asynchronies and lower tapping variability when tapping to a metronome compared with nonmusician counterparts (Aschersleben, 2002; Repp, 2010). Musical expertise also seems to enhance the internal representation of time as suggested by perceptual studies showing that training can improve interval discrimination and perceptual sensitivity to timing perturbations (Buonomano and Karmakar, 2002; Ivry and Schlerf, 2008; Repp, 2010). Research also demonstrates that musicianship specifically interacts with tasks associated with discrete movements, and not continuous movements (Baer et al., 2013), which is consistent with the view that emergent and event timing are distinct mechanisms (Zelaznik et al., 2000, 2005) and suggests that music expertise is predominantly an event-based skill (Repp, 2005; Baer et al., 2013).

On the other hand, we know very little about how expertise and training might influence the operation of emergent timing mechanisms, and whether the effect of training in one movementbased expertise might transfer to other timing skills. The timing of continuous rhythmic movements, such as leg movement during cycling, walking and running, or arm movements during swimming or rowing, is thought to rely on emergent timing mechanisms (Kelso et al., 1981; Sternad et al., 2000; Jaitner et al., 2001; Jantzen et al., 2008; Elliot et al., 2009; Balague et al., 2013). This class of rhythmic movements is typically observed in sport activities such as rowing, swimming, running, and cycling, and could therefore be used as a model to study the effect of training in the production of precise continuous rhythmic movements. The purpose of the present study was to compare the ability of movement-based experts from different domains to engage in discrete and continuous movement tasks. Based on the hypothesis that musical performance involves predominantly discrete rhythmic actions that rely on event timing, and that athletic sports generally recruit fluid and continuous rhythmic movements based on emergent timing, we examined whether movement-based expertise is associated with specific or general timing skills. If the event and emergent timing processes are dependent on the nature of expertise and training, then athletes should be more precise in a timing task that involves continuous movements whereas musicians should be more precise in a timing task that involves discrete movements. In contrast, if musicians and athletes do not differ in their performance in both tasks, then this would suggest that timed movements are accomplished similarly in these two groups of movement-based experts and, therefore, that event and emergent timing mechanisms are not strictly tied to specific tasks, but may both be adopted to achieve precise timing.

Experiment 1 compared the performance of elite athletes, highly trained musicians, and controls on finger-tapping and circle-drawing tasks. The variability of inter-response intervals was measured in a synchronization-continuation paradigm. Participants were instructed to synchronize their movements with a metronome and continue the action at the same rate established by the metronome even when the pacing signal stopped (continuation phase). In Experiment 2, auditory feedback was presented in the continuation phase in order to assess the effect of the presence of salient perceptual events on the timing mechanism adopted to complete the tasks. Based on past research (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012; Baer et al., 2013), we predicted that the presence of auditory feedback would induce an event-timing strategy in the continuous movement task, regardless of the expertise of the participants.

## **EXPERIMENT 1**

## **MATERIALS AND METHODS**

#### *Participants*

Fifteen athletes were recruited through the Macquarie University Elite Athlete Scholarship Program. Athletes (8 females, 7 males) were on average 21.31 years old (*SD* = 2*.*33, range 18–26 years) and had been involved in athletic training for an average of 7.31 years (SD = 3.45). All athletes involved in the project were actively engaged in training and competing at State and/or National level in athletic sports, such as swimming, rowing, martial arts, rugby and others. None of the athletes had completed more than 2 years of musical training or were involved in any musical activities. Musicians (*n* = 13, 4 females) were recruited through the Departments of Music and Psychology at Macquarie University and local conservatories and universities. The average age of musicians was 21.38 years (*SD* = 3*.*20, range 18–28 years) and all participants had been involved in formal music training for at least 10 consecutive years (*M* = 10*.*85, *SD* = 2*.*38). Musicians played a range of instruments, including piano, guitar, and violin. Control participants (*n* = 17, 10 females) were on average 21.76 years old (*SD* = 3*.*31, range 18–31 years). None of the participants in the control group reported any formal athletic or music training. Groups did not differ significantly in mean age, *F*(2*,* 42) = 0*.*07, *p* = 0*.*93. All participants reported that they were right-handed and had no hearing or motor impairment. Psychology undergraduate students were compensated with course credit, and all other participants received financial compensation for their participation. All participants provided informed consent and were debriefed about the goals of the experiment.

## *Materials, stimuli, and procedure*

Stimulus presentation and data collection were done using a Macbook Pro 9.2 laptop running custom software written in Python and tasks were completed using an Apple single-button mouse. The task widely used to induce event timing is finger tapping, whereas circle drawing is thought to typify emergent timing (Repp and Steinman, 2010). The paradigm adopted for both tasks was synchronization-continuation (Stevens, 1886). For each trial, participants first synchronized their movements (circle drawing or finger tapping) with an isochronous metronome click for 18 clicks. The signal tones were 40 ms square waves clicks of 480 Hz presented at 74 dB. After the synchronization phase, the metronome stopped and participants were instructed to continue to produce 36 more movements at the tempo set by the metronome. Within each trial, one of two metronome tempi was used: slow (800 ms IOI) or fast (600 ms IOI).

In the finger-tapping task, participants repeatedly tapped on the mouse with their right index finger at the tempo set by the metronome pacing signals and continued to tap at the same rate when the signal was removed. Participants heard the pacing signals through Sennheiser HD 515 headphones with noise canceling and reduction, which prevented participants from hearing any sound produced by the finger tap. No auditory feedback was provided.

In the circle-drawing task, participants repeatedly moved the computer mouse with the right hand in a circle in time with the metronome and in a clockwise direction, and continued this motion in the absence of the external timing cue. Participants traced an unfilled circle template of 5 cm in diameter displayed on the screen with the mouse cursor, and were instructed to synchronize every time the path of the cursor crossed an intersection at 270◦ of the circle with the metronome. Participants were told that timing precision was more relevant than drawing accuracy, and they were free to draw a circle at their preferred size.

Participants had 5 practice trials at 600 ms IOI before each experimental block. Trials were blocked by task, with tapping performed before circle drawing (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). For each task, trials were blocked by tempo, with the order of the two tempo conditions and the 10 trials within each tempo condition randomized independently for each participant. Participants were permitted to take breaks in between trials at any time. The experiment took approximately 50 min.

#### *Data analysis*

Only responses in the continuation phase were analyzed as the synchronization phase was used only to establish the pacing. In order to allow for acceleration commonly observed in the transition from the synchronization to continuation phase (Flach, 2005), only the final 30 movements were analyzed. For the finger-tapping task, inter-response interval (IRI) was defined as the elapsed time between sequential taps (in milliseconds) and for the circle-drawing task, IRI was defined as the elapsed time between successive passes through the intersection. Outlier IRIs were identified as those 60% longer or shorter than the target IRI for a given trial (4% of all IRIs analyzed in Experiment 1; 2% in Experiment 2), and were deleted.

Several timing measures were used. First, mean IRI within a trial served as a measure of timing accuracy. Second, to measure timing precision we analyzed participants coefficient of variation (CV), which was defined as the standard deviation of IRIs within a trial divided by its mean IRI (SD/Mean). Lower CV scores indicate greater precision. CV can be considered a measure of total IRI variability, including slow drift in IRI over the course of a trial, timing error, and motor implementation error. Third, dependencies between successive IRIs in each trial were measured using lag-one autocorrelation. Data were first linearly detrended to remove the impact of slower drift over the course of a trial on dependencies between successive IRIs. In general, discretetiming tasks are associated with negative lag-one autocorrelation. This has been proposed to arise from random delays in motor implementation (Wing and Kristofferson, 1973) that occur independently of a central clock mechanism. One such delay should both lengthen the IRI that it completes and shorten the one that it initiates; the accumulation of these delays over the course of a trial should be reflected in negative lag-one autocorrelation. Continuous-timing tasks, on the other hand, which are thought not to involve a central clock mechanism, have been shown to result in non-negative lag-one autocorrelation (Zelaznik and Rosenbaum, 2010; Baer et al., 2013). Thus, lag-one autocorrelation can serve as an index of event and emergent timing strategies. CV and lag-one autocorrelation values were averaged by task and tempo for each participant.

Finally, we sought to estimate clock and motor contributions to timing variance (Wing and Kristofferson, 1973) using slope analysis (Ivry and Hazeltine, 1995). Slope analysis takes advantage of the well-established finding that timing variance increases linearly as a function of squared target duration. Under the assumption that motor production is invariant across target durations, a positive slope (i.e., an increase in variance with target duration) is thought to be influenced entirely by duration-dependent variability (Studenka and Zelaznik, 2008). The intercept of this regression line, on the other hand, is thought to be durationindependent, i.e., reflecting variability in the motor aspect of the task (Studenka and Zelaznik, 2008). Different event-like tasks have been shown to exhibit equal slope values (Ivry and Hazeltine, 1995; Green et al., 1999), suggesting a common underlying central clock mechanism. On the other hand, (emergent) circle-drawing and (event) finger-tapping tasks have been shown to exhibit significantly different slopes (Robertson et al., 1999), suggesting different timing mechanisms. Individual differences in slope are also observed within tasks (Spencer et al., 2005; Baer et al., 2013), with lower slope values indicating less duration-dependent variability. In the current study, for each participant and for each task, slope and intercept values were obtained from a linear regression of detrended variance (averaged across trials) against squared target durations (600 and 800 ms2).

### **RESULTS**

Preliminary analysis of mean IRI during the continuation phase revealed that participants were accurate in maintaining the target tempi [fast tempo (600 ms IOI): *M* = 606; *SD* = 35; slow tempo (800 ms IOI): *M* = 818; *SD* = 55]. There were no significant differences between groups or group interactions.

Coefficient of variation (CV) scores were entered into a mixed design ANOVA with Task (circle drawing, finger tapping) and Tempo (fast, slow) as within-subject factors and Group (athletes, musicians, controls) as between-subject factors. There was a significant main effect of Task, *F*(1*,* 42) = 251*.*01, *p <* 0*.*001, demonstrating that participants were more precise in the finger-tapping task (*M* = 0*.*07) than the circle-drawing task (*M* = 0*.*23). It was also verified that there was no statistical difference in CV between the fast and slow tempi conditions, *F*(1*,* 42) = 1*.*16, *p* = 0*.*28, and no significant interaction between Task and Tempo, *F*(1*,* 42) = 2*.*25, *p* = 0*.*14.

Between-subjects analysis revealed a significant main effect of Group, *F*(2*,* 42) = 18*.*42, *p <* 0*.*001, and a significant interaction between Group and Task, *F*(2*,* 42) = 16*.*48, *p <* 0*.*001. Independent sample *t*-tests revealed that on the circle-drawing task athletes were significantly more precise than musicians, *t*(26) = 2*.*19, *p* = 0*.*03, and controls, *t*(30) = 7*.*00, *p <* 0*.*001. Musicians were significantly more precise than controls on the circle-drawing task, *t*(28) = 3*.*37, *p* = 0*.*002. On the fingertapping task, musicians were significantly more precise than controls, *t*(28) = 2*.*23, *p* = 0*.*03, while athletes were not significantly more precise than controls, *t*(30) = 1*.*87, *p* = 0*.*07 (**Figure 1**). The performance of musicians and athletes was not significantly different, *t*(26) = 0*.*61, *p* = 0*.*54. We also analyzed the correlation in CV between tasks for each of the groups tested. Results indicated that the variability in the finger-tapping task was not significantly correlated with the variability in the circle-drawing task for any group: musicians (*p* = 0*.*55), athletes (*p* = 0*.*08), and controls (*p* = 0*.*11).

Slope analysis was next performed to determine whether group differences could be attributed to duration-dependent and/or duration-independent sources. Although the slope analysis was performed with just two target tempi, slope values were almost entirely positive (44 of 46 participants in circle drawing; 45 of 46 participants in finger tapping), indicating greater variance for longer durations (slower tempo), which is consistent with the model's assumptions. An ANOVA on slope values revealed main effects of Task, *F*(1*,* 42) = 21*.*01, *p <* 0*.*001, and Group, *F*(2*,* 42) = 8*.*70, *p <* 0.001, as well as a marginal Group × Task interaction, *F*(2*,* 42) = 2*.*96, *p* = 0*.*06. As shown in **Figure 2**, slope values closely mirrored those for CV. On the circle-drawing task, slope values for athletes (*M* = 0*.*009) and musicians (0.008) were significantly lower than for controls (*M* = 0.02, *p* = 0*.*02). However, although the trend was in the same direction, unlike the CV values, athletes' and musicians' slope values did not differ from each other, *p* = 0*.*88. On the finger-tapping task, slope values were significantly lower for musicians (*M* = 0*.*002) than for athletes (*M* = 0*.*004, *p* = 0*.*03) or controls (*M* = 0*.*006, *p* = 0*.*006); as with the CV values, slope values for athletes and controls did not differ from each other (*p* = 0*.*33). Results of the correlation analysis on the slope values indicated no significant intra-individual correlations for any group. An ANOVA on the intercept values revealed no significant between-subjects effects or interactions. Taken together, the slope analysis indicates that group differences were duration-dependent, suggesting that they can be attributed to the functioning of a timing mechanism rather than to the motor constraints of the tasks.

One generally accepted indicator of the timing strategy adopted in a given task is found through the analysis of lagone autocorrelation. Tasks that involve an event timing strategy exhibit lag-one autocorrelation values between −0.5 and 0, whereas tasks that involve emergent timing strategies are associated with a non-negative lag-one autocorrelation (Zelaznik and Rosenbaum, 2010; Delignieres and Torres, 2011). Our data were only partially consistent with expectations. One sample *t*-tests [with *p*-value set at 0.01 to control for Type I error (Zelaznik

**FIGURE 2 | Slope for the circle-drawing and finger-tapping tasks per group in Experiment 1.** For each participant and for each task, slope values were obtained from a linear regression of detrended variance (averaged across trials) against squared target durations (600 and 800 ms2). Lower slope values indicate lower duration-dependent variability. Standard error bars are shown. Significant pairwise differences are marked with an asterisk (∗*p <* 0*.*05).

and Rosenbaum, 2010; Baer et al., 2013)] showed that group values were significantly negative in all conditions, which contrasts with the expectation of non-negative lag-one autocorrelations in the (emergent) circle-drawing task. A repeated measures ANOVA revealed that, as expected, lag-one autocorrelation values were significantly more negative in the finger tapping condition (*M* = −0*.*14) than the circle-drawing condition (*M* = −0*.*11), *F*(1*,* 42) = 8*.*43, *p <* 0*.*001. Lag-one autocorrelation values, however, did not significantly differ between fast and slow tempo conditions, *F*(1*,* 42) = 0*.*19, *p* = 0*.*66, and the interaction between Task and Tempo also did not reach statistical significance (*p* = 0*.*24).

Comparing lag-one autocorrelation scores between the different groups, the analysis indicated that there was a significant Group × Task interaction, *F*(2*,* 42) = 6*.*81, *p <* 0*.*001. Examination of the percentage of individuals in each group and condition with significantly negative lag-one correlations, as assessed by one sample *t*-tests on each individual's data, revealed that 60% of athletes and 59% of controls adopted an event-timing strategy to perform the circle-drawing task, whereas 93% of athletes and 88% of the controls used event timing in the fingertapping task. That is, the percentage of athletes and controls that tended to rely on an event-timing strategy was significantly larger (*p <* 0*.*001) for the finger-tapping task than for the circle-drawing task. Interestingly, lag-one autocorrelation values for musicians were not significantly different between tasks (*p* = 0*.*37), and musicians tended to adopt an event-timing strategy to perform both finger-tapping (63%) and circle-drawing tasks (85%).

#### **DISCUSSION**

The results of Experiment 1 demonstrated that movement-based experts were significantly more precise than controls on both timing tasks. Athletes were significantly more precise than controls in the circle-drawing task, and musicians were more precise than controls in the finger-tapping task (Repp, 2005; Repp and Doggett, 2007; Baer et al., 2013). This result suggests that expertise leads to enhanced timing precision in domain-related timing tasks and reinforces a dominant timing skill. This suggestion is supported by results showing that, whereas musicians were significantly more precise than controls in the finger-tapping task, the performance of elite athletes did not differ significantly from controls. This result indicates that the group differences observed in this study can be attributed specifically to the functioning of a timing mechanism rather than motor control in general.

A novel finding of this research is that music training was associated with enhanced precision on a continuous-movement task. Past research has suggested that formal music training only enhances precision of discrete movements but not continuous movements (Baer et al., 2013). It should be acknowledged that the use of a computer mouse to perform the tasks might have influenced the results. However, task constraints cannot readily account for the discrepancy between our results and those reported by Baer et al. (2013). The slope analysis suggests that our results are best explained by the functioning of a timing mechanism rather than by the constraints of the tasks. Hence, it can be speculated that group differences may contribute to the discrepancy of results in these studies, such as number of years of formal music training, instrument of expertise, amount of current involvement in musical activities, or age of commencement of training. Research is needed to assess the extent to which these factors contribute to the development of timing skills.

The finding that music training was associated with enhanced precision on the continuous-movement task is compatible with the hypothesis that the distinction between event and emergent timing is not as rigid as initially proposed, and that these mechanisms are not strictly tied to specific tasks such as tapping and circle drawing (Jantzen et al., 2002, 2004; Repp and Steinman, 2010; Studenka et al., 2012; Studenka, 2014). The hypothesis that the dissociation between event and emergent timing is not an all-or-nothing process (Repp and Steinman, 2010; Studenka et al., 2012) implies that the circumstances in which the different timing modes are employed are open for investigation. In Experiment 1, lag-one autocorrelation values were significantly negative in all conditions, suggesting that participants tended to adopt an event-timing strategy for both discrete and continuous tasks. More specifically, approximately 60% of participants in each group tended to adopt event-timing strategies to perform the circle-drawing task. Interestingly, whereas the percentage of athletes and controls that adopted event timing was higher for finger tapping than for circle drawing, the percentage of musicians that relied on event timing was not statistically different between tasks. One interpretation of this result is that years of formal music training prompted participants to rely on event-timing mechanisms to perform any timed movement, even when those movements are continuous (Studenka et al., 2012; Baer et al., 2013).

In Experiment 2, we further explored the hypothesis that movement-based expertise is associated with enhanced skill in discrete and continuous movement, while reinforcing one predominant timing mode. We also reexamined recent evidence that when participants are engaged in a timing task, the presence of salient feedback that defines the completion of cyclical time intervals elicits timing behavior consistent with event timing, even for continuous-movement tasks (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). Studenka et al. (2012) showed that the introduction of discrete tactile events presented at the completion of each cycle of movement induced event timing in a typically emergent timing task. This finding corroborated a previous study that suggested that event timing can be elicited by the insertion of regular cycles of auditory feedback (Zelaznik and Rosenbaum, 2010). To examine these issues, in Experiment 2 we tested whether the presence of auditory feedback elicits an event-timing strategy for a circle-drawing task among participants with intense musical or athletic training.

#### **EXPERIMENT 2**

#### **MATERIALS AND METHODS**

#### *Participants*

Thirty-one elite athletes (10 females) were recruited from Macquarie University through the Elite Athlete Scholarship Program. Athletes' average age was 21.06 years old (*SD* = 3*.*69, range 18–32 years) and they had been involved in athletic training for an average of 8.31 years (*SD* = 5*.*55). Athletic training included sports that require discrete interactions with a ball or other projectile (e.g., kicking, catching, or repelling a ball in soccer, rugby, or volleyball) and sports that primarily involve continuous movements (e.g., strokes in swimming, cycling, rowing). None of the athletes were involved in the first experiment and none had had more than 2 years of musical training. Musicians (*n* = 17, 15 females) were recruited through the Departments of Music and Psychology at Macquarie University and local conservatories and universities. The average age of musicians was 20.72 years (*SD* = 3*.*52, range 18–29 years). Musicians were all currently involved in music activities for a minimum of 2 h/week and all had been involved in formal music training for at least 10 consecutive years (*M* = 11.94, *SD* = 2*.*68). None of the participants were involved in the previous study. Musicians played a range of instruments, including piano, guitar, and violin. Control participants (*n* = 10, 10 males) were postgraduate or professional computer programmers recruited through the Computer Science Department at Macquarie University. Participants were on average 31.58 years old (*SD* = 7*.*21, range 22–49 years), and had an average of 10 years of training in their area of expertise (*SD* = 6*.*07), and reported that they had no previous formal athletic training and no significant past or current involvement in music. Because the control group consisted of professionals and postgraduate students, there was a significant group difference in mean age [*F*(2*,* 55) = 26*.*09, *p <* 0*.*001]. All participants provided informed consent and were debriefed about the goals of the experiment. Fifty-six participants were right-handed and two were left-handed, and all participants reported that they had no hearing or motor impairment. Participants received financial compensation for their participation.

#### *Materials, stimuli, procedure, and data analysis*

Stimulus presentation and data collection involved the same equipment as in Experiment 1, with the exception that participants completed the tasks using the laptop's touch pad in order to facilitate performance on the circle-drawing task. Procedures and data analysis followed the protocol established in Experiment 1. The main change was the introduction of auditory feedback at the continuation phase of the task. For each trial, after participants synchronized their movements (circle drawing or finger tapping) with an isochronous metronome for 18 pacing signals, the metronome stopped and participants were instructed to continue to produce 36 more movements at the tempo established by the metronome.

For the finger-tapping task, participants repeatedly tapped on the touch pad with their right index finger at the tempo set by the metronome. In the continuation phase, every tap triggered a feedback tone of 40 ms duration with a fundamental frequency of 480 Hz and at an intensity of 74 dB SPL. In the circle-drawing task, participants repeatedly traced an unfilled circle template of 5 cm in diameter displayed on the screen with the mouse cursor using their right index finger in time with the metronome and continued the task in the absence of the external timing cue. Participants were told to pass the cursor over a crossing intersection at 270◦ of the circle in synchrony with the metronome. In the continuation phase, every time the cursor trajectory crossed the intersection the auditory feedback was provided.

## **RESULTS**

Participants were accurate in maintaining the target tempo during the continuation phases of trials [fast tempo (600 ms IOI): *M* = 613 (*SD* = 25); slow tempo (800 ms IOI): *M* = 791 (*SD* = 39)]. An analysis of mean IRI across the two tasks showed no significant group differences or group interactions. That is, all three groups maintained a similar overall tempo in the continuation phase of the timing tasks.

To measure timing precision, CV scores were averaged by task and tempo for each participant and entered into a mixed design ANOVA with Task (circle drawing, finger tapping) and Tempo (slow, fast) as within factors and Group (athletes, musicians, controls) as the between-subjects factor. The analysis revealed a significant main effect of Task, *F*(1*,* 55) = 4*.*60, *p* = 0*.*03, and a paired sample *t*-test confirmed that across the three groups performance on the finger-tapping task (*M* = 0*.*05) was significantly more precise than on the circle-drawing task (*M* = 0.10), *t*(57) = 6*.*87, *p <* 0*.*001. There was also a main effect of Tempo, *F*(1*,* 55) = 35*.*61, *p <* 0*.*001, and a significant interaction between Task and Tempo, *F* = 17.69, *p <* 0*.*001. Results indicated that precision was significantly better for fast tempo (*M* = 0*.*05, *p <* 0*.*001) than slow tempo (*M* = 0*.*11) in the finger-tapping task. Participants were also significantly more precise in fast tempo (*M* = 0*.*06) than slow tempo (*M* = 0*.*09, *p <* 0*.*001) in the circledrawing task.

Between-subjects analysis indicated that there was a significant main effect of Group, *F*(2*,* 55) = 3*.*23, *p* = 0*.*04, and a marginally statistical interaction between Task, Tempo and Group, *F*(2*,* 55) = 2*.*81, *p* = 0*.*06. Analysis of the circle-drawing task showed that musicians were significantly more precise than controls on the circle-drawing task (*p* = 0*.*01), but there was no statistical difference between the performance of athletes and musicians (*p* = 0*.*24), or between athletes and controls (*p* = 0*.*07). A similar pattern was observed for the finger-tapping task, which also corroborated the results of Experiment 1: musicians were significantly more precise than controls (*p* = 0*.*04), but no other significant group differences were observed (athletes and controls, *p* = 0*.*14; musicians and athletes, *p* = 0*.*33; see **Figure 3**).

Different subgroups of athletes were included in the study (e.g., swimming, rowing, rugby, volleyball, squash, triathlon, ice hockey, martial arts, and others). We also examined whether performance differed between athletes specializing in sports that require discrete interactions with a ball or other projectile (e.g., kicking, catching, or repelling a ball in soccer, rugby, or volleyball) and athletes trained in continuous movements (e.g., strokes in swimming, cycling, rowing). An independent sample *t*-test indicated that there was no statistical difference between athletes of sports based on different movement class on either the circledrawing task, *t*(29) = 1*.*40, *p* = 0*.*17, or the finger-tapping task, *t*(29) = 0*.*31, *p* = 0*.*75.

Slope analysis was next conducted to determine whether, as in Experiment 1, the group differences in CV could be isolated to duration-dependent variability. As shown in **Figure 4**, a close correspondence was again observed. As with the CV values, only a main effect of Group was significant, *F*(2*,* 55) = 3*.*79, *p* = 0*.*03. Slope values, like CV values, were lower for musicians (*M* = 0.002) than athletes (*M* = 0.005; *p* = 0*.*03) and

controls (*M* = 0.005; *p* = 0*.*02), but did not differ between athletes and controls (*p* = 0*.*96). In contrast, an ANOVA on the intercept values revealed no significant between-subjects effects or interactions, and intraindividual correlations on the slope values were also not significant for any group. Thus, as in Experiment 1, group differences in total variability (as indexed by CV) could be isolated to duration-dependent variability (e.g., arising from noise in a central timekeeping mechanism) rather than duration-independent differences associated with the motor implementation of these tasks.

Previous research has suggested that the introduction of a perceptual event, such as tactile or auditory feedback, can strongly induce event-timing strategies (as indexed by negative lag-one autocorrelations) even for tasks performed with continuous, smoothly-produced movements (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). Our data were generally consistent with these findings. One sample *t*-tests showed that group means were significantly negative in all conditions (see **Figure 5**). A mixed-design ANOVA with Task (circle drawing, finger tapping) and Tempo (slow, fast) as within-subject factors, and Group (athletes, musicians, controls) as the betweensubjects factor, revealed that lag-one autocorrelations values were

not significantly different between tasks [*F*(1*,* 55) = 0*.*21, *p* = 0*.*64]. Lag-one autocorrelation values were significantly different between fast and slow conditions in Experiment 2, *F*(1*,* 55) = 6*.*23, *p* = 0*.*01, and there was also a significant interaction between Task and Tempo (*p* = 0*.*002). Pairwise comparisons indicated that there was a significant difference between lag-one autocorrelation scores in the slow (*M* = −0*.*11) and fast conditions (*M* = −0*.*17) for the finger-tapping task (*p* = 0*.*001), but lagone autocorrelation values did not significantly differ between the slow (*M* = −0*.*13) and fast conditions (*M* = −0*.*12) for the circle-drawing task (*p* = 0*.*70).

An examination of the percentage of individuals in each group and condition with significantly negative lag-one autocorrelations, as assessed by one sample *t*-tests on each individual's data, revealed that 90% of participants in the control group adopted an event-timing strategy to perform the circle-drawing task in Experiment 2, and 60% of participants in this group used event timing to perform the finger-tapping task. Among movementbased experts, the percentage of individuals that adopted an event-timing strategy to perform the circle-drawing and fingertapping tasks was similar for musicians (76%) and athletes (68%). ANOVA confirmed that there was no interaction between Group (musicians, athletes, controls) and Task (finger tapping, circle drawing), *F*(2*,* 55) = 0*.*98, *p* = 0*.*38. Taken together, these results suggest that the majority of participants adopted an event-timing strategy to perform both tasks in Experiment 2 (**Table 1**).

To test whether the presence of auditory feedback defining the completion of cyclical time intervals influenced the timing strategy adopted, we examined the percentage of individuals in each group and condition that had significantly negative lagone autocorrelations between Experiments 1 and 2 (see **Table 1**). Examination of **Table 1** suggests that the use of event timing depended on the condition and expertise of the participant. First, for the circle-drawing task, a smaller percentage of control (nonexpert) participants used an event-timing strategy when there was no auditory feedback (59%) than when there was auditory feedback (90%). Second, when there was no auditory feedback, musicians were more likely to use an event-timing strategy (85%) than control participants (59%).

**Table 1 | Percentage of individuals with significantly negative lag-one autocorrelation values for each group and condition in Experiment 1 (no auditory feedback) and Experiment 2 (with auditory feedback), and Event Timing Index (ETI: the percentage of individuals with negative lag-one autocorrelation/meanCV).**


However, interpreting the raw percentage of negative autocorrelation values is complicated by the fact that these values not only reflect a tendency to adopt an event timing strategy; they also reflect timing variability (van Beers et al., 2013). Thus, changes in the percentage of negative (lag one) autocorrelation values may reflect a change in the strategies adopted by participants; a change in the average variability of timing; or both. For this reason, it is useful to consider the percentage of negative autocorrelation values *relative* to the average CV of participants in each condition. Thus, we defined the *Event Timing Index* (ETI) as the percentage of participants with negative lag-one autocorrelations divided by the CV averaged across these same participants. This normalized measure of event timing permits a meaningful comparison between conditions.

**Table 1** displays ETI values in parentheses. These values suggest that across groups and experiments, an event-timing strategy was used significantly less for circle drawing (mean ETI = 6.6) than for finger tapping (mean ETI = 13.0). For circle drawing, there was a greater tendency for control participants to adopt an event timing strategy when there was auditory feedback (ETI = 8.6) than when there was no auditory feedback (ETI = 3.3). This finding suggests that, for circle drawing, the presence of a salient perceptual event defining the completion of cyclical time intervals influenced the timing strategy adopted by non-experts.

Musicians, however, exhibited a comparatively strong tendency to employ event timing when performing the circledrawing task regardless of whether there was or was not auditory feedback (ETI = 8.3 and 8.1, respectively). For finger tapping, the tendency to adopt an event-timing strategy was similar in the two Experiments (mean ETI in Exp 1 = 13.1; mean ETI in Exp 2 = 13.0). In short, introducing a salient perceptual event demarcating the completion of each movement cycle encouraged an event-timing strategy for circle drawing, but not finger tapping. One interpretation of this finding is that there was already a strong tendency to employ an event-timing strategy for finger tapping, so the inclusion of auditory feedback had no additional impact on this tendency.

### **DISCUSSION**

The findings of Experiment 2 confirmed that participants performed significantly more precisely in the finger-tapping task than in the circle-drawing task. The results also indicated that precision was significantly better for the fast-tempo condition than the slow-tempo condition in both finger-tapping and circle-drawing tasks. Previous studies have also reported significant interactions between task precision and tempo, suggesting that the timing mechanism adopted is affected by the rate of timed movements (Huys et al., 2008; Repp, 2008; Zelaznik and Rosenbaum, 2010). The slope analysis also suggests that the differences in total variability cannot be attributed to differences associated with the motor implementation of the tasks, but to duration-dependent variability (e.g., in event timing, a central clock mechanism accumulates error as the interval duration increases).

The results of Experiment 2 also confirmed that musicians were significantly more precise than controls at both finger tapping and circle drawing. It can be suggested that music training may engage and refine both discrete and continuous movements. One explanation for this result is that both event and emergent timing are implicated in the accurate timing achieved by elite performers, and that music training leads to the enhanced operation of both timing modes (Jantzen et al., 2002, 2004; Repp and Steinman, 2010; Studenka et al., 2012; Studenka, 2014). However, it is also possible that the superior performance by musicians on both discrete and continuous tasks could be largely attributable to an enhanced central clock mechanism, as the results of the lagone autocorrelation analysis suggested that the vast majority of musicians tended to employ an event-timing strategy to perform both discrete and continuous tasks.

In Experiment 2, the performance of athletes in the circledrawing task did not differ significantly from that of controls, in contrast to the results of Experiment 1. This discrepancy could be related to the fact that 90% of controls and 68% of athletes adopted an event-timing strategy to perform the continuous-movement task when auditory feedback was available (Experiment 2), whereas 59% of controls and 60% of athletes used event timing to perform the circle-drawing task when auditory feedback was not available (Experiment 1). In other words, in the presence of auditory feedback there was a greater tendency to adopt an event timing strategy to perform the circle-drawing task, especially for participants without movement-based expertise. This finding corroborates previous evidence that the introduction of a perceptual event, such as tactile or auditory feedback, induces an event-timing strategy even for tasks performed with continuous movements (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). This tendency may explain why the precision of continuous movements increases when auditory feedback is available (Zelaznik and Rosenbaum, 2010). More generally, it is known that the presence of feedback significantly enhances timing precision Aschersleben et al., 2001; Aschersleben, 2002; Stenneken et al., 2006; Goebl and Palmer, 2009, and event timing is preferred in synchronization tasks, given that discrete actions are less variable and quicker to adjust after perturbations in the sensory input (Elliot et al., 2009).

Taken together, the results of Experiment 2 corroborate findings obtained in Experiment 1 that movement-based expertise significantly improves timing skills, and that extensive training in music leads to enhanced precision for both discrete and continuous movements. The findings also support the hypothesis that event and emergent timing are not uniquely tied to specific types of movements but can be influenced by expertise (Jantzen et al., 2002, 2004; Zelaznik and Rosenbaum, 2010), the presence of feedback (Studenka and Zelaznik, 2011), and movement speed (Huys et al., 2008).

## **GENERAL DISCUSSION**

This investigation sought to examine the effects of expertise and training on the precision of timed movements. The results are compatible with the view that movement-based training significantly enhances the precision of timing skills, and that this effect depends on the nature of the training. It was also observed that expertise is an important predictor of the timing mechanism that is engaged during timed actions. These findings help to clarify the distinction between event and emergent timing mechanisms by showing that expertise and training can influence the timing mode that is employed in a particular movement-based task.

Experiment 1 demonstrated that athletes were significantly more precise in the production of continuous rhythmic movements, whereas musicians were significantly more precise in discrete rhythmic movements in the absence of auditory feedback. These results indicate that intense training and expertise can help to improve timing precision, and corroborate the initial hypothesis that music performance relies predominantly on event timing (Repp and Doggett, 2007; Baer et al., 2013; Albrecht et al., 2014), whereas athletic activities tend to employ more predominantly smooth and continuous movements based on emergent timing (Kelso et al., 1981; Sternad et al., 2000; Jaitner et al., 2001; Jantzen et al., 2008; Balague et al., 2013). Thus, hours of daily practice involving a predominant type of movement (i.e., discrete or continuous) may reinforce one dominant timing mode. This finding is particularly relevant to the development of educational and rehabilitation programs that could greatly benefit from activities targeting specific classes of movements.

It is important to state, however, that actions can be implemented in different ways (e.g., walking vs. marching) and may often engage multiple mechanisms simultaneously. For example, playing the piano not only requires precise timing of the pianist' keystrokes but also a fluid transition of the hand across the piano keys. Rowing or swinging a badminton racquet, on the other hand, are continuous actions in the sense that the movement is not smooth and interrupted; however they are discrete insofar as movements are segmented by perceptual events (e.g., contact of the oar with the water, and the racquet with the shuttlecock). Therefore, whereas it is possible to isolate discrete and continuous movements in laboratory for experimental purposes, performances often require both classes of rhythmic actions (Sternad et al., 2000; Hogan and Sternad, 2007; Sternad, 2008; Repp and Steinman, 2010; Studenka, 2014). The results of the present study do not support the idea that musical and athletic skill are associated with event-timing and emergent timing, respectively. On the contrary, our findings suggest that to accurately perform timing tasks at high skill level, experts may rely on both timing modes, although one timing mechanism is often dominant. Therefore, an essential skill in movement-based expertise is to smoothly transition between movements of different classes.

Our findings are consistent with the idea that event and emergent timing mechanisms are not strictly tied to specific tasks (Jantzen et al., 2002, 2004; Repp and Steinman, 2010; Studenka et al., 2012; Studenka, 2014). First, musicians were not only significantly more precise than controls in the finger-tapping task but also in the circle-drawing task, suggesting that music training refines both discrete and continuous rhythmic movements. Second, lag-one autocorrelation values were significantly negative in all conditions in Experiment 1, suggesting that participants tended to adopt an event-timing strategy to perform both discrete and continuous tasks, even when no salient perceptual event was present.

The analysis of the Event Timing Index (ETI) allowed us to further investigate the effect of auditory feedback on the timing strategy adopted to perform continuous movements. These results suggested that the percentage of musicians that used an event-timing strategy to complete the circle-drawing task did not change significantly when auditory feedback was provided at the end of each movement cycle. Years of formal music training might have prompted participants to rely on event-timing mechanisms to complete a continuous-movement task (Studenka et al., 2012; Baer et al., 2013; van Beers et al., 2013). These findings support the suggestion that expertise and training are important predictors of the timing mechanism engaged in maintaining precise timed actions. On the other hand, the percentage of participants who adopted event timing to complete the circle-drawing task significantly increased when auditory feedback was present, especially for control participants. For this task, 59% of participants tended to use an event-timing strategy when no auditory feedback was provided (Experiment 1), but 90% of the control group adopted an event-timing strategy when auditory feedback was provided (Experiment 2). This finding indicates that salient events (e.g., auditory, tactile) signaling the completion of a movement cycle can be used to generate an internal representation of the time intervals to be produced based on clock-like mechanisms (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012). It is known that sensory feedback enhances timing accuracy (Aschersleben et al., 2001; Rabin and Gordon, 2004; Repp, 2005; Goebl and Palmer, 2009; Gray, 2009). However, it is important to note that the manipulation of auditory feedback is possible in experimental conditions, but in real life circumstances multiple sources of feedback may be used to monitor and refine the accuracy and precision of timed actions (Aschersleben et al., 2001). Future studies are needed to examine the role of event and emergent timing mechanisms in the control of discrete and continuous rhythmic movements in ecologically valid conditions. Such research would shed light on the relative importance of these two timing strategies for the production of accurately timed movements in real-life circumstances.

It should be acknowledged that certain methodological features of our investigation limit the conclusions that can be drawn. First, we observed that lag-one autocorrelation values were significantly negative in all conditions in both experiments. This finding suggests that participants tended to adopt an event-timing strategy to perform both discrete and continuous tasks, even when no salient perceptual event was present. One explanation for this finding is that participants took part in the tapping task before the circle-drawing task, as in previous studies (Zelaznik and Rosenbaum, 2010; Studenka et al., 2012), and task order may have significantly influenced the timing strategy adopted by some participants. Previous research has suggested that practicing one timing task reinforces a particular timing strategy, which may then persist over time and over tasks (Jantzen et al., 2002, 2004; Studenka et al., 2012). Interestingly, this carry-over effect may have been stronger for some participants than others. Nonetheless, this possibility also corroborates a central conclusion of the study: that timing strategies are not strictly tied to specific tasks but may be influenced by factors such as task order, expertise and training, and the presence of salient perceptual events.

Second, it is important to acknowledge that different groups of participants were used in the two experiments, preventing a within-subject comparison between the results of these experiments. For this reason, it is difficult to estimate that the precise effect of auditory feedback on the timing strategy adopted. To overcome this limitation, we developed the "Event Timing Index (ETI)." This index is essentially the relationship between the percentage of participants with negative lag-one autocorrelations divided by CV averaged across these same participants. The results of this analysis strongly suggest that auditory feedback influenced the timing strategy adopted for circle drawing but not finger tapping. Further research is required to validate this conclusion.

In summary, expertise in sports and music is significantly associated with enhanced precision of timing skills, but this effect depends on the nature of the expertise and the presence of auditory feedback. It should be emphasized that one interpretation of these findings is that individuals with superior timing precision gravitated to these pursuits. However, it is likely that expertise and training further helped to engage and refine mechanisms associated with skilled timing. Expertise was also an important predictor of the type of timing mechanism that individuals employed for both discrete and continuous movements, which casts further doubt on the longstanding assumption that event and emergent timing mechanisms are strictly tied to discrete and continuous movement tasks, respectively.

## **AUTHOR CONTRIBUTIONS**

Thenille Braun Janzen was the major contributor to this coauthored paper and took primary responsibility for recruiting participants, conducting the experiment, analysing and interpreting the data and preparing the manuscript. William Forde Thompson, Paolo Ammirante, and Ronald Ranvaud provided input on one or more of the experimental design, data analysis and interpretation, and manuscript preparation.

## **ACKNOWLEDGMENTS**

This research was supported in part by grants from the National Council of Technological and Scientific Development (CNPq Brazil) and a grant awarded to William Forde Thompson by the Centre for Elite Performance, Expertise, and Training at Macquarie University. The authors are grateful to Glenn Warry of the Macquarie University Sport and Aquatic Centre for his assistance in recruiting athletes, and to Alex Chilvers for research assistance.

### **REFERENCES**


Stevens, L. T. (1886). On the time sense. *Mind* 11, 393–404.

Stenneken, P., Prinz, W., Cole, J., Paillard, J., and Aschersleben, G. (2006). The effect of sensory feedback on the timing of movements: evidence from deafferented patients. *Brain Res*. 1084, 123–131. doi: 10.1016/j.brainres.2006.02.057


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 June 2014; accepted: 02 December 2014; published online: 23 December 2014.*

*Citation: Braun Janzen T, Thompson WF, Ammirante P and Ranvaud R (2014) Timing skills and expertise: discrete and continuous timed movements among musicians and athletes. Front. Psychol. 5:1482. doi: 10.3389/fpsyg.2014.01482*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Braun Janzen, Thompson, Ammirante and Ranvaud. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Trait-based cue Utilization and initial skill acquisition: implications for models of the progression to expertise

## *MarkW.Wiggins\*, Sue Brouwers, Joel Davies and Thomas Loveday*

*Centre for Elite Performance, Expertise, and Training, Macquarie University, North Ryde, NSW, Australia*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Kylie Ann Steel, University of Western Sydney, Australia Alice F. Healy, University of Colorado, USA*

#### *\*Correspondence:*

*Mark W. Wiggins, Centre for Elite Performance, Expertise, and Training, Macquarie University, Balaclava Road, North Ryde, NSW 2109, Australia e-mail: mark.wiggins@mq.edu.au*

The primary aim of this study was to examine the role of cue utilization in the initial acquisition of psycho-motor skills. Two experiments were undertaken, the first of which examined the relationship between cue utilization typologies and levels of accuracy following four simulated, power-off landing trials in a light aircraft simulator. The results indicated that higher levels of cue utilization were associated with a greater level of landing accuracy following training exposure. In the second study, participants' levels of cue utilization were assessed prior to two 15 min periods during which they practiced take-offs and landings using a simulated unmanned aerial vehicle (UAV). Consistent with Study 1, the outcomes of Study 2 revealed a statistically significant relationship among levels of cue utilization and the number of trials to criterion on the take-off task, and the proportion of successful trials during both take-off and landing. In combination, the results suggest that the capacity for the acquisition and the subsequent utilization of cues is an important predictor of skill acquisition, particularly during the initial stages of the process. The implications for theory and applied practice are discussed.

**Keywords: skill acquisition, cue utilization, expertise, training**

## **INTRODUCTION**

Expert performance across a range of environments, including sport, medical diagnosis, and financial decision making, is characterized by rapid, accurate responses, even in highly complex situations (Farrington-Darby and Wilson, 2006; Müller et al., 2006; Sherbino et al., 2012). Since this level of performance is generally acquired over extensive periods of exposure, there is an assumption that the capacity for sustained, high-levels of performance derives from the gradual development of highly specialized associations or routines that are subsequently retained in memory (Ericsson and Lehmann, 1996; Ericsson and Towne, 2010). Where these routines are available, they are activated rapidly and, in some cases, in the absence of conscious processing (Salthouse, 1991; Finkbeiner and Forster, 2007).

One of the advantages associated with the availability of highly specialized routines is that their activation imposes relatively fewer demands on working memory resources (Chung and Byrne,2008). This enables experts to undertake multiple tasks simultaneously and with relatively consistent levels of accuracy (Houmourtzoglou et al., 1998; Boot et al., 2008). However, it also ensures that cognitive resources are available to enable both the acquisition of additional skills, and the refinement of those skills that have already been acquired.

The necessity for cognitive resources to facilitate the acquisition of cognitive skills reflects the theoretically important role of working memory in enabling the association between environmental features and objects or events. For example, in the case of production systems, Anderson et al. (2004) proposed that a production can only emerge when the condition and action statements are resident simultaneously in working memory. In the early stages of skill acquisition, the process of problem resolution involves the recall of declarative knowledge from long term into working memory, thereby occupying what is a finite resource. The development of a production or condition-action statement obviates this demand for declarative knowledge and the faster that this process occurs, the greater the capacity to allocate the residual resources to other tasks, thereby potentially improving the rate of skill acquisition.

The proposition that efficiencies in information processing can be gained through a parsimonious association between environmental features and events or objects is a consistent theme in various models of skill acquisition, as well as explanations of the superior performance of experts (Ericsson and Kintsch, 1995). Notions of bounded rationality, automated processing, and instance processing all presuppose tightly arranged associations to explain the rapid recognition and response to situations that are characteristic of expertise (Logan, 2002; Campitelli et al., 2007; Pachur and Marinello, 2013).

Klein (2011)suggests that the value of the associations in memory lies in their capacity to enable an operator to quickly classify or diagnose a situation. This process triggers an associated response from memory, and thereby facilitates a relatively rapid response. Where Anderson et al. (2004) would refer to this process as the activation of a condition–action statement in the form of a production, Gigerenzer and Gaissmaier (2011), Klein et al. (2010), and Brunswik (1955) suggest the application of cue-based associations. Cues constitute relationships among features and events or objects that are resident in the environment (Wiggins, 2006). They are highly specialized and targeted, and they enable the rapid recognition and response to particular situations.

The difficulty associated with the acquisition of associations between phenomena is that their coexistence does not necessarily infer a causal relationship (Holyoak and Cheng, 2011). A "storming" process is necessary during the acquisition of skilled performance whereby associations are quickly tested, discarded, or revised to ensure that they are both as accurate and as parsimonious as possible (Wiggins, 2012). However, an inevitable part of this process of storming is the commission of errors, whereby inappropriate cues may be triggered, or the cue associations themselves may be overly general or incorrect, thereby leading to inefficiencies or delays in the acquisition of skilled performance (Bridger and Mecklinger, 2014).

At a fundamental level, the capacity to acquire and subsequently revise cue-based relationships requires a cognitive strategy involving the identification of salient features in the environment, the perception of associations between features and events/objects, the retention of these associations in memory and, finally, the recognition of those situations during which the application of cues applies (Wiggins, 2014). The efficiency with which this process occurs will determine the rate of skill acquisition. However, what remains unclear is whether the capacity for the development and application of cues is determined by a particular context, or whether it constitutes an underlying trait so that the rate of cue acquisition within one context predicts performance in a related domain.

The aim of the present study was to consider the relationship between the cue utilization in the context of motor vehicle hazard detection and way finding, and skill acquisition in learning to land a simulated aircraft and learning to take-off and land a line-of-sight umanned aerial vehicle (UAV). If skilled psycho-motor performance is dependent upon the availability of feature–event/objects in memory in the form of cues, the rate at which cues are acquired within a given period should predict the rate of skill acquisition. Moreover, if cue acquisition is a trait, then the rate of cue utilization evident in one task should reflect the rate of skill acquisition in tasks that demand similar capabilities.

Although the relationship between cue acquisition and skill acquisition has yet to be examined empirically, some evidence for a relationship can be drawn from Small et al. (2014) who were investigating the relationship between cue utilization and performance during a novel, short vigilance task. Of particular interest in the context of the present study was the observation that participants' performance differed in the rate at which they became familiar with a novel representation of a domain-related task. This difference in the rate of skill acquisition was not explained by years of industry-related experience, suggesting that the acquisition of cues may constitute an underlying trait.

### **STUDY 1**

Study 1 was designed to examine the relationship between a composite measure of cue utilization in the context of motor vehicle hazard and way-finding, and performance in learning to land a simulated aircraft as close as possible to a runway target. Since cue utilization represents the outcome of the process of cue acquisition, it was important to control for domain-related experience. To that extent, drivers' years of experience were recorded, and these data were employed as a covariate to control statistically for exposure to the domain. It was hypothesized that, controlling for driving experience, participants who recorded greater levels of cue

utilization would land closer to the runway target following four exposure trials.

## **METHOD**

## *Participants*

A total of 51 university students (25 male and 26 female) were recruited for the study. These participants comprised first- and second-year psychology students who each received 0.5% course credit for their participation. Their ages ranged from 18 to 22 years (*M* = 20.27, SD = 1.601). The inclusion criteria comprised licensed drivers who had never previously flown a flight simulator.

#### *Instruments*

A demographic questionnaire required participants to indicate their age, sex, years of driving experience, weekly driving frequency, weekly frequency of video-game play, and their experience operating a flight simulator. Cue utilization was assessed using the EXPERT Intensive Skills Evaluation (EXPERTise; Wiggins et al., 2010) situation judgement test (SJT). Designed to measure performance on several cue-based processing and problem solving tasks, it provides a composite assessment of domain-related cue utilization.

EXPERTise incorporates experimental tasks that have separately and collectively been associated with differences in operational performance. They include a paired association task, designed to establish the availability of feature–event/object relationships in the form of cues, a feature identification task, designed to assess feature priming, and a feature discrimination task, designed to test the precision of cue-based associations in memory. Each task yields a distinct but complementary assessment of cue utilization that, in combination, provides an overall assessment of the utilization of cues in memory.

In the paired association task, participants were presented with two different terms (feature–event/object) that appeared adjacent to one another for 1800 ms. Using a six-point Likert scale, participants were asked to indicate the extent to which they considered the two words related. Examples included related terms such as "journey time" (event) with "car speed" (feature) and relatively less related terms such as "red traffic light" (feature) and "freeway" (object). Higher levels of cue utilization were associated with a greater variance in the perceived relatedness of terms (Ackerman and Rathburn, 1984; Morrison et al., 2013). The use of variance as a measure of cue utilization in this context is based on the assumption that, through experience, the associations between cues are better refined and thereby lead to a greater level of dichotomy in perceptions of the association between features and events/objects. The measure of performance has been used successfully to differentiate experts from non-experts in a range of context (Witteman et al., 2012)

For the feature identification task, participants were tasked with locating, as quickly as possible, a ball displayed at different locations within a static, complex driving scene; a process similar to measures of field dependence (Goodenough, 1976). In this case, lower response latencies reflect a greater capacity to extract key features from a complex array (Wiggins, 2014).

In the feature discrimination task, participants were provided with a hypothetical scenario ("After driving for some time, you become aware that you have lost your bearings... Surveying your surrounds, you see several cars driving with surfboards on their roofs. There are also several beachgoers and shoppers walking nearby.... In the distance, you can see high-rise apartment blocks as well as palm trees. There are also several street signs visible..."). After making a decision as to their most likely response under the circumstances, participants rated the importance of different features in the formulation of their response. In the driving version of EXPERTise, 14 features were presented during the feature discrimination task, to which participants responded using a 10 point Likert Scale, with higher ratings equating to a greater level of importance in the decision process. In this task, higher levels of cue utilization were reflected in higher variance in the ratings of the perceived relevance of features (Weiss and Shanteau, 2003; Witteman et al., 2012).

The construct validity of EXPERTise has been demonstrated in a number of different domains, whereby typologies that were formed on the basis of performance across the EXPERTise tasks differentiated both simulated and actual performance in the workplace (Loveday et al., 2013b,c; Loveday andWiggins, 2014; Wiggins et al., 2014). The test–retest reliability of the typologies has been demonstrated in the context of power control operators at six monthly intervals, κ = *0.59* (Loveday et al., 2013a).

### *Flight simulator*

A Redbird FMX-1000 flight simulator incorporating three degrees of freedom was used to position a simulated Cessna 172 at an altitude of 1000 feet and a distance of 1.5 km from the runway. Two large white bars positioned on the runway represented the target landing point. After landing, the aircraft was repositioned and this process was repeated for four trials.

### *Procedure*

The participants completed the tasks individually, and all measures were presented sequentially in the same sitting. Having completed the demographic questionnaire, participants were directed to the on-line version of EXPERTise and they were asked to follow the instructions that were displayed on the computer screen. Once completed, participants entered the flight simulator and were briefed on the basic controls and the aim of the exercise. They then completed four trials, attempting to guide the aircraft to a landing on the runway.

## **RESULTS**

#### *Cue utilization typologies*

Prior to detailed analysis, it was necessary to identify the cue utilization typologies that corresponded to relatively higher and lower levels of cue utilization. These typologies were based on the outcomes of the EXPERTise tasks and were employed in this case due to the correspondence with previous methodological approaches to the application of EXPERTise-related outcomes (Loveday et al., 2013b,c; Loveday andWiggins, 2014;Wiggins et al., 2014). The calculation of typologies began with the aggregation of the responses within the tasks, the calculation of *z*-scores, and a cluster analysis to identify whether two, meaningful typologies could be established. In the present study, two typologies were identified with centroids that corresponded to: (a) a lower response latency in the

feature identification task, and higher variance in the paired association and feature discrimination tasks (higher cue utilization), and (b) a greater response latency in the feature identification task, and lower variance in the paired association and feature discrimination tasks (lower cue utilization). The cluster analysis classified 15 participants in the higher cue utilization typology and 33 participants in the lower cue utilization typology (see **Table 1**). The data for three participants were excluded due to missing data.

## *Landing performance*

Flight performance data comprised the primary dependent variable in the current study. Specifically, the flight task required participants to land the aircraft at a specific target located on the runway. The difference between a participant's landing position (longitude and latitude) and the ideal landing location (longitude and latitude) formed the "flight performance" variable in the current study. This was measured in kilometers and calculated using a distance calculator for compass coordinates. Lower scores in flight performance represent a shorter distance from the landing target, and thus, greater accuracy during the flight task.

Five landing trials were conducted by each participant. The Shapiro–Wilks normality statistic for each landing trial was non-significant (*p* < 0.05) and inspection of the *P–P* plots indicated normal distributions. The landing trials were analyzed via repeated measures and *post hoc* pairwise comparisons. A significant main effect for landing performance, *F*(4,144) = 12.83, *p* = 0.000, suggested that landing accuracy differed over trials. *Post hoc* comparisons and inspection of means revealed a pattern of steady and statistically significant improvement in landing accuracy, until the final trial (fifth landing). Landing accuracy in the first trial (*M* = 0.66, *SD* = 0.18, *SE* = 0.030) significantly improved in the second (*M* = 0.58, *SD* = 0.19, *SE* = 0.032, *p* = 0.011), and performance in the second trial significantly improved in the third trial (*M* = 0.51, *SD* = 0.21, *SE* = 0.034, *p* = 0.000). Compared to the third trial, the fourth showed continued improvement, with a reduced mean distance from the target (*M* = 0.48, *SD* = 0.20, *SE* = 0.033). However, the fourth trial was not significantly different from the preceding trial (*p* = 1.00). The final trial indicated a *decrease* in performance and increased error (*M* = 0.49, *SD* = 0.26, *SE* = 0.043). Trial five was also not significantly different from trials two, three, or four. Taken together, this pattern suggests that learning occurred, and that performance in the final trial may have been affected by fatigue. For this reason,

**Table 1 | Cluster centroids for the EXPERTise task scores for Study 1.**


the landing performance for trial four was selected as the dependent variable in subsequent analyses, since it was at this point that optimal performance, following learning, had been achieved.

#### *Cue utilization and landing performance*

A univariate analysis of covariance (ANCOVA) was used to test the relationship between cue utilization and landing performance, and comprised the cue utilization typology as the independent variable (higher and lower), landing performance (as measured by the distance from the target on the fourth trial) as the dependent variable, and years of driving experience as a covariate. The results revealed a statistically significant main effect for cue utilization typology, *F*(1,45) = 4.18, *p* = 0.047. An inspection of the mean landing performance of the clusters indicates that participants with higher levels of cue utilization (controlling for driving experience) landed the aircraft closer to target (*Mean distance* = 0.37, *SD* = 0.23) than participants with lower levels of cue utilization (*Mean distance* = 0.53, *SD* = 0.19). This suggests that a relationship exists between cue acquisition and the acquisition of skilled performance in related, psycho-motor task.

Using hierarchical regression with change statistics, the variance in landing performance attributable to cluster was 11.5% and when driving experience was included in the model, the total proportion of the variance explained increased to 27%. This change represents an increase of 24%, and was statistically significant, *F*(1,45) = 9.49, *p* = 0.005. Partial correlations for driving experience and cluster revealed that the variance uniquely attributed to cluster was 8.5% and the variance uniquely attributed to driving experience was 17.4%.

## **STUDY 2**

Study 2 was designed to extend the outcomes of Study 1 by examining the relationship between driving-related cue acquisition and the development of skilled performance in the operation of a simulated UAV. If cue acquisition represents a precursor to skilled performance, then measures of cue utilization (controlling for driving experience) were expected to predict the number of trials required to reach a predetermined level of takeoff and landing performance, together with the proportion of successful take-off and landing trials. Since the acquisition of cues is also likely to dependent upon the capacity to exclude extraneous information and thereby identify predictive feature– event/object associations, it was also anticipated that measures of sensory processing sensitivity (SPS) and attentional control would account for a proportion of the variance associated with the acquisition of skilled performance in operating the UAV. Higher levels of SPS and lower levels of attentional control are normally associated with clinical conditions and heightened arousal (Aron and Aron, 1997; Derryberry and Reed, 2002). Therefore, lower levels of SPS in combination with higher levels of attentional control and greater levels of cue acquisition were expected to account for a greater proportion of successful trials and trials to reach criterion than either variable in isolation.

## **METHOD**

## *Participants*

A total of 50 university students participated in the study of whom 21 were male and 29 were female. They were recruited from the Psychology Research Pool, and each received 1% course credit for their participation. They were aged between 18 and 26 (*M* = 18.87, *SD* = 1.58), possessed a current motor vehicle driver's license, and had no experience in remote control aircraft operation.

### *Instruments*

As in Study 1, the participants completed a demographic questionnaire, including questions related to video game and driving experience, and then progressed to complete the on-line version of EXPERTise. They were subsequently asked to complete Aron and Aron's (1997) Highly Sensitive Person Scale and Derryberry and Reed's (2002) Attentional Control Scale. The 27 item Highly Sensitive Person Scale requires participants to indicate their response on a seven-point Likert scale. An example item is "Are you easily overwhelmed by things like bright lights, strong smells, coarse fabrics, or sirens close by?" Levels of sensitivity are calculated by summing the responses to the questions with higher scores associated with higher levels of SPS. The scale has adequate discriminant, convergent, and overall construct validity, and Cronbach's alphas have been obtained in the range of 0.81– 0.84, demonstrating adequate reliability (Jagiellowicz et al., 2010). An alpha of 0.77 was achieved in the present study.

The Attentional Control Scale is designed to measure an individual's general capacity to focus attention on a task, to filter out distractions, shift attention between tasks, and to flexibly control thought (Derryberry and Reed, 2002). The 20-item scale required participants to indicate their response on a four-point Likert scale. An example item is "When concentrating, I can focus my attention so that I become unaware of what's going on in the room around me?" Scores are calculated by summing the responses to the questions with higher scores associated with greater levels of Attentional Control. The scale has adequate discriminant, convergent, and overall construct validity in different populations (Fajkowska and Derryberry, 2010). A Cronbach's alpha of 0.83 was achieved in the present study.

## *UAV simulator*

Real Flight 6.0TM was used to simulate the operation of a UAV. The simulator was displayed on a 40-inch monitor with control exercised using a standard remote control aircraft transmitter, incorporating two joysticks, one to control the pitch and roll of the aircraft and the other to control power.

For the take-off task, the UAV was located at the end of the runway, and participants were asked to accelerate the aircraft using the joystick and fly the aircraft down the extended center line of the runway through two virtual parallel lines. In the case of this computer program, the failure to position the aircraft within the parallel lines would result in the destruction of the aircraft, wherein the aircraft was repositioned at the take-off in preparation for the next trial. Similarly, in the case of the landing, the aircraft was positioned at altitude, a short distance from the runway along the extended center line. Participants were asked to advance or

retard the throttle while maintaining directional control of the aircraft, and descend between two parallel lines. Similar to the take-off task, the failure to maintain the position of the aircraft within the two parallel lines would result in the destruction of the aircraft and a return to the landing approach position. The number of trials was recorded and criterion performance was set at three consecutive successful take-offs or landings to reduce the influence of accidental and sporadic successes.

#### *Procedure*

Once the initial questionnaires had been completed, participants were asked to stand in front of the UAV display and the task was described, together with instructions concerning the control of the simulated aircraft. Participants were advised that they were to try and ensure that the aircraft departed or landed within the parallel lines and that flight outside the parallel lines would result in the destruction of the aircraft with the requirement to restart the trial. Similarly, if the participant succeeded in completing a trial successfully, the aircraft trial would be restarted. Participants were advised that they would be given 15 min to complete as many successful trials as possible. The take-off trials always preceded the landing trials.

#### **RESULTS**

The aim of Study 2 was to examine the differential effects of cue utilization, sensory processing sensitivity, and attentional control on trials to criterion in learning to operate a UAV simulator. Similar to Study 1, data arising from the EXPERTise SJT were aggregated and converted into *z* scores. However, unlike Study 1, typologies were not established. Instead, a grand mean was employed as an overall, standardized measure of cue utilization and to allow for regression analyses.

## *UAV performance*

Four measures of UAV skill acquisition were used as dependant variables in the present study. The number of trials to achieve criterion performance (three consecutive successful trials) was used to establish the rate of skill acquisition. A descriptive analysis of the data revealed that skewness was outside normal limits. A squareroot transformation was undertaken subsequently, which reduced the skewness to an acceptable level (<1).

The number of trials to reach criterion performance could not be used as a measure of the rate of skill acquisition, as only 31 of the 50 participants were able to complete three consecutive landings successfully. Consequently, a new categorical variable was calculated with two levels: those who were able to achieve landing criterion performance and those who were unable to achieve criterion performance.

The proportion of successful trials was included as a broader measure of skill acquisition that was influenced by both the rate of skill acquisition and the consistency of performance beyond the initial achievement of the criterion. Proportions were derived for both the take-off and landing tasks by calculating the number of successful trials as a proportion of the total number of trials completed by each participant. The mean number of total trials across all participants was 70.82 for the take-off task (*SD* = 10.312) and 76.46 for the landing task (*SD* = 11.38).

### *Modeling UAV performance*

A measure of SPS was obtained by summing the responses to all 27 items of the Highly Sensitive Person Scale. The Attentional Control score was obtained by summing the responses for the reverse scored items (1, 2, 3, 6, 7, 8, 11, 12, 15, 16, and 20) from the Attentional Control Scale. A hierarchical multiple linear regression was used initially to determine the relationship among cue utilization scores, weekly videogame use, SPS, Attentional Control, driving experience, and the proportion of successful take-off trials. Entering the cue utilization score explained 19.7% of the variance in the proportion of take-off trials and was statistically significant, *F*(1,47) = 11.81, *p* < 0.01. This increased to 30.3% of the explained variance with the addition of Sensory Processing Sensitivity scores, a change that was statistically significant, *F*(1,46) = 7.15, *p* < 0.01. The addition of the remaining variables failed to increase the amount of variance explained, indicating that the proportion of successful take-off trials during skill acquisition of the UAV was best predicted by a combination of a higher level of cue utilization (β = 0.38) and lower level of sensory processing sensitivity (β = –0.33).

Consistent with the results associated with the proportion of successful take-off trials, levels of cue utilization and sensory processing sensitivity also provided the model of best fit for the proportion of successful landing trials using the UAV, *F*(2,47) = 8.33, *p* < 0.01. Specifically, 26.2% of the variance in the proportion of successful landing trials was predicted by higher levels of cue utilization (β = 0.37) and lower levels of sensory processing sensitivity (β = –0.29). Although this is slightly lower than the variance explained for take-off performance, it is of particular note that neither driving experience, videogame experience nor attentional control contributed significantly to the final model. As might be expected a strong correlation was evidence between takeoff and landing performance using the UAV [*r*(50) = 0.69, *p* < 0.001].

In relation to the number of trials required to satisfy the takeoff criterion, the regression model of best fit was restricted to the level of cue utilization, β = –0.41, *F*(1,42) = 8.58, *p* < 0.01, which explained 17% of the variance in performance. No other variables contributed significantly to the model, including sensory processing sensitivity. This suggests that, while sensory processing sensitivity may explain some of the variance associated with sustained performance beyond the achievement of a criterion level of performance, it is not predictive of the initial achievement of this criterion.

Since the nature of the data arising from the landing performance task precluded the use of linear regression, a logistic regression was employed in which the dependent variable comprised whether or not the participant achieved the landing criterion. Consistent with the data for the linear regression concerning the achievement of the take-off criterion, only cue utilization was retained as in the model of best fit, β = 1.68, *SE* = 0.584, Wald's *<sup>X</sup>*<sup>2</sup> <sup>=</sup> 8.27, Exp(*B*) <sup>=</sup> 5.35, *<sup>p</sup>* <sup>=</sup> 0.004. The results suggest that the odds of achieving landing criterion increased by a factor of 5.35 for each unit increased in the log concentration of cue utilization (Hosmer–Lemeshow *<sup>X</sup>*<sup>2</sup> <sup>=</sup> 8.78, Cox and Snell *<sup>R</sup>*<sup>2</sup> <sup>=</sup> 0.20).

## **GENERAL DISCUSSION**

The overall aim of this research was to examine the role of cue utilization in the early development of skilled performance in two related domains. Cue utilization has been established previously as a characteristic of expertise (Loveday et al., 2013c), and this suggests that the capacity to identify, acquire, and retain feature–event/object relationships in memory may be a necessary precursor for the acquisition of expertise. Understanding the role of cue acquisition at the initial stages of skill development potentially gives rise to a more complete model of the mechanisms that will facilitate the transition from novice through competence to expertise.

Study 1 sought to establish the relationship between cue utilization and performance in learning to land a simulated aircraft. Cue utilization was assessed using the driving-related version of EXPERTise, and this enabled the formation of typologies reflecting relatively higher or lower levels of cue utilization. The results revealed a statistically significant relationship between cue utilization typology and the proximity to the runway target following the fourth landing trial. Specifically, higher levels of cue utilization, controlling for driving exposure, were associated with a closer proximity to the landing target. This suggests an association between a measure of cue utilization and performance on a novel skill acquisition task.

To establish the generalizability of the outcomes of Study 1 and to explore additional explanations of the mechanisms of skill acquisition, Study 2 was designed to employ the same measure of cue utilization (driving-related) but considered the acquisition of skilled performance in the operation of a UAV. While conceptually similar to the operation of a flight simulator, the operator of a Line-of-Sight UAV controls the aircraft from the ground using a remote control device.

In addition to cue utilization, additional variables were incorporated into the analysis, including video-game use, SPS, and attentional control. It was surmised that, in combination with the capacity to associate feature–event/object relationships in the form of cues, the capacity to identify prospective features and events would depend upon the capacity to direct attention to particularly salient features and avoid being distracted by those features that are likely to embody little predictive capacity. The results revealed a strong model in which a combination of cue utilization scores and sensory processing sensitivity was most predictive of both the number of trials to reach criterion on the landing and take-off tasks, together with the proportion of successful trials. Specifically, greater levels of cue utilization and lower sensory processing sensitivity predicted 31.7% of the variance associated with the acquisition of take-off performance on the UAV simulator.

In combination, the results of the two studies suggest that cue utilization in one context may play a significant role in the initial acquisition of skilled performance in other, related tasks. The fact that cue utilization is also characteristic of expert performance suggests that cues may constitute a key cognitive mechanism by which skill acquisition occurs, even at the earliest stages of the process.

### **THEORETICAL IMPLICATIONS**

Although there have been a number of different theoretical propositions concerning the cognitive mechanisms that facilitate the progression to expertise, including cases, instances, and productions, the present study targeted behavior that was most closely associated with the utilization of cues. EXPERTise is designed to target a number of aspects of cue utilization, including the capability to discern key features from a complex visual background, the capacity to differentiate the strength of the relationship between different feature–event/object pairs, and the relative importance of features in the context of diagnosis.

On the basis of the relatively consistent relationship between the driving-related version of EXPERTise and performance on the skill acquisition tasks, it might be concluded that the capacity to identify features and discern the strength of relationships between features and events/objects, constitutes a capability that informs the acquisition of skilled performance on both a threedimensional tracking task in the context of flight simulation, and take-offs and landings in the context of operating a UAV. Moreover, in cases involving the acquisition of novel skills, the rate of progression toward expertise may be dependent upon: (a) the extent to which key features can be identified; (b) their association with events/objects established and retained in memory; and (c) their accurate application during the process of skill acquisition.

Although the outcomes of present research do not necessarily discount the role of productions as an explanation of skill acquisition, given the conceptual similarities between the two constructs, the role of cases and instances as an explanation of the process is less clear. In particular, the predictive capacity of a process that deconstructs tasks into distinct feature–event/object relationships, coupled with the lack of domain-related knowledge on the part of participants gives rise to a cue-based explanation of psycho-motor skill acquisition, particularly at the initial stages of the process. This explanation is consistent with previous research establishing the role of cue-based training in developing the skills of novices in other domains (Abernethy, 1990; Wiggins and O'Hare, 2003; Markovits, 2013; Momm et al., 2013).

As a context-dependent measure of cue utilization, EXPER-Tise has differentiated performance among pediatricians, power controllers, and software engineers. It has also identified differences in the acquisition of skilled performance in the context of power control. In combination with the results of the present study, cue utilization, as measured by EXPERTise, appears to both differentiate the performance of different operators, and predict the rate and the achievement of skilled performance. However, despite the relative consistency of the outcomes achieved, a number of key questions remain that are relevant to those models of cognitive skill acquisition that posit that the progression to expertise is based on the acquisition and utilization of cues. First, and most important, EXPER-Tise is purported to measure cue utilization but, in the absence of neurological evidence, that argument will remain speculative. Second, longitudinal studies have yet to be completed that include competent practitioners. Much of the work thus far has focused on the performance of novices, the transition from competence to expertise, or on retrospective accounts

of skill acquisition and experience from skilled performers (e.g., Young and Salmela, 2010). Little research has been undertaken that considers the key transition from novice to competence that, potentially, is the stage at which mental models are acquired and tested.

### **APPLIED IMPLICATIONS**

The practical implications of the present study include the identification of those practitioners who are relatively more likely to achieve criterion performance on a three-dimensional psychomotor tracking task, together with the rate at which this progression is likely to occur. The associated benefits include a potential reduction in the costs of training and attrition due to the selection of candidates who are capable of acquiring particular skills more accurately and at a rate that reduces the investment in both time and access to expensive simulation technologies.

In addition to the selection of candidates, the results of the present study also enabled the development of interventions for those candidates who, having been selected, experience plateaus in their acquisition of skilled performance. The apparent key role of cue utilization during the initial stages of skill acquisition suggests that learning plateaus may be explained by the inability of the learner either to identify predictive features, and/or establish a relationship between predictive features and associated events or objects. Therefore, learning plateaus may be considerably shortened if learners can be directed toward those features that are most appropriate in the context of the problem being confronted. Evidence for the potential utility of this type of approach to initial learning can be drawn from Lagnado et al. (2006) who observed that learning environments that are directed toward the identification of feature-outcome relationships can facilitate the acquisition of cue-based associations that, in turn, lead to improvements in performance. Similarly, Wulf et al. (2000) observed that improvements in tennis could be achieved by directing learners' attention toward what they referred to as the "antecedents" and "effects" of particular types of strokes from opponents. In identifying and remediating "gaps" in cue-based processing, it becomes possible to augment existing training initiatives, thereby maintaining an optimal rate of skill acquisition, irrespective of the nature of the learner.

## **CONCLUSION**

The aim of this paper was to establish whether a measure of cue utilization predicts the rate and the achievement of initial performance in the acquisition of two, three-dimensional tracking tasks. In Study 1, participants learnt to maneuver a simulated light aircraft to land nearest to a target on a runway. Study 2 involved learning to take-off and land a UAV using a remote control device. In both studies, a relationship was established between cue utilization and task-related performance, whereby relatively higher levels of cue utilization were associated with both a greater rate of skill acquisition and a greater proportion of successful trials in learning to operate the UAV. Given that cue utilization also differentiates greater from lesser performance among experienced operators, the results suggest that the acquisition and subsequent utilization of cues may play a significant role in facilitating the rate and the achievement of expertise.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 April 2014; accepted: 15 May 2014; published online: 03 June 2014. Citation: Wiggins MW, Brouwers S, Davies J and Loveday T (2014) Trait-based cue utilization and initial skill acquisition: implications for models of the progression to expertise. Front. Psychol. 5:541. doi: 10.3389/fpsyg.2014.00541*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Wiggins, Brouwers, Davies and Loveday. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## *Matthew B. Thompson\*, Jason M. Tangen and Rachel A. Searston*

*School of Psychology, The University of Queensland, St Lucia, QLD, Australia \*Correspondence: mbthompson@gmail.com*

#### *Edited and reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

**Keywords: expertise, fingerprints, decision making, forensics, non-analytical reasoning, testimony, instance-based learning**

As a novice in a particular domain, the cognitive feats that experts are capable of performing seem impressive, even extraordinary. According to the well-established exemplar theory of categorization (e.g., Brooks, 1987; Medin and Ross, 1989), a new category member in everyday classification (e.g., a bird, a table, or a car) or expert classification (e.g., an abnormal chest x-ray, a patient with myocardial ischaemia, or a poor chess move) is categorized on the basis of its similarity to individual prior cases. Often this sensitivity develops effortlessly and without any intention to learn similarities or differences among the exemplars.

Experts can do a lot with a little. Across various domains of expertise, it seems that experts can perform quickly and accurately when given only a small amount of information, as in chess (Gobet and Charness, 2006); fireground command (Klein, 1998); radiology (Myles-Worsley et al., 1988; Evans et al., 2013), and dermatology (Norman et al., 1989). The experiential knowledge based on the hundreds of thousands of prior instances serves as a rich source of analogies to permit efficient problem solving.

A fruitful approach to understanding these cognitive feats has been to understand where expertise lies in various domains. Expertise in ball sports, for example, seems to lie in anticipating where the ball will be (Abernethy, 1991); expertise in wine seems to lie in applying verbal labels (Hughson and Boakes, 2001); expertise in radiology seems to lie in rapid discrimination of normal and abnormal radiographs (Evans et al., 2013); and expertise in chess seems lie in rapid retrieval of board configurations from memory (Chase and Simon, 1973).

Over the last several years, we have been working with a fascinating group of experts who spend several hours a day examining a highly structured set of impressions. When a fingerprint is found at a crime scene it is a human examiner, not a machine, who is faced with the task of identifying the person who left it. Professional fingerprint examiners are usually sworn police officers who use image enhancement tools, such as Photoshop or a physical magnifying glass, and database tools to provide a list of possible matching candidates. They place a crime scene print and a suspect print side-by-side—physically or on a computer screen—and visually compare the prints to judge whether the prints came from the same person or two different people.

These fingerprint examiners have testified in court for over one hundred years, but there have been few experiments directly investigating the extent to which experts can correctly match fingerprints to one another, how competent and proficient fingerprint experts are, how examiners make their decisions, or the factors that affect performance (Loftus and Cole, 2004; Saks and Koehler, 2005; Vokey et al., 2009; Spinney, 2010b; Thompson et al., 2013a). Indeed, many examiners have even claimed that fingerprint identification is infallible (Federal Bureau of Investigation, 1984). Academics, judges, scientists, and US Senators have reported on the absence of solid scientific practices in the forensic sciences. They highlight the absence of experiments on human expertise in forensic pattern matching, suggesting that faulty analyses may be contributing to wrongful convictions of innocent people (Edwards, 2009; National Research Council, 2009; Campbell, 2011; Carle, 2011; Expert Working Group on Human Factors in Latent Print Analysis, 2012; Maxmen, 2012), and they lament the lack of a research culture in the forensic sciences (Mnookin et al., 2011). The field of forensics is, however, beginning to acknowledge the central role that fallible humans play in the identification process (Tangen, 2013).

Our first point of inquiry was to see whether qualified, court practicing fingerprint examiners are any more accurate than the person on the street, and to get a feel for the kinds of errors examiners make. In our first experiment (Tangen et al., 2011), we tested the matching accuracy of fingerprint examiners from Australian state and federal law enforcement agencies. In a signal detection paradigm, we created ground-truth matching prints for use as targets, and highly-similar, nonmatching prints from a national database search for use as distractors. We found that qualified, court-practicing fingerprint experts were exceedingly accurate compared with novices. Experts tended to err on the side of caution by making more errors of the sort that could allow a guilty person to escape detection than errors of the sort that could falsely incriminate an innocent person. A similar experiment, with participants from the US Federal Bureau of Investigation, produced similar results (Ulery et al., 2011), and a follow-up experiment found variability in the consistency within and between examiners' decisions (Ulery et al., 2012). An examiner's expertise seems to lie, not in matching prints *per se*, but in discriminating highly similar but nonmatching prints (Thompson et al., 2013a).

In a follow-up experiment (Thompson et al., 2013b), we replicated (Tangen et al., 2011) but with genuine crime scene matching prints from casework (where the ground truth is uncertain), and with the addition of two trainee participant groups. Intermediate trainees—despite their lack of qualification and average 3.5 years' experience—performed about as accurately as qualified experts who had an average 17.5 year's experience. It appears that people can learn to distinguish matching from similar nonmatching prints to roughly the same level of accuracy as experts after a few years of experience and training. New trainees—despite their 5-week, full-time training course or their 6 months experience—were not any better than novices at discriminating matching and similar nonmatching prints, they were just more conservative. It appears that early training and/or experience may not necessarily result in more accurate judgments, but may simply result in a more conservative response bias. Again we concluded that the superior performance of experts was a result of their ability to identify the highly similar, but nonmatching fingerprints as such.

What the findings mean for reasoning about expert performance in the wild is an open question (e.g., Koehler, 2008, 2012; Mnookin, 2008; Thompson et al., 2013a), but these findings do contradict the notion that fingerprint identification is infallible, and that the fingerprint identification "methodology" can be disembodied from a judgment about whether two fingerprints match or not.

Our second point of inquiry was to understand the nature of expertise in fingerprint matching—to understand where a fingerprint examiner's expertise lies. Through experience and feedback, an expert can rapidly retrieve, from memory, previous instances and decisions relevant to the current situation, whereas novices rely more on formal rules and procedures (Brooks, 2005; Norman et al., 2007). Fingerprint examiners claim that careful, deliberate analysis is the basis of the work that they do (Busey and Parada, 2010), but a hallmark of genuine expertise is the ability to accurately perform a domain relevant task quickly (Kahneman and Klein, 2009). Do fingerprint examiners rely on the same non-analytic cognitive processes as experts in other domains of expertise?

Busey et al. (2011) found that experts move their eyes differently from novices, and Busey and Vanderkolk (2005) found that experts performed better than novices at identifying the matching fragments of fingerprints in noise after a short delay. They also found that inverted fingerprints produced a delayed N170 event-related potential response in experts but not in novices, suggesting that experts process upright fingerprints configurally (Busey and Parada, 2010).

In a series of experiments, we further examined the nature of fingerprint expertise (Thompson, 2014). We added artificial noise to all the print pairs, and inverted half and kept the other half upright, and found that experts could discriminate prints even when the prints were highly noisy. Unexpectedly, fingerprint experts did not show the classic inversion effect seen in face recognition. We tested the short term memory of experts and novices by separating fingerprint pairs in time by a few seconds, and found that experts were better than novices at discriminating print, and that experts were far better at discriminating similar, nonmatching prints. We tested the long term memory of experts and novices by asking them to learn a set of fingerprints to be recognized a few minutes later, but found no difference in long term memory accuracy between experts and novices, and both groups performed around the level of chance. We tested the ability of experts and novices to discriminate prints by presenting them briefly on screen, and found that experts could accurately discriminate prints when presented for just 2000 ms, and the largest difference between experts and novices was on similar, nonmatching prints. We then further reduced the stimuli presentation time, and found that experts were more accurate than novices overall, and that experts had a much better idea about whether a pair of prints match or not in a rapid period of time. With such short presentation times (i.e., from 250 to 2000 ms), there is little time to engage in careful, deliberate analysis of the minutiae in a fingerprint image in order to make accurate decisions.

These findings suggest that fingerprint experts are capable of making accurate decisions when the amount of visual information in the prints is dramatically decreased. It is clear that, through experience, experts can learn the regularities of matching and nonmatching prints, and rapidly compare new prints to memory in order to make accurate judgments. The findings above are in stark contrast to the common and consistent claims in formal training, textbooks, and courtroom testimony: that fingerprint identification is a "scientific process" that requires careful, thorough analysis in order for judgments to be accurate.

Fingerprint expertise is particularly interesting because of the sheer amount of experience that examiners have with the stimuli. Their full-time job, often in departments that run 24 hours a day, is to visually compare crime scene prints to suspect and database candidates. It's difficult to imagine any other domain in which there is so much attention and exposure to a highly constrained stimulus, where the task is to definitively report that two images come from the same, single source or not. And the stakes are high: innocent people could be wrongly convicted, and guilty people could be wrongly acquitted.

This vast experience allows experts to resolve information in a print: to correctly regard ambiguous information that is more consistent with within-source variability as a "match," and correctly regard ambiguous information that is more consistent with between-source variability as a "non-match." An ambiguous mark on a fingerprint, for example, can be regarded as signal (i.e., as evidence of a "match"), or it can be disregarded as noise (i.e., as evidence of a "non-match"). This kind of process is undoubtedly operating in novices too, but the ambiguity cannot be sufficiently resolved unless the examiner has accumulated enough matching and nonmatching exemplars in memory to point to one direction or the other. One clear result of this vast experience is the experts' capacity to disregard, to "see through" the ambiguity and surface structure of similar prints and discriminate them accurately. We think that further study of the nature of fingerprint expertise will inform general theories for the development of expertise, while also providing an empirical basis for claims made by expert witnesses in the courtroom.

## **REFERENCES**


Charness, P. Feltovich, and R. Hoffman (New York, NY: Cambridge University Press), 523–538.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 June 2014; accepted: 24 June 2014; published online: 16 July 2014.*

*Citation: Thompson MB, Tangen JM and Searston RA (2014) Understanding expertise and non-analytic cognition in fingerprint discriminations made by humans. Front. Psychol. 5:737. doi: 10.3389/fpsyg.2014.00737*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Thompson, Tangen and Searston. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expert analogy use in a naturalistic setting

## *Donald R. Kretz <sup>1</sup> and Daniel C. Krawczyk1,2\**

*<sup>1</sup> School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA*

*<sup>2</sup> Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Lindsey Engle Richland, University of Chicago, USA Máximo Trench, Universidad del Comahue, Argentina*

#### *\*Correspondence:*

*Daniel C. Krawczyk, Department of Psychiatry, The University of Texas at Dallas, 800 W. Campbell Road, GR41, Richardson, TX 75083, USA e-mail: daniel.krawczyk@ utdallas.edu*

The use of analogy is an important component of human cognition. The type of analogy we produce and communicate depends heavily on a number of factors, such as the setting, the level of domain expertise present, and the speaker's goal or intent. In this observational study, we recorded economics experts during scientific discussion and examined the categorical distance and structural depth of the analogies they produced. We also sought to characterize the purpose of the analogies that were generated. Our results supported previous conclusions about the infrequency of superficial similarity in subject-generated analogs, but also showed that distance and depth characteristics were more evenly balanced than in previous observational studies. This finding was likely due to the nature of the goals of the participants, as well as the broader nature of their expertise. An analysis of analogical purpose indicated that the generation of concrete source examples of more general target concepts was most prevalent. We also noted frequent instances of analogies intended to form visual images of source concepts. Other common purposes for analogies were the addition of colorful speech, inclusion (i.e., subsumption) of a target into a source concept, or differentiation between source and target concepts. We found no association between depth and either of the other **two** characteristics, but our findings suggest a relationship between purpose and distance; i.e., that visual imagery typically entailed an outside-domain source whereas exemplification was most frequently accomplished using within-domain analogies. Overall, we observed a rich and diverse set of spontaneously produced analogical comparisons. The high degree of expertise within the observed group along with the richly comparative nature of the economics discipline likely contributed to this analogical abundance.

**Keywords: analogy, expertise, naturalistic, reasoning, problem-solving**

## **INTRODUCTION**

The importance of analogy has been described in many ways by notable researchers. Polya (1957) wrote that analogy "pervades all our thinking," Holyoak et al. (2001) called it "a central component of human cognition," and Hofstadter (2001) referred to it as "the lifeblood . . . of human thinking." Because of its perceived importance in cognitive functioning, the use of analogy in thought and language has been studied extensively in cognitive psychology, cognitive development, and cognitive science since the early 1980s. Research examining analogical production and retrieval under experimental conditions has provided a wealth of valuable data. Far fewer studies of these important phenomena have been conducted in naturalistic settings.

The importance of studying analogical production "in the wild" was emphasized by one of its most prominent researchers. Just as it is necessary to conduct both *in vivo* and *in vitro* studies to fully understand biological phenomena, Dunbar (1995, 2001) argued that it is likewise necessary to conduct both naturalistic and experimental studies in cognitive research to fully understand the cognitive processes involved in reasoning and analysis. He modeled his approach after techniques applied in biological research, referring to this paradigm as "*in vivo/in vitro.*" Observing behavior in naturalistic settings provides several advantages: (a) behaviors are unconstrained by laboratory conditions and are not instigated by artificial or experimental stimuli, (b) the setting emphasizes processes rather than outcomes, and (c) there is a clear relationship between observed behaviors and the domain of interest (Dunbar, 1995; Crano and Brewer, 2002). These conditions are particularly important when investigating analogical thinking which involves linking one's current context with prior knowledge in a spontaneous fashion. It is important to note, however, that the observational approach lacks the superior structure and clarity of laboratorybased experiments.

Because there are many commonly used definitions of the term *analogy*, we felt it necessary to offer a clear definition that captures the essential characteristics applied by researchers in this field. A frequently used means of conveying an understanding of an unfamiliar concept is by drawing a comparison to similar, more familiar concepts. An analogy conveys more than literal similarity between two objects or concepts (Gentner, 1983). As a process, analogy involves the search for and selection of a well-understood *source* from long-term memory, followed by the transfer of meaning from that source to a less familiar *target* (Spellman and Holyoak, 1996). The set of correspondences between a source and target are referred to as a *source-target mapping*. In contrast to other forms of likeness or similarity, analogy is based on a comparison of structural relations, or systems of relations, rather than a mere resemblance of surface properties or attributes (Gentner and Markman, 1997). A system of relations can be quite complex, and all of its mappings may not be apparent. Some evidence suggests, however, that analogical mapping is not entirely relational (Ball et al., 2004; Bearman et al., 2007).

In the current study, we applied Dunbar's *in vivo* approach to observe and analyze some of the characteristics of source analogs produced during live, open discussion involving problems in economics and discuss some of the factors influencing their production. We chose a behavioral economics group to observe, as economists seek to broadly explain complex realworld human behavior in the context of models, games, and examples. All of these techniques rely heavily upon comparisons between different states of the world, different types of behavior, and combinations of models and situations. Many of these comparisons are potentially analogical, drawing heavily upon the expertise of the individual and the broader group. Data were collected from the weekly sessions of an economics reading group, which consisted of participants whose expertise in the broad field of economics varied in terms of depth and academic specialization.

## **ANALOGICAL DEPTH**

One of the factors that has been shown to influence source retrieval is the level, or *depth*, of similarity between a target and the chosen source. Past research has shown that source-to-target mapping is primarily driven by the comparison of structural relations between source and target concepts but retrieval may be facilitated by superficial characteristics (Gick and Holyoak, 1980, 1983; Gentner et al., 1993; Forbus et al., 1997). In other words, people will rely on structural details to find appropriate sources when mapping an analogy, but perhaps find it easier to rely on overlapping surface features in addition to underlying structure when retrieving a correspondence (Holyoak and Koh, 1987; Holyoak and Thagard, 1995; Dunbar, 2001). The use of superficial characteristics for comparison is particularly apparent when source and target analogs are generated a priori and provided to an individual, who must then consider the nature of the relationship or relationships.

When people generate their own analogies by drawing upon their own knowledge, research has shown that fewer superficial similarities between analogs are observed. Studies by Dunbar (1995, 1997) and Blanchette and Dunbar (1997, 2000, 2001) revealed that more than half of the analogies produced during biology laboratory meetings and political discourse, respectively, showed no apparent surface similarity. The Bearman et al. (2007) management study showed that 73% of analogies were only structural in nature. With this in mind, the first goal of this study was to examine the depth of analogies produced by economics experts during scientific discourse. Results were expected to show infrequent surface feature overlap and provide additional support for earlier findings.

#### **ANALOGICAL DISTANCE**

A second important feature in analogy use concerns the range, or *distance*, between the domains of the source and target analogs. Dunbar (1995) defined three categories of analogy in terms of their degree of domain separation: (a) *long-distance* describes a source drawn from a very different domain (also referred to as *outside-domain*, or *out-of-category*), (b) *regional* refers to a source mapped from a similar domain (e.g., economics to finance or public administration), and (c) *local* maps a target to a source in the same domain. Both local and regional classes are collectively referred to as *within-domain*, or *within-category*. Because of the subjective nature of judging domain distance between similar domains, many observational studies simply categorize analogies using a binary choice of within-category or outside-category. Analogies formed within the same category might compare biological organisms or investment strategies, while commonly referenced domains for outside-category analogies include sports (e.g., one might fail and "strike out at the plate" or succeed and "push the ball over the goal line") and the supernatural (e.g., "it works so well that it's like magic").

Research on the domain separation of analogies has provided contrasting results. Dunbar's studies of scientific reasoning set in microbiology laboratories showed heavy use of withincategory analogies—98% of the analogies generated were classified as either local or regional (Dunbar, 1995, 1997). A study by Saner and Schunn (1999) produced a similar but narrower finding—researchers in psychology laboratory meetings and colloquia made frequent use of analogies, and more than 75% of them were within the same domain. In contrast, Blanchette and Dunbar found that 77% of the analogies that appeared in political articles aimed at more general audiences were outside-category (2001). Christensen and Schunn's, 2007 engineering design group study showed a more balanced mix of analogies—55% were within-category while 45% were outside-category.

One explanation for the difference in domain distance is that the collective expertise of the audience may influence the selection of analogy. When addressing other domain experts with specialized knowledge, within-category analogs may prove more effective whereas an outside-domain analogy might be more attractive choice for a general audience. The type of task or function performed by the analogizer is another possible constraint. Our second goal was to observe the use of within- and outside-domain analogies produced by the economics experts. We expected to find a fairly balanced use of both styles, which differs somewhat from earlier findings. Our expectation was motivated by the different subdomains and varying levels of subject expertise within the reading group. Furthermore, in light of the stated belief that within-domain analogies tend to involve a higher degree of superficial similarity than outside-domain analogies, we investigated whether such a relationship emerged.

#### **ANALOGICAL PURPOSE**

Thirdly, past research has shown that the goal of the individual producing the analogy is likely to influence the process of selection and transfer (Spellman and Holyoak, 1996). Prior observational studies have examined the types of goals that emerge from the production of analogies, and the goals themselves tend to be highly domain-specific and task-specific. For example, the four goals derived from the microbiology laboratories by Dunbar (1995, 1997) were: forming hypotheses, designing experiments, modifying experiments, and explaining issues and concepts to other scientists. The management decision-making study by Bearman et al. (2002)identified only two goals: problemsolving and illustration. Similarly, the Christensen and Schunn (2007) study of analogizing during engineering design meetings reported three functions: identifying problems, solving problems, and explaining concepts. Our economics reading group differed in its function from the settings of the studies mentioned above. The purpose of the group was limited to explaining and understanding experiments performed by other researchers; i.e., the experts in our observational study critiqued experimental designs and analyzed results.

Blanchette and Dunbar (2001) provided evidence that goals influenced the choice of analogy. When supporting a favorable position, emotionally positive analogies were more commonly chosen over those displaying negative emotional ideas. Conversely, when criticizing an unfavorable position, emotionally negative analogies were more frequent. Because analogical retrieval appears subject to the influence of the purpose for which it was produced, our third goal was to analyze the range of goals that emerge from the use of analogies by the economics group and their potential effect on analogical distance and depth.

## **METHODS**

### **PARTICIPANTS**

The setting for this study was the School of Economic, Political, and Policy Sciences (EPPS) weekly reading group at The University of Texas at Dallas during the Spring semester of 2011. Participants included male and female reading group attendees. The average weekly attendance was approximately 20 participants, which included both faculty members (∼25%) and graduate students (∼75%). The group was largely the same from week to week, but attendance did fluctuate and not all participants attended regularly. Participants' academic expertise was mixed, and included sub-disciplines such as econometrics, experimental economics, and game theory. A few of the attendees had no exposure to experimental methods.

The sessions were conducted in typical reading group fashion. One of the senior faculty members acted as the group's moderator, and one student participant was assigned each week to lead the following week's discussion of one or more chosen research papers. Following a brief presentation of the paper by the assigned student, a free and open discussion followed in which the members of the group examined and dissected the experimental methods and results described in the paper.

## **PROCEDURE**

## *Session recordings*

The investigator attended and made audio recordings at each of five group meetings, but did not participate in the discussions and appeared to have minimal impact on group interaction. Group members understood that their discussions would be evaluated (the moderator referred to it as "the science behind the science"). The research goal of examining the spontaneous use of analogies was not revealed, however. Although each session was scheduled to last for 90 min, actual discussion times ranged from 65 to 110 min. In all, approximately 7 h of discourse were recorded.

## *Transcription procedure*

Four transcribers participated in the initial processing of the discourse—the primary author and three undergraduate Psychology students. Only the primary author and one of the undergraduates had any significant transcription experience prior to the task. None of the undergraduate transcribers had exposure to Economics beyond introductory coursework. Transcribers were given instruction by the primary author on what constitutes an analogy based on the definitions presented earlier in this paper, then solved practice problems on recognizing analogies from non-analogies by identifying sources and targets. The period of instruction lasted approximately 30 min.

For purposes of indoctrination, each undergraduate transcriber was given one of the sessions for practice and directed to process at least 30 min of the recording. They were instructed to transcribe passages in which a possible analogy was made, taking care to include all of the important source and target information. Their instruction was: "when in doubt, include it." The primary author evaluated their performance and made individual adjustments until the results were satisfactory and consistent.

To address the possibility of subtle, easily-overlooked analogies, every audio recording was processed by two transcribers. To be included for analysis, a passage needed only to appear on either transcriber's log, not both. In order to be missed, a passage would have to have been heard by both transcribers and rejected.

## *Coding procedure*

The two authors performed the coding duties. Both were experienced coders with strong knowledge of analogy literature and considerable research experience. Each of the transcribed segments was first evaluated for the presence of one or more analogies based on the definition stated earlier. Comparisons based only on literal similarities were considered non-analogies. If a segment was judged to contain no analogy, no further evaluation was performed. If an analogy was deemed present, the source, target, source domain, and target domain of the analogy were recorded. Each analogy was then coded along the three dimensions of distance, depth, and purpose.

Both coders rated the entire set and reliability was calculated for each dimension. Distance was rated using the commonly applied *within-domain* and *outside-domain* categories. Following the example of Saner and Schunn (1999) in which the authors collapsed all psychology-related categories into "within-domain," we likewise considered all analogies related to economics, finance, statistics, probabilities, and game theory to be in the same domain. Depth was rated as superficial if the source and target shared surface characteristics; if not, it was rated as structural. Two passes were made by each coder to rate the purpose. The first pass was used to generate and agree on a list of functions represented by the set of analogies. Then, the coders used the list from the first pass as set of categories for rating analogies in the second pass.

## **RESULTS**

## **SEGMENT TRANSCRIPTION AND ANALOGY EXTRACTION**

Transcription of the five recorded sessions yielded 114 unique passages with possible analogies (*M* = 16*.*29 analogy segments per hour). See Appendix A for a sampling of extracted passages. When judging whether a passage contained an analogy, the coders agreed on 96% (*n* = 109) of the 114 passages. Of these, most were rated as containing an analogy (*n* = 91, 83%). Passages lacking the basic ingredients of an analogy (i.e., sourcetarget mapping and knowledge transfer) or showing evidence of a literal comparison (*n* = 18, 17%) were eliminated. Five passages were inconclusive and were likewise eliminated. The interrater reliability for identifying analogies was strong (κ = 0*.*85). Some of the passages were found to contain multiple analogies. In all, 97 analogies were extracted for the remainder of this analysis.

## **ANALOGICAL DEPTH**

In coding for analogical depth, the coders agreed on 91% (*n* = 88) of the 97 analogies. Of these, analogies showing no obvious overlap in surface characteristics (*n* = 69, 78%) far outnumbered those where superficial similarities were present (*n* = 19, 22%). The inter-rater reliability was found to be substantial (κ = 0*.*75).

## **ANALOGICAL DISTANCE**

In coding for analogical distance, the coders agreed on 86% (*n* = 83) of the 97 analogies. Of these, within-domain (*n* = 44, 53%) and outside-domain (*n* = 39, 47%) analogies were almost evenly distributed. The inter-rater reliability was again found to be substantial (κ = 0*.*71).

## **ANALOGICAL PURPOSE**

At stated above, we did not impose a priori categorical restrictions on coding for analogical purpose. Rather, the coders were free to make subjective judgments as to the intent of the speaker on the first pass through the data. From these impressions, we grouped similar items and derived a set of categories for coding the analogies on the second pass. The derived taxonomy of rating categories was as follows: Differentiation (highlighting differences between source and target), Inclusion (indicating that the target was a type or component of the source), Example (indicating that the target was an instance of the source concept), Visualization (intended to create a picture or image in the mind of the audience), Emotion (appealing to feelings of the audience, or drawing on the emotion of the expert), Critique (using the source to point out shortcomings in the target), Exaggeration (gratuitous use of colorful phrasing), and Abstraction (broaden the target concept using a more general source concept). Where an analogy plausibly served multiple purposes, the raters chose the strongest.

In coding for analogical purpose, the coders agreed on 79% (*n* = 77) of the 97 analogies. Examples were the most prevalent (*n* = 27, 35%), followed by visualizations (*n* = 19, 25%). A fair number of exaggerations (*n* = 11, 14%), inclusions (*n* = 9, 12%), and differentiations (*n* = 8, 10%) were observed, but abstractions (*n* = 2, 3%) and critiques (*n* = 1, 1%) were uncommon. The inter-rater reliability for analogical purpose was substantial (κ = 0*.*74).

## **ASSOCIATION BETWEEN DEPTH AND DISTANCE**

As reported above, most of the analogies were rated as structural analogies. Of those, there was an even distribution of outside-domain (*n* = 31, 50%) and within-domain (*n* = 31, 50%) analogies. The superficial analogies were likewise split between outside-domain (*n* = 7, 54%) and within-domain (*n* = 6, 46%). It was determined that there was no significant distancedepth association [χ<sup>2</sup> (1*,<sup>N</sup>* <sup>=</sup> 75) <sup>=</sup> <sup>0</sup>*.*06, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*80], suggesting that the two variables are independent.

## **ASSOCIATION BETWEEN DEPTH AND PURPOSE**

Almost half of the analogies were rated as either structuralexample (*n* = 18, 25%) or structural-visualization analogies (*n* = 16, 23%). Though the distribution of purpose ratings was highly skewed, it was skewed similarly over both categories of analogical depth and no significant depth-purpose association was found [χ<sup>2</sup> (7*,<sup>N</sup>* <sup>=</sup> 71) <sup>=</sup> <sup>8</sup>*.*74, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*27]. This finding suggests that these two variables are also independent.

### **ASSOCIATION BETWEEN DISTANCE AND PURPOSE**

In contrast to the two findings above, an examination of analogical distance as a function of the speaker's purpose did reveal a significant association [χ<sup>2</sup> (7*,<sup>N</sup>* <sup>=</sup> 66) <sup>=</sup> <sup>34</sup>*.*94, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*005]. When a speaker sought to produce an analogy that created a visual image, an out-of-domain source was typically selected (*n* = 16, 24%). On the other hand, when an analogizing by example, a sourcetarget transfer from within the same domain was most commonly observed (*n* = 18, 27%).

## **DISCUSSION**

The most prominent observation to come from this study was that the use of analogy for exploration and explanation among economics experts and students was rich and abundant. In 427 min of discourse, 97 analogies were extracted, suggesting that analogies are both commonly used and serve as an important component in human reasoning and in understanding problems. The reading group setting enabled the observation of actual experts spontaneously forming analogies using their semantic knowledge applied to economics, a domain likely to entail more freedom to move between and among domains of knowledge than the previously investigated biology and political domains. The selection and fluid assembly of analogies during discourse may help to reveal the core principles involved in analogical thinking among experts. This study's findings will be discussed in the context of prior evidence.

## **EXPERTS IN THE FIELD OF ECONOMICS**

We chose to study behavioral economics experts in this study. Economists seek to broadly explain complex real-world human behavior in the context of equations, models, games, and hypothetical examples. All of these techniques rely heavily upon comparisons between different states of the world, different types of behavior, and combinations of models and situations. While some feature or object-based similarity occurs in comparisons between economic models and real-world behavior, it can be argued that the majority of the comparisons are about relations and systems of relations. For example, in the economic Ultimatum Game (Guth et al., 1982), two individuals each make monetary decisions that will be reviewed and reacted to by the other. The Ultimatum Game can be compared to many situations in life in which banks, individuals, or nations either choose to cooperate or not with regard to money, goods, or military action. Thus, many of the comparisons used in behavioral economics are likely to be analogical and may occur with greater frequency than observed in laboratory settings. The expertise of the individuals involved in behavioral economics discussions and expertise levels of the broader group are likely to encourage potentially rich and relationally deep analogies.

Expertise was a critical element to this study, as the role of domain knowledge in analogy production is not yet well understood. The level of expertise among reading group participants varied, but all had sufficient knowledge to be considered experts in the field. In some observational studies, expertise was narrowly concentrated [Dunbar's (1995, 1997) molecular biology group], whereas in others, general audiences possessed little to no domain knowledge [e.g., Blanchette and Dunbar (2001) political news study]. In our economics reading group, the expertise was both deep and broad, covering a range of specializations such as experimental economics, econometrics, and game theory.

There is growing evidence that the depth of expertise of the audience may influence analogical selection. It is reasonable to hypothesize that the breadth of expertise may contribute, as well. Blanchette and Dunbar (2001) showed more frequent use of outside-domain analogies when the intended audience consisted of non-experts and correspondingly more prevalent withindomain analogies among fellow experts. Additionally, experts appear more likely to exploit the complex relational nature of analogies while avoiding the tendency to rely on superficial features for comparison. The findings of Bearman et al. (2007) suggested that experts tend to compare structural relationships in all activities but novices show differences; i.e., when engaged in problem solving activities, novices rely on structural analogies but incorporate superficial features in their comparisons when illustrating or explaining.

We based our depth and distance expectations for this study largely on these findings. We did, as expected, observe a more even distribution of distance in analogies and infrequent use of superficial features in making comparisons. Relative to novices, experts can draw on a great deal of accumulated knowledge when thinking and reasoning (Bearman et al., 2007). It stands to reason that they are better able to exploit the deep, structural nature of source information as a result. Non-experts, on the other hand, lack the same deep encoding of domain information and may need to rely more heavily on superficial characteristics. The evidence is inconsistent, however; the Blanchette and Dunbar (2000) study points to novice use of structure as well.

## **ANALOGICAL DEPTH—SUPERFICIAL VS. STRUCTURAL**

Structural correspondences in the underlying system of relations between source and target elements represent an important component of analogies. Past experimental research demonstrated that superficial features influence the selection of source analogs (Gick and Holyoak, 1980, 1983; Gentner et al., 1993; Forbus et al., 1997), and studies in naturalistic settings suggest that other factors may contribute, as well (Blanchette and Dunbar, 2001; Bearman et al., 2007). The ratings from this study showed that the economics experts relied primarily on structural components of analogies (78%) with infrequent use of superficial feature comparisons (22%).

The data collected from the reading group sessions contained a dense set of complex comparisons rich in structure. Because of the unconstrained, yet guided, nature of the discussion, participants experienced a great deal of freedom to explore, compare, and explain complex target concepts. The reading group setting had certain features of prior story-based analogical reminding studies (Gentner et al., 1993; Wharton et al., 1994; Catrambone, 1997), as participants compared the contents of journal papers and the various experimental methods they described. The setting also had characteristics of the Blanchette and Dunbar "production paradigm" studies (2001) in which participants generated source analogs from their own knowledge and experiences. Perhaps not all settings are completely retrieval-based or production-based; rather, the degree to which the activity combines retrieval tasks with production tasks may determine the balance of analogical depth applied. Additionally, the occasional use of superficial comparisons likely reflects a tendency to spark heightened interest by making a link to a distant analogous domain during their descriptions of economic processes. Indeed, some of the surface-level analogies tabulated could be considered to be turns of phrase or metaphorical comparisons. The use of such comparisons can be helpful in making speech more interesting to the listener and to add points of common reference periodically.

#### **ANALOGICAL DISTANCE—WITHIN DOMAIN VS. OUTSIDE DOMAIN**

One possible explanation for the observed balance between within-category and out-of-category analogies was that subject matter expertise shared among a speaker and audience influenced source selection. Observational data offer support to this hypothesis. In Dunbar's (1995, 1997) biology laboratory study, scientists overwhelmingly produced within-domain analogies (98%). Saner and Schunn (1999) also observed high rates of within-domain analogies in their study of psychology laboratory meetings (81% within-domain) and colloquia (77% within-domain). In contrast, Blanchette and Dunbar (2001) studied opinion articles in the mainstream press, where the audience was the population at large. Here, they found that most analogies (77%) were from outside the target domain. Based on these findings, it appears that experts produce more within-domain analogies when communicating knowledge to their fellow subject matter experts, but generate analogies from sources from other domains when the audience consists of non-experts.

A second, but related, explanation stated that the goals of the participants were an influencing factor on source distance (Dunbar, 1995; Holyoak and Thagard, 1997; Blanchette and Dunbar, 2001). In Dunbar's biology lab study, the scientists were heavily focused on examining unexpected experimental results and resolving methodological problems. However, in the Dunbar and Blanchette studies involving both experts and non-experts, the experts' goals were to educate and/or persuade. In our study, there was no "discovery task"—i.e., there were no new hypotheses being generated, no new designs being developed, and no problems being solved. Rather, the discussion involved a great deal of comparison, both integrating and differentiating details of experimental methods and results. It may be that tasks involving a greater amount of creativity or innovation lead to more frequent use of same-domain analogies.

The results from the current study were more balanced. Outof-category analogies were observed most frequently (53%), but within-category analogies accounted for a sizeable portion of the total as well (47%). These results fall squarely between the previous findings, but a plausible explanation can be made. If the number of within- and outside-domain analogies is a function of expertise, then the balance between them might be expected to vary with the range and depth of expertise. In the EPPS reading group, the participants all had some degree of expertise in economics, but their domain experience varied in terms of sub-discipline (e.g., econometrics, game theory, behavioral economics) and academic career longevity (i.e., faculty or graduate student). Hence, common expertise is likely to be an influencing factor, as Dunbar suggested, but the amount of influence it has on source distance may moderated by the variability in such factors as range and depth. Additionally, the goals of the participants differed from those in earlier studies. Here, the participants sought to comprehend papers describing experimental methods and results, often by comparing unfamiliar methods to known, more familiar ones. To accomplish these goals, perhaps a more balanced and diverse set of analogies is most effective.

Another possibility for explaining the balance between withinand outside-domain analogies that we observed is concerns the economics discipline itself. In Dunbar's (1995, 1997) biology lab observations, the overwhelming number of within-domain analogies may have stemmed from the fact that a majority of situations in molecular biology are likely to have clear relational correspondences to closely related areas within biological research, rather than remote domains comprised of non-molecular elements. Meanwhile, the Blanchette and Dunbar (2001) study of political commentary suggested that politicians and journalists appeared to draw intentionally from remote domains that would be familiar to readers in ways intended to highlight certain aspects of relational comparisons. In the present study, discussions across the relatively broad domain of economic inquiry highlighted the technical overlap between its various subdomains. Unlike molecular biology, economics has a high potential for relational alignment to more remote domains within public policy, banking, and corporate practice, domains in which human behavior, monetary valuations, and world affairs converge. Thus, the complexity of economics, which plays out both in academic analytic settings and in real-world financial markets, appears to provide a rich field optimal for both within-domain and out-of-domain analogical comparisons. Given the complicated nature of economic systems, economic analogies are rarely complete, so both superficial and structural correspondences appear to be drawn upon in order to explain and describe various aspects of complex systems. The validity or appropriateness of analogies in economics may also be subject to greater interpretation than other domains given our limitations in fully explaining human behavior and market dynamics.

In terms of a possible relationship between distance and depth, it has been suggested that within-domain analogies present more superficial similarity than distant analogies; in other words, the greater the distance, the less superficial the comparison (Christensen and Schunn, 2007). However, we found no evidence of that constraint in our observation of economists.

## **ANALOGICAL PURPOSE**

It is fairly well established that goals influence the production of analogies (Dunbar, 1995, 1997; Spellman and Holyoak, 1996; Blanchette and Dunbar, 2001). Prior studies examined the goals of experts in scientific laboratories in both *discovery* (e.g., problem solving, hypothesis generation, experimental design) and *non-discovery* (e.g., explanation, illustration, or visualization) activities. The primary difference in purpose between the economics reading group and groups observed in the cited studies was the absence of discovery goals in the reading group. Since we determined that the discussions we observed were largely nondiscovery in nature, we focused on categorizing the extracted analogies into groupings based on the perceived reason for selecting the particular analogy; e.g., to differentiate a concept from other concepts, to inject emotion or colorful language into a comparison, to give a concrete example of a more abstract idea, etc. The list we derived can be found in the Results section above.

Blanchette and Dunbar (2001) provided evidence that goals influence analogical production, but the goals of the individuals in their study appeared to have no effect on analogical distance. Saner and Schunn (1999), on the other hand, found that goals did impact the domain distance. In particular, they found that individuals used within-domain analogies when working to identify problems but used outside-domain analogies when explaining issues or concepts to their lab mates. In our study, too, the purpose-distance effect was significant. Exemplification was associated with within-domain analogizing, while visualization was strongly related to out-of-domain analogy use. It may be that the functions that result in within-domain analogies (i.e., problem identification, generation of concrete examples) share some of the same underlying cognitive operations, as do those that produce outside-domain analogies (i.e., explaining issues, visualizing concepts), in ways that influence the generation and retrieval of analogies.

In interpreting our results, we should mention that the categories we developed were not mutually exclusive; e.g., an analogy intended to differentiate between concepts might do so using visual elements. In cases where an analogy plausibly served multiple purposes, the raters chose the category they felt best described its purpose.

## **GENERAL DISCUSSION ON THE RICHNESS OF OBSERVED ANALOGIES**

We have already emphasized that the collected passages contained a wealth of analogies rich in depth and structure, many of which involved implied systems of complex relations that could not be fully identified and analyzed. Furthermore, some of the more complex comparisons actually involved multiple analogies at different distances and depths. The challenges we faced in terms of analyzing the passages were made more difficult due to the complexities of the structural and superficial analogies used. Unlike laboratory tasks, in which comparisons between explicit statements are made (Gentner and Landers, 1985; Gentner et al., 1993; Krawczyk et al., 2004, 2005), the comparisons made by our economics experts were not always made fully explicit. Indeed, we frequently encountered instances in which a source or a target were implied, or not even articulated in the flow of discussion. This interpretive challenge may have been exacerbated by the high degree of expertise in our sample group, as some analogies involved inside knowledge that the speaker did not feel was necessary to clearly articulate in order to meaningfully draw the comparison to the group.

These observations are offered for several reasons. First, there are additional questions that can be investigated from these data, aside from the boundaries of this paper (e.g., Do the withincategory/outside-category comparisons correspond to particular types of source-target pairs? Does a taxonomy of source categories emerge from the outside-category analogies in this setting? How effective did the analogies seem to be in conveying the intended information?). Second, the rating process involved some inherent subjectivity; even though inter-rater reliability was rather high, there were disagreements on specific passages. This is a challenge inherent in real-world analogical analyses, as the dynamic and unscripted nature of the interactions can produce text that is very difficult to interpret outside of the context in which it was spoken and by an individual who does not have access to the speaker's intent. Third, the rating process was further complicated when analogies were embedded in familiar language constructs—well-known clichés, common metaphors, etc.—and were not immediately identified as analogies by the raters, as such phrases have become so conventionalized that their figurative qualities can be simply overlooked.

We were able to ascertain some of the major purposes for which analogies are used in economics discussions among experts. The leading purpose for analogical comparisons was to provide examples. Good examples from other domains or familiar sources can provide clarity to a target and make the concept more concrete. In many of the analogies, we observed that experts tended to either relate an abstracted theoretical model to a more common situation that occurs in human interactions, or they did the reverse and described a particular situation as being an example of what is known to occur within a particular theoretical game. Experts also made common use of analogies to create a visual image of a particular concept. Visualization is important for providing a common ground for the audience and for making a target domain richer and easier to comprehend. Experts also used analogies in order to add color or interest to their contributions and to mark a topic as being included within a more familiar source domain. Given that we conducted this observation process in an academic setting, our experts may have developed tendencies to use analogy due to their experiences with teaching, in which good examples and visual comparisons can be useful for conveying concepts (Richland et al., 2007; Glynn, 2008). These goals for analogical comparison fit with prior observations about analogy as being diverse in function (Holyoak and Thagard, 1995), but contrary to several prior scholarly works describing analogical purpose (Holyoak and Thagard, 1997; Hummel and Holyoak, 1997) we did not find substantial evidence of new inferences being generated. Rather, the most prevalent use of analogy in the economics experts was to describe concepts to the group or to point out similarities between relational systems in either the real world, or in theoretical constructs.

In addition to these questions, several limitations with regard to the method and analysis of this study should be noted. The transcribers were non-experts in economics, but they were chosen to avoid domain bias and reduce the tendency to add their own interpretation to the passages. Additionally, three of the four transcribers lacked prior transcription experience in recognizing analogies, which opened the door for subtle analogies to be overlooked. We addressed this limitation using a two-phased training process and a two-person extraction strategy and believe that we reduced - but may not have eliminated - the likelihood of missed analogies. While it is possible that some subtle analogies may have been overlooked, we believe the number to be small enough to not alter our findings.

Their lack of domain expertise, however, may have contributed to the subjectivity and inconsistency mentioned above. Furthermore, we did not code the analogies according to any sort of standard domain taxonomy, as Dunbar (1995) did in his microbiology laboratory study. Finally, we did not account for individual differences by examining patterns of analogy use by specific individuals in the group.

## **CONCLUSIONS**

In summary, this study reinforces the strong reliance humans place on analogies for developing understanding and communicating in natural settings. It contributes valuable evidence that humans are quite agile in their selection of analogies, drawing on a mix of shallow and deep comparisons and determining an effective distance strategy based on the constraints of the domain and the level of perceived group expertise. Economics faculty and graduate student experts engaged in scientific discussion were observed to apply analogies that were more balanced in terms of categorical distance and structural depth than those observed in other natural settings. The domain context, problem-solving goals, and participant expertise of this particular group setting all appear to be important factors that led to differences in the magnitudes of depth and distance observed in the earlier studies. No evidence of an association between the distance and depth characteristics was found, but distance and purpose appeared to be related. A number of other questions remain to be answered about the ways in which analogies are applied in different types of settings. Future naturalistic analogy research could help to clarify the role on analogical comparisons in real-world settings and the social sciences appear to provide particularly rich domains for additional studies.

## **ACKNOWLEDGMENTS**

The authors wish to thank Dr. Catherine Eckel and the Economic, Political, and Policy Sciences (EPPS) Spring 2011 weekly reading group at The University of Texas at Dallas for their participation in this study. Thanks, also, to Brandon Wolfe, Srikant Chari, and Ashley Fournier for their valuable assistance with the difficult task of transcription.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.01333/abstract

## **REFERENCES**


Holyoak, K. J., and Thagard, P. (1997). The analogical mind. *Am. Psychol.* 52, 35–44.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 03 November 2014; published online: 26 November 2014.*

*Citation: Kretz DR and Krawczyk DC (2014) Expert analogy use in a naturalistic setting. Front. Psychol. 5:1333. doi: 10.3389/fpsyg.2014.01333*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Kretz and Krawczyk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Explaining the abundance of distant analogies in naturalistic observations of experts

## *Máximo Trench\**

*Psychology Department, Centro Regional Universitario Bariloche, National University of Comahue, Bariloche, Argentina \*Correspondence: maximo.trench@crub.uncoma.edu.ar*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Daniel C. Krawczyk, The University of Texas at Dallas and University of Texas Southwestern Medical Center, USA*

**Keywords: analogy, naturalistic settings, experts, surface similarty, retrieval**

#### **A commentary on**

## **Expert analogy use in a naturalistic setting**

*by Kretz, D., and Krawczyk, D. C. (2014). Front. Psychol. 5:1333. doi: 10.3389/fpsyg. 2014.01333*

Analogical reasoning is a landmark of human cognition. Based on the realization that the elements of two situations are organized by similar systems of relations, analogical inferences allow the transfer of knowledge structures from a better-known situation (the base analog) to a target situation that is relatively less understood (the target analog).

Experimental research has demonstrated that the retrieval of base analogs from long term memory in response to the proceesing of a target analog is infrequent in the lack of semantic similarities between both situations (Gick and Holyoak, 1980; Keane, 1987; Gentner et al., 1993; Trench and Minervino, 2014). With the turn of the century, several naturalistic observations of experts working in their domains of expertise yielded a more complex picture. While molecular biologists (Dunbar, 1997) and psychologists (Saner and Schunn, 1999) still exhibited mostly within-domain analogizing, the observation of journalists and politicians (Blanchette and Dunbar, 2001), teachers (Richland et al., 2004), managers (Bearman et al., 2007) and design engineers (Christensen and Schunn, 2007) showed a more frequent use of long-distance analogies. The naturalistic study by Kretz and Krawczyk (2014) on the use of analogies by economists also demonstrates an abundance of distant analogies in the sevice of an impressive variety of communicative purposes, most of which were not evident in prior research. These goals included the generation of concrete source examples of more general target concepts, the formation of visual images of source concepts, the addition of colorful speech, the inclusion of a target into a source concept, or the differentiation between source and target concepts. With these results in mind, the time is ripe to assert that the naturalistic observation of experts shows a more flexible use of analogical sources than is predicted by experimental studies on analogical transfer, and simulated by dominant computer models of analogical retrieval (e.g., MAC/FAC, Forbus et al., 1995; LISA, Hummel and Holyoak, 1997). How, then, to explain this analogical abundance? In trying to account for the contrasting results of the experimental and the naturalistic traditions, the default explanations revolve around the expertise of the analogizers and the psychological constraints of the target tasks. I will argue that although both factors are likely to bear some responsibility for this empirical inconsistency, there are reasons to expect a heavier weight of the latter.

## **THE EXPERTISE OF THE ANALOGIZERS**

Shortly after having documented that journalists and politicians generated mostly distant analogies when arguing for (or against) the referendum on the independence of Quebec, Blanchette and Dunbar (2000) obtained an *in vitro* replication of this result with novice participants generating their own analogies for another realistic political topic: the zero-deficit strategy for controlling public debt. The authors concluded that their prior results were due to the fact that the analogizers were generating their own analogies for a realistic situation, rather than to their expertise in the target issue. Trench et al. (2009a) provided support for this interpretation by replicating Blanchette and Dunbar's results with 10 different target topics. In the same vein, Bearman et al. (2007) failed to observe differences in the analogies proposed by novices and experts solving management problems. Rather than based on broad expertise differences across-participants, it seems that the ease of generating distant analogies depends on the goals of the analogizer and on the extent to which she understands the target analog at stake. When the analogizer comprehends the target analog better than her intended audience, as in communicative situations such as explaining a procedure to students (Richland et al., 2004) or selling political ideas to the population (Blanchette and Dunbar, 2000, 2001; Trench et al., 2009a), both experts and novices easily generate distant analogies. But when the target analog is insufficiently understood, as when Dunbar's (1997) expert molecular biologists or Gick and Holyoak's (1980) novice participants are attempting to solve a problem, distant analogies are seldom generated.

## **PSYCHOLOGICAL CONSTRAINTS OF THE TARGET TASK**

Another explanation for the frequent use of distant analogies in naturalistic studies might arise from comparing the psychological constraints of naturalistic analogy generation against those of classical experimental studies. The standard experimental procedure comprises an encoding phase, during which participants learn the base analogs, and a transfer phase, where experimenters present participants with either a semantically close or a semanically distant target situation, and assess whether its processing elicits the spontaneous retrieval of the base analog. Based on this procedure, differences accross conditions were typically taken to demonstrate the centrality of surface similarities during analogical transfer. In contrast with this highly controlled environment—in which transfer can only originate in the retrieval of the critical base analog from memory—in naturalistic settings participants are free to generate analogies by means of retrieving their own base analogs (Blanchette and Dunbar, 2000; Hofstadter and Sander, 2013), identifying conceptual metaphors (Minervino et al., 2009), stumbling across suitable analogs in the external environment (Christensen and Schunn, 2005) or fabricating novel base analogs either by generating extreme cases out of the target analog (Clement, 1988), or by reinstantiating the relational structure of the target with a new set of elements (Olguín et al., 2013). Upon generating candidate analogies via any combination of such mechanisms, the proportion of close vs. distant analogies that people produce in naturalistic settings can also reflect a conscious editing of one type of analogies in favor of the other type, depending on the purpose of the reasoner (Trench et al., 2009b).

The fact that the core components of our retrieval mechanisms are invariably set to favor semantically close base analogs (Gentner et al., 1993; Trench and Minervino, 2014) suggests that the abovementioned generative mechanisms could possibly account for the frequency and the diversity of the analogies generated by experts. Future studies, both naturalistic and experimental, are required to understand how these overlooked analogy generation methods interact with the variety of goals that realistic analogy generation can pursue, as eloquently revealed by Kretz and Krawczyk's (2014) detailed analysis of the analogies produced by expert economists.

## **ACKNOWLEDGMENT**

This work was supported by grants PICT 2352 and PICT 2650 from the National Agency for Scientific and Technical Research of Argentina (ANPCyT), grant C108 from National University of Comahue and grant PIB013 from Universidad Abierta Interamericana.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 November 2014; paper pending published: 02 December 2014; accepted: 03 December 2014; published online: 22 December 2014.*

*Citation: Trench M (2014) Explaining the abundance of distant analogies in naturalistic observations of experts. Front. Psychol. 5:1487. doi: 10.3389/fpsyg.2014.01487 This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Trench. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Can taking the perspective of an expert debias human decisions?The case of risky and delayed gains

## *Michał Białek\*† and Przemysław Sawicki †*

*Department of Economic Psychology, Centre for Economic Psychology and Decision Sciences, Kozminski University, Warsaw, Poland*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Gordon Pennycook, University of Waterloo, Canada Francesca Chiesi, Università di Firenze, Italy*

#### *\*Correspondence:*

*Michał Białek, Department of Economic Psychology, Centre for Economic Psychology and Decision Sciences, Kozminski University, Jagiello ´nska 59, Warszawa 03-301, Poland*

*e-mail: mbialek@kozminski.edu.pl*

†*Michał Białek and Przemysław Sawicki have contributed equally to this work.*

In several previously reported studies, participants increased their normative correctness after being instructed to think hypothetically, specifically taking the perspective of an expert or researcher (Beatty and Thompson, 2012; Morsanyi and Handley, 2012). The goal of this paper was to investigate how this manipulation affects risky or delayed payoffs. In two studies, participants (*n* = 193) were tested online (in exchange for money) using the adjusting procedure. Individuals produced certain/immediate equivalents for risky/delayed gains. Participants in the control group were solving the problem from their own perspective, while participants in the experimental group were asked to imagine "what would a reliable and honest advisor advise them to do." Study 1 showed that when taking the perspective of an expert, participants in experimental group became more risk aversive compared to participants in the control group. Additionally, their certain equivalents diverged from the expected value to a greater extent.The results obtained from the experimental group in Study 2 suggest that participants became less impulsive, which means they tried to inhibit their preferences. This favors the explanation, which suggests that the perspective shift forced individuals to override their intuitions with the social norms. Individuals expect to be blamed for impatience or risk taking thus expected an expert to advise them to be more patient and risk aversive.

**Keywords: risk, intertemporal choices, expert, dual-process theory, decision-making**

## **INTRODUCTION**

Human life presents continuous choices. In this text, we will focus on choices made in risk conditions (sticking to a permanent post vs. starting your own business) and intertemporal ones (buying an iPad vs. saving money). Studies have shown that people make mistakes in both areas, which can lead to serious social problems, such as gambling or obesity.

Overwhelming media advertisements and marketing strengthen impulsive behaviors in the modern society. This results for an example in obesity and financial debts. Nowadays, people are often facing artificial risky problems (stock market, inflation) for which they are not prepared (for extended review, see Todd and Gigerenzer, 2012). This suggests a need to provide support to people so they can make more rational decisions, especially those that are intertemporal and risky.

One of the recently introduced methods to support thinking, specifically reasoning, is hypothetical thinking. People are asked to assess a problem from the perspective of an expert or researcher (Beatty and Thompson, 2012; Morsanyi and Handley, 2012). This usually results in increased normative correctness of their mental processing. Our aim is to test this method in a new field of cognition, that is, decision-making under risk and delay. We hope to validate the method of taking the perspective of an expert as an efficient debiasing method.

## **DECISIONS UNDER RISK**

Uncertainty about the future is an inherent part of human existence. While there are events we can be sure of and others that are impossible to predict, most of us have to deal with probabilistic situations. Studies on choices in risky conditions show that people have difficulty understanding information about probabilities. In a classic study, Tversky and Kahneman (1971) showed that when people assess the probability of events, they tend to ignore base rate information and instead rely on the social stereotype.

One of the main assumptions of prospect theory is that in risky situations, people underestimate moderate and large probabilities but overrate rare events (Kahneman and Tversky, 1979). Failure to understand the rules of probability theory as well as the fact that people overrate small probabilities are possible causes for the high percentage of people participating in different types of gambling. Despite the unfavorable profit-torisk ratio, studies have shown that 82% of adult Americans (Welte et al., 2002), 72% of Canadians (Azmier and Clements, 2001), and 68% of adult British citizens (Wardle, 2007) admit to gambling. Even part of the stock market investors treat investing as a substitute of gambling. (Markiewicz and Weber, 2013).

Biased perception of randomness is a challenge in the healthcare domain. For example, in the medical context, there is the question of informing patients about the probability of various diseases. Much empirical evidence has shown that people have serious problems estimating small probabilities. In particular, people are insensitive to changes in the magnitude of these probabilities (Kunreuther et al., 2001; Siegrist et al., 2008; Tyszka and Sawicki, 2011).

### **INTERTEMPORAL CHOICES**

Intertemporal choices present people with different challenges. In everyday life and in politics or economic affairs, some decisions are based on choosing between payments that occur at a time different from the time when the decision is made. For example, deciding whether to eat fast food immediately or wait for a balanced meal is based on the same psychological mechanisms as deciding whether to spend your profit immediately or invest it. The issue of intertemporal choices is also examined in terms of self-control, for example, not succumbing to temptation. In the classic research commonly known as the marshmallow test,Walter Mischel offered a four-year-old child a sweet and said if the child decided not to eat it, he or she would soon get two marshmallows. The experimenter then left the room, leaving the child alone with the temptation. If the child did not wait, he or she got only one marshmallow instead of two. The results showed that people who had greater self-control when they were children, scored higher on the SAT test several years later, exhibited fewer behavioral problems, coped better with stress, and were more focused and attentive (Mischel et al., 1989).

The inability to defer gratification may also lead to serious social problems, such as obesity. Obesity is estimated to be the seventh leading cause of mortality in the world (Ezzati et al., 2002). In 2007–2008, 68% of adults in the US were overweight, and 33.8% of them were obese (Flegal et al., 2010).

### **SOCIAL NORMS AND NORMATIVE MODELS**

For many cognitive processes there is a normative model, which states what is correct (e.g., logic for reasoning). For risk taking, multiplication of gain and probability of its occurrence expected value (EV) is used as a normative model. Those who expect more for a lottery than its EV are overly risk seeking, and those who expect less thank the EV are risk averse. The intertemporal choices do not have a normative model, but because of changes in our societies and extended lifespan, patience (to some degree) is seen as rational.

Social norms also regulate the behavior. Typically, patience and the ability to avoid acting impulsively are virtues (Haidt and Joseph, 2004). Children are rewarded when they show the ability to postpone reward.

Rational, according to the normative model of risk taking, would be to take a well calculated risk. It is unknown whether there are any consistent social norms regarding risk-taking, but experiencing a loss because of risk-taking (action) is blamed more than missing a chance to profit (omission, Ritov and Baron, 1990, 1995). This happens because people expect to be blamed when taking the risk, and risk avoidance can be seen as a socially accepted behavior.

## **METHODS IMPROVING DECISION MAKING**

Some studies have focused on debiasing individuals in their conclusions and decisions. These studies introduced different types of instruction or additional information to help people override their initial, biased intuitions. There are two, usually implicit, assumptions behind these manipulations.

First group of researchers tries to inform people about the normative models and procedures (presenting people with the concept of validity, EV or base rates). They assume that people are making mistakes because they lack the appropriate knowledge or intuitions regarding the field of probability or logic (or mindware, as called by Stanovich, 2009b). In this view, an efficient method of debiasing would be a request to rely on a specific, formal procedure, e.g., "being presented with the concept of logical validity, please try to assess following conclusions according to their validity" (Evans et al., 1993).

The other group of researchers believes that biased thinking is not a result of the lack of appropriate knowledge but of cognitive miserliness (Fiske and Taylor, 1991), which means that individuals are making biased decision because of lack of available cognitive resources and/or motivation to use reflexive processing. When motivated and having enough time, individuals show less biased decision-making. In this view, an efficient method of debiasing is an instruction that relates to the procedure, supporting a deeper and reflexive thinking. An example of such instruction would be, "please try to override your initial beliefs and focus on the logical structure of presented problems," like used by Moutier et al. (2002).

Both approaches did not produce any satisfying and consistent increase in normative correctness of decisions. Despite the consensus in the literature that debiasing requires decoupling of the intuitions with effortful processing (Croskerry et al., 2013), it seems that people are having troubles willingly override their initial beliefs, even when instructed to do so and when they are motivated and have appropriate knowledge.

In contemporary literature, we can also find other methods of improving individual's cognitive processing. Hypothetical thinking can increase the normative correctness of decisions by increasing chances of using effortful, rule based processing (called Type 2, Evans and Over, 1996; Evans, 2007) but also inhibiting intuitive and heuristic responses (Type 1 processing). For example, Loewenstein et al. (2001)instructed participants to imagine the consequences of both presented alternatives, and thus debiased people from the vividness effect. Lord et al. (1984) instructed their participants to imagine the opposite when considering social dilemmas (e.g., capital punishment). Thanks to this strategy, they improved the objectivity of their judgments. Baron (2008) proposed open-minded thinking to help override simple heuristic-cued intuitions. Trippas (unpublished doctoral dissertation) showed that cognitive style, understood as willingness to engage in Type 2 processing rather than cognitive ability, influences accuracy of reasoning. This suggests that human cognition can be improved by encouraging people to think hypothetically (Type 2 processing).

Considering a problem from an expert's perspective is quite a natural instruction. People typically perceive experts as unbiased and reliable sources of information. When taking an expert's perspective, intuitions and emotions should play a minor role, and thanks to this manipulation, reflexive processing (1) should be used more often compared to standard cases and (2) should override the internal conflict with intuitions.

In some studies, the instructing participants to take the experts perspective effectively encouraged the use of Type 2 processing, leading to less biased human performance (see Greenhoot et al., 2004; Thompson et al., 2005; Ferreira et al., 2006; Beatty and Thompson, 2012). However, when testing children, the effect of the instruction was ambiguous. The instruction, "please answer the questions taking the perspective of a perfectly logical and rational person (pp. 328)," as used in the study of Chiesi et al. (2011), seemed to work only for high cognitive ability individuals. This is possibly a result of the lack of learned rules and procedures, which could be applied to the task in the reported study or lack of cognitive resources. We can assume that the rules of thinking, just like driving a car, require much more cognitive resources when they are just learned while requiring fewer resources with greater practice (Stanovich, 2009a).

### **QUESTIONS AND HYPOTHESIS**

As stated in the introduction section, people make many mistakes when dealing with risk and delay, what results in many social problems, like debts, gambling, obesity, and many more. The presented literature suggests that instructing people to reflect on a problem from the perspective of an expert can significantly improve the performance. We expect our participants to produce less biased decisions in the field of risk and delay management.

The performance in risky condition is expected to be more consistent with the behavior predicted by EV normative model. This would provide an evidence of less biased risk assessment.

The performance in intertemporal choices is expected to change direction toward bigger patience. The ability to focus on bigger but more delayed goals seems to be more adaptive in modern society than is impulsivity; thus, our manipulation should boost patience. This assumption does not follow from any normative model, as in the case of risk; instead, the general social norm supports patient behavior rather than short-term oriented behavior (Haidt and Joseph, 2004). Hence, we expect participants to follow the social norm of impulsivity avoidance and produce lower discounting strength in the experimental condition in which participants are instructed to take the perspective of an expert.

## **STUDY 1**

We investigated the effect of forced hypothetical thinking on choices under risk. Situations, such as considering how much one is willing to invest to get a higher but uncertain return, are everyday problems, but the form of presenting the problem is relatively artificial, and people are evolutionally not prepared to deal with such problems. In other words, people can intuitively deal with risk, but the way they are presented increases the chance of making a biased decision (Gigerenzer and Selten, 2002).

By improving these types of judgment, we could enhance people's entrepreneurship abilities as well as prevent risky, maladaptive behaviors, such as gambling or smoking. The hypothetical thinking can possibly increase the use of other competing intuitions or rule-based processing. Both should change the risk taking decision in a more normative manner and help people accurately estimate the EV of potential gains.

#### **MATERIALS AND METHODS**

The certain equivalent of a potential gain of \$5000 with a probability of 90, 70 and 30% was computed for every participant. The experimental manipulation was an instruction, asking participants to consider the problem from one of two perspectives, experts' or own. Our manipulation should increase abstract hypothetical thinking by taking the perspective of an expert1. Through this manipulation, we expected to debias human reasoning and discover the mindware responsible for risk management. The following is an example of tasks used:

*Imagine you have received a \$5000 reward, which will be paid to you with a 70% chance. An investment fund is willing to rebuy your reward with a certain payment. Imagine what an honest and rational expert would advise you to do – accept or refuse the given offer.*

#### **ADJUSTING PROCEDURE**

The research scheme used in all experiments was based on the adjusting method, which is most popular in the discounting research (Yi et al., 2006; Benzion et al., 2011; Odum, 2011). This procedure enables one to determine balance points, i.e., payment methods where the person being tested was indifferent to two given alternatives, for example, between receiving a sum x immediately and a sum y after a period of time t (in research on risky choices, between receiving a sum x for certain, and a sum y with a determined probability). Thanks to the adjusting method, the overestimation of expected gains can be prevented, which is common if individuals are asked directly, e.g., how much they would expect for having to wait for their gain.

The research was conducted using a specially developed computer program, which enabled us to determine the balance points. The characteristic feature of the adjusting method is that one choice alternative adjusts its value depending on previous decisions made by the person being tested. For example, during the test on risky payments, two cards depicting sums were displayed on the screen, one was bigger but uncertain (the card on the right side of the screen), and the other was smaller and certain (the card on the left side of the screen). The sum on the card on the right was fixed, while the card on the left changed its value depending on the subsequent choices of the person being tested.

The participant's task was to choose between the two alternatives. With every question, the person being tested was to choose one of the values (certain or uncertain). In the first step, the participant was given a choice of \$2500 for sure (information on the card on the left side) or \$5000 with a 70% chance of winning (information on the card on the right side). After choosing the risky option, in the following step, the certain sum increased by half of its previous value. Hence, this time, the person being tested could choose \$3750 for sure or \$5000 with a 70% chance of winning. When the participant chose the risky option once more, the certain value increased by half of the previous value and amounted to \$4375 (\$3750 + \$625) in the subsequent step. However, if in the next step (\$4375 for sure or \$5000 with a 70% chance of winning), the person being tested chose the certain sum, its value decreased by half of the previous change (by

<sup>1</sup>We have also manipulated the ownership of the gain (deciding for myself or for somebody else's gains), but it produced neither main effect nor interaction; thus, we do not report this manipulation in more details.

\$312). To sum up, the certain value was adjusted to reflect previous choices of the person being tested. After making the sixth choice, the program calculated the equivalent point for the risky alternative.

## **PARTICIPANTS**

Participants (*n* = 105) were citizens of the United States recruited by specialized company in exchange for payment. They were randomly assigned to one of four experimental conditions. From this group, 27 participants who answered illogically (e.g., they wanted to receive more for a 30% chance of winning compared to 70% chance) were removed from the database prior to any further analysis. Mean age, gender distribution, and number of individuals in each experimental condition are presented in **Table 1**.

## **RESULTS**

When a person is risk averse his subjective value of a lottery is lower. **Figure 1** presents a line that connects mean subjective values of each lottery computed for the own/expert perspective and compared to the EV model. We can see that individuals who took the perspective of an expert were more risk averse. Additionally, their result is further from the normative EV line compared to the control group individuals who solved the problem from their own perspective.

To measure the attitude towards risk, the area under the curve was analyzed2 (Myerson et al., 2001). The surface of the area under the curve is the sum of all trapezes set by the next balance point values in relation to the ordinate and abscissa (**Figure 2**). By using this measure, one should first assign the values in the range (0, 1) to a scale of delay and the subjective value of the discounted values. In the case of the scale of delay, values are subsequently divided by the highest delays used. The scale of subjective values of the discounted sums is converted in a similar way. Next, we calculate the sum of the fields of the created trapezes using the formula (a2–a1) [(b1 + b2)/2], where a1 and a2 are consecutive delays while b1 and b2 represent consecutive subjective values of gains. The area under the empirical discounting curve is therefore the sum of all trapezes. The smaller the area under the curve the bigger is the risk aversion of an individual.

Attitude toward risk was computed for each participant. Later, the parameters were compared across experimental groups. The

<sup>2</sup>This measure has two significant advantages compared to an alternative analysis, which is based on comparing discounting parameters. First, it reduces the level of skewness compared to the analysis of the distribution of discounting parameters (k); second, it is neutral, i.e., it does not refer directly to any specific mathematical formula.


general linear model showed the main effect of perspective [*F*(1,78) <sup>=</sup> 7.634; *<sup>p</sup>* <sup>&</sup>lt; 0.01; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.091]3, and the participants in the experimental group showed bigger risk aversion. The mean areas under the curve are shown in **Figure 3**. The bars represent the mean field under the curve, with higher value indicating more positive attitude toward risk (lower risk aversion).

## **DISCUSSION**

We found that taking the perspective of an expert significantly influenced the risky choices. Individuals in the experimental condition were more risk aversive. By comparing the choices in the study to the line representing the EV,we can conclude that the decisions made in the experimental condition were less normatively correct.

This result is contrary to our expectations because previously reported studies in other domains suggested that the perspective manipulation increases the normative correctness of decisions.

There are two possible explanations. First, the concept of EV is not known to participants or the social norm for risky decisions (risk avoidance) was made less salient thanks to the perspective shift. The first explanation seems less probable, as already children intuitively compare lotteries by multiplying gains and probabilities (Schlottmann, 2001). But if individuals would follow the social norm would expect to be blamed for unsuccessful risk taking and thus avoid doing so to greater extent.

Morsanyi and Handley (2012) showed that the use of Type 2 processing does not guarantee the correctness of thinking. Participants instructed to think from the perspective of a rational person focused even more on stereotypes instead of base rates when solving a lawyer's task (Kahneman and Tversky, 1973).

*In a study 1000 people were tested. Among the participants there were 5 engineers and 995 lawyers. Jack is a randomly chosen participant of this study. Jack is 36 years old. He is not married and is somewhat introverted. He likes to spend his free time reading science fiction and writing computer programs.*

Most of individuals endorsed the conclusion that Jack is an engineer. They showed more interest in social stereotype than in the probability of occurrence of a specific event and answered against the odds. This case is especially interesting for understanding the naïve probabilities that humans calculate. Here, taking a rational perspective (Type 2 processing) exacerbated the neglect of base rates.

In other study, Pennycook et al. (2014) showed that the base rates are available at an intuitive level, so the increase of biased responses in referred Morsanyi and Handley study is a corrupted mindware case, where a social stereotype was judged as a more reliable source of information compared to the base rates. Additionally, in the field of risky decision, the instruction manipulation or enhancement of hypothetical thinking did not consistently improve the performance (Weinstein and Klein, 1995). The manipulation of instructions (hypothetical thinking) could possibly increase efficiency when one has the appropriate mindware: knows how to calculate the probability and how to

<sup>3</sup>The analysis on full population of participants showed also a significant main effect of the perspective manipulation [*F*(1,104) <sup>=</sup> 5.187; *<sup>p</sup>* <sup>&</sup>lt; 0.05; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.077]. By the elimination of irrational responses, we wanted to reduce a possible noise and report most reliable effect of the study.

use it in real-life social problems. If the mindware is missing, the performance can be even worse (Chiesi et al., 2011).

## **STUDY 2**

The aim of the second study was to investigate the effect of forced hypothetical thinking on intertemporal choices. Such situations are everyday problems, where one has to think about how much he/she is willing to invest now to get a higher but delayed return. By improving that type of judgment, we could persuade people to improve health or prevent such maladaptive behavior as overeating.

This assumption does not follow from any normative model, as in the case of risk, but the general social norm supports patience

rather than short-term oriented actions (Haidt and Joseph, 2004). Thus, we expected participants to follow the social norm of impulsivity avoidance and produce lower discounting strength in the experimental condition of taking the perspective of an expert.

#### **MATERIALS AND METHODS**

We repeated the procedure of Study 1. The only difference was that individuals had to evaluate three delayed gains. The delay was set for 1 month, 6 months, and 24 months. Once again, participants were randomly assigned to one of two experimental conditions, own perspective or the perspective of an expert. Additionally, in Study 2, we manipulated the ownership of the money (own money or someone else's money), but this manipulation produced no

main effect and did not interact with the perspective manipulation; thus, it will not be reported in following analysis.

#### **PARTICIPANTS**

Participants (*n* = 130) were citizens of the United States recruited by external services in exchange for payment. They were randomly assigned to one of two experimental conditions. Details of the group are presented in **Table 2**. A small group of participants (*n* = 15) was excluded because of irrational responses (the same criterion as in Study 1 was used).

### **RESULTS**

The discounting curves that connect balance points for delayed gain of \$5000 are presented in **Figure 4**, from which we can see that when taking the perspective of an expert, people show less impulsivity compared to the control condition (own perspective).

General linear model analysis was used to test the influence of possession and perspective on discounting strength. Once again, a main effect of perspective has been found [*F*(1,114) = 4.168; *<sup>p</sup>* <sup>&</sup>lt; 0.05; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.036]4. The mean discounting strength is presented in **Figure 5**.

## **DISCUSSION**

Taking the perspective of an expert improves thinking by helping individuals overcome impulsivity. Participants taking an expert's perspective were less likely to lose some of their money by receiving it sooner and not having to wait compared to considering a problem from their own perspective. We can conclude that individuals' mindware related to delay management is governed by a rule, where impulsivity is assessed as wrong, and the correct behavior is patience. This belief is then in conflict with the intuitive willingness to receive immediate rewards. This is consistent with common observations that people are sometimes consciously aware of the internal conflict between their intuitions (impulsivity, Type 1 processing) and beliefs about correct response (patience, Type 2 processing).

Despite the lack of normative model, we can conclude that the patience in this specific task presented here is adaptive; thus, it should be evaluated positively. This internal conflict should emerge when the mindware response is made salient by taking the perspective of an expert. De Neys (2014) stated that when the emotional arousal emerges after the conflict detection and an individual notices it, the heuristic response could be questioned. This can decrease the impulsivity, as it is a heuristic response.

<sup>4</sup>As requested by reviewer, we tested gender as a factor in the GLM analysis and found no main effect or interaction (both *p* > 0.5).


**FIGURE 4 | Discounting curves for delayed gains.**

## **GENERAL DISCUSSION**

The perspective shift changed human decisions. Participants showed bigger patience and risk aversion. The results in the area of risk are not consistent with our expectations. The decisions made cannot be seen as more correct or rational. The intertemporal choices have shown an improvement, as the discounting rate shown by individuals decreased. This can help individuals overcome temptations and help them make long-term financial plans (savings, investments).

Possible explanation of observed behavior is that individuals focused on social norms rather than on normative models. This would be consistent with findings of Chiesi et al. (2011) and Morsanyi and Handley (2012) who reported no consistent improvement or even decrease in correctness of decisions under the forced perspective shift. We discuss, that the experimental manipulation made the social norm salient and people who took experts' perspective focused on the social norm to bigger extent, than in typical, everyday decision. Because patience and cautiousness are socially perceived as virtues we see a change of decisions to match those. This assumption is consistent with the reported findings on the improvement of thinking and reasoning presented in the introduction section. The social norm related to thinking promotes reflexive and logical thinking, that is, the Type 2 processing. The mindware responsible for dealing with risk is still a main topic for research on risk and delay management, but social norms related to this topic have to be investigated more and incorporated in the design of studies.

Our data did not fully support presented conclusion, as we have not tested the social norms of participants. We assume that the cultural norm should have some effect on most individuals. The proverb "A bird in the hand is worth two in the bush," is a good example of socially accepted risk avoidance. The hypothesis of social norms made salient could be tested by comparing cultures with big differences in the delay management or in the attitude toward risk.

Despite possible difficulties, the idea of improving individuals' decision-making in the area of risk and delay seems to be worth effort. Individuals perform sub-optimally even in advantageous conditions (with all required information provided and no time pressure) and require to be supported. Modern society created an artificial environment (e.g., by marketing, commercials) in which people are misinformed or put under time pressure; thus, human decision-making needs to be supported to greater extent, particularly by hypothetical thinking with the focus on a specific instruction.

## **ACKNOWLEDGMENTS**

The current project was financed by the resources of Polish National Science Centre (NCN) assigned by the decision no. 2013/11/D/HS6/04604; http://www.nauka.gov.pl. Thefunders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Authors would like to thank to Prof. Tadeusz Tyszka and Dr. Łukasz Markiewicz for their comments and advices.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 20 August 2014; published online: 04 September 2014.*

*Citation: Białek M and Sawicki P (2014) Can taking the perspective of an expert debias human decisions? The case of risky and delayed gains. Front. Psychol. 5:989. doi: 10.3389/fpsyg.2014.00989*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Białek and Sawicki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The geometry of expertise

## *María J. Leone1,2\*, Diego Fernandez Slezak3, Guillermo A. Cecchi <sup>4</sup> and Mariano Sigman1,2,5*


#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia Michael H. Connors, Macquarie University, Australia*

#### *\*Correspondence:*

*María J. Leone, Laboratorio de Neurociencia Integrativa, Departamento de Física, FCEN UBA and IFIBA, CONICET; Pabellón 1, Ciudad Universitaria, 1428 Buenos Aires, Argentina e-mail: juli.leone@gmail.com*

Theories of expertise based on the acquisition of chunk and templates suggest a differential geometric organization of perception between experts and novices. It is implied that expert representation is less anchored by spatial (Euclidean) proximity and may instead be dictated by the intrinsic relation in the structure and grammar of the specific domain of expertise. Here we set out to examine this hypothesis. We used the domain of chess which has been widely used as a tool to study human expertise. We reasoned that the movement of an opponent piece to a specific square constitutes an external cue and the reaction of the player to this "perturbation" should reveal his internal representation of proximity. We hypothesized that novice players will tend to respond by moving a piece in closer squares than experts. Similarly, but now in terms of object representations, we hypothesized weak players will more likely focus on a specific piece and hence produce sequence of actions repeating movements of the same piece. We capitalized on a large corpus of data obtained from internet chess servers. Results showed that, relative to experts, weaker players tend to (1) produce consecutive moves in proximal board locations, (2) move more often the same piece and (3) reduce the number of remaining pieces more rapidly, most likely to decrease cognitive load and mental effort. These three principles might reflect the effect of expertise on human actions in complex setups.

#### **Keywords: chess expertise, object representation, chunks, spatial proximity, attentional control**

## **INTRODUCTION**

The focus of attention can be directed by exogenous (bottomup) and endogenous (top-down) cues (Pylyshyn, 2007; Richard et al., 2008). The region of visual space to which attention is directed changes according to specific goals and tasks (Gilbert and Sigman, 2007; Vinckier et al., 2007). A classic example is Yarbus gaze experiment (where subjects had to view a complex image several times, each with a different instruction), he demonstrated that the sequence of eye fixations changed drastically according to the question the observer was trying to respond about the image (Yarbus, 1967; Tatler et al., 2010). Top-down control of attention can act over a wide range of categories, including location but also objects, goals, features, context, time . . . (Duncan, 1984; Chun and Jiang, 1998; Maunsell and Treue, 2006; New et al., 2007). The ability to direct attention to specific objects and categories changes with experience (Gilbert and Sigman, 2007) and, similarly, the ability to ignore salient cues requires inhibition mechanisms which are trainable (Cepeda et al., 1998).

Chess has been one of the most widely studied models of expertise (de Groot, 1978; Schultetus and Charness, 1999; Reingold et al., 2001a,b; Campitelli and Gobet, 2008; Connors et al., 2011; Bilalic et al., 2012). Chess experts recognize and recall chess positions accurately [chunk and templates theories (Chase and Simon, 1973; Gobet and Simon, 1996)], and develop heuristics that allow them to focus and explore only a few "good enough" moves (de Groot, 1978), substantially alleviating the search process.

As in other domains of perceptual expertise, chunk and templates acquisition theories reflect geometrical differences in the organization of perception between chess experts and novices: strong players recognize groups of pieces connected by functional relations as units (Gobet and Simon, 1996) and they also explore chess positions differently than novices [eye fixations are more centered in relationships between pieces (Reingold et al., 2001a)]. An implication of this theory is that expert representation is not anchored to the proximity between two pieces (Euclidean distance) and may instead be dictated by the intrinsic relation in the structure and grammar of the board. For instance, a bishop in one corner of the board which works in concert with a knight on the other side of the board to jointly attack an opponent square may be "functionally proximal" pieces in the mind of an expert, but "functionally distal" in the mind of a novice who does not recognize this relation. The same argument is true for other domains of expertise, for instance an expert soccer goalkeeper may bind together (spatially) distant properties of the field (where is the ball, where are the defenders and the attackers) which jointly may build an important cluster of features. Here we set out to examine explicitly this conjecture. We focus on chess which has three principal advantages to solve our goal: (1) The degree of proficiency can be quantified precisely with international systems of ratings (Elo, 1978), (2) The spatial layout in chess is well delimited by a square discrete grid of 64 locations, and (3) Capitalizing on chess internet servers we can base our conclusions on massive sets of data.

We reasoned that the movement of an opponent piece to a specific square constitutes an external attentional cue. The reaction of the player to this "perturbation" should reveal his internal representation of proximity. Specifically, we hypothesized that a naive player will tend to respond in proximal squares. Instead, we hypothesized that expert player responses are less likely to be governed by the spatial position of the opponent last move. Additionally, when this argument is expressed in terms of object oriented attention, we hypothesized that a novice player will more likely direct attention (concentrate) to a specific piece and hence produce repeated sequence of moves with the same piece. Instead, expert representation is directed to a more sophisticated pattern of pieces (chains of pawns, coordinated set of pieces working in concert. . . ) and hence the sequence of moves should show fewer repetitions. If weak players focus more on individual pieces, we hypothesized they should tend to reduce the number of objects to be attended to avoid cognitive load. Finally, we examined some core aspects of chess strategy hypothesizing the experts would play more in-line with them than novices.

#### **MATERIALS AND METHODS**

#### **DATA ACQUISITION**

All games were downloaded from FICS (*Free Internet Chess Server*, http://www*.*freechess*.*org/), a free ICS-compatible server for playing chess games through Internet, with more than 300,000 registered users. This constitutes a quite unique experimental setup providing virtually infinite data (thousands of millions of moves).

Each registered user may be human or a computer player, and has associated a rating (Glicko rating, http://www*.*glicko*.*net/ glicko*.*html) that indicates the chess skills strength of the player, represented by a number typically between 1000 and 3000 points. We defined two expertise levels: (a) strong players with rating higher than 1900 points and (b) weak players with rating between 1000 and 1400 points.

A regular game of chess contains about 40 moves from each player. A ply (plural plies) refers to one turn taken by a player. Hence, a chess game of 40 moves corresponds to 80 plies. We use the term "*next move*" to refer to two consecutive actions by the same player (white move 1: e4, white move 2: Nf3) and the term "next ply" to refer to consecutive actions by each player (ply 1: e4—white movement; ply 2: e5—black movement).

For each expertise group, we selected games from the FICS database (played from 2005 to 2013) with at least 80 plies, played between human players (of the same expertise group), with a total time budget of 180 s for each player and no increment. In order to make further comparisons, we generated 35 sets for each expertise group (each with 5000 games). These sets were built by date (i.e., set 1 contained mostly games from 2005 and set 35, from 2013). For each analysis we compared 35 values (one for each set) from the high rated expertise group with the 35 values from the low rated group.

In order to extend the results to longer time budgets, we replicated the analysis with games of 300 and 900 s per player. Because of the lower number of available games for these time budgets, we generated 35 datasets of 1500 games each (for each expertise group) for each of them. All other conditions were maintained.

#### **MOVEMENT DISTANCE MEASURE**

Piece location coordinates were organized in an 8∗8 matrix (representing the 64 squares of the chess board). For each movement, we calculated two different measures (D1 and D2) to examine the proximity between successive actions in a game of chess. To do this, we define the initial square of a movement as the location of the moving piece before the movement and the final square of a movement as the location of the moving piece after this action.

D1 corresponds to the difference between the initial square of a piece movement and the final square of the previous movement by the opponent. In other words, this corresponds to the difference in location between two consecutive plies of the game. This is the observable which can be more easily mapped to classic attentional cues experiments. We make the analogy that the location where the opponent drops a piece is an attentional cue and observe the player's onset of his response relative to this cue.

D2 corresponds to the difference between the final squares of two successive moves from the same player. This is measured as the difference between the end-locations of ply(*n* + 2) and ply(*n*).

For each measure we calculated the signed difference in squares in the x and y axis. Positive values of the y-axis indicate that the difference is shifted toward the upper side of the board. To do so we assumed the normal row conventions of chess (1 indicating white's first row and 8 black's first row). A positive difference in x-axis indicates a shift toward the left side and negative values a movement toward the right side of the board. Note that the differences in both axis take a range in the [−7, −6, . . . , 0, 1, . . . , 7] values. A value of zero indicates that there was not shift in that axis. For each measure (D1 and D2), we obtained a distance matrix for white moves and other for black moves, and we added them.

For each independent set (35 per expertise group) we calculated the 15∗15 matrices D1mat(*i,R*) and D2mat(*i,R*), where *i* ranges from 1 to 35 and *R* can be high or low rated group. Probabilities for each entry of the matrices were calculated as follow: (a)- for each movement (at least 40∗5000 –number of moves∗number of games-, white or black) we calculated the distance measure, (b) we counted the amount of movements which matched with each matrix entry, (c) we divided the number of movements on each entry by the total number of movements on the distance matrix. For each matrix, the entries indicate the probability to find successive plies (for D1) or moves (for D2) at the corresponding distance (see **Figure 1**) where indices 1 . . . 15, respectively, code distances [−7 . . . 7].

Our main experimental question is to investigate whether certain transitions (spatial differences between successive movements) differ between high and low rated players. To examine this we performed two-sample *t*-tests comparing high vs. low groups. We performed an independent *t*-test for each entry of the matrix (a total of 225 tests), each comparing the 35 values measured for the high and low expertise groups. From this analysis we generated a matrix oft-values which encodes the difference in probabilities for each entry between high and low rated players.

**FIGURE 1 | Expertise level defines spatial effects on successive movements. (A)** Distance measurements. Distances between two squares over an 8∗8 checkerboard can be measured subtracting the coordinates of one location (white circle) to the coordinates of other location (represented by the end of each arrow) on both x and y axis. The square where the white circle is located corresponds to *x* = 6 and *y* = 6, and four alternative locations are illustrated with color arrows. All possible distances between two squares of this 8 × 8 board are constrained to a 15∗15 square, where now the coordinates of the white circle are (0, 0) and distance measures range from −7 to 7. The x axis shows the direction movement on number of columns (*x <* 0, left direction; *x >* 0, right direction) and the y axis represents movements on number of the rows (*y <* 0, down direction; *y >* 0, up direction). For example, the distance between the end of the yellow arrow (which is at *x* = 5 and *y* = 7 on the 8∗8 board) and the white circle is calculated as the difference between the corresponding squares coordinates (*x* = −1 and *y* = 1). **(B)** Probability distributions of movement distances. We use two observables to assess locality effects on chess playing: D1 and D2 (see Methods). Probabilities to make a movement close to

the previous one is higher at short distances, for both High and Low expertise levels. **(C)** Weak players made their movements closer to the previous one. We contrasted probability distributions for both High and Low rated players on each entry of the 15∗15 distance square independently. *t*-value of each independent two-sample *t*-test (with *p*-value *<* 0.001, Bonferroni corrected for multiple comparisons) is color-coded. Positive (red) *t*-values indicate significantly higher probabilities for high rated players and negative (blue) values, for weaker players. **(D)** Radial or Euclidean distances. Distances were one-dimension collapsed and the difference between probabilities of making movements corresponding to a distance square [P*(*High*)* – P*(*Low*)*] was plotted vs. each radial distance. High and low rated groups distributions were independently compared in each radial distance [two-sample *t*-test on each variable (D1 and D2)]. Red asterisks (*t >* 5*.*3 for D1 and *t >* 5*.*6 for D2) indicates distances were P*(*High*)* is significantly higher than P*(*Low*)*; blue asterisks (*t <* −12*.*3 for D1 and *t <* −11 for D2), distances were P*(*Low*) > P(*High*)*; in both *p <* 0*.*001, Bonferroni corrected for multiple comparisons. Dotted lines indicate distance values where the significances changes from weak to strong players.

Positive *t*-values indicate that this transition is more likely for high than for low rated players. *p*-values were corrected with a strict Bonferroni criterion for multiple comparisons, considering a difference as significant only if *p <* 0*.*001/225. For visualization purposes, *t*-values in comparisons that did not reach significance were set to *t* = 0.

We then collapsed the matrices D1mat(*i,R*) to their radial distance by the conventional formula *r* = *(*<sup>x</sup><sup>2</sup> <sup>+</sup> y2*)*1*/*2. This resulted on independent vectors of 34 dimensions for each *i* and *R*. Note that the possible differences in distance over the board is less than 15∗15 since there is a lot of redundancy (for instance moving the king one square to the left, to the right, up or down, all correspond to a radial distance of 1). As before, we converted these distributions in a vector of *t*-values which encode the difference in probabilities for each index between high and low rated players. Positive *t*-values indicate that this transition is more likely for high than for low rated players. *p*-values were corrected with a strict Bonferroni criterion for multiple comparisons, considering a difference as significant only if *p <* 0*.*001/34.

## **PIECE REPETITION**

This analysis is shown only for white moves (analyzing black moves yielded identical results). We encoded for each white move whether the piece moved was the same (1) or different (0) than the piece moved in the previous turn. The probability of moving the same piece twice depends on the number of pieces remaining of the board. Thus, we calculated the repetition probability as a function of the pieces remaining in the board independently for the 35 sets of each expertise level. This yielded vectors PR(*i,R,np*) where *i* indicates the group (1–35), *R* the expertise level (high or low) and *np* the number of pieces remaining on the board from 3 to 16 (there were not sufficient positions with one or two pieces remaining on the board when considering the first 40 moves). To quantify the main effects of expertise level and number of pieces on the repetition probability we made a Two-Way ANOVA test with number of pieces and expertise level as independent factors and their interactions. Then, we compared the distributions of high and low rating independently for each number of pieces with independent two-sample *t*-tests comparing high and low rated players. *p*-values were corrected with a strict Bonferroni criterion for multiple comparisons, considering a difference as significant only if *p <* 0*.*001/16.

#### **NUMBER OF PIECES REMAINING ON THE BOARD**

This analysis is shown only for white moves (analyzing black moves yielded identical results). We calculated for each white move number (1–40) the number of white pieces remaining on the board (1–16). We then averaged this value for each move number (1–40) across all the games in each set. This yielded vectors NP(*i,R,n*) where *i* indicates the set (1–35), *R* the expertise level (high or low), and *n* the move number from 1 to 40. Note that this distribution as function of move number has to be monotonous decreasing. This would not be true in bughouse or crazy house variants of the games where a player can introduce a captured piece back to the board. We assessed the main effects of move number and expertise level with a Two-Way ANOVA test with move number and expertise level as independent factors and their interactions. Then, we compared the distributions of high and low rating independently for each move number with independent two-sample *t*-tests. *p*-values were corrected with a strict Bonferroni criterion for multiple comparisons, considering a difference as significant only if *p <* 0*.*001/40.

## **BOARD DISTRIBUTION**

This analysis is shown only for white moves (analyzing black moves yielded identical results). We calculated for white moves 5, 10, 20, 30, and 40 and for each piece the frequency of distribution along all squares of the board. Since the remaining amount of pieces between varies with expertise levels and the number of types of pieces is not the same (8 pawns, 2 bishops, one king,. . . ) we normalized the occurrences by the average number of the piece type at each move number on each set. For example, the probability to have a knight on b1 in the move number 1 (in a set of an expertise level) is 0.98 and for g1 is 0.93, but at that moment there were 2 knights over the board (on average in that set), then the values for b1 and g1 were 0.49 and 0.465, respectively.

This results (for each piece type, move number, set and expertise level) in an 8∗8 matrix which encodes the normalized average occupation of the corresponding piece along the board. Then, for each piece and move number, we compared the distributions along the board for high and low rating, with independent two-sample *t*-tests for each entry of the matrix. Positive *t*-values indicate that the occupation probability in a given entry is more likely for high than for low rated players. *p*-values were corrected with a strict Bonferroni criterion for multiple comparisons, considering a difference as significant only if *p <* 0*.*001/64. We reported results from knights, rooks and queen in the full matrix.

Data is represented as mean ± *SD* (*n* = 35) for **Figures 1C**, **2A,B**. Asterisks indicate significant differences at *p <* 0*.*001 (Bonferroni corrected). The same color code is maintained along the whole work: red indicates higher probabilities for strong players and blue for weaker players.

## **RESULTS**

## **HYPOTHESIS 1**

Low rated players make moves which are more proximal to their own last move and to the opponent precedent move.

D1 considers the difference between the initial location of a piece movement and the final location of the previous movement by the opponent. D2 corresponds to the difference between the final locations of two successive moves from the same player (see Materials and Methods and **Figure 1A** for a full description of how D1 and D2 are calculated).

For both expertise groups, we found a similar distribution of distances probabilities: all players tended to make their movements in squares close to the final location of their opponent last movement (D1) and to their own last movement (D2) (**Figure 1B**). However, and in accordance with our first hypothesis, lower rated players tend to make more movements in board locations which are proximal to the opponent's last movement (D1) and their own antecedent movement (D2). Instead, strong players showed higher densities of successive movements which

**FIGURE 2 | Object-based mechanisms depend on expertise level. (A)** Weak players repeat the same piece on consecutive movements more frequently than strong players. Probabilities to repeat the same piece on two consecutive moves depends on the remaining number of pieces on the board. We plotted this probability vs. the number of pieces for both expertise levels. Independent two-sample *t*-tests (high vs. low expertise probabilities) for each number of pieces left (ranging from 1 to 16) evidence a higher probability to move the same piece on consecutive movements for low rated players, almost independently of the number of remaining pieces (∗*p <* 0*.*001 Bonferroni corrected for multiple comparisons). Dashed black line shows the random threshold. **(B)** Low-rated players reduce the number of pieces more rapidly than high-rated players. The number of remaining pieces over the board, which change throughout the game (starting in 16 pieces for each player), is significantly higher for high rated players along the whole game (moves 3–40, independent two-sample *t*-tests on each move number, ∗*p <* 0*.*001 Bonferroni corrected for multiple comparisons), evidencing than low rated players exchange (or loss) their pieces more rapidly than stronger players.

were more scattered in space (**Figure 1C**) <sup>1</sup> . To quantify this observation we aggregated the distribution of spatial differences to a single scalar estimating the radial distance between the two successive movements (**Figure 1D**).

Results of D1 indicated that movements starting within a radius of 3.5 squares from the position where the opponent moved his piece were more likely in low rated players (*p <* 0*.*001 Bonferroni corrected) while movements outside of this radius where more likely for high rated players (*p <* 0*.*001 Bonferroni corrected). Similarly, results of D2 indicated that low rated players were more likely (*p <* 0*.*001 Bonferroni corrected) to make two consecutive moves with end-locations within a radius of 2.5 squares, while higher rated players made more likely moves beyond this radius.

Both of these results are extremely reliable through distances (**Figure 1D**). This indicates that while the specific geometry of the board may be subtle and specific to chess (the patterns of **Figure 1C** are complex) they organize on a synthetic rule by which low rated players tend to overplay more proximal and high rated players more distal moves.

Previous results were obtained using games with a short time budget (180 s). To avoid any confound related with the use of very short games, we repeated the exact analysis for games with longer time budgets (300 and 900 s per player) and we found the same results (**Figures S1**, **S2**).

#### **HYPOTHESIS 2**

Low rated players are more likely to move the same piece in consecutive turns.

The previous results suggest that low rated players have a narrower (or more focal) spatial window of attention. Attention can also be directed to objects (Richard et al., 2008). We examine the hypothesis that throughout a game, low rated players are more focused in a specific piece than high rated players, who may drive attention to schemas assembling sets of pieces (pawn chains, several pieces converging in a square or a plan . . . ). To this aim, we simply measured the probability of repeating a piece in two consecutive moves. A repetition is counted only when the exact same piece (not the same type of piece) is moved twice. If a player moves a pawn and in the next turn moves another pawn, this is considered as a different piece movement.

The probability of moving the same piece twice depends on the number of pieces remaining of the board. Thus, we calculated the repetition probability as a function of the pieces remaining in the board (**Figure 2A**). The repetition probability decreased with the number of pieces for both groups but remained above chance levels. This is expected since: (1) some pieces actually cannot move and (2) players rarely consider all pieces in the board as candidates to move. As we had hypothesized, lower rated players produced more repetitions reflecting that attention (or their strategies or conception of plans) is more likely to be constrained to a single piece. To quantify this observation we first submitted the data to a Two-Way ANOVA test with number of pieces and expertise level as independent factors and their interactions. Results showed a main effect of both factors (Expertise, *p <* 0*.*0001, *F* = 91*.*8, *df* = 1; Number of Pieces: *p <* 0*.*0001, *F* = 181.6, *df* = 15 and Interaction, *p <* 0*.*0001, *F* = 7.5, *df* = 15). We followed this test with independent two sample *t*-tests (corrected with a strict Bonferroni criterion for multiple comparisons) for each number of pieces left, comparing the distributions for high and low rated players. Each value of the distribution is obtained from one

<sup>1</sup>Note that D2 has a non-zero probability in the origin which corresponds to two consecutive moves by one player which have the same end-location. This can only happen when a player captures twice in the same location. Instead D1 has a strict zero probability in the origin since a player cannot make a move starting in the square where the opponent has moved a piece.

of the 35 different sets of each expertise level. All comparisons consistently showed greater repetition probability for lower than higher rated player. This effect was significant when 6, 8–14, and 16 pieces remaining on the board [*t(*34*) <* −5*.*2, *p <* 0*.*001].

As for the first hypothesis, we repeated the piece repetition analysis using games with longer time budgets (**Figures S3A,B**) and we replicated the results found for 180 s games.

## **HYPOTHESIS 3**

Since weak players focus more on individual pieces it is expected that it is effortful for them to work on boards with many pieces. We expect that weaker players will tend to simplify the position to avoid mental effort (Koechlin and Summerfield, 2007).

We compared the amount of pieces as a function of move number for both expertise levels. As expected the number of pieces started in 16 (initial configuration) and smoothly decreased to an average of about 7 pieces by move 40. Beyond this main trend we observed that after the first five moves, the distributions of remaining pieces for high and low rated players bifurcate. This analysis shows that, as we had hypothesized, weaker players exchange pieces more rapidly than stronger players. To quantify this observation we first made a Two-Way ANOVA test with number of pieces and expertise level as independent factors and their interactions. Results showed a main effect of both factors (Expertise, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>4</sup>*.*5∗104, *df* <sup>=</sup> 1; Move number: *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>2</sup>*.*2∗105, *df* <sup>=</sup> 39 and Interaction, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *F* = 296, *df* = 39). We followed this test with independent twosample *t*-tests (corrected with a strict Bonferroni criterion for multiple comparisons) for each move number, comparing the distributions for high and low rated players. All comparisons consistently showed significant greater number of pieces along the whole game (move numbers 3–40) for high rated players [*t(*34*) >* 15 for move numbers 4–6, *t(*34*) >* 27 for move numbers 3 and 7–40, *p <* 0*.*001]. Once again, we repeated the previous analysis for 300 and 900 s games (**Figures S3C,D**) replicating the results found for 180 s time budget.

The results described above show that weaker players tend to produce consecutive moves in proximal board locations, more often moving the same piece and exchanging pieces more rapidly to reduce the number of remaining pieces. These three principles reflect consistent general findings which might reflect the effect of expertise on human actions in complex setups.

Beyond these three principles there are idiosyncratic aspects of the game of chess which relate to piece value, with the way the move on the board and where they are more effective, which dictate a specific strategy in the board. Expert play is expected to be more in-line with certain strategic themes. Here we examined three core aspects of chess strategy: (1) Knights are more effective when they are centralized, (2) Rooks play first along the 1st row to find an optimal centralized column where they are effective, (3) The queen should not risk going for long travels early in the game.

At different stages of the game (moves 5, 10, 20, 30, and 40) we calculated the average density of white pieces (N: knights, R: Rooks, Q: Queen) along the board. For each square we performed a two-sample *t*-test comparing the distribution of densities for the high and low rated players and corrected for multiple comparisons with a strict Bonferroni criterion. Results showed that, as expected, knights were significantly more centralized (over the whole board) for higher rated players and were more likely to be found in their initial square (b1 or g1) or advanced in the enemy camp for weaker players (**Figure 3A**). Rook movement revealed that by move 5 higher rated players are more likely to have castled and by move 10 a higher probability of centralizing the rooks on the 1st row (**Figure 3B**). Rooks position in weaker players instead was much more likely to be in the initial squares (a1 and h1). Also, as with knights, weaker players are more likely to advance the rook in the enemy camp. Another consistent finding is that stronger players place their rooks in the queen-side (left side of the board for white). Instead weaker players more rapidly attack on the king-side: for instance by move 30, white rooks are more likely to be found in squares close to the opponent-king location. This reflects a more positional play and a less direct tendency of directly going to mate the enemy king for high rated players. Finally, as expected, stronger players tend to postpone queen development (by move 5 and 10 the queen is more likely to be in the initial square d1) and throughout the game develop the queen on the first rows (**Figure 3C**). We emphasize that these results do not convey information about the absolute distribution of occupation of a piece. Instead they reflect differential distributions, i.e., indicating whether a given pieces is more likely to be occupied by stronger or weaker players. To avoid confusion **Figure S4** shows the average degree of occupancy of these pieces throughout the game for each expertise level. As for previous hypotheses, we replicated and found similar results for games with longer time budgets (see **Figure S5**).

## **DISCUSSION**

Here we showed differences in general and domain-specific patterns of actions depending on expertise level. We found that weaker players play more locally, tend to focus sequences of actions in the same piece and more rapidly exchange pieces to reduce the total number of pieces on the board. Our working hypothesis is that these observations reflect a different focus of attention to space and objects with expertise.

Our first working hypothesis was that the end location of a movement (the place where a piece is located) functions like a spatial cue in the board space. Our conjecture is that the tendency to continue playing close to that cue is a reflection of the persistence of attention to this location.

Attention involves a sequence of operations: 1- attentional shifting to the target location, 2-attentional engagement, 3 attentional disengagement (Posner and Petersen, 1990; Fox et al., 2002; Koster et al., 2006). The persistence of play in a given location has three likely and related explanations: a first possibility is that novices have difficulties disengaging attention from the cued place in the last movement (Sheridan and Reingold, 2013). Another possibility is that expert players have more effective preattentive mechanisms to encode saliency mechanisms in peripheral locations of the board (Sigman and Gilbert, 2000; Intriligator and Cavanagh, 2001; Pylyshyn, 2007; Vinckier et al., 2007). Chess masters have an advantage for the recognition of chess pieces (Saariluoma, 1995; Kiesel et al., 2009; Bilalic et al., 2010) and chess themes (Reingold et al., 2001a,b), which may

(normalized by the number of remaining pieces of this type) in each square (8∗8 checkerboard) were compared for high and low expertise groups at different game stages (moves 5, 10, 20, 30, and 40). The *t*-value resulting of the two-sample *t*-test (high vs. low expertise group), in each square is color coded for those significantly different comparisons with *p <* 0*.*001 Bonferroni corrected for multiple comparisons. Red positive values indicate significantly higher probabilities for strong players and blue negative values, significantly

knights and rooks, delaying the queen development, compared with lower rated players delaying more the development of knights and rooks (but not the development of the queen, which is almost always not good) and/or occupying more advanced squares with all. It should be noted that the this figure represents the differential occupancy of each square, showing that strong or weak players locate each type of piece comparatively more frequently than the other group.

serve as salient detector providing new cues which compete with the previous spatial cue. A third possibility is that expert players can attend to themes or schemas (strings of pieces) and the focus of attention is spread over the whole attended object (Houtkamp et al., 2003; Alvarez and Scholl, 2005; Richard et al., 2008). In line with this idea, the size and form of the selection window has been proposed to be controlled by top-down mechanisms and dependent on the task difficulty (Belopolsky and Theeuwes, 2010).

Our results based on the spontaneous distributions of actions during a game are consistent with a recent report by Sheridan and Reingold analyzing the distribution of attention in the Einstellung Effect (Sheridan and Reingold, 2013). The authors present a problem in which there is a move which almost indefectibly attracts attention (for instance a region on the board where there seems to be mate, with the king exposed and many pieces attacking it). In this construction, the best move is away from this specific location of the board. Weaker players very often do not consider this optimal (but distant from a very salient location) move and their gaze remains in the Einstellung region of the board. Instead, stronger players can disengage from this location which allows them to find the distant and optimal move (Sheridan and Reingold, 2013).

Weaker players are less likely to organize the representation of the board in large chunks (Chase and Simon, 1973; Gobet and Simon, 1996) and thus, comparing variations when many pieces remain on the board requires more attentional shifts and executive function. Koechlin and colleagues have coined the idea of a "lazy" executive system which is triggered only when strictly needed (Koechlin and Summerfield, 2007). Combining these premises, we reasoned that all players will seek to minimize effort. To achieve this, weaker players will prefer positions with less number of pieces on the board. As expected by this prediction we show that weak players tend to "simplify" the problem by more rapidly removing pieces of the board consistently.

We initially chose using 180 s games to test our hypotheses based on: 1- evidences showing that rapid processes (related with pattern recognition) rather than slow processes (fundamentally, search mechanisms) are responsible of chess expertise (Burns, 2004; Sigman et al., 2010), 2- the time used for each movement is close-related with the common psychological experiments and 3- the FICS database for 180 s games is larger than those for longer time budgets. However, we also replicated all our results for longer time budgets (300 and 900 s) indicating the robustness of the conclusions and absence of 180 s time budget-related artifacts.

The effect size of all our results is small but consistent with our working hypotheses and each result was replicated in three different time budgets, favoring the consistency and reliability of our results.

Our study focused on chess but there is no reason to think that the main of the conclusions derived here (with the exception of specific strategic patterns shown in **Figure 3**) are specific to chess and hence are likely be generalized to the effect of expertise on other domains of human action and decision making (i.e., novice car drivers focusing his/her attention only in the front road but not in the car mirrors or novice sport players deciding their actions based only in the ball location because they are not able to simultaneously attend to their partners and opponents locations).

## **ACKNOWLEDGMENTS**

This research was supported by the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). Mariano Sigman is sponsored by the James McDonnell Foundation.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.00047/abstract

**Figure S1 | Spatial effects are also observed in longer games: 300 s time budget. (A)** Probability distributions of movement distances. As for 180 s games (**Figure 1B**), probabilities to make a movement close to the previous one are higher at short distances, for both expertise levels. **(B)** Weak players made their movements closer to the previous one. The same analysis explained for 180 s games was made for 300 s games, obtaining similar results (see **Figure 1C**). Probability distributions for both High and Low rated players were contrasted on each entry of the 15∗15 distance square independently. *t*-value of each independent two-sample *t*-test (with *p*-value *<* 0.001, Bonferroni corrected for multiple comparisons) is color-coded. Positive (red) *t*-values indicate significantly higher probabilities for high rated players and negative (blue) values, for weaker players. **(C)** Radial or Euclidean distances. Again, as in **Figure 1D** for 180 s games, distances were one-dimension collapsed and the difference between probabilities of making movements corresponding to a distance square [P*(*High*)* – P*(*Low*)*] was plotted vs. each radial distance. High and low rated groups distributions were independently compared in each radial distance [two-sample *t*-test on each variable (D1 and D2)]. Red asterisks (*t >* 5*.*2 for D1 and *t >* 4*.*9 for D2) indicates distances were P*(High)* is significantly higher than P*(Low)*; blue asterisks (*t <* −7*.*3 for D1 and *t <* −7*.*8 for D2), distances were P*(*Low*) >* P*(*High*)*; in both *p <* 0*.*001, Bonferroni corrected for multiple comparisons.

**Figure S2 | Spatial effects are also observed in longer games: 900 s time budget. (A)** Probability distributions of movement distances. As for 180 s (**Figure 1B**) and 300 s (**Figure S1A**) games, probabilities to make a movement close to the previous one are higher at short distances (for both expertise levels). **(B)** Weak players made their movements closer to the previous one. The same analysis explained for 180 s and 300 s games was made for 900 s games, obtaining similar results (see **Figure 1C**). Probability distributions for both high and low rated players were contrasted on each entry of the 15∗15 distance square independently. *t*-value of each independent two-sample *t*-test (with *p*-value *<* 0.001, Bonferroni corrected for multiple comparisons) is color-coded. Positive (red) *t*-values indicate significantly higher probabilities for high rated players and negative (blue) values, for weaker players. **(C)** Radial or Euclidean distances. Again, as for 180 s games (**Figure 1D**) and 300 s games (**Figure S1C**), distances were one-dimension collapsed and the difference between probabilities of making movements corresponding to a distance square [P*(*High*)* – P*(*Low*)*] was plotted vs. each radial distance. High and low rated groups distributions were independently compared in each radial distance [two-sample *t*-test on each variable (D1 and D2)]. Red asterisks (*t >* 6*.*9 for D1 and *t >* 4*.*9 for D2) indicates distances were P*(*High*)* is significantly higher than P*(*Low*)*; blue asterisks (*t <* −5*.*3 for D1 and *t <* −5 for D2), distances were P*(*Low*) >* P*(*High*)*; in both *p <* 0*.*001, Bonferroni corrected for multiple comparisons.

**Figure S3 | For longer games (300 and 900 s time budget), object-based mechanisms also depend on expertise level. (A,B)** Weak players repeat the same piece on consecutive movements more frequently than strong players. As for 180 s games (**Figure 2A**), we plotted the probability to repeat the same piece on successive movements vs. the number of pieces for both expertise levels for 300 and 900 s games. To quantify these observations we first submitted the data to a Two-Way ANOVA test with number of pieces and expertise level as independent factors and their interactions. Results for 300 s games showed a main effect of both factors (Expertise, *p <* 0*.*0001, *F* = 41*.*9, *df* = 1; Number of Pieces: *p <* 0*.*0001, *F* = 61.6, *df* = 15 and Interaction, *p <* 0*.*0001, *F* = 10*.*1, *df* = 15). Results for 900 s games showed a main effect of both factors (Expertise, *p <* 0*.*0001, *F* = 197.8, *df* = 1; Number of Pieces: *p <* 0*.*0001, *F* = 89*.*7, *df* = 15 and Interaction, *p <* 0*.*0001, *F* = 31*.*4, *df* = 15). We followed these tests with independent two sample *t*-tests (corrected with a strict Bonferroni criterion for multiple comparisons) for each number of pieces left, comparing the distributions for high and low rated players. Each value of the distribution is obtained from one of the 35 different sets of each expertise level. All comparisons consistently showed greater repetition probability for lower than higher rated player. For 300 s games, this effect was significant for 7–14 pieces remaining on the board [*t(*34*) <* −4*.*9, *p <* 0*.*001]. For 900 s games, this effect was significant for 8–16 pieces remaining on the board [*t(*34*) <* −6*.*4, *p <* 0*.*001]. Dashed black line shows the random threshold. **(C,D)** Low-rated players reduce the number of pieces more rapidly than high-rated players. As it was previously showed for 180 s games, the number of remaining pieces over the board is significantly higher for high rated players almost for the whole game for both 300 and 900 s games. First, we made a two-way ANOVA test with number of pieces and expertise level as independent factors and their interactions showed a main effect of both factors. ANOVA results for 300 s games: Expertise, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>1</sup>*.*3∗104, *df* <sup>=</sup> 1; Move number: *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>3</sup>*.*2∗104, *df* <sup>=</sup> 39 and Interaction, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> 175, *df* = 39. ANOVA results for 900 s games: Expertise, *p <* 0*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>0</sup>*.*98∗104, *df* <sup>=</sup> 1; Move number: *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001, *<sup>F</sup>* <sup>=</sup> <sup>2</sup>*.*2∗104, *df* <sup>=</sup> <sup>39</sup> and Interaction, *p <* 0*.*0001, *F* = 79, *df* = 39. We followed these tests with independent two-sample *t*-tests (corrected with a strict Bonferroni criterion for multiple comparisons) for each move number, comparing the distributions for high and low rated players. All comparisons consistently showed significant greater number of pieces almost along the whole game [in 300 s games, move numbers 3, and 8–40, *t(*34*) >* 5, *p <* 0*.*001; in 900 s games, move numbers 3 and 7–40, *t(*34*) >* 6, *p <* 0*.*001] for high rated players, evidencing than low rated players exchange (or loss) their pieces more rapidly than stronger players in 300 and 900 s games.

**Figure S4 | Occupation distribution of pieces throughout the game for both expertise levels.** Probabilities to find a **(A)** Knight, a **(B)** Rook, or the **(C)** Queen at each square of the chess board is represented for each rating group at different game stages (move numbers 5, 10, 20, 30, and 40). It should be noted that both groups occupy almost the same squares along the board (there are not "exclusive" squares), but some places are comparatively more occupied by weak or strong players (see **Figure 3**).

**Figure S5 | Piece distribution over the board also reveals domain-specific and expertise-dependent strategies in longer games (300 and 900 s time budget).** Probabilities to find a type of piece (normalized by the number of remaining pieces of this type) in each square (8∗8 checkerboard) were compared for high and low expertise groups at different game stages (moves 5, 10, 20, 30, and 40). The *t*-value resulting of the two-sample *t*-test (high vs. low expertise group), in each square is color coded for those significantly different comparisons with *p <* 0*.*001 Bonferroni corrected for multiple comparisons. Red positive values indicate

significantly higher probabilities for strong players and blue negative values, significantly higher probabilities for weaker players. **(A,D)** Knights, **(B,E)** Rooks, and **(C,F)** Queen occupancy comparisons reveal that strong players centralize more their knights and rooks, delaying the queen development, compared with lower rated players delaying more the development of knights and rooks (but not the development of the queen, which is almost always not good) and/or occupying more advanced squares with all. It should be noted that the this figure represents the differential occupancy of each square, showing that strong or weak players locate each type of piece comparatively more frequently than the other group. **(A–C)** correspond to 300 s games. **(D–F)** correspond to 900 s games.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 December 2013; accepted: 15 January 2014; published online: 04 February 2014.*

*Citation: Leone MJ, Fernandez Slezak D, Cecchi GA and Sigman M (2014) The geometry of expertise. Front. Psychol. 5:47. doi: 10.3389/fpsyg.2014.00047*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Leone, Fernandez Slezak, Cecchi and Sigman. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expertise and the representation of space

## *Michael H. Connors 1,2\* and Guillermo Campitelli <sup>3</sup>*

*<sup>1</sup> ARC Centre of Excellence in Cognition and its Disorders, Department of Cognitive Science, Macquarie University, Sydney, NSW, Australia*

*<sup>2</sup> Dementia Collaborative Research Centre, School of Psychiatry, University of New South Wales, Sydney, NSW, Australia*

*<sup>3</sup> School of Psychology and Social Science, Edith Cowan University, Perth, WA, Australia*

*\*Correspondence: michael.connors@mq.edu.au*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Maria Juliana Leone, University of Buenos Aires, Argentina*

**Keywords: big data, chess, chess expertise, chunks, expertise, decision making, methodology**

### **A commentary on**

## **The geometry of expertise**

*by Leone, M. J., Slezak, D. F., Cecchi, G. A., and Sigman, M. (2014). Front. Psychol. 5:47. doi: 10.3389/fpsyg.2014.00047*

Across many domains, experts make decisions based on the spatial relationships of objects within an environment. Firefighters, for example, need to evaluate the fire in front of them, radiologists the medical scan, and chess players the position on the chess board before making a decision. In order to be effective, experts need to assess these spatial relationships quickly and despite possible uncertainty. This ability is facilitated by cognitive processes. According to the socalled chunking and template theories, this expertise is made possible by pattern recognition (Gobet and Simon, 1996; Gobet, 1997). Based on their experience and practice, experts store in their longterm memory meaningful configurations of elements (i.e., patterns—chunks or templates) in their domain of expertise. These patterns are associated with typical decisions and strategies. Thus, when an expert faces a situation in their domain of expertise, they are able to rapidly recognize patterns, which, in turn, prompt them to consider decisions or strategies that have been effective in previous situations. This allows experts to make superior decisions to novices who are limited to a real-time search through a large number of possible options (Feltovich et al., 2006; Gobet and Charness, 2006; Connors et al., 2011).

Leone et al.'s (2014) elegant study adds to existing evidence for this account. In particular, Leone et al. demonstrate that experts' representation of space in their domain of expertise is qualitatively different to that of novices. Leone et al. focused on chess. Chess has the advantage of being a constrained task environment with relatively high ecological validity to other domains. Chess also has the advantage of a precise measure of expertise in the form of a numerical rating that is assigned to each player on the basis of their performance against their opponents. For these reasons, a large amount of expertise research has examined chess (Gobet and Charness, 2006). For Leone et al.'s study, chess has the additional advantage of having large amounts of data available: Leone et al. examined well over 175,000 games drawn from an internet chess server.

Given experts' pattern recognition ability, Leone et al. predicted that experts should make moves that were best for the position at hand and so not be constrained by the mere physical distance between objects. In contrast, novices would be more likely to make moves based on spatial proximity to previous moves due to their relative lack of pattern recognition and their need to conserve cognitive resources. Leone et al. made four specific hypotheses. First, the authors predicted that novices should make moves that are closer to their or their opponent's previous move. To test this, they developed a novel methodology in which they calculated the distance between a player's move and their previous move, and the distance between a player's move and their opponent's previous move. Second, the authors predicted that novice players would be more likely to move the same piece multiple times. To test this, they examined the frequency with which players moved the same piece more than once on consecutive turns. Third, the authors predicted that novices would be more likely to simplify the position by exchanging pieces (capturing an opponent's piece when the opponent can recapture their own piece), thereby reducing the cognitive load of dealing with large numbers of pieces. To test this, they examined the rate at which pieces were removed from the board according to the number of moves in the game. Finally, the authors predicted that novices would be less likely to keep with general strategic principles (e.g., knights are more effective when centralized) than experts. To test this, they examined the frequency with which pieces were placed in suitable parts of the board (e.g., knights in central regions).

Leone et al. found evidence for all four hypotheses. First, novices were more likely to move pieces that were closer to their or their opponent's previous move than experts. Second, novices were more likely to move the same piece on consecutive turns than experts. Third, novices were more likely to simplify positions than experts. Finally, novices were less likely to keep to general strategic principles than experts. Although the relative size of the effects were small, Leone et al. demonstrated the robustness of the effects across very large datasets and across games of different durations. Similar differences between experts and novices were evident regardless of whether players had a total of 3, 5, or 15 min each.

Although the authors did not directly assess the quality of the players' decisions, their work provides evidence that experts base their decision on what is most meaningful in a position, rather than being limited by spatial proximity. As Leone et al. note, these findings are consistent with chunking and template models of expertise as the differences between experts and novices exist at very short time limits, when searching through various options is not possible. The findings are also likely to generalize to a wide range of other domains of expertise. In addition to revealing these effects, the work is very important because it develops novel methodologies for assessing spatial relationships in chess positions and across large datasets. Indeed, Leone et al.'s use of big data to test specific hypotheses about cognition is particularly innovative. In future, these methods might provide further insight into how experts interpret and represent space.

## **REFERENCES**


several boards. *Cogn. Psychol.* 31, 1–40. doi: 10.1006/cogp.1996.0011

Leone, M. J., Slezak, D. F., Cecchi, G. A., and Sigman, M. (2014). The geometry of expertise. *Front. Psychol.* 5:47. doi: 10.3389/fpsyg.2014. 00047

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 February 2014; accepted: 12 March 2014; published online: 03 April 2014.*

*Citation: Connors MH and Campitelli G (2014) Expertise and the representation of space. Front. Psychol. 5:270. doi: 10.3389/fpsyg.2014.00270*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Connors and Campitelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Playing off the curve - testing quantitative predictions of skill acquisition theories in development of chess performance

#### *Robert Gaschler <sup>1</sup> \*, Johanna Progscha2, Kieran Smallbone3, Nilam Ram4 and Merim Bilalic´ <sup>5</sup>*

*<sup>1</sup> Universität Koblenz-Landau, Landau, Germany and Interdisciplinary Research Laboratory Image, Knowledge, Gestaltung at Humboldt-Universität, Berlin, Germany <sup>2</sup> Department of Psychology, Humboldt-Universität, Berlin, Berlin, Germany*

*<sup>3</sup> School of Computer Science, University of Manchester, Manchester, UK*

*<sup>4</sup> College of Health and Human Development, Human Development and Family Studies, Pennsylvania State University, University Park, PA, USA*

*<sup>5</sup> Alpen-Adria-Universität Klagenfurt, Institut für Psychologie, Abteilung für Allgemeine Psychologie und Kognitionsforschung, Klagenfurt, Austria*

#### *Edited by:*

*Michael H. Connors, Macquarie University, Australia*

#### *Reviewed by:*

*Michael H. Connors, Macquarie University, Australia Eyal M. Reingold, University of Toronto, Canada*

#### *\*Correspondence:*

*Robert Gaschler, Department of Psychology, Universität Koblenz-Landau, Fortstraße 7, D-76829 Landau, Germany e-mail: gaschler@uni-landau.de*

Learning curves have been proposed as an adequate description of learning processes, no matter whether the processes manifest within minutes or across years. Different mechanisms underlying skill acquisition can lead to differences in the shape of learning curves. In the current study, we analyze the tournament performance data of 1383 chess players who begin competing at young age and play tournaments for at least 10 years. We analyze the performance development with the goal to test the adequacy of learning curves, and the skill acquisition theories they are based on, for describing and predicting expertise acquisition. On the one hand, we show that the skill acquisition theories implying a negative exponential learning curve do a better job in both describing early performance gains and predicting later trajectories of chess performance than those theories implying a power function learning curve. On the other hand, the learning curves of a large proportion of players show systematic qualitative deviations from the predictions of either type of skill acquisition theory. While skill acquisition theories predict larger performance gains in early years and smaller gains in later years, a substantial number of players begin to show substantial improvements with a delay of several years (and no improvement in the first years), deviations not fully accounted for by quantity of practice. The current work adds to the debate on how learning processes on a small time scale combine to large-scale changes.

**Keywords: learning curves, skill acquisition, expertise, chess, development**

## **INTRODUCTION**

Anderson (2002) drew attention to the problem of time scales in psychology with the programmatic article *Spanning Seven Orders of Magnitude*. On the one hand, acquisition of expertise is known to takes years (e.g., Ericsson et al., 1993). On the other hand, expertise research has a strong basis in cognitive psychology paradigms wherein a large repertoire of laboratory tasks are used to understand and chart changes in potential subcomponents of expertise acquisition over minutes or hours. This includes component skills such as verifying and storing chess patterns (Gobet and Simon, 1996a,b,c, 1998; Campitelli et al., 2005, 2007; Bilalic´ et al., 2009a), learning to discard irrelevant perceptual features from processing (e.g., Gaschler and Frensch, 2007; Reingold and Sheridan, 2011) or overcoming dysfunctional bindings of knowledge structures (e.g., Bilalic et al., 2008a,b ´ ). Anderson suggested that while meaningful educational outcomes take at least tens of hours to achieve, those outcomes can be traced back to operations of attention and learning episodes at the millisecond level. He went beyond offering the perspective that expertise acquisition should *in principle* be reducible to small scale learning episodes. Rather, Anderson suggested that the problem of linking domains of (a) laboratory cognitive psychology/neurocognitive research and (b) educational/developmental science should be tractable, because small scale learning episodes would sum up to large scale developmental/educational changes of the same functional form. Increases in overall performance as well as increases in efficiency of components (e.g., keystrokes, eye movements and fact retrieval) over time are well described by the power function (see also Lee and Anderson, 2001). Power functions of improvements in simple components add up to a power-function improvement at the large scale. Scalability across time-scales would offer straightforward linking of change taking place within minutes to change taking place over years.

The power function (as well as the negative exponential function, see **Table 1** and **Figure 1**) describes negatively accelerated change of performance with practice. Early in practice, the absolute improvement in performance per unit of time invested is large. Later on, the improvement per unit of time diminishes. Apart from improvements in hour-long laboratory learning tasks, the power function has been used to describe motor skills in individuals differing in amount of practice on the scale of years (e.g., up to 7 years of cigar-rolling in Crossman, 1959, see Newell et al., 2001 for an overview). Description of practice gains with the power function are widespread in the literature (Newell and Rosenbloom, 1981; Kramer et al., 1991; Lee and Anderson, 2001; Anderson, 2002) and consistent with prominent models of skilled performance such as ACT-R (Anderson, 1982) or the instance model of automatization (Logan, 1988, 1992).

However, the reason for the dominance of the power function in describing the functional form of describing practice has been debated in the literature on skill acquisition. Heathcote et al. (2000)see also Haider and Frensch (2002) argued that the analysis of averaged data favors the power over the exponential function as a statistical artifact. They suggested computation of power and exponential curves with non-aggregated data, separately for each participant. They found an advantage of the negative exponential function over the power function in 33 of 40 different re-analyzed data sets with an average improvement in fit of 17%. Note that success of a mathematical model in fitting data better than a competitor model might not mean that it provides a more concise description. Potentially, one mathematical model is more flexible than the other, and better able to accommodate systematic as well



as chance features in the data. Thus, further credence is lent to a model by accurate prediction rather than fitting (i.e., without any further parameter adjustments; cf. Roberts and Pashler, 2000; Pitt et al., 2002; Wagenmakers, 2003; Marewski and Olsson, 2009).

It is worthwhile considering the exact shape of the learning curve to predict future performance. Furthermore, the differences between exponential and power function are linked to assumptions in theories of skill acquisition (see below). **Figure 1** represents schematic examples of learning curves and derivatives. The left panel depicts a power function and an exponential function that start at the same level in the first year of chess tournament participation and approach similar levels in year 20 of tournament participation. The power function shows especially strong performance gains in the first years. For instance, the gain in rating points (e.g., Elo, 1978) in year one is about double the size of the gain in year two. Year two still yields considerably more performance gain as compared to year three, and so on and so forth. Absolute gain per year is depicted in the right panel. It is decreasing for both, the power and the exponential function. The qualitative difference between the two types of learning curves becomes most obvious when considering the relative learning rate (RLR). This rate is decreasing for the power function, but remains constant for the exponential function. In our example, the exponential function has a relative learning rate of about 20%. In each year, the players gain about 20% of the ELO points they have not gained yet. If someone starts with 1000 and will end up with 1500 points (see Method for an explanation of the scale used in chess), this would mean a gain of 100 points for the first year and 80 points in the second year (20% of 1500 − (1000 + 100) = 80 points).

One qualitative aspect of learning curves is that they represent the diminishing absolute payoff of practice-investment. Exponential practice functions can be derived from a narrow set of assumptions. As Heathcote et al. (2000) explained one needs only to assume that learning is proportional to the time taken to execute the component in case of a continuous mechanism. First, a component that takes longer to execute presents more opportunity for learning. Second, as learning proceeds, the time to execute the component decreases. Therefore, the

**FIGURE 1 | (A)** Schematic plot of the power function (blue lines) and negative exponential function (red lines) over 20 time points. **(B)** Shows the absolute differences in performance from one time point to the next (filled symbols) and the relative learning rate (empty symbols).

absolute learning rate decreases, resulting in exponential learning. Similarly, for discrete mechanisms, such as chunking, exponential learning can be explained by a reduction in learning opportunity. As responses are produced by larger and larger chunks, fewer opportunities for further composition are available. Timedemanding control is no longer necessary for small steps but only for scheduling sets consisting of fixed series of small patterns. Naturally, the opportunities for compilation of small single knowledge units into larger ones reduce, as more and more patterns are already chunked.

Additional theoretical assumptions are needed to accommodate a decreasing RLR. For instance, Newell and Rosenbloom (1981) see also Anderson (2002) assumed that chunks are acquired hierarchically and that every time a larger chunk is practiced, this entails practice of its smaller components. Thus, by practicing a knowledge unit consisting of sub-units, the subunits and the overall pattern are fine-tuned and strengthened. Furthermore, at least in combinatorial environments, acquisition proceeds ordered by chunk span. No larger span chunk is acquired until all chunks of smaller span have been acquired.

The above research suggests that one or the other simple learning function might be adequate to describe improvements over long time intervals (cf. Howard, 2014). Functions known from work on short-term skill acquisition should be relevant to describe long-term expertise acquisition. We take chess as an example to explore this perspective. First, longitudinal data spanning years of practice are available. Second, theories on expertise in chess can be taken to suggest that scalability between small scale learning and large scale expertise acquisition should be especially likely to hold in this domain. Expertise development in chess might predominantly be based on cumulatively storing more and more patterns of chess positions (Chase and Simon, 1973; Gobet and Simon, 1996b). Spatial (Waters et al., 2002; Connors and Campitelli, 2014; Leone et al., 2014) and perceptual capabilities are deemed crucial (Charness et al., 2001; Reingold et al., 2001; Bilalic et al., 2008a,b; Kiesel et al., 2009; Bilali ´ c et al., 2010; Bilali ´ c´ and McLeod, 2014). This suggests that attentional and learning episodes taking place at the time scale of milliseconds might together lead to expertise acquisition. This in turn would make it likely that expertise acquisition can be described by the learning function exhibited during learning episodes that take place within a single laboratory session.

In order to explore the potential of this conjecture in the current study, we provide a descriptive analysis of the development of chess performance in German players who start playing chess at an early age and continue with the activity for at least 10 years. Relevant for theoretical as well as practical purposes, the time courses of expertise acquisition could thus potentially be predicted. Based on the shape of the curve of improvements during the first years of expertise acquisition, one might be able to predict the time course of improvements over the years of practice to come (Ericsson et al., 1993; Charness et al., 2005).

## **METHODS**

#### **DATABASE**

We used archival data of the population of German players recorded by the German chess federation (Deutscher Schachbund) from 1989 to 2007. Data were kindly provided by the federation and analyzed in line with guidelines of the ethics review board at Humboldt-Universität, Berlin. With over 3000 rated tournaments in a year, the German chess federation is one of the largest and the best-organized national chess federations in the world. Given that almost all German tournaments are rated, including events such as club championships, the entire playing careers of all competitive and most hobby players in Germany are tracked in detail. This is particularly important because we wanted to capture the very first stage of chess skill acquisition by focusing on the very young chess players who just started to play chess. The German database provides a perfect opportunity to study the initial stages of skill acquisition because even school tournaments are recorded.

## **THE MEASURE—CHESS RATING**

Besides precise records of players, the German federation's database and chess databases in general use an interval scale, the Elo rating, for measuring skill level. Every player has an Elo rating that is obtained on the basis of their results against other players of known rating (see Elo, 1978). Average players are assumed to have rating of 1500 Elo points, experts over 2000 points, grandmaster, the best players, over 2500. Beginners usually start at around 800 Elo points. The German database uses the same system but labels the rating as Deutsche Wertzahl (DWZ), which is highly correlated (*r >* 0*.*90) with the international Elo rating (Bilalic et al., ´ 2009b).

#### **SELECTION CRITERIA AND GROUPING OF DATA ANALYZED**

The German chess federation database contains records of over 124,000 players and the average rating of these players is 1387 points with standard deviation of 389 points. For all practical purposes, the database contains the entirety of the population of tournament chess observations in Germany (for more information about the database, see Bilalic et al., 2009b; Vaci et al., ´ 2014). With interest in expertise development (rather than maintenance), we used the subset of data from all players who entered the database between age 6 and 20. This population consisted of 1383 players that played competitive chess for at least 10 years. All players took part in tournaments in each of the 10 years. To be sure that the initial observation was indeed first entry into competitive chess, we excluded players who were already listed in the first year the federation started tracking players. For the players starting young, there should have been little opportunity for expertise acquisition prior to taking part in tournaments covered by the database. To track this issue, we split the sample into age-groups (see **Table 2** gender and age as well as for means and standard deviations of games played, rating reached by year 10, change in rating between year 1 and 10, and change in rating per game played).

Note that since we are working with the entirety of tournament chess performances in Germany since 1989, we provide description of the entire population of interest—chess players that played competitive chess in Germany for at least 10 years (means, standard deviations, correlations that allow for an estimation of effect sizes). Generalization of findings, beyond the internal


**Table 2 | Sample characteristics and summary statistics on games played and ratings.**

predictions, will have to be based on replications with other or future databases (see e.g., Asendorpf et al., 2013).

## **PREDICTING AND FITTING WITH THE POWER AND EXPONENTIAL FUNCTION**

Fits were derived with constrained optimization, requiring the A and B parameters to take sensible values (0 *<* B *<* A *<* 3000) using the MATLAB Curve Fitting Toolbox. For each participant we compared estimated and observed ratings and determined whether the power function or the exponential function led to a smaller squared deviation. For predictions, we only used the data of the first 5 years to extract the parameters of the power function and the exponential function. Then we used these parameters to extrapolate the predicted ratings for the next years (at least 5 each person in the sample had database entries for a minimum of 10 years). The predicted values were then compared to the ratings actually achieved. For instance, for a given participant who played for 10 years, we took the performance in the first five, acting as if the trajectory data of the next years were not yet available. The power function and the exponential function were fit to the data of the first 5 years in order to obtain the parameter values exemplified in **Table 1**. Next we used these values in order to extrapolate for the coming years of tournament participation. These predicted values were than compared to the actual ratings obtained. For each participant we could thus compare the root mean square error (RMSE) between power function-based prediction and prediction based on the exponential function.

## **RESULTS**

## **DESCRIPTIVE STATISTICS**

**Table 2** indicates that our sample was predominantly male. Participants starting to play tournaments younger accumulated more games as compared to those starting at an older age. For instance, the 6 to 9 year olds played twice as many games than the 18 to 20 year olds. The rating reached by Year 10 was similar across age groups. Yet, this implied a much stronger improvement compared to Year 1 for the players starting young rather than old. For instance, the youngest group showed trifold the increase of the oldest group. The increase in rating relative to the number of games played was similar across age groups (with the players starting oldest, who showed a reduced gain per games played).

## **EXPONENTIAL BETTER THAN POWER FUNCTION IN PREDICTIONS AND FITS**

**Figure 2** presents a random subset of individual time courses. Despite fluctuations from one year to the next, participants generally showed increases in skill, as measured by Elo, over years

of chess played. Some participants showed large gains especially in their first years. In order to systematize such observations, we tested the capability of the power function and the exponential function to fit and predict the observed trajectories. Prediction is interesting for practical purposes as we can infer the skill level someone will have after ten years of activity based on the pattern of performance in their first years. On the other hand, prediction circumvents methodological problems inherent in curve fitting. For instance, one mathematical function might fit better than another, because it is flexible enough to mimic the competitor.

Across individuals from all age groups, the exponential function provided better prediction and fit to the data than the power function (**Table 3**). The average RMSE and its standard deviation were smaller for the exponential function than for the power function (with exception of the prediction among those starting chess at ages 18–20). For 88% of the players, the exponential function was better in fitting the first 5 years the skill acquisition process, and for 62% it was better in predicting the skill level in later years. As shown in **Figure 3**, the distribution of RMSE values was heavily left-skewed. For a substantial proportion of participants neither the exponential nor the power function provided an account of the dynamics of individuals' skill development.

**Table 3 | Average and standard deviation of RMSE per age group.**


### **INCREASING GAINS IN PARTICIPANTS STARTING YOUNG**

We grouped players by the age they started to play tournaments in order to explore reasons for the substantial problems in fit and prediction encountered with both the power and exponential functions. Potentially, players starting tournament participation at older ages might have profited from substantial opportunities to practice chess before they entered our window of analysis. Thus, the expected learning gains might manifest more readily in those players that started at younger ages. **Figure 4A** indeed shows that players entering tournament chess at older ages demonstrate higher skill levels by the end of their first year, while players entering at younger ages, start at lower levels. Most notably, however, the shape of the improvements deviates systematically from the patterns of change that would be expected based on either the exponential or the power function. Both learning functions predict that participants should show a higher absolute gain in rating points from the first to the second year compared to the gain from the second to the third year, which in turn should yield a higher gain as compared to the change from the third to the fourth year, and so on and so forth. Among the subset of players starting young, however, the contrary seemed to be the case (see also **Figure 4B** for difference values). Over the first years of tournament participation, the absolute amount of gain per year *increased* rather than decreased. The deviation from the expected learning curve might be related to year-to-year variations in practice. For example, the players starting young may, at first, participate in very few tournaments, and then, in the next few year, increase in the number of tournament games they take part in. This is indeed the case (**Figure 4C**). Therefore, it is conceivable that the amount of practice which increases over the years accounts for the dynamics of the skill increase—only once the players starting young take part in more and more games, their skill might start to increase in the manner predicted by the learning curves. As we do not possess any further data on changes in the amount of practice per year (i.e., off tournament practice), we cannot conclusively judge this account. However, at least we can state that the increase in the number of tournament games played cannot fully account for the dynamics. As shown in **Figure 4D**, the change in rating per year per number of tournament games played also shows an increase over the first years for players starting young.

## **FLUCTUATIONS IN GAMES PLAYED PER YEAR RELATE TO MISFIT WITH POWER FUNCTION**

It is conceivable that the misfit and inaccurate prediction of the power and the exponential functions are related to variability in the number of games played per year. While we only examined the subset of players who played in tournaments in each of the 10 years tracked, the number of games per year might have fluctuated. We computed the within-person (intraindividual) standard deviation of games played per year, assuming that fits should be optimal if the number of games a player takes part in does not change over the years. This index is equivalent to computing the deviation from a zero-slope line in numbers of games played. **Table 4** shows the Spearman rank order correlation of intraindividual variability in number of games played per year with the RMSE obtained from fitting and predicting ratings based on the power function and the exponential function. The correlations suggest that larger intraindividual variability in number of games played per year was weakly but consistently related to worse fit in case of the power function (while the pattern was less consistent for the exponential function). A similar pattern was observed when correlating the overall number of games played to accuracy in prediction and fitting. Participants who played more games showed worse fits compared to participants who played less games. This was likely the case, because the number of games played over the ten years (a count variable) was closely linked to the intraindividual variability in games per year (Spearman correlations ranging between 0.84 and 0.88 across the four age groups).

**Figure 4C** suggests that variability in number of games played per year is not purely random. Instead it can be based on an ordered pattern (inverted U-shape). Separately for each age group, we took the average profile in number of games played per year (displayed in **Figure 4C**) as a prototypical pattern. Then, we determined for each participant the profile correlation between his/her pattern of numbers of games played with the average pattern of the respective age group. Our analyses suggested that there was substantial variability, with some participants following

the pattern represented in the group mean and others deviating from it. Median within-person correlations per age group were *r* = 0*.*58, 0.5, 0.47, and 0.19. The percentage of individuals showing a negative correlation with the prototypical pattern was 9.8, 15.2, 19.6, and 31.8%. However, as suggested by **Table 4**, the extent to which the dynamics of an individual's number of games played per year was represented by the average pattern of the age group was not systematically related to the accuracy in power function or exponential function fits and predictions.

### **OFF-THE-CURVE PATTERNS IN 2/3rds OF THE SAMPLE**

We sought to provide descriptive data on the number of participants who deviated from the predictions of the learning curves by showing smaller rather than larger rating gains during their early as compared to their later years of tournament participation. For this we sorted individuals into tertiles based on the total gains achieved during the first 3 years (lowest, medium, and highest rating gains). As shown in **Figure 5A**, the third of players with the lowest gains even showed small decreases in rating during the first years, while only the individuals with the largest gain yielded performance changes in line with the predictions by the learning curves (i.e., larger gain per year in early rather than late years, compare **Figure 5B**). Players that did not improve in their first three tournament years caught up to some extent in later years, but did not reach the same level by year 10 as those players with a steep increase early on. Thus, irrespective of complex dynamics of the shape of the performance curve, the first years do seem to offer a proxy for predicting the level a player will eventually reach.

#### **COHORT DIFFERENCES**

There have been many changes in resources available for chess players since 1989. We analyzed the time course in development of chess ratings separately for different cohorts in order to explore whether deviations from the pattern predicted by the learning curves varied in relation to the historical period that a chess career was started. Deviations from the learning curve were not **Table 4 | Spearman rank order correlations of (a) indices of regularity in numbers of games played per year with (b) the fitting and prediction error.**


accounted for by cohort. Rather, for all 5-year cohorts from 1970 to 1990 and age-groups displayed in **Figure 6**, the increase in rating during the first years of performance was linear or positively accelerated. The pattern of negative acceleration (larger gains in earlier as compared to later years, compatible with the learning curves) was not observed.

Age-groups and cohorts differed more with respect to the rating level they started out with (i.e., reached by end of their first year) than with respect to their level of performance in Year 10. As already observable in **Figure 5A**, people starting to play tournaments at younger age, started out at a lower level. In addition, **Figure 6** shows that later cohorts started at lower levels. This might be taken to suggest that players starting young in late cohorts are the best candidates to track trajectories in chess performance based on tournament ratings, while ratings of players starting older and earlier cohorts might be shaped more strongly by off-tournament practice.

#### **GAIN IN RATING FROM GAMES PLAYED**

The above analyses suggest that the success of the power function and the exponential function in predicting development of chess performance might be rather limited due to quantitative and qualitative misfit. Furthermore, the number of tournament games played seemed to be linked to deviations from the learning curve. Therefore, we sought to describe the extent to which early vs. late years in playing tournament chess are related to gain in rating as well as performance level reached by Year 10. For this we used games played per year and gain in rating per year. We applied Spearman rank order correlations separately for each age group and year of tournament participation. **Figure 7A**

shows that number of games played per year is related to betweenperson differences in gain in rating. Participants playing more tournament games in a year tend to show a larger increase in rating compared to those playing less games. This holds consistently across age-groups and especially so for early years of tournament participation. However, diminishing returns seem to be observable with respect to the extent to which more tournament games can lead to an increase in rating. **Figure 7B** shows that the relationship between (a) games played per year and (b) gain per games played per year can become negative. Thus, overall it does not seem to be the case that playing more tournament games can lead to an increase in efficiency in taking gains in rating from a tournament game. For instance, those players starting tournament participation at age 10–13 who played more tournament games, seemed to show a reduced gain in rating per tournament game played in their middle years.

The gain in rating that players show from Year 1 to Year 10 can be predicted by gain in rating per year in early years of tournament participation. As depicted in **Figure 7C**, gain in later years is less predictive of the overall increase in rating. While the power and the exponential function would have predicted that we can observe large gains in rating in early years, we thus, somewhat analogously, observer a larger predictive power of between-person differences in early as compared to late years of chess tournament participation. Apart from the gain per year, also the gain per year relative to the number of games played per year could be used to predict the overall increase in rating between Year 1 and 10 (**Figure 7D**). Participants who, during the first years of tournament participation, efficiently increased their rating per games played, ended up at a higher performance level than those, who did not show a large gain per games played during early years.

## **SELECTIVE ATTRITION**

Finally, we checked for selective attrition. While in our main analyses we only used 10 years of subsequent tournament participation, some participants provided records for additional years (up to 19 years overall). Rank order correlations indicated that the number of overall years of tournament participation per age group was neither systematically related to gain between Year 1 and Year 10, nor the gain in rating within the first 3 years (*r*s between −0.10 and 0.16).

## **DISCUSSION**

In the current work we have explored the potential of the powerand the negative exponential learning functions to account for the development of chess performance measured in ratings based on tournament outcomes. In line with re-evaluations of the power law of practice (Heathcote et al., 2000; Haider and Frensch, 2002), we documented that the exponential function was better than the power function in fitting and predicting the time course of chess ratings over years of practice. However, a crucial aspect shared by both of these mathematical functions and the underlying theories of skill acquisition was not reflected in the data. While according to the power- as well as the negative exponential function players should achieve large absolute gains early in practice and small gains later in practice, this was not the case for many of the participants. Rather, many players started to show substantial improvements only *after* their first years of tournament participation. They were playing off the learning curves suggested by skill acquisition theories. If expertise acquisition is not well described by learning functions used to describe skill acquisition, the linking of underlying cognitive processes of attention and learning that proceed on time-scales measuring milliseconds to hours with learning processes that proceed on time-scales measured in years seems much less straightforward than one could have hoped for (i.e., Anderson, 2002).

Many players showed an acceleration of gain in rating in the first years of tournament participation, followed by a deceleration. Based on the power function and the exponential function we would have expected to only find the latter. Newell et al. (2001) suggested to mathematically and conceptually accommodate such findings by assuming a mixture of learning processes taking place on different time scales. Acceleration followed by deceleration could be captured by a sigmoid function that consists of two exponential components, a positive (acceleration) and a negative one (deceleration). Learning opportunities and efficiency in using them might increase during first years of tournament performance for many players, while in later years returns of investing in chess performance are diminishing. In line with this view, year-long trajectories of skill acquisition might be better understood from a perspective that takes lifespan-developmental and educational changes into account (Li and Freund, 2005). For instance, players starting to take part in tournaments at a young age are likely to promote changes in self-regulation strategies available (Lerner et al., 2001; Freund and Baltes, 2002) and acquire the potential to shape their social and learning environment. Their ability to learn about chess from (foreign language) media and options to travel to and communicate with other players will increase. Deliberate practice (cf. Ericsson et al., 1993) might require that young players develop skills to competently use of their motivational resources, by, for instance, scheduling work on skill acquisition such that as many of the activities as possible are intrinsically motivating (cf. Rheinberg and Engeser, 2010 as well as Christophel et al., 2014, for training of motivational competence). Underlining this challenge, Coughlan et al., 2014 reported that participants in the expert group of their study rated their practice as more effortful and less enjoyable compared to other participants. The experts were successful in improving performance, by predominantly practicing the skill they were weaker at. However, such gains in potential to learn might for many players no longer compensate for the physical and social changes faced during puberty (Marceau et al., 2011; Hollenstein and Lougheed, 2013), at the end of adolescence, during secondary education, family formation or labor force participation. Future research should thus try to simultaneously account for development in the individual, the opportunities provided by the environment (cf. Ram et al., 2013) and to model different trajectories in one framework (e.g., Grimm et al., 2010; Ram and Diehl, in press).

For skill acquisition mechanisms such as chunking, negative exponential learning can be explained by a reduction in learning opportunities (cf. Heathcote et al., 2000). The later in practice, the fewer chunks are yet to be learned. While a deceleration of learning should be observed late in practice, such an account does not preclude that strong increases in learning opportunities early in practice can lead to an acceleration of chunks acquired per time invested. It appears that, for at least some players, opportunities and efficiency in increasing chess performance are already fully present at the time they start to play tournaments. They start at the turning point of the sigmoid function. The "upper" negative exponential portion of the sigmoid is sufficient to describe their performance gains, which are large in their early years and then diminish as performance approaches the asymptote. For other players, both positive and negative exponential portions of the sigmoid function are needed to represent the dynamics of their chess performance over time. These players appear to be less saturated with respect to learning opportunities and efficiency when starting to take part in tournaments covered by the database. They thus first show an acceleration in rating gains per year, followed by the deceleration when approaching asymptote.

In line with these speculations, Howard (2014) reported an average trajectory of rating increases showing deceleration only for International Chess Federation (FIDE) players (rather than acceleration followed by deceleration). The shape of the curve reported by Howard matches the exponential curve from **Figure 1A**. Starting at an average of about 2200 points, the sample mean increased beyond 2500 points with practice. Different from the database used in the current study, the threshold to be listed in the FIDE database is high (cf. Vaci et al., 2014 for a discussion of problems implied by restriction of range in chess databases). Likely, players were already taking full advantage of opportunities to improve chess performance when entering the database so that an acceleration in rating gain with practice was no longer possible. Descriptive analyses suggest that the dynamic in rating improvement that players at the international level show with practice seems consistent with the negatively accelerated exponential function. As implied by the exponential function, the relative learning rate (RLR) estimated based on the average data published by Howard (2014) is constant. While the power function should lead to a decrease of RLR with practice (cf. Heathcote et al., 2000), the RLR is fluctuating around 20%. Focusing on the first half of practice in order to avoid inflation of RLR at the end of the practice curve, we obtained an *r* = 0*.*11 correlation of RLR with time point. Thus there was no hint toward a decrease.

Our correlational analyses suggest that interindividual variability in rating gain over the course of ten years of tournament participation can be predicted by between-person differences in performance during the first years. Even by taking data from single years, number of games played, rating points gained or rating points gained per games played, allow to predict overall gain at a moderate level. While the power and the exponential learning curve would suggest that the first years of practice should be important because of the large performance gains, we thus can somewhat analogously conclude that the first years are more important than later years for predicting between-person differences in performance level reached on the long run (cf. Ackerman and Woltz, 1994).

We focused on examining changes in rating with year of practice (rather than number of games played, cf. Howard, 2014). This allowed us to explore changes in rating gain and rating gain per games played with age and cohort. Yet, a direct comparison of the capability to capture performance change is lacking so far for the two potential time scales, (1) number of games played, (2) chronological time in years, as well as (3) a mixture of both scales. Several issues are worth considering when exploring the complexity of models needed to account for expertise acquisition over years, as compared to models of skill acquisition in hour-long laboratory sessions. In the lab, quantity and quality of practice per unit of time is usually well controlled. In skill acquisition processes outside the lab they might vary considerably over the years of practice an individual engages in. In addition, potential cohort differences should not be neglected (cf. Gobet et al., 2002; van Harreveld et al., 2007; Connors et al., 2011). Future work should consider how data on both, quantity of practice and quality of practice, can be used to explain the time course of chess skill development (cf. Baker et al., 2003; Charness et al., 2005; Gobet and Campitelli, 2007; Howard, 2014). Apart from obtaining data on the amount of off-tournament learning opportunities, available data sets could be used to gauge variability in specific aspects of the learning opportunities. For instance, taking part in tournaments with large spread in opponent strength might provide more opportunities for improvement as compared to tournaments with more homogenous competitors.

### **REFERENCES**


Elo, A. E. (1978). *The Rating of Chess Players, Past and Present*. New York, NY: Arco.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 02 August 2014; published online: 22 August 2014. Citation: Gaschler R, Progscha J, Smallbone K, Ram N and Bilali´c M (2014) Playing off the curve - testing quantitative predictions of skill acquisition theories in development of chess performance. Front. Psychol. 5:923. doi: 10.3389/fpsyg.2014.00923 This article was submitted to Cognition, a section of the journal Frontiers in*

*Psychology.*

*Copyright © 2014 Gaschler, Progscha, Smallbone, Ram and Bilali´c. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Checkmate to deliberate practice: the case of Magnus Carlsen

#### *Fernand Gobet <sup>1</sup> \* and Morgan H. Ereku2*

*<sup>1</sup> Department of Psychological Sciences, University of Liverpool, Liverpool, UK*

*<sup>2</sup> Department of Psychology, Brunel University, Uxbridge, UK*

*\*Correspondence: fernand.gobet@liv.ac.uk*

#### *Edited and reviewed by:*

*Michael H. Connors, Macquarie University, Australia*

**Keywords: chess, deliberate practice, expertise, superior performance, talent**

The role of practice in the acquisition of expertise has been a key research question at least since Bryan and Harter's (1899) study on expertise in Morse telegraphy, which proposed that it takes 10 years to become an expert. The framework of deliberate practice (Ericsson et al., 1993) has taken an extreme position by denying the role of talent in most domains and stating that superior performance is an increasing monotonic function of deliberate practice—the more goal-oriented practice, the higher the level of skill. For example, Ericsson et al. (1993) argue that "individual differences in ultimate performance can largely be accounted for by differential amounts of past and current levels of practice" (p. 392). The deliberate practice framework has captured the imagination of the popular press, as can be seen by the publication of several popscience books such as *Talent is Overrated* (Colvin, 2008), *Outliers* (Gladwell, 2008) and *Bounce* (Syed, 2011).

In recent years, this framework has been criticized in academic circles; for example, in retrospective studies, the amount of deliberate practice accounts for only about one third of the variance in expertise in music and in chess (Hambrick et al., 2014). More naturalistic data also question the validity of the framework. As top performers have spent similar number of hours to improve and maintain their skills, the fact that individuals such as Roger Federer in tennis, Michael Jordan in basketball, Usain Bolt in sprint or Michael Schumacher in auto racing have so outrageously dominated their sport throws considerable doubt on the deliberate practice framework.

A particularly spectacular example is provided by chess grandmaster Magnus Carlsen (Norway), who became world champion in classic chess in November 2013 by beating Viswanathan Anand (India) and who also became world champion in rapid chess (15 min + 10 s additional time per move) and speed chess (3 min + 2 s additional time per move) in June 2014. In the June 2014 rating list published by the World Chess Federation (http://ratings*.* fide*.*com/toparc*.*phtml?cod=309), 23-year old Carlsen is ranked first with 2881 points<sup>1</sup> . This is just one point below 2882, the highest rating in chess history that Carlsen held in May 2014. There is a 66 point difference between him and the second player, grandmaster Levon Aronian (Armenia, 2815 points; see **Table 1**). This difference is nearly the same as that between the 2nd and the 14th player in the list (63 points), Dutch grandmaster Anish Giri (2752 points). **Table 1** shows the rating of Carlsen and of the ten players following him in the list. A one-sample *t*-test confirms that Carlsen's rating is statistically different from the next ten grandmasters (*M* = 2780*.*6), *t*(9) = −19*.*38, *p <* 0*.*001, *mean difference* = −100*.*4; 95% CI [−112.1, −88.7]. One hundred points is a considerable difference: it is half a standard deviation in skill and means that, against the very best players in the world, Carlsen's probability of winning is 63.7%.

To test the monotonic assumption, we collected information from the internet and biographies about the age at which these grandmasters started playing chess and about their current age (see **Table 1**). Starting age is a good approximation of when players started practicing seriously (i.e., using some form of deliberate practice), as most of these players obtained outstanding results in youth competitions a few years after starting playing chess, and indeed obtained the grandmaster title rapidly. In the case of Carlsen, he has stated that he had learned the rules at 5 years but started practicing seriously only at 8 years (see Gobet and Campitelli, 2007)<sup>2</sup> . To be consistent, we used starting age anyway. (Note that this bias adds years of deliberate practice, and thus is in a favor of the monotonic assumption.) If the monotonic assumption is correct, Carlsen should have accumulated more hours of deliberate practice than the other players, given the way he dominates the chess world. We did find that Carlsen's number of years of deliberate practice (18 years) is different to the average of the following ten best players in the world (*M* = 24*.*6 years), *t*(9) = 2*.*83, *p <* 0*.*05, *mean difference* = 6*.*6 years; 95% CI [1.33, 11.87]. However, this result *is exactly the opposite of what is predicted by deliberate practice*: on average, Carlsen practiced statistically significantly *fewer years* than the other players. (Note also that, for the players in **Table 1**, the correlation between

<sup>1</sup>To measure chess players' skill level, the World Chess Federation (FIDE) uses the rating scale developed by Elo (1978), which is an interval scale that computes players' rating as a function of their results against other players of known rating. The scale has a normal distribution with a theoretical mean of 1500 and a standard deviation of 200 points. Grand Masters are typically rated above 2500 points. The best players of the world have around 2800 points and the weakest players less than 1200 points.

<sup>2</sup>Ericsson et al. (2007) explanation that prodigies' high levels of performance can be accounted for by the amount of deliberate practice made possible by a very early start does not apply in Carlsen's case.


**Table 1 | Rank, country, rating, starting age, current age, and number of years of practice of the 11 top players in the world (June 2014).**

*Source: http:// ratings.fide.com/ toparc.phtml?cod*=*309.*

rating and the number of years of practice is negative (*r* = −0*.*21) but not statistically significant (*p* = 0*.*55)).

In this analysis, we have assumed that, at the top level, all players practice with extreme dedication and with the best training methods available. If expertise was solely a monotonic function of practice, then it follows that Carlsen, who learned the rules at age of five but started playing chess seriously at the relatively old age of eight, should be much weaker than most of the ten players that follow him in the international rating list, as these opponents had time to clock in substantially more deliberate practice (on average, at least 6.6 years more). The fact that Carlsen dominates the chess world so outrageously, being world champion not only in classic chess but also in rapid chess and in blitz, refutes this hypothesis, central to the theory of deliberate practice.

Several objections can be leveled at this analysis. We discuss three of them, and show that they do not invalidate our argument. First, Carlsen's prodigious skill throughout adolescence and early adulthood may not be as remarkable as it first appears, as numerous young players perform better that their older competitors. For example, Howard (1999) has shown that the top chess players are increasingly younger. Key changes have taken place in the last decades that enable more efficient practice (Gobet et al., 2002). In particular, the quality and quantity of chess books have dramatically increased over the last decades, and chess programs and computer databases have revolutionized training methods. That more efficient deliberate practice should lead to quicker progress is consistent with Ericsson et al.'s (1993) framework. However, as all players in the Table have benefitted from these improvements in training, this factor does not explain away Carlsen's superiority.

Second, it could be argued that, just like in sport, age plays an important role in chess and youth will give an edge to younger top competitors. It is known that the effects of ageing occur depressingly early with cognitive variables such as reasoning, visualization and processing speed, peak performance being observed in the early to mid-twenties (Salthouse, 2009). However, whether this is a key factor in chess is unclear, as six of the absolute top players shown in **Table 1** are 30 years old or older. In addition, Gary Kasparov and Viswanathan Anand were still world champions when they were 37 and 44 years old, respectively. In any case, in **Table 1** the correlation between age and rating (*r* = −0*.*21) is not statistically significant (*p* = 0*.*54), but Carlsen is reliably younger than the other ten top players, *t*(9) = 3*.*16, *p <* 0*.*05, *mean difference* = 7*.*6 years; 95% CI [2.16, 13.04]. Nevertheless, the age variable does not explain why Carlson is so clearly better than the four players who are roughly his age.

Third, Carlsen might have engaged in more intense deliberate practice. Although we do not know the details of Carlsen's training, this is unlikely, in particular if we use Ericsson et al.'s (1993) criterion that deliberate practice is not enjoyable. In a recent interview, Carlsen said that "in chess training, I do the things I enjoy. I don't particularly enjoy playing against computers, so I don't do that" (Anders, 2014). In addition, he is a keen sportsman, with a penchant for playing or watching football rather than practicing chess intensively (Sujatha, 2013).

Thus, the question arises, in the risk of offending the proponents of deliberate practice: Does Carlsen have a particular talent for chess? The answer to this question is so obvious in the chess world that it is not even posed—Carlsen is known as the "Mozart of chess." Several factors support the hypothesis of talent. Carlsen showed clear signs of intellectual precocity early in his life. At the age of five, he knew "the area, population, flag, and capital of all the countries of the world," and memorized similar information for all Norway's 430 municipalities (Agdestein, 2004, p. 10). He became a grandmaster just five years after starting playing chess seriously, at the age of 13 years and 148 days<sup>3</sup> . He has also adopted a highly unconventional approach to chess. While most grandmasters specialize in specific openings that they study at great length (Chassy and Gobet, 2011), often using computers, Carlsen plays a wide range of openings and avoids known variations, even accepting inferior positions as a consequence of this choice. Rather than preparing lengthy opening variations, he relies on his uncanny ability to find near-optimal moves in middle games and endgames. Together with scientific research, the case of Magnus Carlsen

<sup>3</sup>This contradicts another key prediction of the deliberate practice framework: "More specifically, expert performance is not reached with less than 10 years of deliberate practice" (Ericsson et al., 1993, p. 372).

demonstrates that deliberate practice is necessary, but not sufficient, for achieving high levels of expert performance (Campitelli and Gobet, 2011).

## **ACKNOWLEDGMENT**

We thank the reviewer for useful comments.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 June 2014; accepted: 23 July 2014; published online: 14 August 2014.*

*Citation: Gobet F and Ereku MH (2014) Checkmate to deliberate practice: the case of Magnus Carlsen. Front. Psychol. 5:878. doi: 10.3389/fpsyg.2014.00878*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Gobet and Ereku. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The influence of deliberate practice on musical achievement: a meta-analysis

#### *Friedrich Platz 1, Reinhard Kopiez <sup>2</sup> \*, Andreas C. Lehmann3 and Anna Wolf <sup>2</sup>*

*<sup>1</sup> University of Music and Performing Arts, Stuttgart, Germany*

*<sup>2</sup> Hanover Music Lab, Hanover University of Music, Drama and Media, Hanover, Germany*

*<sup>3</sup> University of Music, Würzburg, Germany*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Brooke Noel Macnamara, Princeton University, USA Gary Edward McPherson, University of Melbourne, Australia*

#### *\*Correspondence:*

*Reinhard Kopiez, Hanover University of Music, Drama and Media, Emmichplatz 1, 30175 Hanover, Germany e-mail: reinhard.kopiez@ hmtm-hannover.de*

Deliberate practice (DP) is a task-specific structured training activity that plays a key role in understanding skill acquisition and explaining individual differences in expert performance. Relevant activities that qualify as DP have to be identified in every domain. For example, for training in classical music, solitary practice is a typical training activity during skill acquisition. To date, no meta-analysis on the quantifiable effect size of deliberate practice on attained performance in music has been conducted. Yet the identification of a quantifiable effect size could be relevant for the current discussion on the role of various factors on individual difference in musical achievement. Furthermore, a research synthesis might enable new computational approaches to musical development. Here we present the first meta-analysis on the role of deliberate practice in the domain of musical performance. A final sample size of 13 studies (total *N* = 788) was carefully extracted to satisfy the following criteria: reported durations of task-specific accumulated practice as predictor variables and objectively assessed musical achievement as the target variable. We identified an aggregated effect size of *rc* = 0*.*61; 95% CI [0.54, 0.67] for the relationship between task-relevant practice (which by definition includes DP) and musical achievement. Our results corroborate the central role of long-term (deliberate) practice for explaining expert performance in music.

**Keywords: deliberate practice, music, sight-reading, meta-analysis, expert performance**

## **INTRODUCTION**

Current research on individual differences in the domain of music is surrounded by controversial discussions: On the one hand, exceptional achievement is explained within the expertperformance framework with an emphasis on the role of structured training as the key variable; on the other hand, researchers working in the individual differences framework argue that (possibly innate) abilities and other influential variables (e.g., working memory) may explain observable inter-individual differences (see Ericsson, 2014 for a detailed discussion). The expert-performance approach is represented by studies by Ericsson and coworkers (e.g., Ericsson et al., 1993) who assume that engaging in relevant domain-related activities, especially deliberate practice (DP), is necessary and moderates attained level of performance. Deliberate practice is qualitatively different from work and play and "includes activities that have been specially designed to improve the current level of performance" (p. 368). In a more comprehensive and detailed definition, Ericsson and Lehmann (1999) refer to DP as a

"Structured activity, often designed by teachers or coaches with the explicit goal of increasing an individual's current level of performance. (···) it requires the generation of specific goals for improvement and the monitoring of various aspects of performance. Furthermore, deliberate practice involves trying to exceed one's previous limit, which requires full concentration and effort." (p. 695)

In other words, we have to distinguish between mere experience (as a non-directed activity) and deliberate practice. An individual's involvement with a new domain entails the accumulation of experience, which may include practice components and lead to initially acceptable levels of performance. However, only the conscious use of strategies along with the desire to improve will result in superior expert performance (Ericsson, 2006). Note that in most studies DP is only indirectly estimated using durations of task-relevant training activities that also include an unspecified proportion of non-deliberate practice components. The unreflected use of the "accumulated deliberate practice" concept to denote durations of accumulated time spent in training activities is therefore misleading, because the measured durations might theoretically underestimate the true effect of deliberate practice on attained performance. In the context of classical music performance, the task-relevant activity can often consist of some type of solitary practice (e.g., studying repertoire or practicing scales) or the execution of a particular activity in a rehearsal or training context (e.g., sight-reading at the piano while coaching a soloist; receiving lessons). The theoretical framework for the explanation of expert and exceptional achievement has been validated in various domains and is widely accepted nowadays (Ericsson, 1996), as evidenced by the extremely high citation frequencies of key publications in this area. For example, according to Google Scholar, the study by Ericsson et al. (1993) has been cited more than 4000 times in the 20 years since its publication. As an internationally known proponent of research on giftedness, Ziegler (2009) concludes that even modern conceptions of giftedness research have integrated the perspective of expertise theory. However, controversial discussions persist (see Detterman, 2014).

In contrast, researchers relying more on talent-based approaches maintain that DP might not explain individual differences in performance sufficiently and emphasize innate variables as the explanation for outstanding musical achievement, such as working memory capacity (Vandervert, 2009; Meinz and Hambrick, 2010), handedness (Kopiez et al., 2006, 2010, 2012), sensorimotor speed (Kopiez and Lee, 2006, 2008), psychometric intelligence (Ullén et al., 2008), intrinsic motivation (Winner, 1996), unique type of representations (Shavinina, 2009), or verbal memory (Brandler and Rammsayer, 2003). According to Ericsson (2014), the predictive power of additional factors, such as general cognitive abilities, is usually of small to medium size and diminishes as the level of expertise increases.

Although expertise theory provides convincing arguments for the importance of structured training on expert skill acquisition and achievement, no comprehensive quantification for the influence of DP on musical achievement has been presented so far. A first and highly commendable attempt to estimate the "true" (population) effect of DP via estimates of durations of accumulated practice on musical achievement was published by Hambrick et al. (2014) who identified a sample of eight studies for their review. However, their methodology, assumptions, and use of the term DP raise some issues that have to be resolved. These open questions and concerns spawned our initial motivation for the present meta-analysis.

### **REANALYSIS OF DATA PRESENTED IN Hambrick et al. (2014)**

First, we carefully studied the publication by Hambrick et al. (2014) **(Table 1)**. Using Table 3 of their paper, we extracted the correlations between training data and measures of music performance and entered these data into a meta-analysis software (Comprehensive Meta-Analysis, see Borenstein, 2010). This analysis brought to light an aggregated efffect size value of*r* = 0*.*44 for the influence of training data on musical performance (see **Table 1**, for details). According to Cohen's (1988) benchmarks, this corresponds to a large overall effect (see also Ellis, 2010, p. 41). Unlike Hambrick et al. (2014), we did not use the correlation values corrected for measurement error variance (attenuation correction) in the present paper because their correction of confidence intervals relied on the biased Fisher's*z*transformation (see Hunter and Schmidt, 2004, Ch. 5) and not on the corrected sampling error variance for each individual correlation as suggested by Hunter and Schmidt (2004, Ch. 3). Therefore, to allow for later comparisons, we decided to use the uncorrected (attenuated) correlation as the basis for our analysis of heterogeneity.

The effect size, however, is not the only relevant parameter in a meta-analysis, and it should be examined in the light of a possible publication bias. To test for the strength of the resulting effect size estimate, we conducted a test for heterogeneity for **Table 1 | Aggregation of data from Table 3 in Hambrick et al. (2014) for the reanalysis of effect sizes regarding the influence of deliberate practice on music performance.**


*Aggregation of studies shows a large (I*<sup>2</sup> <sup>=</sup> *60.3%) and significant heterogeneity (Q(7)* = *17.7, p < 0.02).*

the underlying sample of studies. Following Deeks et al. (2008), the *I*<sup>2</sup> value describes the percentage of variance in effect size estimates that can be attributed to heterogeneity rather than to sampling error. The *I*<sup>2</sup> value of 60.3 obtained for the Hambrick et al. (2014) sample of studies implied that it "may represent substantial heterogeneity" (Deeks et al., 2008, p. 278). The main reason for possible heterogeneity, in our opinion, could be a less selective inclusion with resulting inconsistent predictor and target variables. For example, in their study on the acquistion of expertise in musicians, Ruthsatz et al. (2008) used inconsistent (non-standardized) indicators for the estimation of musical achievement that made it difficult to compare the observed differences in performance: In Study 1, the band director's audition scores for each of the high school band members were ranked and used as individual indicators of musical achievement; in Study 2A, audition scores from the admission exam were used as the outcome variable; and in Study 2B, a music faculty member rated the students' general musical achievement. In no instance was a standardized performance task used as the target variable. Unfortunately, no information was reported on the rating reliabilities.

Although our reanalysis of Hambrick et al.'s (2014)review confirmed a large effect size for the relation between training data and musical achievement, this finding still underestimates the "true" value. In order to arrive at a convincing effect size for deliberate practice in the domain of music we also aggregated studies, but invested great effort in the selection of studies for our meta-analysis. As will be shown below, our meta-analysis was not affected by potential publication bias and heterogeneity. We also applied transparent and consistent criteria for study selection as this is one of the most important prerequisites for the aggregation of studies.

#### **CHOICE OF METHOD**

Two methods are available to evaluate past research: (a) a narrative and systematic review and (b) a meta-analysis. The narrative reviewer uses published studies, reports other authors' results in his or her own words and draws conclusions (Ellis, 2010, p. 89). A systematic review is also sometimes referred to as a "qualitative review" or "thematic synthesis" (Booth et al., 2012) and necessitates a comprehensive search of the literature. The disadvantage of this approach is that it depends on the availability of results published in established journals and tends to show a publication bias toward the Type I error (false positive). The reason for this is that journals prefer to publish studies with significant results, and negative findings or null results have a lower probability of publication (Masicampo and Lalande, 2012). In the field of music, narrative reviews on the influence of DP on musical achievement play an important role and have been conducted in the last two decades (Lehmann, 1997, 2005; Howe et al., 1998; Sloboda, 2000; Krampe and Charness, 2006; Lehmann and Gruber, 2006; Gruber and Lehmann, 2008; Campitelli and Gobet, 2011; Hambrick and Meinz, 2011; Nandagopal and Ericsson, 2012; Ericsson, 2014).

The other approach is that of a meta-analysis. Here, studies are included following "pre-specified eligibility criteria in order to answer a specific research question" (Higgins and Green, 2008, p. 6). Within the meta-analytic approach, studies' effect sizes have to be weighted before they are aggregated. Every study's effect size weight then reflects its degree of precision as a function of sample size (Ellis, 2010). Consequently, studies with smaller sample sizes, particularly in combination with larger variation, will result in smaller weights compared to studies with larger sample sizes and more narrow variation. These weights of the individual studies then function as estimators of precision. If these weights differ markedly from each other, statistical heterogeneity is present. The final result of a meta-analysis is the weighted mean effect size across all studies included. Compared to an individual study's effect size, this weighted mean effect size represents a more precise point estimate as well as an interval estimate surrounding the effect size in the population (Ellis, 2010, p. 95). Moreover, a meta-analysis generally increases statistical power by reducing the standard error of the weighted average effect size (Cohn and Becker, 2003). Researchers who use meta-analysis techniques have two goals: First, they want to arrive at an interval of effect size estimation in a population based on aggregated effect sizes of individual studies; second, they want to give an evidence-based answer to those questions that reviews or replication studies cannot give in part due to their arbitrary collection of significant and insignificant results.

Despite the fact that meta-analyses have been shown to be an important constituent for the production of "verified knowledge" (Kopiez, 2012), they have only recently been applied to various topics in music psychology (e.g., Chabris, 1999; Hetland, 2000; Pietschnig et al., 2010; Kämpfe et al., 2011; Platz and Kopiez, 2012; Mishra, 2014). To date, there has been no formal meta-analysis concerning the influence of DP on attained music performance.

## **GOAL OF THE PRESENT STUDY**

The aim of our study was two-fold: First, by means of a systematic literature review we wanted to identify all relevant publications that might help us answer the question of how strongly taskspecific practice influences attained music performance. Second, we wanted to quantify the effect of DP on music performance in terms of an objectively computed effect size. This effect size is an important component for the development of a comprehensive model for the explanation of individual differences in the domain of music. Although this meta-analysis is supposed to reveal the "true" effect size of deliberate practice on musical achievement, for theoretical reasons it is possible that it is still underestimating the upper bound of deliberate practice (see Future Perspectives).

## **MATERIALS AND METHODS**

The study was conducted in three steps: First, to arrive at a relevant sample of selected studies, we conducted a systematic review (Cooper et al., 2009) that helped to control for publication bias (Rothstein et al., 2005). In the second step, we identified each study's predictor and outcome variable in line with Ericsson (2014), and we identified all artifactual confounds that might attenuate the studies' outcome measures (Hunter and Schmidt, 2004, p. 35). Third, we carried out a meta-analysis of individually corrected (disattenuated) correlations as well as a quantification of its variance (Hunter and Schmidt, 2004; Schmidt and Le, 2005) to obtain the true mean score correlation (ρ) between music-related practice and musical achievement.

### **SAMPLE OF SELECTED STUDIES**

Our sample of selected studies for the subsequent meta-analysis was the outcome of a systematic literature search which had led to a preliminary corpus of selected studies (see **Figure 1A**). Due to a wide variety of methodological approaches, and for the purpose of later generalizability of our meta-analytical results, we decided to select only studies with comparable experimental designs. Therefore, in the next step of generating a sample, we excluded all studies from the preliminary corpus that did not meet all of our selection criteria (see **Figure 1B**). Consequently, our preliminary corpus of *n* = 102 studies dwindled to the final sample of *n* = 13 studies which served as input for the meta-analysis.

## **LITERATURE SEARCH**

The acquisition of studies for our systematic review derived from (a) the search for relevant databases of scientific literature, (b) queries of conference proceedings, and (c) personal communications with experts in the field of music education or musical development. First, a database backward and forward search for literature was conducted in January 2014 (**Figure 1A**). To control for publication bias (see Rothstein et al., 2005), we considered a large variety of databases for our literature search: peer-reviewed studies in the field of medical and neuroscientific (PubMed), psychological (PsycINFO), educational (ERIC), social (ISI), and musicological research (RILM). To avoid an overestimation of the effect size due to possibly unpublished results (Rosenthal, 1979), so-called "gray literature" (Rothstein and Hopewell, 2009) with often non-significant study results, we also searched doctoral dissertations (DAI), proceedings or newspaper articles (PsycEXTRA) as well as book chapters containing psychological study results (PsycBOOKS).

Studies were excluded from the preliminary corpus if they did not conform with at least one of the following three descriptors (**Figure 1A**): (1) "music" AND "deliberate practice," (2) "music" AND "formal practice," (3) "music" AND "expertise."

In addition, we included in the preliminary corpus those musicrelated studies which cited Ericsson et al.'s (1993) first extensive review of skill acquisition research. Finally, authors who had conducted experimental studies on predictors of music achievement were contacted and queried for currently unpublished correlational data involving music-related deliberate practice and musical achievement. In total, our initial literature search resulted in a preliminary corpus of 102 studies (**Figure 1A**).

## **CRITERIA-RELATED LITERATURE SELECTION**

While Hambrick et al. (2014) performed a more intuitive search, resulting in a significant heterogeneity of the study sample, the aim of our method was to arrive at a homogenous sample of pertinent studies. To this end, we selected studies based on objective criteria which we derived from the theoretical framework of expert performance according to Ericsson et al. (1993). Thus, studies were successively removed from the preliminary corpus of studies if they did not meet all the criteria shown in **Figure 1B**. As a result of our study selection (see **Table 2**), we identified studies which met the following 6 criteria: (1) they followed a hypothesistesting design; (2) they contained a correlation between accumulated deliberate practice and a *corresponding* task-related level of musical achievement; (3) the amount of relevant practice had to be accrued across at least 1 year, (4) musical performance had to be measured by means of objective criteria such as a computerbased assessment (e.g., scale analysis by Jabusch et al., 2004) or expert evaluation based on psychometric scales (e.g., Hallam, 1998). (5) Furthermore, studies were excluded if they did not contain sufficient statistical information for effect size calculation or estimation. (6) Finally, in the case of duplicate publication of data (as happens when original articles are also published in chapter form), study results were considered only once for effect size aggregation in the meta-analysis.

Following our selection criteria *n* = 89 studies had to be excluded from our preliminary corpus. Our final sample size was thus *n* = 13 studies, comprising results from peer-reviewed studies as well as "gray" literature from 1992 to 2012 (see **Table 2**). For comparison, Hambrick et al.'s (2014) sample size of studies included in his review was *n* = 8.

#### **PROCEDURE**

According to Hunter and Schmidt (2004, p. 33), the aim of a psychometric meta-analysis is two-fold: namely, to uncover the variance of observed effect sizes (*s* 2 *<sup>r</sup>*)—in our study, this was the variance of observed correlations between the task-related practice (predictor) and musical achievement (outcome variable) and to estimate the supposedly "true" effect size distribution in the population - σ2 ρ . The use of the term "psychometric" refers to the idea in classical testing theory (Gulliksen, 1950) that every observed correlation is subject to an attenuation due to the imperfect measurement of variables, sampling error, and further artifacts (for an overview see Hunter and Schmidt, 2004, p. 35). If the influence of all such artifactual influences on an observed correlation are known (*ro*), each study's correlation can be corrected first for its individual attenuation bias (*rc*). In a subsequent step, the population variance of the "true" correlation (σ<sup>2</sup> <sup>ρ</sup>) is estimated by subtracting the observed variance of corrected correlations (*s* 2 *rc* ) from the observed variance attributable to all attenuating factors (*s* 2 *ec* ). In the case of a perfect concordance between the observed variance of corrected correlations (*s* 2 *rc* ) and the observed


variance attributable to all artifacts (*s* 2 *ec* )*,* there is no population variance left to be explained (σ<sup>2</sup> <sup>ρ</sup> = 0). Then all studies' effect sizes in the meta-analysis are homogenous and assumed to derive from one single population effect (Hunter and Schmidt, 2004, p. 202). Therefore, we will first identify each study's theoretically appropriate predictor and outcome variable as well as reliability information for both variables in order to calculate effect size and estimate artifactual influence.

## **IDENTIFICATION OF PREDICTORS AND OUTCOME VARIABLES**

Although accumulated deliberate practice on an instrument has been identified as a generally important biographical predictor in the acquisition of expert performance (Ericsson et al., 1993), it is sometimes erroneously considered a catch-all predictor for achievement in music-specific tasks. However, as Ericsson clearly states, "it is not the total number of hours of practice that matter, but a *particular type of practice* [emphasis by the third author, AL] that predicts the difference between elite and sub-elite athletes" (Ericsson, 2014, p. 94). For example, according to Lehmann and Ericsson (1996) as well as Kopiez and Lee (2006, 2008), sight-reading performance as a domain-specific task of musical achievement should be less well predicted by accumulated generic deliberate practice in piano playing (i.e., solitary practice) than by the accumulated amount of task-specific deliberate practice in the field of accompanying and sight-reading. Therefore and in contrast to Hambrick et al.'s (2014) procedure—for each study we identified the most corresponding predictor variable. For example, the researcher might have summed up the number of pieces sight-read (Kornicke, 1992, p. 133), determined the size of the accompanying repertoire (Lehmann and Ericsson, 1996, p. 29), counted the number of accompanying performances (Meinz, 2000, p. 301), reported cumulated piano accompanying performances (Tuffiash, 2002, p. 81), calculated the accumulated sight-reading expertise until the age of 18 (Kopiez and Lee, 2008, p. 49) or aggregated the durations of accompaniment and hours of specific sight-reading practice (Meinz and Hambrick, 2010, p. 3). Information on the task-specific accumulated practice duration until the age of 18 or 20 years was used in the case of Ericsson et al. (1993, p. 386), Krampe and Ericsson (1996, p. 347), and Kopiez and Lee (2008, p. 49). In the absence of such data, we used the total accumulated practice time (at the time of the data collection) instead (e.g., in the case of Hallam, 1998, p. 124; McPherson, 2005, author contacted for data; Jabusch et al., 2007, p. 366; and Kopiez et al., 2012, p. 372).

In addition to the predictor variable, the measurement of the outcome variable should be representative of the investigated skill (Ericsson, 2014). Consequently, inter-onset evenness in scale-playing as well as performed (rehearsed) music were identified as truly domain-specific tasks of musical achievement in our sample of studies on music performance. Here, participants' performances were measured either by a reliable psychological evaluation based on psychometric scale construction (e.g., Kornicke, 1992) or by an objective, computer-based, physical measurement such as obtaining the number of correctly performed notes (e.g., Lehmann and Ericsson, 1996) or identifying the inter-onset evenness of scale-playing (e.g., Ericsson et al., 1993; Krampe and Ericsson, 1996; Jabusch et al., 2007). In the case of multiple tasks, as was the case in Ericsson et al. (1993, p. 386) as well as in Krampe and Ericsson (1996, p. 347), we decided to choose the task with the stronger measurement reliability, the highest difficulty and the highest discrimination ability for musical achievement (different movements with each hand (Ericsson et al., 1993, p. 386), simultaneously [Exp. 1], see Krampe and Ericsson, 1996).

#### **RELIABILITY OF IDENTIFIED PREDICTORS AND OUTCOME VARIABLES**

For the purpose of adjusting the correlation coefficient of the observed studies for attenuation, the measurement error in the predictor as well as in the outcome variable had to be identified (Hunter and Schmidt, 2004, p. 41). As shown in **Table 3**, only a small number of studies reported information on the reliability for either the predictor or the outcome variable. Specifically, only Tuffiash (2002, p. 36) reported test-retest reliability in cumulative piano accompaniment performance (*rxx* = 0*.*91) for the quantification of measurement error in the predictor variable. His test-retest reliability estimations were similar to those reported in Bengtsson et al. (2005, p. 1148), who stated a mean test-retest reliability *rxx* = 0*.*89 for the estimation of accumulated deliberate practice obtained from retrospective interviews. Thus, when no reliability was reported for the predictor variable, we used the mean correlation of test-retest reliability according to Bengtsson et al. (2005) to estimate the imperfection of the predictor variable.

To quantify measurement error in the outcome variable, we used the Cronbach's alpha reported in Kornicke (1992, p. 109) for the inter-rater reliability of the sight-reading test and in McPherson (2005, p. 13) for performing rehearsed music. In Krampe and Ericsson (1996, p. 339) and Meinz and Hambrick (2010, p. 4), Cronbach's alpha of the construct reliability for the psychometric measurements could be copied from the respective papers. Finally, in the case of Tuffiash (2002, p. 28) we computed a mean correlation on the basis of all the test-retest reliabilities of sight-reading tests the author reported. For studies in which no measurement error was stated for the outcome variable, we estimated the reliability of the outcome variable's measurement: To estimate the reliability of experts' performance ratings for the outcome variable in Lehmann and Ericsson (1996) and Kopiez and Lee (2008), we used the intercorrelations between the expert judgment of overall impression and the amount of correctly played notes (*ryy* = 0*.*88) as reported in Lehmann and Ericsson (1993, p. 190). In the cases of Ericsson et al. (1993), Jabusch et al. (2007, 2009) and Kopiez et al. (2012), we estimated *ryy* = 0*.*91 as the construct reliability according to Spector et al. (in revision); they computed a mean correlation of test-retest reliability for Jabusch et al.'s (2004) measurement of note-evenness in scale playing. The same test-retest reliability of the scale-analysis by Spector et al. (in revision) was used for the estimation of the test-retest reliability for the ABRSM in Hallam (1998). Along the lines of Bergee (2003), we underestimated the disattenuated correlation by using *ryy* = 0*.*91 and obtained a more conservative correction. Finally, a reliability estimate of *ryy* = 0*.*96 for Meinz (2000) was communicated by the author and also reported in Hambrick et al. (2014, p. 6). In summary, all studies showed a weak attenuation with a 1–17% downwards bias (see **Table 4**, column A).


**Table 3 | Reported effect size data on the relationship between indicators of deliberate practice and objective measurement of musical achievement.**

+*Absolute values were used in meta-analysis.*

◦*Aggregated correlation based on all four correlations between accumulated deliberate practice and outcome variable.*

◦◦*Aggregated correlation based on two reported correlations between accumulated life-time deliberate practice and outcome variable.*

◦◦◦*According to Lehmann and Ericsson (1996) the mean correlation of accompaniments (r* = *0.63) and hours of deliberate sight-reading practice (r* = *0.48) was used as task-specific predictor for sight-reading performance.*

*\*Reliability coefficients reported in studies; assumed reliability (if not reported) of predictor variable used for attenuation correction in meta-analysis: rxx* <sup>=</sup> *0.89; assumed reliability (if not reported) of outcome variable (ryy ) for attenuation correction in meta-analysis: Ericsson et al., 1993 (ryy* = *0.91), Lehmann and Ericsson, 1996 (ryy* = *0.88), Hallam, 1998 (ryy* = *0.91), Meinz, 2000 (ryy* = *0.96), Jabusch et al., 2007 (ryy* = *0.91), Kopiez and Lee, 2008 (ryy* = *0.88), Jabusch et al., 2009 (ryy* = *0.91), Kopiez et al., 2012 (ryy* = *0.91).*

## **STATISTICAL REANALYSIS AND META-ANALYSIS WITH CORRELATIONS CORRECTED FOR ARTIFACTS**

All studies reported correlations that could be used for quantifying the effect of deliberate practice on the musical achievement (see **Table 3**). Meinz and Hambrick (2010) reported multiple predictors of sight-reading skill along the theoretical outline for the acquisition of sight-reading skill (Lehmann and Ericsson, 1996; Kopiez and Lee, 2006). We aggregated the two predictors, *number of accompanying events/activities* (*r* = 0*.*63) and *hours of sight-reading practice* (*r* = 0*.*48), into a mean correlation (*r* = 0*.*56) to be used as a global predictor for sight-reading performance (see **Table 3**). As a result of a 2 × 2 experimental design, four correlations of pianists' accumulated task-specific practice times and scale performances were reported in Kopiez et al. (2012). Again, the four individual correlations - *rLi* = −0*.*47;*rLo* = −0*.*23;*rRi* = −0*.*46;*rRo* = −0*.*50 were aggregated to the study's effect size (*r* = −0*.*42) (Kopiez et al., 2012, Table 6 on p. 372; see comment on negative values below). Finally, in the case of Jabusch et al. (2009, p. 77), two correlations between total life-time practice and music performance (as measured by evenness in scale playing on various dates with a distance of 1 year; *r*<sup>1</sup> = −0*.*47;*r*<sup>2</sup> = −0*.*40) were reported. We calculated and used the mean correlation (|*r*| = 0*.*44) in our meta-analysis.

Jabusch et al.'s (2004) scale-playing paradigm generally resulted in negative correlations (see **Table 3**). Since the authors report the median of the scale-related inter-onset interval standard deviation (medSDIOI) as an indicator for evenness, a low medSDIOI signals high evenness. A positive association between accumulated practice times and the medSDIOI can still be postulated: the longer the pianist's deliberate practice durations, the smaller the degree of unevennes. For the sake of simplicity we used the absolute values of the correlations reported in our meta-analysis (this also applies to Ericsson et al., 1993; Krampe and Ericsson, 1996; Jabusch et al., 2007, 2009; Kopiez et al., 2012).

Finally, the observed correlations as well as the reliabilities of predictor and outcome variables were entered into the Hunter-Schmidt Meta-Analysis software (Schmidt and Le, 2005) so that we could correct all observable correlations for artifacts (Hunter and Schmidt, 2004, p. 75) within the meta-analysis and estimate the population correlation for the "true" effect size (see **Table 4**).

## **RESULTS**

## **STATISTICAL PROCEDURE**

The observed correlation (*ro*) for each study was transformed into its disattenuated *rc* value. This disattenuation procedure is based on the assumption that the observed correlation (*ro*) comprises the "true" value plus the influence of a measurement error that depends on the reliability of both the predictor (*rxx*) and outcome (*ryy*) variable. According to Hunter and Schmidt (2004), the *ro* value has to be corrected for limited reliability of both variables, and this correction is implemented in the *Hunter-Schmidt Meta-Analysis Programs* (see Schmidt and Le, 2005). Detailed results with all steps and for each study are shown in **Table 4**. It is remarkable that 81.2% of the complete variance in all corrected correlations was attributable to the artifacts, a finding which leaves no residual variance to be explained (for an explanation, see Hunter and Schmidt, 2004, p. 401). In other words, our meta-analysis is based on an homogenous corpus of data (*Q*(12) <sup>=</sup> <sup>8</sup>*.*19*, <sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*77;*I*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*00%) which is the outcome of a careful sampling and study selection, guided by the criteria of task-specific practice and objective measurements of music performance.


## **Table 4 | Statistical values of the meta-analysis.**

*N, sample size; ro, observed correlation (Hunter and Schmidt, 2004, p. 96); rxx , reliability of predictor variable (error of measurement in the predictor variable, Hunter and Schmidt, 2004, p. 96); ryy , reliability of outcome variable (error of measurement in the outcome variable, Hunter and Schmidt, 2004, p. 96); A, attenuation factor (ro/rc , Hunter and Schmidt, 2004, p. 118); Var(eo), sampling error variance of each study's uncorrected correlation (Hunter and Schmidt, 2004, p. 87); Var(ec ), sampling error variance of each study's corrected correlation (Hunter and Schmidt, 2004, p. 119); w, study weight (Hunter and Schmidt, 2004, p. 125); rc , corrected study correlation (Hunter and Schmidt, 2004, p. 118); weighted mean observed correlation ro* = *0.54 (Hunter and Schmidt, 2004, p. 81); frequency-weighted average squared error S<sup>2</sup> <sup>r</sup>* = *0.01 (Hunter and Schmidt, 2004, p. 81); mean true score correlation ρ* = *0.61 (Hunter and Schmidt, 2004, p. 125); variance of true score correlations S<sup>2</sup> <sup>ρ</sup>* <sup>=</sup> *0 (Hunter and Schmidt, 2004, p. 126); observed variance of the corrected correlations S<sup>2</sup> rc* = *0.01 (Hunter and Schmidt, 2004, p. 126); variance in corrected correlations attributable to all artifacts S<sup>2</sup> ec* = *0.01 (Hunter and Schmidt, 2004, p. 126); complete variance in corrected correlations (81.2%) is attributable to all artifacts (Hunter and Schmidt, 2004, p. 401); Q-test on study homogeneity as well as I<sup>2</sup> suggest no significant variation across studies (I<sup>2</sup>* <sup>=</sup> *0.00; Q(12)* <sup>=</sup> *8.19, p* = *0.77).*

## **MAIN OUTCOME**

The result from 13 studies regarding the effect of the indicators of DP on musical achievement is summarized in **Figure 2** using a forest plot. Our meta-analysis yielded an average aggregated corrected effect size of *rc* = 0.61, with CI 95% [0.54, 0.67]. According to Cohen's benchmarks (1988, p. 80), this corresponds to a large effect. The size of the squares in the forest plot indicates each study's weight and error bars delimit the 95% CI. The remarkably strong relationship between task-specific practice and musical achievement as measured by objective means is only one facet of the aggregated and corrected correlations. Another facet of the results is the 95% CI as a measure of dispersion for the population effect which is rather narrow [0.54, 0.67] and positive. This feature indicates the stability of our finding. The forest plot also shows that the aggregated correlation is not biased by one or two studies with extreme relative weights. Rather, a total of 4 studies (Hallam, 1998; Meinz, 2000; Tuffiash, 2002; McPherson, 2005) with high relative weights contribute 50% to the aggregated result.

## **TEST FOR PUBLICATION BIAS**

Evidence suggests that due to their selective decision processes and preference for significant results, peer-reviewed journals only partially reflect research activities (Rothstein et al., 2005). This so-called publication or availability bias is an indicator for the existence of unpublished results, and it is a sign of how strongly those unpublished studies could influence the results of a metaanalysis. To detect the presence of a systematic selection bias of publications, we used the so-called funnel plot (Egger et al., 1997) (see **Figure 3**). If publication bias is present, the distribution of results will form an asymmetrically shaped funnel. Fortunately, **Figure 3** shows a nearly symmetrical distribution of effect sizes in relation to the standard error (the indicator of precision). With the exception of one, the effect sizes lie within the funnel's shape and are centered symmetrically around the aggregated mean of *rc* = 0.61. Such considerably low bias is one of the strengths of our meta-analysis and the result of carefully defined criteria for inclusion (see **Figure 1**).

## **DISCUSSION**

One of the main results of our meta-analysis is the identification of a reliable, aggregated correlation between task-relevant practice and objectively measured musical achievement. Although the central parameter of our analysis of 13 studies is similar to the one calculated by Hambrick et al. (2014) on the basis of 8 studies, there are some marked differences between both approaches. Our results may currently represent the best estimate of this correlation given the published data and methodological tools.

## **COMPARISON OF OUR FINDINGS TO THOSE BY Hambrick et al. (2014)**

An important step in the use of correlation coefficients in metaanalyses is the correction for attenuation (Hunter and Schmidt,

**FIGURE 3 | Funnel plot of studies' effect sizes (***rc* **) against standard error of effect sizes as a test for publication bias.**

2004). It considers the reliability of the outcome and predictor variables in a study. Although we chose conservative estimates of reliability for the disattenuation procedure in the present paper, our resulting correlation value is higher (*rc* = 0.61) than Hambrick et al.'s (2014) (*rc* = 0.52), and it covers a smaller confidence interval (95% CI [0.54, 0.67]) compared to theirs (95% CI [0.43, 0.64]). Therefore, we conclude that our meta-analysis is a more reliable approximation of the "true" correlation between task-relevant practice (including DP) and musical achievement.

In some instances, the predictors we used were different from those Hambrick et al. (2014) had used for their study. For example, they selected the value of *ro* = 0.25 from the sightreading study by Kopiez and Lee (2008). However, this correlation between task-relevant study (i.e., sight-reading expertise) and actual sight-reading achievement was based on the lifetime accumulated practice time in sight-reading (up to the time of data collection). In line with the criteria for the calculation of accumulated practice time employed in Ericsson et al. (1993); Ericsson et al. (Study II, see **Table 3**), and for reasons of comparability, we used the correlation between accumulated sight-reading expertise up to the age of 18 years and sight-reading performance (*ro* = 0.36; Kopiez and Lee, 2008) for our meta-analysis. Life-time accumulated practice durations were only used when no information on the task-specific accumulated practice time until the age of 18 or 20 years could be obtained from the studies. We believe that the careful selection of studies and variables based on selection criteria of objective measurement for the outcome (performance) variable and clear calculations of accumulated practice durations are the main reasons for the differences between Hambrick et al.'s results and ours.

## **THE ROLE OF POSSIBLE FURTHER MODERATING VARIABLES ON PERFORMANCE**

The discussion on the influence of variables other than study durations that might influence musical achievement is ongoing and interesting. Here, we wish to comment on the tendency of authors to use headings for publications that can be misleading for the uninformed reader. For example, Meinz and Hambrick (2010) insinuate that there might be (heritable) variables which have a significant influence on musical achievement, and they suggest working memory capacity as such an influential factor. Yet, their main finding regarding the central role of various forms of relevant practice on sight-reading achievement (within a range from *ro* = 0.37 to 0.67) implies that working memory capacity can only contribute a smaller proportion of the variance (*ro* = 0.28). Although the authors conclude "that deliberate practice accounted for nearly half of the total variance in piano sightreading performance" (Meinz and Hambrick, 2010, p. 914), the article title, "Limits on the Predictive Power of Domain-Specific Experience and Knowledge in Skilled Performance," defames the role of deliberate practice. A second case is the publication by Ruthsatz et al. (2008) in which the authors found a low correlation between general intelligence (IQ) and musical achievement of *ro* = 0.25 (Study 1), 0.11 (Study 2A), and −0.01 (Study 2B) but a large one between accumulated practice time and musical achievement (*ro* = 0.34 [Study 1], 0.31 [Study 2A], and 0.54 [Study 2B]). Their combination of "other" variables exceeds the influence of deliberate practice times only when the aggregated correlations of IQ and music audiation are compared with the influence of the individual predictor of practice. However, it is well-known that Gordon's tests of audiation (AMMA), which Ruthsatz uses, is influenced by musical experience and thus already captures effects of DP. In light of such findings, the authors' claim that "higher-level musicians report significantly higher mean levels of characteristics such as general intelligence and music audiation, in addition to higher levels of accumulated practice time" (Ruthsatz et al., 2008, p. 330) is grossly misleading.

Another argument for a differentiated view of our findings arises from the erroneous interpretation of *r* (or *rc*) values as *r* 2 values known from common variance. For example, Hambrick et al. (2014, p. 7) state: "On average across studies, deliberate practice explained about 30% of the reliable variance in music performance." However, according to Hunter and Schmidt (2004, p. 190), this is a problematic interpretation with regard to findings from a meta-analysis, because the *r* <sup>2</sup> value is "related only in a very nonlinear way to the magnitudes of effect sizes that determine their impact in the real world." Instead, relationships between variables should be interpreted in terms of linear relationships. Therefore, we could illustrate the relevance of our meta-analytical finding by means of a correlation simulation based on a sample size of *N* = 788 and a given correlation of *rc* = 0.61. **Figure 4** displays this simulation with the linear increase of one unit on the *x*-axis corresponding to an increase of musical skill level or achievement by 0.61 units. If we expressed this in terms of an experimental between-groups design, this *rc* value of 0.61 would translate to a Cohen's *d* of 1.52 which implicates a very large effect (Ellis, 2010, p. 16). In our view, this is a strong argument for the eminent importance of long-term DP for skill acquisition and achievement.

**indicators of DP and musical achievement based on a simulation with** *N* **= 788 normal distributed cases with a mean of 0.** An increase of 1 unit on the *x*-axis corresponds to an increase of 0.61 units on the *y*-axis.

In summary, it is incorrect to interpret our findings (*rc* = 0.61) as evidence that DP explains 36% of the variance in attained music performance. Instead, it is correct to state that the currently trackable correlation between an approximation of deliberate practice with indicators such as solitary study or task-relevent training experiences is related to measurements of music performance with *rc* = 0.61.

## **FUTURE PERSPECTIVES**

Currently, there is a lack of controlled empirical studies based on the expertise theory in the domain of music. This problem is reflected in the small number of studies (*N* = 13) conducted over the last 20 years which matched the rigorous selection criteria of our meta-analysis. One of the main challenges in the future will therefore be to extend the base of reliable experimental data. This means that studies should use state of the art measurements of relevant deliberate practice durations (e.g., year-by-year retrospective reports, diaries etc.) and objective and reliable assessments of performance variables (e.g., preferably hard performance measurements or consensual expert ratings of performance achievements). All of this was demanded many years ago (e.g., Ericsson and Smith, 1991). The use of standardized performance tasks (e.g., intact performance such as sight-reading with a pacing voice or isolated subskills such as scale playing at a given speed) with the objective measurement of performance and additional information on their reliabilities will be mandatory for investigating the "true" relationship between taskspecific practice and musical achievement. This demand underscores Ericsson's (2014, p. 16) claim that "the expert-performance framework restricts its research to objectively measurable performance. It rejects research based on supervisor ratings and other social indicators*...* ." Consequently, self-reports on abilities, the rating of a musican's skill level by an orchestra's conductor, and reports of parents about their child's level of achievement are not acceptable as objective indicators of performance. The question of whether the expert performance framework generalizes to the general population also awaits investigation (Ericsson, 2014). As our findings are currently limited to music, it will be necessary to cross-validate them with meta-analytic findings in other domains of expertise, such as sports or chess. The likelihood of their being generalizable is high, though, due to the methodological rigor of our study.

One general problem for the domain of music is that time estimations of practice durations are only approximate indicators of deliberate practice, which by definition only constitutes optimized practice and training activities. If we were able to identify the actual amount of deliberate practice inherent in the durational estimates that currently also include suboptimal practice activities, especially in sub-expert populations, then the aggregated correlations could certainly be higher than *rc* = 0.61. Solitary practice might also not cover all aspects of deliberate practice (e.g., competition experience). Thus, our figure of *rc* = 0.61 might currently be considered as the theoretically lower bound of the true effect of DP. The most suitable future studies that could untangle this empirical conundrum would include micro-analyses of practice activities and in particular longitudinal studies like the one's by McPherson et al. (2012) for music; or Gruber et al. (1994) for chess. Such studies should be the natural next step in the quest for the factors that mediate expert and exceptional performance.

## **AUTHOR CONTRIBUTIONS**

Conceived and designed the meta-analysis: Andreas C. Lehmann, Reinhard Kopiez, Friedrich Platz, Anna Wolf. Conducted the search for references: Reinhard Kopiez, Anna Wolf, Friedrich Platz, Andreas C. Lehmann. Analyzed the data: Friedrich Platz, Anna Wolf, Reinhard Kopiez, Andreas C. Lehmann. Wrote the paper: Friedrich Platz, Reinhard Kopiez, Andreas C. Lehmann, Anna Wolf.

## **ACKNOWLEDGMENTS**

This study was supported by a grant from the German Research Foundation (DFG Grant No. KO 1912/9-1) awarded to the second and third author. We thank David Z. Hambrick for his very helpful cooperation, Hans-Christian Jabusch, and Gary McPherson for making their data available to us.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 10 April 2014; accepted: 06 June 2014; published online: 26 June 2014.*

*Citation: Platz F, Kopiez R, Lehmann AC and Wolf A (2014) The influence of deliberate practice on musical achievement: a meta-analysis. Front. Psychol. 5:646. doi: 10.3389/fpsyg.2014.00646*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Platz, Kopiez, Lehmann and Wolf. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

<sup>∗</sup>References marked with an asterisk indicate studies included in the metaanalysis. The in-text citations to studies selected for meta-analysis are not preceded by asterisks.

## Facing facts about deliberate practice

#### *David Z. Hambrick1 \*, Erik M. Altmann1, Frederick L. Oswald2, Elizabeth J. Meinz <sup>3</sup> and Fernand Gobet <sup>4</sup>*

*<sup>1</sup> Department of Psychology, Michigan State University, East Lansing, MI, USA*

*<sup>2</sup> Department of Psychology, Rice University, Houston, TX, USA*

*<sup>3</sup> Department of Psychology, Southern Illinois University Edwardsville, Edwardsville, IL, USA*

*<sup>4</sup> Institute of Psychology, Health, and Society, University of Liverpool, Liverpool, UK*

*\*Correspondence: hambric3@msu.edu*

#### *Edited by:*

*Michael H. Connors, Macquarie University, Australia*

#### *Reviewed by:*

*Lena Rachel Quinto, Macquarie University, Australia Michael H. Connors, Macquarie University, Australia*

**Keywords: deliberate practice, music, expertise, expert performance, individual differences, talent**

#### **A commentary on**

**The influence of deliberate practice on musical achievement: a meta-analysis**

*by Platz, F., Kopiez, R., Lehmann, A. C., and Wolf, A. (2014). Front. Psychol. 5:646. doi: 10.3389/fpsyg.2014.00646*

More than 20 years ago, Ericsson and colleagues proposed that "individual differences in ultimate performance can largely be accounted for by differential amounts of past and current levels of practice" (Ericsson et al., 1993, p. 392). We empirically tested this claim through a metaanalysis of studies of music and chess (Hambrick et al., 2014). The claim was not supported. Deliberate practice accounted for about one-third of the reliable variance in performance in each domain, leaving most of the variance explainable by other factors.

Focusing on music, Platz et al. (2014) identified 13 studies of the relationship between deliberate practice and performance and found a correlation of 0.61 after correcting for unreliability. We credit Platz et al. for their effort and thank them for their criticisms of our meta-analysis. However, none of these criticisms challenge our conclusion that deliberate practice is not as important as Ericsson and colleagues have argued.

Platz et al.'s (2014) major criticism targets our conclusion that deliberate practice accounted for 30% of the variance in music performance. They write that "relationships between variables should be interpreted in terms of linear relationships" (p. 10), and that "it is incorrect to interpret our findings (*rc* = 0*.*61) as evidence that DP explains 36% of the variance in attained music performance" (p. 11). They base this criticism on Hunter and Schmidt's (2004) argument that effect sizes from meta-analyses (and primary research) be reported as correlations rather than estimates of variance accounted for (i.e., *r*s rather than *r*2s).

Platz et al.'s (2014) criticism is puzzling for two reasons. First, other researchers have characterized the importance of deliberate practice in terms of variance (individual differences) accounted for—including not only Ericsson et al. (1993), but also two authors of the Platz et al. article (Reinhard Kopiez and Andreas Lehmann). For example, Kopiez and colleagues concluded that "the total life practice time at the beginning of the study correlated moderately with the baseline performance values and predicted *only 17% of their variance*" (Jabusch et al., 2009, p. 80, italics added; see also Lehmann and Ericsson, 1996; Kopiez and Lee, 2006, 2008). Second, Hunter and Schmidt's (2004) point is not that *r*<sup>2</sup> is statistically incorrect. Indeed, *r* and *r*<sup>2</sup> are both standard indexes of effect size (Cohen, 1988), providing different ways to conceptualize the strength of statistical relationships. Rather, their point is that *r*<sup>2</sup> can make theoretically and practically important relationships seem trivially small—as when a correlation of, say, 0.30 between a predictor and an outcome is dismissed because "only" 9% of the variance is explained. For this reason, we reported both *r* and *r*<sup>2</sup> values in our meta-analysis. Moreover, to avoid trivializing the role of deliberate practice, we have repeatedly emphasized its importance—the necessity of it for becoming an expert. In no less a public forum than the opinion pages of *The New York Times*, two of us commented that there is no denying the "power of practice" (Hambrick and Meinz, 2011). Again, our conclusion is not that deliberate practice is unimportant, either statistically or theoretically; it is that deliberate practice is *not as important as Ericsson and colleagues have argued*, in the precise sense that factors other than deliberate practice account for most of the variance in performance. Platz et al. apparently miss this point.

Platz et al. (2014) also take aim at the criteria we used for including a study in our meta-analysis, calling them "intuitive" (p. 4). In fact, our criteria were dictated by the theoretical claim we sought to test and were clearly stated in our article—measures of accumulated amount of deliberate practice and performance were collected and a correlation between these measures was reported. Platz et al. did find a few studies in their literature search that we did not, but this does not bear on our conclusion that deliberate practice is not as important as Ericsson and colleagues have argued. In fact, the results of Platz et al.'s meta-analysis support this conclusion: A correlation of 0.61 between deliberate practice and music performance leaves room for two additional orthogonal predictors of nearly the same magnitude (*r*s = 0.56).

Perhaps with an inkling of this, Platz et al. (2014) argue that their correlation of 0.61 might be regarded as the "theoretically lower bound of the true effect of DP" (p. 11) because "time estimations of practice durations are only approximate indicators of deliberate practice" (p. 11). But their correlation could equally well be regarded as an *upper* bound on the true effect of deliberate practice. For example, using retrospective questionnaires to measure deliberate practice could lead to inflated correlations between deliberate practice and performance if people base practice estimates on their skill rather than recollections of engaging in practice. The more general problem with Platz et al.'s argument is that it can always be made: if the correlation between deliberate practice and performance is not as high as one likes, one can always argue that this is because the measure of deliberate practice is imperfect—making it impossible to falsify hypotheses about the predictive value of deliberate practice.

Finally, some measures used by Platz et al. (2014) may not be estimates of deliberate practice. For example, for some studies, they used the correlation between number of accompanying performances and sight-reading performance, but number of accompanying performances could be considered a measure of what Ericsson et al. (1993) termed "work," as distinct from deliberate practice. Platz et al. are also inconsistent in what they consider the accumulation period for deliberate practice (e.g., lifetime for some studies, to age 18 for others).

The bottom line is that, in all major domains in which deliberate practice has been studied, most of the variance in performance is explained by factors other than deliberate practice (Macnamara et al., 2014). These factors may include starting age (Gobet and Campitelli, 2007), working memory capacity (Meinz and Hambrick, 2010), and genes (Hambrick and Tucker-Drob, 2014). For scientists, the task now is to develop and test falsifiable theories of expertise that include as many relevant constructs as possible.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 June 2014; accepted: 27 June 2014; published online: 17 July 2014.*

*Citation: Hambrick DZ, Altmann EM, Oswald FL, Meinz EJ and Gobet F (2014) Facing facts about deliberate practice. Front. Psychol. 5:751. doi: 10.3389/fpsyg. 2014.00751*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Hambrick, Altmann, Oswald, Meinz and Gobet. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Training principles to advance expertise

#### *Alice F. Healy1 \*, James A. Kole2 and Lyle E. Bourne Jr. <sup>1</sup>*

*<sup>1</sup> Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA*

*<sup>2</sup> School of Psychological Sciences, University of Northern Colorado, Greeley, CO, USA \*Correspondence: alice.healy@colorado.edu*

*Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Craig Speelman, Edith Cowan University, Australia*

**Keywords: training, acquisition, retention, transfer, efficiency, durability, generalizability, expertise**

## **INTRODUCTION**

There are three forms of task engagement that are the basis of successful training for expertise—acquisition, retention, and transfer—and three corresponding goals of training—efficiency, durability, and generalizability. This paper reviews training conditions that facilitate acquisition, enhance retention, and promote transfer to contexts not encountered during training. Diligent practice under these training conditions can lead eventually to an elite level of performance or to expertise.

In this review of training principles, we emphasize those that are derived from work in our laboratory. We have found that developing training that optimizes efficiency, durability, and generalizability, however, is something of a balancing act because what promotes efficient training often comes at a price in durability, and durable training is not always generalizable (see Healy et al., 2012; Bourne and Healy, 2014). These tradeoffs are due in part to the fact that training might involve two different types of knowledge declarative and procedural. Declarative knowledge consists of facts, whereas procedural knowledge consists of skills, or ways to use the facts, and these two types of knowledge differ in terms of their durability and generalizability.

## **ACQUISITION**

There are training principles that can be used to improve the efficiency of acquiring knowledge and skills. One way to increase training speed involves the *scheduling of feedback* given to the trainee. Trial-bytrial feedback can facilitate learning in many situations, especially by improving motivation and by identifying and correcting errors. Frequent feedback can be distracting, however, when the trainees already have a good sense of how well they are responding. In those cases periodic summary feedback, which is provided on only some percentage of training trials, can be a more effective procedure for promoting skill acquisition (Schmidt et al., 1989, demonstrated this principle with a ballistic timing task).

In terms of the difficulty of new material that is presented during training, there is an optimal *zone of learnability*, implying that training changes should be neither too difficult nor too easy. Specifically, training trials should contain information that is a little beyond what the trainee already knows or should require a bit faster or more accurate performance (Wolfe et al., 1998).

Training can be based on mental, as opposed to physical, practice. Although physical practice is better than mental practice in many circumstances, *mental practice* can be superior to physical practice when the practice involves different effectors from those used at testing (Wohldmann et al., 2008a). For example, training with the right hand and testing with the left hand can lead to interference during physical practice, but not during mental practice, which involves a more abstract representation of the skill that appears to be effector independent. Mental practice is especially useful, of course, when circumstances do not allow physical practice.

The *focus of attention* can be varied in training, and training usually benefits from an external focus on the results of body movements as opposed to an internal focus on the body movements themselves (Wulf, 2007). For example, to perfect the skill of dart throwing, attention should be focused on the trajectory of the dart rather than on the motion of the arm (Lohse et al., 2010).

Research has concluded that for learning new skills (but not necessarily for learning new facts) rest intervals should be interpolated between practice trials (distributed practice) rather then practicing without rest intervals (massed practice), especially if testing occurs after a delay following practice (see Cepeda et al., 2006, for a review of effects involving *spacing of practice*).

When engaged in prolonged work on a routine task, it is often advisable to introduce into the training regimen a *cognitive antidote*, which is a simple cognitive requirement that can be added to minimize task disengagement and boredom and to mitigate a decline in accuracy across trials (Kole et al., 2008). For example, when entering into a computer a long sequence of numbers, alternating between using the + and − keys as the concluding keystroke, rather than using only a single key to conclude each number, increases the accuracy of digit entry and eliminates the usual increase in errors that occurs with increasing practice.

Task requirements can sometimes be broken up into parts, with practice at the beginning of training involving only one or more parts of a task rather than the whole task (Wickens et al., 2012). The recommendation to use *part-task training* applies to a segmented task, when the parts are performed sequentially (Wightman and Lintern, 1985), but not to a fractionated task, when the parts are performed simultaneously (Adams and Hufford, 1962).

## **RETENTION**

The durability of training can be enhanced by selected training principles. Some of these principles benefit training retention with a cost in training efficiency, but others benefit retention with little or no cost in training efficiency.

## **RETENTION VARIABLES WITH A COST IN EFFICIENCY**

Complications can sometimes be added to training to promote retention (Schneider et al., 2002). However, not all complications that increase *task difficulty* are desirable (Bjork, 1994); difficulties at training facilitate retention only when they force learners to apply task-relevant cognitive processes (McDaniel and Butler, 2011). For example, in training RADAR detection, adding to training a secondary task that was irrelevant lowered RADAR detection performance at test, but adding a relevant secondary task at training aided RADAR detection accuracy at test (Young et al., 2011).

As research on experts has shown (Ericsson et al., 1980), training should make *strategic use of knowledge* that trainees already have (Kole and Healy, 2007). For example, when learning facts about unknown people, associating each of those people with someone familiar (e.g., a friend or relative) requires additional learning but can considerably enhance acquisition and retention of those facts.

Also following the methods of experts, training can be improved by using *memory strategies*, such as the keyword method, by which two items can be associated in memory through a keyword serving as a mediating link (Kole and Healy, 2013). When learning the association between the French word *jambe* and the English word *leg*, for example, learners could use the keyword *jam* and form an interactive image of some sticky jam on someone's leg. This method will allow the learners to derive the translation of *jambe*, but it does require forming extra links from *jambe* to *jam* and from *jam* to *leg*, which slows initial acquisition.

## **RETENTION VARIABLES WITH NO COST IN EFFICIENCY**

Items can be grouped together in chunks during training to promote retention. *Item chunking* yields no loss, and probably a gain, in training efficiency (Miller, 1956). For example, when required to type a four-digit number (e.g., 9632), individuals often find it helpful to represent the number as two two-digit chunks (96 and 32) (Fendrich et al., 1991). Also, memory experts can learn a long list of digits by dividing them into chunks representing familiar sequences like running times or dates (Ericsson et al., 1980).

Similar to chunking, two separate tasks can often be combined into a single *functional task* to promote durability with no cost to efficiency. If a secondary task is combined with a primary task at test, the two tasks should be combined during training as well for best test performance (Wohldmann et al., 2012). In fact, in some cases removing at test a difficult secondary task that was available during training can retard test performance; for example, removing an alphabet-counting task used during the training of time estimation can increase errors in time estimation during test (Healy et al., 2005).

For optimal retention, the procedures used in training should be duplicated at test. *Procedural reinstatement* is effective because declarative information (facts) shows strong generalizability but weak durability, whereas procedural information (skills) shows strong durability but weak generalizability (Lohse and Healy, 2012). Despite the high degree of transfer for declarative information, when learning facts it is best to make sure that there is an overlap in the processing required at training and test (i.e., ensure that there is transfer appropriate processing; Morris et al., 1977; Roediger et al., 1989). Another way to improve the retention of factual material is by studying it at its most meaningful level, as opposed to studying it at a superficial level (e.g., paying attention to the sound or spelling of the material), an effect called *depth of processing* (Craik and Lockhart, 1972).

Practice retrieving factual information, instead of simply restudying the material, can improve retention, demonstrating a *testing effect* (Roediger and Karpicke, 2006). A related method to enhance fact retention is to generate the information rather than just to read it, demonstrating a *generation effect* (Slamecka and Graf, 1978). For example, if the task is to remember the answers to a set of multiplication problems, it is better to generate or verify the answers to the problems than to simply read them or perform the multiplication using a calculator (Crutcher and Healy, 1989). See Karpicke and Zaromb (2010) for a comparison of the similar but not identical benefits of testing and generation.

For skills, periodic restudy of material during periods of disuse, called *refresher training*, enhances skill retention (McDaniel, 2012). On the other hand, training should not be done more than necessary, resulting in *overlearning or overtraining*, because extra practice produces diminishing performance enhancement returns (Driskell et al., 1992).

## **TRANSFER**

Improving the generalizabilty of knowledge and skills is considerably more difficult than improving their efficiency and durability. There are only a few training principles shown to be effective for enhancing task transfer.

One way to enhance transfer is to change the conditions of practice periodically, thereby increasing the *variability of practice* (Schmidt and Bjork, 1992). Not all forms of variable practice are effective, however. Varying parameters within a single generalized motor program can enhance transfer, but varying the generalized motor programs themselves might not benefit transfer. For example, learning how to cope with a variety of defective computer mice that vary in their properties (e.g., a mouse that reverses only vertical movements and another mouse that reverses both horizontal and vertical movements) does not enhance transfer to a new type of defective mouse (e.g., a mouse that reverses only horizontal movements) (Healy et al., 2006). Practicing to move a single defective mouse to a variety of targets, however, does enhance transfer to moving that same mouse to new targets (Wohldmann et al., 2008b).

For tasks involving quantitative estimation (e.g., of country populations or intercity distances), a technique of *seeding the knowledge base* can be used, in which a few selected examples are given that define a domain and thereby enhance transfer to unpracticed examples (Brown and Siegler, 1996).

When possible, the discovery of rules can lead to generalizable knowledge, so that searching for systematic relationships among examples can bolster transfer of training, showing the advantage for *rules* vs*. instance memory* (Bourne et al., 2010; McDaniel et al., in press). For example, in learning how to choose between the alternate pronunciations for the word *the* (as thuh or thee) individuals can either learn the pronunciation preceding every single word or can learn the rule that thuh is used preceding words starting with a consonant sound and thee is used preceding words starting with a vowel sound.

#### **CONCLUDING REMARKS**

This summary, which is outlined in **Table 1**, does not include all training principles that have been shown to be effective in promoting training efficiency, durability, and generalizability. For additional principles, readers are referred to

#### **Table 1 | Training principles as a function of form of task engagement and training goal.**

#### **ACQUISITION, EFFICIENCY**

Scheduling of feedback Zone of learnability Mental practice Focus of attention Spacing of practice Cognitive antidote Part-task training

#### **RETENTION, DURABILITY**

*Retention variables with a cost in efficiency* Task difficulty Strategic use of knowledge Memory strategies *Retention variables with no cost in efficiency* Item chunking Functional task Procedural reinstatement Depth of processing Testing effect Generation effect Refresher training Overlearning or overtraining **TRANSFER, GENERALIZABILITY** Variability of practice

Seeding the knowledge base Rules vs. instance memory

Bourne and Healy (2014) and Healy et al. (2012), and for an initial attempt to account for the principles in a single theoretical framework, readers are referred to Jones et al. (2012), which is available by request from the corresponding author.

Almost all of the studies reviewed here have involved novice learners. Nevertheless, these same principles should apply for training at any level, so they should be informative with respect to the issue of training to an elite level of performance. In fact, some of these principles (e.g., focus of attention) seem to apply more strongly for experts than for novices (e.g., Perkins-Ceccato et al., 2003). To reach elite levels of performance, however, the learners need to couple these training principles with the use of deliberate, highly focused, and effortful practice (Ericsson et al., 1993).

#### **AUTHOR CONTRIBUTIONS**

Alice F. Healy wrote the initial and final drafts of the manuscript. James A. Kole and Lyle E. Bourne Jr. edited the initial draft of the manuscript and suggested revisions of it.

### **ACKNOWLEDGMENTS**

We are indebted to the members of the Center for Research on Training at the University of Colorado for their helpful suggestions about this research. The research reported here was supported by Army Research Office Grant W911NF-05-1-0153 and National Aeronautics and Space Administration Grant NNX10AC87A.

### **REFERENCES**


*Psychol. Bull. Rev.* 3, 385–388. doi: 10.3758/BF03 210766


and implications for educational practice. *Perspect. Psychol. Sci.* 1, 181–210. doi: 10.1111/j.1745- 6916.2006.00012.x


less retroactive interference and more transfer than physical practice. *J. Exp. Psychol. Learn. Mem. Cogn.* 34, 823–833. doi: 10.1037/0278- 7393.34.4.823


*Received: 09 January 2014; accepted: 30 January 2014; published online: 19 February 2014.*

*Citation: Healy AF, Kole JA and Bourne LE Jr. (2014) Training principles to advance expertise. Front. Psychol. 5:131. doi: 10.3389/fpsyg.2014.00131*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Healy, Kole and Bourne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The acquisition of expertise in the classroom: are current models of education appropriate?

## *Craig P. Speelman\**

*School of Psychology and Social Science, Cognition Research Group, Edith Cowan University, Joondalup, Australia \*Correspondence: c.speelman@ecu.edu.au*

#### *Edited and reviewed by:*

*Michael H. Connors, Macquarie University, Australia*

**Keywords: expertise, skill, numeracy, mathematics, computer games**

## **INTRODUCTION**

The study of expertise tends to focus on humans who can perform extraordinary feats. Although the way in which expertise is acquired is often characterized as similar to everyday skill acquisition, the attainment of basic numeracy skills is rarely considered in the same context as the attainment of expertise. It is clear, though, that average numeracy skills possess all the hallmarks of expert performance. In this paper I argue that the traditional classroom of Western education systems pays insufficient attention to the idea that effective numeracy skills represent a level of expertise that requires a particular form of training. Using the five principles of skill acquisition identified by Speelman and Kirsner (2005), I argue that the modern classroom is not the most appropriate environment for acquiring important cognitive skills, and that computer programs, such as games and tailored training tasks, should be considered a valuable adjunct to traditional didactic instruction.

## **THE NATURE OF EXPERTISE**

Experts in most fields are characterized as people who have more knowledge and abilities than non-experts. In cognitive science, this superiority in knowledge and abilities has been reported as better memory, more extensive knowledge, and highly developed procedural and perceptual skills. More often than not, experts are described as people who can perform extraordinary feats of memory and cognition, whilst possessing the same cognitive apparatus as nonexperts. Their superior performance is typically explained as the end product of skill acquisition through many years of dedicated practice in their field.

## **ACQUISITION OF EXPERTISE**

Based on over 100 years of research on skill acquisition and expertise (e.g., Anderson, 1982; Logan, 1988; Ericsson et al., 1993), Speelman and Kirsner (2005) identified five principles of skill acquisition that explain how expertise is associated with superior feats of cognition. In brief, the five principles of skill acquisition state that (1) practice leads to faster and (2) more efficient uses of knowledge, which enables faster performance and (3) results in less demand on mental resources. As a result, the performance of low level tasks becomes second nature, and (4) this frees up mental resources that can be utilized to attempt higher level behaviors. Ultimately (5) skilled performance reflects the development of many component processes.

## **ACQUISITION OF NUMERACY SKILLS**

The five principles of skill acquisition were derived from research in a broad range of fields in which people have been reported to have acquired expertise. As a result, Speelman and Kirsner (2005) claim that these principles apply to any area of cognition in which practice can lead to improved performance. Importantly, these principles can also explain performance improvements along the trajectory of expertise attainment, even at levels below which someone would be considered an expert. So, whereas in a discussion of mathematics experts Butterworth (2006) considered only those 1 in several million people with extreme abilities, the principles suggest we all show elements of expertise as we learn to count and add. I have been surprised to learn, however, how little of this view of skill acquisition is reflected in current education practice. In particular, the attainment of basic numeracy skills is rarely considered as a form of expertise acquisition, and nor are difficulties with the learning of numeracy skills seen as problems in the acquisition of expertise (e.g., a lack of practice). Instead, learning difficulties are often seen as resulting from some neurological or developmental disorder that adversely affects a child's ability to learn mathematics (Clark et al., 2014; Haase et al., 2014), or some systemic issue related to the school system (Ramsden, 1984; Biggs, 1999; van Kraayenoord and Elkins, 2004) or the child's culture (Whitburn, 1996) or SES (OECD, 2013). And yet, numeracy skills, particularly those related to early learning of arithmetic and number facts, share features with many examples of expertise. For example, they require many years of practice to attain, performance relies on pattern recognition of a large number of items (i.e., numbers and symbols) and retrieval from memory of a large number of facts, and expertise attainment is characterized as a transition through a hierarchy of skills (e.g., counting to addition) that can only occur when performance at the lower level has attained a particular level of skill (Pellegrino and Goldman, 1987; Neumann et al., 2013). These are all features that have been identified in experts in a range of other domains (e.g., Ericsson, 1996). It is rare to see the development of basic numeracy skills characterized in this manner in education research (e.g., Griffin, 2009; Lei et al., 2009; Yelland and Kildery, 2010; Neumann et al., 2013), and certainly my conversations with primary school teachers reveals they are unaware of such concepts.

## **THE TRADITIONAL WESTERN CLASSROOM**

The traditional model of the western classroom, with a teacher at the front delivering instruction to a class of 20–30 children, does not sit easily with the view of skill acquisition reflected in the five principles. Although a teacher may be able to effectively convey declarative and procedural information relevant to an arithmetic task (e.g., demonstrate how 2 × 4 is equivalent to 4 + 4), for a child to convert this sort of knowledge into a form of expertise requires a great deal of practice. Traditionally a child would get this practice by attempting to solve many problems like 2 × 4 = ? until the teacher is satisfied that most children have grasped the concept. But "grasping" the concept may not be sufficient. According to the five principles, what counts as "sufficient" practice is determined by what comes next in the trajectory of skill acquisition. For instance, if the "2×" problems are introduced as an extension of addition facts, it is important that the child is sufficiently skilled at retrieving relevant addition facts. If they stumble over the idea of 4 + 4 (i.e., they do not retrieve the answer quickly) while the teacher is explaining the "2×" concept, they may be forced to generate the answer by a counting method (e.g., start from 4 and count a further 4 places) (Pellegrino and Goldman, 1987). Such a counting strategy will tax working memory and as a result the child may not have sufficient working memory capacity to follow the explanation of the "2×" concept. This strategy will also take extra time and so the student may fall behind the teacher's explanation. It is difficult for a teacher to ensure that the pace of a lesson matches the learning rates of all children in the class, and also to monitor that all children have mastered a concept prior to introducing the next concept. Children who are fast learners may become bored and disruptive if the lesson is paced too slow. Children who are slower learners may be left behind by a lesson that is paced to match the learning rate of the faster learners. It would not take too many experiences of being left behind for someone to believe they just do not have a "maths brain" (Swan, 2004), or even develop a "maths phobia" (Furner and Berman, 2003). Even if the attainment of proficiency is assumed to occur through homework drills, it is likely that only accuracy is checked by a teacher, when speed of access to number facts is also necessary to avoid the problem just described.

## **THE DECLINE IN NUMERACY SKILLS**

The suggestion that the traditional classroom may lead to some children struggling with the acquisition of basic numeracy skills goes some way to explain the slide in numeracy skills in many western countries like Australia [as indicated in the PISA results of 2009 (OECD, 2013)]. The problem in Australia appears to be longstanding as 53% of the Australian adult population is functionally innumerate (ABS, 2006), which indicates that many would not comprehend a bank statement. Failures in elementary mathematics courses are likely to be compounded by a lack of confidence regarding higher level mathematics, and so many people neglect maths at the higher levels, leading to universities having to provide remedial classes for their commencing students (Healy et al., 2010; Slattery, 2010; Arlington, 2012; Maslen, 2012).

## **AN ALTERNATIVE EDUCATION MODEL**

According to the five principles of skill acquisition, overcoming the problems identified here with the traditional Western classroom would require children to be presented with a structured learning program that involved instruction regarding each level of a hierarchy of concepts, interspersed with practice opportunities. Further, each child would only be introduced to the next level of a concept when they have reached some degree of fluency with the previous level. Until that point they would continue practicing with problems at the previous level, possibly with some form of intervention by a tutor to ensure their understanding of the concept is appropriate. This is probably the aim of most teachers, however the level of monitoring required to ensure each child has reached the requisite level of fluency is possibly beyond the capacity of a teacher responsible for 20–30 children in the one classroom. An alternative model would be to develop computer software in the form of games and tailored training tasks. Such software can be developed to not only provide hours of practice opportunities, but it can do so in an exciting and enjoyable manner that will hold the attention of children and provide them with the motivation to spend many hours mastering a concept (Rosas et al., 2003). Further, the software can be designed to deliver feedback on every response, and monitor the level of performance (i.e., both accuracy and response time) such that a child will be allowed to move to the next higher level of the concept when they have mastered the previous level, as is the case with computer games designed purely for entertainment (Towne et al., 2014). A recent study (Main and O'Rourke, 2011) demonstrated the benefits of such software, where the speed and accuracy of performance on a standard arithmetic test was improved for children who had played a maths game (*Dr Kawashima's Brain Training*) on a hand held games console compared to children who received standard classroom lessons.

Another computer game, *Numbeat,* has been designed explicitly according to the five principles. This game was developed to facilitate the acquisition of basic arithmetic skills in primary school children. *Numbeat* requires a player to destroy some "bad" characters on the screen, before they convert "good" characters into "bad" characters, by filling up some destruction device (e.g., a cannon) with an amount of ammunition that matches the number of "bad" characters. To achieve this aim requires the player to perform several types of mental arithmetic operations. The game is structured so that performance speed is important (e.g., levels have time limits). If a player beats the time limit for a particular level, they are considered to be sufficiently fluent with the arithmetic operations tested in that level, and so are allowed to progress to the next level of the game, which typically represents a slightly more advanced level of arithmetic. A level is repeated if the player does not meet the required performance standard. As such, the game approximates the deliberate practice of challenging tasks that is required to acquire expertise in a domain (Towne et al., 2014). In preliminary trials involving 248 children, playing this game for 10–15 min per day for 8–10 days resulted in an average 5% reduction in the time to solve simple arithmetic problems presented in a traditional manner (e.g., 3 × 4 = ?) (Speelman, 2013).

Although I do not claim that games such as *Dr Kawashima's Brain Training* and *Numbeat* will be the solution to all of the mathematical ills facing many countries, evidence that playing such games can facilitate the acquisition of arithmetic skills in primary school children is a positive step toward addressing learning difficulties in mathematics. This type of research may pave the way toward a situation where teachers can provide the necessary content lessons, and computers can facilitate the necessary practice that will enable students to master each level of a skill before tackling the next step in the skill hierarchy. Ultimately such evidence supports the argument that education in basic numeracy skills should reflect principles of expertise attainment.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 March 2014; accepted: 25 May 2014; published online: 12 June 2014.*

*Citation: Speelman CP (2014) The acquisition of expertise in the classroom: are current models of education appropriate? Front. Psychol. 5:580. doi: 10.3389/fpsyg. 2014.00580*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Speelman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Designing a "better" brain: insights from experts and savants

#### *Fernand Gobet <sup>1</sup> \*, Allan Snyder 2, Terry Bossomaier <sup>3</sup> and Mike Harré4*

*<sup>1</sup> Department of Psychological Sciences, University of Liverpool, Liverpool, UK*

*<sup>2</sup> Centre for the Mind, School of Medicine, University of Sydney, Sydney, NSW, Australia*

*<sup>3</sup> Centre for Research in Complex Systems, Faculty of Business, Charles Sturt University, Bathurst, NSW, Australia*

*<sup>4</sup> Complex Systems Research Group, Faculty of Engineering and IT, University of Sydney, Sydney, NSW, Australia*

*\*Correspondence: fernand.gobet@liv.ac.uk*

#### *Edited and reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

**Keywords: autism, brain, cognitive illusion, creativity, Einstellung effect, expertise, perception, savants**

If we naively, simply scale up the brain more of the same—we would not necessarily have a brain that is qualitatively better. Taking the analogy from visual ecology the eagle—it could just see better at greater heights (Snyder and Miller, 1978)—more of the same!

Instead, let us ask how to design a brain that is qualitatively better in some sense, say at being creative. One bottleneck to creativity is our inability to see things in a new light, free of prior interpretations. Once a pattern is identified and labeled, it is difficult to identify different patterns and find alternative solutions. A classic example is the duck/rabbit illusion, where we can see either a duck or a rabbit but not be simultaneously aware of both. We are blinded by our mindsets, by our expertise! This presumably is a consequence of our hypothesis-driven, cognitive make up. Hypotheses are mental templates that encapsulate expectations of the world as derived from past experience—a brilliant strategy for coping rapidly with partial information in a dynamic but familiar environment (Snyder et al., 2004; Bossomaier et al., 2009). [Although we use the human brain as an example, there is considerably evidence that the conceptdominated architecture is characteristic of the mammalian brain, and possibly also in birds, in particular corvids (e.g., Emery and Clayton, 2004)].

Our brain can recall a seemingly unlimited number of meaningful patterns and labels, but often not the attributes that compose them. In contrast, some autistic savants can recall a seemingly unlimited amount of details, without any attempt to impose meaning (e.g., Wiltshire, 1989). But, savants are less prone to cognitive illusions (Bogdashina, 2003) and that fact gives us a clue for a "better" brain a brain that is hypothesis driven, but is resilient to cognitive illusions, a brain that can in addition see the world with direct perception and thus open to alternative interpretations (Snyder, 2009).

A better brain—natural or artificial would have tremendous implications for our world. Most importantly, it would boost creativity, both in art and in science. Better decisions would be made in politics and business, and long-standing issues such as pollution and hunger would be more likely to be solved. How do we go about to developing such a brain?

A possible strategy is to start from the things humans are doing so well as a result of our adaptation to the environment due to evolutionary pressure, and then to consider why the products of this adaptation sometimes are associated with penalties. Finding a way to fix these instances of penalties in humans leads to insights that can be further applied to the design of better brains.

As a first example, consider visual perception. Overall, our eye and our visual cortex, honed by millions of years of evolution, work very well. However, human visual perception can be fooled surprisingly easily by perceptual and cognitive illusions. We can see things that do not exist (e.g., Escher's impossible figures), do not see things that exist (e.g., reading words rather than seeing black and white pixels), see two objects from the same drawing (e.g., the duck-rabbit illusion), and grossly misjudge the dimensions of objects (e.g., the Ebbinghaus and Ponzo illusions). How can we avoid these illusions? What would this tell us about brain design?

As a second example, consider expertise. Experts can show extreme adaptations to their environment and are capable of great achievements (Ericsson et al., 2006; Didierjean and Gobet, 2008). Tennis players can return services with balls hit at more than 200 km/h, and they can even counter-attack when doing so. Mnemonists can memorize more than 100 digits read at 1 s each. Some chess players can play more than 30 games simultaneously, without seeing the board and the pieces. But expertise sometimes fails. In their study on the Einstellung effect in chess, Bilalic et al. (2008a,b, 2010) ´ showed that even experts can fail to find an optimal solution when a common solution comes first to their mind. The effect is surprisingly powerful: compared to control positions, players lose about 1 standard deviation in skill when showing the Einstellung effect. This illustrates the strength of the schemas we hold in longterm memory and the power that our preconceptions have on our mind.

What are the solutions to perceptual/cognitive illusions, such as those displayed in visual illusions and in the Einstellung effect? Intuitively, there are two main approaches. The first one is to use more knowledge (i.e., scaling up) and could be called "quantitative scaling." It could be summarized by the phrase "more of the same." This is the standard approach in computer science and artificial intelligence.

Quantitative scaling has had some tremendous successes in artificial intelligence, such as the victory of Deep Blue against chess world champion Kasparov in 1997 (Campbell et al., 2002). But increase of knowledge does not always provide a solution; for example, many visual illusions persist in spite of our knowledge that they exist. Indeed, quantitative scaling has also met with problems in artificial intelligence. Most notably, it has failed in many attempts, such as playing Go at master level (Gobet et al., 2004; Hsu, 2007), providing computers with common sense (McCarthy, 2007; Sarbo, 2007), and in general developing genuine, domain-general intelligence and creativity (Bridewell and Langley, 2010; Jennings, 2010).

A second approach could be called "qualitative scaling." In essence, this leads to the definition and use of new conceptual spaces. To get to these new spaces, we need to break apart the existing conceptual structures, whereas in quantitative scaling we build increasingly elaborate concepts on top of what we have already. A classic example is Einstein's conjecture that the speed of light was constant in all inertial reference frames, leading to conclusions very different to those of Newtonian physics. Serialism, cubism or simply Dadaism itself illustrate this notion in the arts.

We know that individuals with autism are less susceptible to some illusions (Walter et al., 2009; Mitchell et al., 2010), and that they process information in a less holistic way (Nakahachi et al., 2008), for example paying more attention to details, which can result in savant skills such as extreme memory for detail and speed in counting objects (numerosity) (Soulieres et al., 2010). We also know that some savant skills can be artificially induced in normal individuals, for example by lowfrequency repetitive transcranial magnetic stimulation (Snyder et al., 2006; Boggio et al., 2009; Snyder, 2009) or transcranial direct current stimulation (Chi and Snyder, 2012).

Our key argument is that our normal cognition, while very efficient, tends to develop cognitive mind sets. Breaking these mind sets can help explore new conceptual spaces, and thus be more creative. Rather than reorganizing knowledge in some superficial way, we propose two radical approaches. The first is to inhibit some concepts or a class of concepts (e.g., a group of concepts that are strongly connected or the most likely concepts in a given situation). The second is to avoid concepts altogether (or at least as much as possible), by using raw perception instead, thus imitating the autistic mind. Our proposal is thus that creativity can be boosted by decreasing conceptual processing and increasing the role of low-level perceptual processing.

Our proposal raises intriguing issues. What is the link between concepts and raw perception? Are they two discrete states, or are they part of a more graded space? How can "raw perception" mitigate or even eliminate Einstellung-like effects? Is raw perception enough? Autism seems to offer a counter-example, where the lack of use of concepts leads to serious intellectual and social impairments. But is it necessarily so?

The history of human thought provides us with creativity examples where the inhibiting-concept strategy was used (possibly unconsciously) and worked. When 2005 Nobel Laureates Marshall and Warren (1984) correctly proposed that stomach ulcers were caused by helicobacter pylori rather than by excess acid, they had to jettison a whole raft of concepts. There are also examples of this in AI research—for instance computer chess and checkers, where new profound insights were gained from brute force search, without using sophisticated concepts. In the latter case, the literal perception used by computers, which is often derided, turns out to be a strength. The cost of deleting concepts might also be studied. If the concepts that are inhibited are infrequent and are not the building blocks of a large number of other concepts, the benefits might outweigh the costs. But what is the threshold? Another solution with artificial systems is parallelism. A possibility would be that the original conceptual base stays online while concepts are inhibited in a copy of the original base that operates offline.

The examples in this article focus on finding novel creative solutions without being bound by prior knowledge. It is an open question as to exactly what a better brain would be: more creative, more efficient, more rational, more adaptive, more altruistic? Answering this question is a huge challenge in itself, and we should be alert to our own mindset in defining the space of possible answers.

The implications for the study of human cognition and the psychology expertise in particular are profound. A better brain would shed considerable but also cruel light on the limits of human cognition and expertise. We already had a preview of this with developments in computer chess. No players, even world champion Magnus Carlsen, can compete with computers nowadays. Computers sometimes find moves that are considered by humans as highly creative, although some of these moves are just beyond human discovery. In addition, computers have led to re-evaluations of large aspects of the game, in particular openings and endgames, which humans had researched for centuries. Be ready to be surprised with better brains!

Irrespective of possible benefits for science and technology, including artificial intelligence, our proposal for a better brain raises important questions for the nature of the human mind. Are qualitative scaling and quantitative scaling really in opposition? Is it adaptive to inhibit some concepts "just" to be creative? Why has such a system not evolved? Is it because, now, we have the luxury to be creative—courtesy of cultural evolution?

## **ACKNOWLEDGMENTS**

Fernand Gobet was supported by a Research Fellowship from the UK Economic and Social Research Council.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 March 2014; accepted: 01 May 2014; published online: 22 May 2014.*

*Citation: Gobet F, Snyder A, Bossomaier T and Harré M (2014) Designing a "better" brain: insights from experts and savants. Front. Psychol. 5:470. doi: 10.3389/fpsyg. 2014.00470*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Gobet, Snyder, Bossomaier and Harré. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## 2011 space odyssey: spatialization as a mechanism to code order allows a close encounter between memory expertise and classic immediate memory studies

#### *Alessandro Guida1 \* and Magali Lavielle-Guida2*

*<sup>1</sup> Psychology Department, CRPCC, Université Rennes 2, Rennes, France*

*<sup>2</sup> Cabinet de Psychologie et d'Orthophonie, St Malo, France*

*\*Correspondence: alessandro.guida@univ-rennes2.fr; alessandro.guida.psychology@gmail.com*

*Edited and reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

**Keywords: spatialization, immediate memory, expertise, long-term working memory, retrieval structures**

In 2011 van Dijk and Fias with an innovative working memory paradigm showed for the first time that words to-beremembered, presented sequentially at the center of a screen acquired a new spatial dimension: the first words of the sequence acquired a left spatial value while the last words acquired a right spatial value. In this article, we argue that this spatialization which putatively underpins how order is coded in immediate memory<sup>1</sup> allows bridging the domain of memory expertise with classic immediate memory studies.

After briefly reviewing the mechanisms for coding order in immediate memory and the recent studies pointing toward spatialization as an explanatory mechanism, we will pinpoint similar mechanisms that are known to exist in memory expertise, particularly in the method of loci. We will terminate by analyzing what these similarities can tell us about expertise.

## **HOW ORDER IS CODED?**

Surprisingly, this very fundamental question has not yet received a definitive answer. If one tries to naively think about a way order could be coded, generally the first idea that comes is chaining: items in a list to-be-remembered are just chained together by our cognitive system. And indeed, for more than four decades, this has been the most prominent idea among researchers (e.g., Wickelgren, 1965; Jordan, 1986; Lewandowsky and Murdock, 1989). This idea beyond being simple and intuitive, is also ancient since it roots back at least to Ebbinghaus (1885/2010). However, in the last two decades chaining models have lost ground, mostly because of experimental results. In immediate memory, error patterns (i.e., transposition and protrusion errors, Estes, 1991; Henson, 1996, 1999) and the distance effect (e.g., Hacker, 1980; Marshuetz et al., 2000) have been difficult to explain with the chaining concept.

## **POSITIONAL TAGGING**

Nowadays prominent models are of a positional kind (e.g., Anderson and Matessa, 1997; Burgess and Hitch, 1999; Brown et al., 2000, 2007; O'Reilly and Soto, 2001; Lewandowsky and Farrell, 2008a; Oberauer and Lewandowsky, 2011). Based on various studies (e.g., Dale, 1987; Poirier and Saint-Aubin, 1996; Mulligan, 1999; Engelkamp and Dehn, 2000; Henson et al., 2003), these models assume that item information and order information are coded and represented separately (for a review, see Marshuetz, 2005). Order is putatively coded through positional coding mechanisms, where a positional marker (or tag)–a context–is associated to each item. These contexts or positional markers can be temporal or not (Lewandowsky and Farrell, 2008b), but several studies seem to run against temporal markers (e.g., Lewandowsky and Brown, 2004, 2005; Lewandowsky et al., 2006), which favors non-temporal ones. Nonetheless if temporal tags are by definition wellknown, the nature of non-temporal tags remains unknown (Lewandowsky and Farrell, 2008b) 2 . It could be an external context such as the environment or/and an internal context such as the inner states of the mind associated with each items.

## **WHAT DOES VAN DIJCK AND FIAS (2011) STUDY CHANGE CONCERNING ORDER CODING?**

In 2011 van Dijck and Fias proposed an alternative explanation of the SNARC (Spatial-Numerical Association of Response Codes) effect. This effect was first popularized by Dehaene et al. (1993). They used a classic parity judgment task where participants had to decide if a number was odd or even. However, the left-/right-hand key assignment was varied: the answer "even" (as the answer "odd") was assigned for half of the trials to one hand and for the other half to the other hand. Results showed a SNARC effect, that is, small numbers triggered faster responses when participants answered with the left hand and large numbers triggered faster responses when participants answered with the right hand. According to Dehaene et al. (1993), the effect was due to the representation

<sup>1</sup> Immediate memory is an umbrella term for working memory and short-term memory.

<sup>2</sup>Lewandowsky and Farrell (2008b) wrote: "The use of context markers does, however, entail a cost: As in many other models (e.g., SEM; Henson, 1998), the structure of the markers across positions is assumed rather than explained by the model. That is, although it is entirely plausible to postulate that the contexts of adjacent items are more similar to each other than the contexts of items separated by intervening events, the precise form of their similarity relationship is not derived from the model's architecture. Are there any candidate mechanisms on the horizon that might permit a more principled derivation of context markers?."

numbers have in (semantic) long-term memory (LTM), that of a mental line, which in western cultures increases from left to right (e.g., Dehaene et al., 1993; Göbel et al., 2011).

This LTM conception of the SNARC was disputed by van Dijck and Fias (2011) using a new paradigm. They proposed that the SNARC effect depended on the organization numbers assume in working memory. In the study, participants were presented five random numbers (ranging from 1 to 10) to-be-remembered in correct order. Numbers were displayed at the center of a screen. After the presentation phase, numbers ranging from 1 to 10 were displayed randomly at the center of screen. When a number to-be-remembered was displayed, participants had to execute a parity judgment task. As in Dehaene et al. (1993), the left-/right-hand key assignment was varied. But instead of the usual SNARC effect, results showed a Spatial-Positional Association of Response Codes (SPoARC) effect, that is, left hand responses were faster with numbers presented in the first positions of the to-beremembered numbers (instead of small numbers in the SNARC effect) and right hand responses were faster with numbers presented in the last positions (instead of big numbers).

## **A NEW POSITIONAL TAGGING MECHANISM: SPATIALIZATION**

This result and others (i.e., van Dijck et al., 2013; Guida, under review) suggest that the initial words of a sequence have a left spatial value while the last words of the same sequence have a right spatial value. Apparently individuals tend to create a spatial mental line based on the order items enter immediate memory (Example 1, **Figure 1**). This is highly compatible with the idea that in verbal immediate memory, items order is coded spatially, through spatialization. Given the fuzzy nature of non-temporal tags, this discovery could allow specifying the way items order is coded in immediate memory.

## **WHAT HAS SPATIALIZATION GOT TO DO WITH MEMORY EXPERTISE?**

Since the very first (internal) mnemonic (Yates, 1966; Worthen and Hunt, 2011) which is thought to be the loci method proposed by Simonides of Ceos (556 BC– 448 BC) and reported by Marcus Tullius Cicero in *De Oratore*, visuo-spatial processes have played a central role to enhance memory for verbal material. Concerning the loci method, Simonides of Ceos proposed to visualize a familiar route or a sequence of familiar locations (like rooms in one's own house) and use them to

**examples.** The upper part of the figure offers a generic and abstract representation of retrieval structures, from Ericsson and Kintsch (1995). The first example is taken from the spatial positional mental line and adapted

from Guida (under review), it represent the encoding of three letters via three spatial positional tags. The second example is from the method of loci, and represents the encoding of three words via known locations used as retrieval cues.

mentally store a list of words (Example 2, **Figure 1**), before a speech for example. Then during the speech, one would take a mental tour and retrieve each word via each familiar location (e.g., kitchen). Greek orators (Yates, 1966; Worthen and Hunt, 2011) soon became experts of the method of loci.

## **THE METHOD OF LOCI: EXPERTISE THROUGH SPATIALIZATION**

Of interest here, is the fact that the loci method necessitates to spatialize the items to-be-remembered in various locations. Moreover the method is not just an ancient oddity, the method efficiency has been confirmed since and still is nowadays. Memory experts (i.e., mnemonists) use it and several memory world records have been set with it. For example Pridmore (2013) was the first man to break the 30 s barrier in the Speed Cards discipline, which necessitates to memorize the order of a shuffled deck of cards. To do so, he used a system based on the method of loci, he spatialized groups of two cards the long of a familiar route.

## **IS THERE A THEORY OF EXPERTISE THAT SUPPORTS THE LOCI METHOD PHENOMENOLOGY WHICH POINTS TOWARD SPATIALIZATION?**

Even if mnemonics and memory expertise are very ancient (certainly due to oral tradition, see Rubin, 1997; Ong, 2012), grounded cognitive theories describing them are recent. It could be argued that the first complete theoretical contribution on mnemonic expertise (but see the Chunking theory, Chase and Simon, 1973) was Chase and Ericsson's (1981) Skilled memory theory, which was to be completed with the Long-term working memory (LT-WM) theory (Ericsson and Kintsch, 1995).

## **LONG-TERM WORKING MEMORY AND RETRIEVAL STRUCTURES**

In order to explain memory expertise, Chase and Ericsson (1981) proposed three principles: the significant encoding, the structured retrieval and the principle of acceleration. The first principle proposes that in order to swiftly and reliably store items in LTM, information need to be transformed into meaningful units. Of interest here is the second principle which states that to increase mnemonic performances, hierarchical spatial cognitive structures, named retrieval structures (for a discussion, see Ericsson and Kintsch, 2000; Gobet, 2000a,b) can be used to encode and retrieve items from LTM. These structures constitute an internal artificial context to which items are linked to. In the loci method, it is done via the visuo-spatial knowledge of a sequence of familiar locations. Each location is a retrieval cue, and all the cues together constitute a retrieval structure (**Figure 1**). The skilled memory theory was first proposed to account for the performances of experts capable to increase their digit span above 80. The LT-WM (Ericsson and Kintsch, 1995) was a generalization of this theory to all activities and to all individuals, experts and novices.

## **WHAT DOES SPATIALIZATION AS A LINK BETWEEN CLASSIC IMMEDIATE MEMORY STUDIES AND MEMORY EXPERTISE BRING AS PSYCHOLOGICAL PERSPECTIVES ON EXPERTISE?**

Notwithstanding Ericsson and Kintsch's (1995) generalization, the LT-WM theory remains underused in the classic domain of verbal immediate memory (but, see Guida et al., 2009, 2013). As stated by Ericsson and Kintsch (1995, p. 217) concerning the Skilled memory theory (but the same can be said for LT-WM), even if this theoretical construct is largely accepted as accounting for experts, "several investigators (e.g., Schneider and Detweiler, 1987; Carpenter and Just, 1989; Baddeley, 1990) have voiced doubts about its generalizability." Retrieval structures are often dismissed because considered too artificial or idiosyncrasies to be reserved to experts. Thank to van Dijck and Fias's (2011) study, this could change.

## **RETRIEVAL STRUCTURE AS SPATIALIZATION: A GENUINE AND UNIVERSAL PROCESS**

As seen previously, van Dijck and colleagues' results (van Dijck and Fias, 2011; van Dijck et al., 2013; see also Guida, under review) clearly point toward the idea that in all-comers, spatial processes are also at stake in verbal immediate memory. When comparing retrieval structures such as in the method of loci and spatial positional tags, the similarities are striking (**Figure 1**). In both cases, a virtual spatial construct, used as a context, is associated to the incoming information. And the context can later be used to retrieve the items. Even if the mental line (Dehaene et al., 1993; van Dijck and Fias, 2011) used by all-comers is much simpler and lesser sophisticated, compared to mnemonists using the method of loci, spatialization seems the same underpinning process. If this standpoint is adopted, then it becomes more explicit why the loci method is so ancient and efficient: because experts' spatialization via retrieval structures roots on basic processes that all individuals can use. *Ipso facto*, retrieval structures stop being idiosyncrasies to be reserved to experts.

The link between both kinds of spatialization becomes even more tangible when considering that the spatial mental line could also be due to our expertise, in this case in mastering the writing system. In fact the orientation and direction of our mental line varies according to reading/writing habits (e.g., Dehaene et al., 1993; Shaki et al., 2009; Göbel et al., 2011; for the influence of reading habits on visuo-spatial processes, e.g., see Maass and Russo, 2003; Dobel et al., 2007). Therefore, it is very plausible that our reading and writing habits foster our spatial mental line.

When considering the privileged link between space and memory, it is also interesting to conclude taking a brief glance to anthropology, which shows that this link seems to be far more ancient than our reading habits and already present in nonliterate societies. In fact myths around the world have often been linked to specific locations. This "myth spatialization" can be found in the Tobriand culture from Papua New Guinea for example, or in the Australian aborigines famous songlines (Chatwin, 1987) or even in Zunis' legends from southwestern United States. In all these cases, "spatial location functions as a mnemonic device for the recall of a corpus of myth" (Harwood, 1976, p. 783). Building on Harwood's (1976) myth spatialization, the loci method can be considered as a phylogenetic protraction of the myth spatialization, and the mental spatial line as an ontogenetic protraction of our reading habits.

## **REFERENCES**


linguistic influences on the development of number processing. *J. Cross Cult. Psychol.* 42, 543–565. doi: 10.1177/0022022111406251


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 May 2014; accepted: 23 May 2014; published online: 10 June 2014.*

*Citation: Guida A and Lavielle-Guida M (2014) 2011 space odyssey: spatialization as a mechanism to code order allows a close encounter between memory expertise and classic immediate memory studies. Front. Psychol. 5:573. doi: 10.3389/fpsyg.2014.00573*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Guida and Lavielle-Guida. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## *Eva Christophel\*, Robert Gaschler and Wolfgang Schnotz*

*Department of Psychology, Universität Koblenz-Landau, Landau, Germany \*Correspondence: christophel@uni-landau.de*

#### *Edited by:*

*Michael H. Connors, Macquarie University, Australia*

#### *Reviewed by:*

*Craig Speelman, Edith Cowan University, Australia Michael H. Connors, Macquarie University, Australia*

**Keywords: teacher expertise, motivational phases, learning phases, feedback, curriculum scripts**

## **BUILDING BETTER BRAINS—SCHOOL DAY BY SCHOOL DAY**

In this essay we highlight feedback application as a domain to study the knowledge and abilities involved in the construct of teachers' expertise. One approach to advance the discussion in a field of research such as expertise acquisition generally is to combine (a) analysis of complex natural phenomena by division into *hypothetical* simple building blocks, with (b) the synthesis of complex phenomena based on *known* simple building blocks (e.g., Braitenberg, 1984; Gaschler et al., 2012). For instance, (a) knowledge representations such as chunks have been hypothesized to underlie chess expertise and (b) this analysis in turn was supported by a computer-run expert system based on chunks (cf. Lane et al., 2001; Guida et al., 2012). Gobet et al. (2014) summarize research on expert knowledge as well as the Einstellung effect (e.g., missing a to spot an efficient procedure, because a wellknown one is available; cf. Bilalic et al., ´ 2010), and neuro-interventions targeting it. They conclude with the desideratum to build better brains—brains that can take full advantage of the power of hypothesis-driven cognition while being safeguarded against cognitive illusions and the Einstellung effect.

In the current article we argue that building better brains is (an admittedly to-be-optimized) everyday practice, rather than a thought experiment. Like Gobet et al. (2014) we ask how creativity can be fostered by gaining control over prior knowledge so that it can be flexibly used or blocked at demand (cf. Bilalic et al., ´ 2008). Teacher education in universities delivers quasi-experimental conditions for studying how expertise on learning can be acquired and applied best. In particular, concepts of motivation and action control relevant in robotics and psychology seem promising in order to capture and structure the gist of teacher expertise.

## **BECOMING EXPERTS IN FACILITATING KNOWLEDGE ACQUISITION**

While expertise is often studied in domains that yield only hundreds or thousands of experts, universities all over the world are taking efforts to continuously contribute to a large population of experts on knowledge acquisition. Based on theoretical input and years of practice, teachers should be experts for shaping school lessons such that knowledge acquisition is optimized (Bromme, 1997, 2008). Focusing on the demands placed by school lessons can help to obtain an overview on the skills teaching experts should have (Bromme, 2008). Teachers have to organize and maintain a structure of student and teacher activities. This includes the anticipation of students' inferences as well as disturbances of the social context of learning. Teachers need a broad and flexible knowledge base covering the subject, as well as a repertoire of instructional methods to stimulate students' learning activities for reaching instructional objectives. Like managers, teachers have to organize the timetable of each lesson to make sure that the time is mainly used for the subject matter. Shulman (1986, 1987) attributes to teaching experts a repertoire of content knowledge (e.g., diagnosis of task specific requirements), pedagogical content knowledge (e.g., how to present the subject matter content), and curricular knowledge. Leading to the appropriate generation and scheduling of feedback to the students, teaching experts should have a strong background in the philosophy of the subject (e.g., subject-related epistemic beliefs), diagnostic competences (e.g., judgment of students' abilities), and skills that allow them to juggle between studentrelated aspects of the learning situation as well as content-related aspects (Bromme, 2008). Teachers' instructional routines include categorical units called "curriculum scripts," in which subject-related and didactic-methodological aspects are linked. As a consequence of this integrated knowledge, most actions of experienced teachers proceed automatically (Blömeke et al., 2003). They are difficult to access via verbal protocols, because in the classroom or a face-to-face situation, verbalization of the teacher could reduce the learners' amount of cognitive resources and shift the learner's attention to the level of meta-task processes.

## **FEEDBACK ALIGNED TO MOTIVATIONAL AND VOLITIONAL STATES OF ACTION REGULATION**

Learners face the challenge to acquire and employ self-regulation strategies in order to obtain educational outcomes (e.g., Lerner et al., 2001; Ley and Young, 2001). In order to scaffold learning through adaptive feedback, teachers need knowledge about the dynamics of learning processes and opportunities to apply feedback that takes motivational, cognitive and emotional aspects of learning into account. Apart from finding the appropriate frequency for feedback (Healy et al., 2014), we suggest consideration of the match of feedback type and action phase the learner is currently in Heckhausen and Gollwitzer (1987) have proposed distinguishing between motivational and volitional phases of action regulation. The distinction between motivational and volitional phases of action regulation has been used in two different ways to bridge the gap between analysis and synthesis highlighted by Braitenberg (1984). On the one hand, many architectures of artificial agents in robotics implement the distinction between a motivational phase (the agent is open to new information and ready to re-evaluate current preferences and plans) and a volitional phase (the agent is executing a plan and shielding itself from novel information that might lead to a re-evaluation of the plan currently executed; e.g., Visser and Burkhard, 2007). On the other hand, cycles of motivational and volitional phases are present in theories of the cyclical and recursive dynamic of learning in educational contexts (e.g., Zimmerman, 2000; Schmitz and Wiese, 2006).

According to Zimmerman (2000) as well as Schmitz and Wiese (2006), the learning process proceeds in three phases. (1) In the goal-setting phase, the learner chooses goals (e.g., appropriate tasks), plans actions or action steps and selects adequate strategies. (2) In the performance phase, the selected strategies have to be applied in order to complete the task. (3) In the self-reflection phase, the learner evaluates his/her learning outcome. Gollwitzer (1990) assumes that in each one of these phases the learners' attention is focused on specific information that helps him/her to accomplish the demands of the task. Taking the current focus of the learner into account, feedback can be better adapted with respect to effects on the learners' performance, mood and effort (Ley and Young, 2001; Narciss, 2004; Baadte and Schnotz, 2013).

Three types of feedback can be distinguished that align to the respective phases of the learning process. In the goal setting phase, the learner should receive goal-setting feedback that informs him/her about how realistic completion of the chosen task is according to his/her previous performances. In the performance phase, process-feedback offers information about specific task-inherent demands and error-related information. In the self-reflection phase, the learner should obtain appropriate outcome-feedback that informs him/her about the possible causes for success or failure and about the quality of his/her performance. In contrast, feedback is inappropriate if it does not take the learner's phase-specific mind-set into account and if it does not support the task completion but instead decreases the learner's amount of cognitive resources available for processing the relevant information (e.g., Sweller, 2005; see Christophel and Baadte, in press).

Matching feedback to the phasespecific requirements of the learning process might require flexible strategy changes and adaptation of routines to the specific content being taught, epistemic beliefs and motivational/volitional state diagnosed. Thus, adaptation and shifting skills seem at least as important as a large repertoire of strategies. This view is reflected in the notion of adaptive expertise advocated in education psychology (Verschaffel et al., 2009; Godau et al., 2014). In contrast to routine expertise (e.g., expertise that allows the expert to solve problems very efficiently and precisely), Hatano and Inagaki (1984) describe adaptive expertise as the potential to create new solutions and new problem solving procedures. Adaptive experts are "those who not only perform procedural skills efficiently but also understand the meaning of the skills and nature of their object" (p. 28). In contrast, "routine experts are outstanding in terms of speed, accuracy, and automaticity of performance, but lack flexibility and adaptability to new problems" (Hatano and Inagaki, 1984, p. 31). Verschaffel et al. (2009) have pointed out that *flexibility* can be defined as the "use of *multiple* strategies" while *adaptivity* includes "making *appropriate* strategy choices" (p. 338). This flexibility and adaptivity is necessary in order to support students' individualized learning processes with appropriate feedback. Thus, teachers need adaptive expertise which should include knowledge about the task specific requirements, the learner abilities, the cyclical and recursive dynamic of the learning process and the implications of this dynamic on motivational, cognitive and emotional levels. On the one hand, there are first formalized efforts to teach future teachers relevant motivational competencies (e.g., Rheinberg and Engeser, 2010). On the other hand, there are more cautious outlooks as well. According to Verschaffel et al. (2009) "adaptive expertise is not something that can be *trained* or *taught* but rather something that has to be *promoted* or *cultivated*" (p. 348).

## **MORE TEACHING EXPERIENCE DOES** *NOT* **NECESSARILY LEAD TO BETTER FEEDBACK SKILLS**

Teachers' expertise emerges during the theoretical and practical phases of teacher education and professional experience after university education (Bromme, 2008). It is an open question as to what extent the professional experience of teachers promotes and cultivates adaptive expertise with regard to adequate students' support in individualized lessons. For instance, Christophel (2014) demonstrated that more experienced teachers did *not* give more appropriate feedback (i.e., feedback that offers phase-specific information) but actually gave more inappropriate feedback than less experienced teachers (e.g., feedback that does not support the task completion but reduces the learner's amount of cognitive resources available for processing the relevant information; Sweller, 2005). Christophel (2014) studied 30 more experienced teachers with a mean professional experience of 11.15 years (*SD* = 12*.*85) and 30 less experienced teachers with a mean professional experience of 42.07 days (*SD* = 89*.*40). Teachers watched three video-vignettes of students passing through the different phases of self-regulated learning (goal-setting, performance and the selfreflection phase). The teachers were instructed to stop the films whenever they wanted to give feedback to the students. The results revealed that the more experienced teachers stopped the videovignettes more frequently and more often provided inappropriate feedback to students as compared to their less experienced counterparts. In addition, the study of Baer et al. (2011) showed that professional experience did not lead to better schooling. Baer and colleagues examined the development of teacher competences in transition between academic and professional careers. The results demonstrate that teachers with more professional experience did not perform better in the measured aspects of schooling than graduate teachers at the end of their university education.

These findings suggest that professional experience alone is not a good predictor for the progress of teachers' expertise (Baer et al., 2011; cf. Campitelli and Gobet, 2011; Hambrick et al., 2014). Years of teaching are possibly not sufficient to informally and implicitly sensitize most teachers to cues conveying action phases and the matching feedback. This could have diverse reasons. As explained above, school lessons place numerous demands: teachers have to organize and maintain the structure of schooling and can be absorbed by organizational and administrative activities. After university and practical education, there is a lack of peer-coaching (e.g., colleagues who discuss classroom situations) or qualified instruction (e.g., best-practice examples). A lack of (time) resources and motivation to reflect ones feedback behavior might result.

## **ATTENDING THE LARGER PICTURE**

However, training studies can support an optimistic outlook on the development of teachers' expertise in feedback application. Experienced and inexperienced teachers can be trained to apply feedback that supports realistic goalsetting and adequate self-reflection of learners (Christophel et al., in press). Recommendations helped teachers to increase feedback in line with the motivational phase of the learner from preto post-test while phase-inappropriate feedback could be reduced. Also, attention allocation in the classroom—often a prerequisite for feedback application seems to be open to intervention. For instance, Miller (2011; cf. Speelman, 2014; Wiggins et al., 2014) suggested to employ eye-tracking in order to provide (becoming) teachers with feedback on how evenly they distribute their attention across students in the classroom. Students disturbing the setting should not monopolize attention at the cost of other students. While experienced teachers agree that uneven distributions of attention should be avoided, they lack awareness of how well they are yet managing to implement the respective strategies of attention allocation.

## **REFERENCES**


*between Instruction and Autonomy*]. Wiesbaden: Springer. doi: 10.1007/978-3-658-05099-3


Zimmerman, B. J. (2000). "Attaining self-regulation. A social cognitive perspective," in *Handbook of Self-Regulation,* eds M. Boekaerts, P. R. Pintrich, and M. Zeidner (San Diego, CA: Academic Press), 13–39.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 19 July 2014; published online: 05 August 2014.*

*Citation: Christophel E, Gaschler R and Schnotz W (2014) Teachers' expertise in feedback application adapted to the phases of the learning process. Front. Psychol. 5:858. doi: 10.3389/fpsyg.2014.00858*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Christophel, Gaschler and Schnotz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## "No level up!": no effects of video game specialization and expertise on cognitive performance

#### *Fernand Gobet <sup>1</sup> \*, Stephen J. Johnston2, Gabriella Ferrufino3, Matthew Johnston3, Michael B. Jones 3, Antonia Molyneux3, Argyrios Terzis <sup>3</sup> and Luke Weeden3*

*<sup>1</sup> Psychological Sciences, University of Liverpool, Liverpool, UK*

*<sup>2</sup> Department of Psychology, Swansea University, Swansea, UK*

*<sup>3</sup> Department of Psychology, Brunel University, Uxbridge, UK*

#### *Edited by:*

*David Zachary Hambrick, Michigan State University, USA*

#### *Reviewed by:*

*Brooke Noel Macnamara, Princeton University, USA Mark W. Becker, Michigan State University, USA Elizabeth Jane Meinz, Southern Illinois University Edwardsville, USA*

*\*Correspondence:*

*Fernand Gobet, Psychological Sciences, University of Liverpool, Bedford Street South, Liverpool L69 7ZA, UK e-mail: fgobet@liv.ac.uk*

Previous research into the effects of action video gaming on cognition has suggested that long term exposure to this type of game might lead to an enhancement of cognitive skills that transfer to non-gaming cognitive tasks. However, these results have been controversial. The aim of the current study was to test the presence of positive cognitive transfer from action video games to two cognitive tasks. More specifically, this study investigated the effects that participants' expertise and genre specialization have on cognitive improvements in one task unrelated to video gaming (a flanker task) and one related task (change detection task with both control and genre-specific images). This study was unique in three ways. Firstly, it analyzed a continuum of expertise levels, which has yet to be investigated in research into the cognitive benefits of video gaming. Secondly, it explored genre-specific skill developments on these tasks by comparing Action and Strategy video game players (VGPs). Thirdly, it used a very tight experiment design, including the experimenter being blind to expertise level and genre specialization of the participant. Ninety-two university students aged between 18 and 30 (*M* = 21*.*25) were recruited through opportunistic sampling and were grouped by video game specialization and expertise level. While the results of the flanker task were consistent with previous research (i.e., effect of congruence), there was no effect of expertise, and the action gamers failed to outperform the strategy gamers. Additionally, contrary to expectation, there was no interaction between genre specialization and image type in the change detection task, again demonstrating no expertise effect. The lack of effects for game specialization and expertise goes against previous research on the positive effects of action video gaming on other cognitive tasks.

**Keywords: change detection task, expertise, flanker task, transfer, video game playing**

## **INTRODUCTION**

Transfer—the extent to which skills generalize—is an important theoretical concept that has serious practical implications. In a classic article, Thorndike and Woodworth (1901) propounded their theory of "identical elements," according to which transfer from a first domain to a second domain is possible only when the components of the skills required in each domain overlap. For example, a pianist can use their knowledge of music theory to understand a violin concerto, and a mathematician will understand the differential equations of an economics paper better than a person without background in mathematics. But even in these cases, transfer is far from perfect; for example, the pianist will not be able play the violin concerto itself without extensive additional training.

#### **FAR-TRANSFER**

In line with Thorndike and Woodworth's (1901) hypothesis, most theories of expertise predict that transfer from one domain to another (*far-transfer*) will be difficult. This is particularly the case for theories based on the notion that expertise in great part relies on domain-specific perceptual knowledge [e.g., chunking theory (Chase and Simon, 1973) and template theory (Gobet and Simon, 1996)]. While perceptual knowledge enables fluid behavior in the original domain, it is of little use in other domains as it does not match the new environment. Research on chess has provided considerable support for this prediction. Chess players' perceptual skills do not extend to visual memory for shapes (Waters et al., 2002), nor do their planning capabilities transfer to the Tower of London, a task measuring executive function and planning (Unterrainer et al., 2011). Moreover, contrary to widespread belief, there is no robust empirical evidence that playing chess improves scholastic abilities (Gobet and Campitelli, 2006).

One of the rare domains in which evidence of far transfer has been found is playing action video games (e.g., Green et al., 2009) 1 . Repeated playing of this kind of game has been reported

<sup>1</sup>In this study, the term "video game" refers to any published computer or console video game from 1952 to the present day. The term "video game player"

to lead to improvements in perceptual and attentional processes and to reduce reaction time in other tasks where one must be both fast and accurate (e.g., Green et al., 2009; Bavelier et al., 2012).

One of the main advantages proposed to be the result of habitual action video game playing is that of a more efficient attentional system. For example, Chisholm et al. (2010) compared action vs. non-action video game players (VGPs) on an attentional capture task where participants searched for a target that could appear in isolation or with a salient task-irrelevant distractor. Action VGPs showed faster reaction times to detect targets and a reduced effect of distractor interference, leading the authors to conclude that the action video gamers had better topdown attentional control, with the consequence that they spend less time processing irrelevant distractors. Consistent with this result, Hubert-Wallander et al. (2011) found that, compared with non-action gamers, action gamers demonstrate superior visual selective attention as measured in a visual search task, with the greatest benefit occurring at the highest cognitive loads (largest search arrays). Additional evidence comes from neuroimaging where differences in brain activation support the idea that longterm video game playing impacts on cortical functioning. For example, using functional magnetic resonance imagery (fMRI), Bavelier et al. (2012) compared a group of action and non-action video gamers on a task involving locating a target stimulus under conditions of increasing distractor load. In addition to overall faster reaction times, compared to the non-video game playing group, the VGPs showed little increase in the level of activation in a network of fronto-parietal sites as distractor load increased. This fronto-parietal network has commonly been associated with attentional processing (Ptak, 2012). These data were taken to suggest that the VGPs were more efficient in their allocation of attentional resources such that the cortical sites deploying attention were able to filter out the distracting information more easily and therefore showed less load dependent increases in cortical activity, supporting the behavioral finding of Chisholm et al. (2010). The proposed superior attentional resource allocation of VGPs (e.g., Hubert-Wallander et al., 2011; Bavelier et al., 2012), that may be at the heart of the observed enhancement of stimulus processing and reduced distractor interference, has now been observed in a number of experiments examining video game expertise based improvements in spatial selective attention (Green and Bavelier, 2003; Feng et al., 2007; Spence et al., 2009), distractor inhibition (Chisholm et al., 2010; Hubert-Wallander et al., 2011; Mishra et al., 2011; Bavelier et al., 2012), enhanced image search (Dye et al., 2009) and target detection (Castel et al., 2005; Dye et al., 2009).

As detailed above, a number of reports using different attentional tasks have suggested that VGPs have an improved ability to "filter out" unnecessary or irrelevant stimuli partly through the enhancement of attentional functioning (Chisholm et al., 2010; Bavelier et al., 2012). One task that has been used successfully to test VGPs proposed advantage at distractor filtering directly is the Flanker Compatibility Task (Eriksen and Eriksen, 1974; Bavelier et al., 2012). The Flanker Task requires participants to ignore salient laterally presented distractors while making responses to a centrally presented target stimulus. Mishra et al. (2011) employed a flanker task to examine whether there was any neuroelectrophysiological evidence of VGP showing enhanced distractor inhibition. The results showed that, behaviourally, VGPs were better able to ignore flanking items competing for attention with a central stimulus than nVGPs and that this increase in behavioral performance was associated with a greater P300 component in the ERP. The P300 electrophysiological component has been associated with perceptual discrimination and decision-making (Picton, 1992; Mishra et al., 2011). These data, together, were taken to support the hypothesis that VGP were better at filtering out the distractor stimuli leading to improved perceptual decision making. Lavie (1995) also reported that through extensive game play VGPs gain the ability to identify task-irrelevant flankers before further processing stimuli. This indicates that VGPs possess an enhanced capability to logically filter information for relevance before attempting to ignore distractors, rather than trying to process everything at once as nVGPs do.

An experimental task that is homologous to the task requirements of many action video games is the change detection task. In the change detection task, participants are asked to monitor a visual display for a small change that they indicate finding via a keypress response. For example, Clark et al. (2011) found that VGPs display a superior ability to spot changes when presented with rapidly alternating sets of images. In this study, 35 participants were presented with an unedited/edited image cycle switching at 4 Hz. The image cycle repeated until the participants indicated via a mouse click that they had spotted the edited element by clicking the image in the position they thought believed contained the image edit. Video game players performed better than nVGPs, replicating previous work on attentional improvements in VGP (e.g., Green et al., 2009), and there were also strategic changes in their search patterns. Compared with nVGP, the VGP showed broader search strategies, further supporting the view that VGPs develop top-down processing.

It follows from the above arguments that, if video-game expertise leads to the observed enhanced attentional and perceptual processing, then it should be possible to train nVGP using video games and observe an improvement in their cognitive functioning. Green and Bavelier (2003) recruited two groups of participants that had no history of video gaming; one group was then trained on a fast-paced action game (*Medal of Honor*) and the other on a slow-paced puzzle game (*Tetris*). After a period of 10 h training (1 h a day over 10 days), compared with the *Tetris* group, participants trained on *Medal of Honor* displayed better Useful Field-of-View (Ball et al., 1988), that is they had an enhancement in their ability to search for and identify cued targets. It was also found that the *Medal of Honor* trainees showed a reduced attentional blink (Raymond et al., 1992), i.e., a reduction in the window of attentional "blindness" that occurs after detecting or recognizing the first of two temporally close visual stimuli (Green and Bavelier, 2003; Feng et al., 2007; Bailey et al., 2010).

While intriguing, the research on the cognitive benefits of video game playing has been criticized on several grounds.

<sup>(</sup>shortened to VGP) refers to any individual who plays these games, and the term "non-video game player" (shortened to nVGP) refers to any individual who partakes in video game play for less than 1 h per week.

Boot et al. (2011) note that experts and novices have different expectations about their performance, which is likely to affect their behavior due to demand characteristics. They also observe that playing video games might affect the kind of strategies that are being used rather than basic perceptual or cognitive capacities. Finally, as some studies have failed to find differences between VGPs and nVGPs, the literature might suffer from a file-drawer problem. Kristjánsson (2013) note that, in many training studies, the control groups do not improve their performance on the tasks of interest, as one would expect, based on the extensive literature on learning, given the test-retest methodology used. In addition, the results might be affected by gender differences, as it is difficult to find expert female VGPs. Both Boot et al. (2011) and Kristjánsson (2013) note the necessity to carry out independent replications.

## *Near transfer*

Research has also investigated whether transfer occurs between sub-disciplines of the same field (*near transfer*). For example, do physicians specializing in neurosurgery generalize their skills when solving problems from pediatrics, or do chess players specializing in specific openings (i.e., the first moves of the game) maintain their skill level when making decisions in board positions in which they are not specialized?

Several studies have addressed this issue in medicine (Rikers et al., 2002), political science (Chiesi et al., 1979), and the design of experiments (Schunn and Anderson, 1999). The pattern of results suggests that experts fall back on general heuristics when they cannot use domain-specific knowledge. Emphasizing the role of general problem-solving methods, these studies also highlight the role of domain-specific patterns and methods, as clearly some degree of expertise is lost when domain-specific methods are replaced by domain-general one. While these studies compared individuals of the same level of expertise, Bilalic et al. (2009) ´ compared individuals of different skill levels. They took advantage of several features of chess: chess skill is precisely and quantitatively measured by the Elo rating; chess players enjoy trying to find the best move in a chess position; and chess players specialize in different openings, which makes it relatively easy to find players who have the same strength (as measured by their Elo points) but who have different specialized opening knowledge.

Bilalic et al. compared the performance, in both a memory and ´ problem solving task, of players who specialized in two different chess openings. In addition to positions coming from these two types of defense, they also used neutral positions (positions difficult to classify with respect to the opening they came from). The players were Candidate Masters, Masters, and International Masters/Grandmasters. The results were dramatic. With only one exception, all players obtained the best results with the positions taken from the openings they specialized in. When confronted with positions outside their domain of specialization, players performed one standard deviation on average below the level shown with positions taken from their domain of specialization.

#### **AIMS OF THE STUDY**

Many studies have investigated the differences between VGPs and nVGPs but there is as yet, to our knowledge, no research establishing whether differing levels of video gaming expertise vary with performance on cognitive tasks. Thus, the first aim of this study was to test the hypothesis that, as the level of expertise increases, task accuracy increases, and reaction times become faster.

In addition, a number of studies compare VGPs who identify as "Action" players to nVGPs, but as yet there has been no research into whether the skills demonstrated by action players cross over into other genres, such as strategy games, or indeed if each genre improves different skills. Data from a consumer survey by the Entertainment Software Association found that action and strategy games proved popular with both console and computer VGPs, and so these two genres were chosen as a variable to test the hypothesis (Entertainment Software Association, 2012).

The second aim of the study was thus to test to what extent different video-game genre specialization tap into different cognitive abilities. Action games typically involve fast-paced gameplay, such as *"Call of Duty: Modern Warfare 3"* (Infinity Ward, 2011), which was the best selling action console video game in 2011 (Entertainment Software Association, 2012). It was predicted that the speed of gameplay will heighten action VGPs speeded response times to stimuli other than those normally responded to in a VG, as shown by Green and Bavelier (2003). Strategy players, however, are predicted to possess a stronger reliance on maintaining accuracy as a gained trait from longterm play where accuracy over response time is key to success. This is because, typically, strategy games require the ability to move and place items in carefully decided places and formations. While often these changes are in response to an in-game target event and can result in swift determination and sequencing of new actions to fulfill a shifting long-term goal, there is less emphasis on rapid direct responding to an appearing target. Games such as "*StarCraft 2: Wings of Liberty*" (Blizzard Entertainment, 2010), the best-selling PC strategy game of 2011 (Entertainment Software Association, 2012), demonstrate the need for this ability, particularly in games with a military basis. As a consequence, it was predicted that strategy players would perform with significantly higher mean accuracy and action players would perform with a significantly faster mean reaction time.

The final aim was to replicate the effect of action video-playing on two tasks: a flanker task (Eriksen and Eriksen, 1974) and change detection task (Clark et al., 2011). In particular the flanker task has shown a mixed pattern of results; a basic flanker task has shown both no effect of expertise (Cain et al., 2012), and effects of expertise only once an additional perceptual load has been added (Green and Bavelier, 2003). In the case of Green and Bavelier (2003) it was argued that the addition of a perceptual load prevented flanker interference in the case of nVGP because, unlike the VGP, there were fewer spare resources to process the distracting flankers. However, it did appear in the original Green and Bavelier (2003) that there was a small advantage for VGP compared to nVGP at low loads. We therefore predict that, using a basic "low-load" flanker task, there should be a smaller flanker effect for VGP compared to nVGP and that this will increase as VG expertise decreases.

## **OVERALL METHOD**

## **ETHICAL APPROVAL**

This study was granted ethical approval from the Brunel University School of Social Sciences ethics board in accordance with the British Psychological Society (BPS) guidelines. All participants gave informed consent and were fully debriefed after the study.

## **PILOT STUDY**

An online pilot, carried out several months before the main study, asked participants (*N* = 115) to identify the last three action and strategy video games they had played. The *Call of Duty* and *Assassin*'*s Creed* video game series were identified as the most popular action video games, and the *Starcraft* and *FIFA* series were found to be the most popular strategy video games.

## **PARTICIPANTS**

Ninety-two participants (56 male) aged between 18 and 30 (*M* = 21*.*25, *SD* = 2*.*07) were recruited by opportunistic sampling through social networking sites and word of mouth. Most of the participants had filled out the online questionnaire (Pilot study). Each participant was offered a food reward for participating in the study, with a further cash reward incentive (£20) if they achieved the best score on one of the two tasks out of all participants.

## **APPARATUS**

The experiment was run using the E-Prime software package (Psychology Software Tools, Inc., 2008) on a Dell desktop computer running Windows 7, with stimuli presented on a 15 inch Lenovo LCD screen running at a resolution of 640 × 480 pixels at a refresh rate of 60 Hz. Keyboard and mouse responses were collected via a standard keyboard and mouse. Participants were sat approximately 60 cm from the computer screen.

## **DESIGN**

This study was pseudo-experimental in nature, as the independent variables were not directly manipulated. In both experimental measures, the independent variables were skill (experts, intermediates, novices, and controls) and specialization (action vs. strategy). In some analyses, in order to allow direct comparison with the literature, we used skill with only two levels (VGPs vs. nVGPs).

In order to operationalise the study variables, criteria for each between-subject variable needed to be established. A questionnaire was given at the end of the study. In addition to standard questions such as asking age and gender, information was obtained on the participants gaming habits to allow for allocation of each participant to the levels of the two independent variables (i.e., VGP or nVGP). The following three questions were asked.

(i) "How many hours a week, on average, do you play video games for? 0–1, 2–5, 6–10, 11–15, 16–20, 21+." This question was used to allocate participants to either the VGP or nVGP group based on their hours of play. Participants who answered "0–1" were allocated to the nVGP group and any answers above were assigned to the VGP group. Sixty-two VGPs and 19 nVGPs were identified.


**Table 1** presents the frequency of participants for each genre × expertise cell of the design.

## **GENERAL PROCEDURE**

Upon arrival, one of the research team allocated the participant a random number that corresponded to the participant's entry in the pilot database that contained information pertaining to their game playing history and experience. They were then handed over to a second experimenter who ran the experiment and who was blind to the participant's details and questionnaire scoring. This method ensured that both the participant and the second researcher were unaware of the participants' genre or expertise allocation. Participants carried out the two experimental measures that formed the study in a random order. For each measure (described below), the first screen to appear was a set of task instructions (see below for details of each experiments instruction). Once the series of tasks was complete, participants completed the "General Information Sheet," wherein they answered questions such as age, genre specialization, average weekly hours played etc.

**Table 1 | Participation allocation to the experimental groups (genre specialization and expertise level).**


## **DATA ANALYSIS**

Outliers were identified as either reaction times or correct responses that were notably outside the general distribution. Boxplots for each data set were analyzed and any outliers SPSS identified were removed. All reaction time analyses were performed using correct only trials.

The experiment comprised two measures, an Eriksen Flanker task and a change detection task. Each of these measures is described below.

## **MEASURE 1: ERIKSEN FLANKER TASK**

This measure was a modified version of Eriksen and Eriksen's (1974) flanker task. Arrows were used instead of letters, similar to other video gaming studies such as Cain et al. (2012).

## **MATERIALS**

Congruent and incongruent stimuli were created prior to the start of the experiment by combining arrow stimuli such that the central arrow to which the participant responded was surrounded by equally spaced, directionally congruent stimuli (e.g., *<<<* or *>>>*) or directionally incongruent stimuli (e.g., *<><* or *><>*). The flanker stimuli subtended 8◦ of visual angle.

## **DESIGN**

The experiment was a mixed factorial design with the within subjects factor being Congruence and between-subjects factors of Expertise Level, Genre Affiliation and "Video Game Players vs. Non-Video Game Players." The dependent variables in this task were reaction time and percent correct accuracy.

## **PROCEDURE**

Participants completed 24 congruent and 24 incongruent trials (trial order randomized) in two blocks of equal number (i.e., 24 trials per block). On each trial the participant viewed a centrally presented fixation cross for 500 ms that was replaced by either a congruent or incongruent trial image. Participants viewed each trial image and were asked to indicate in which direction the central arrow using the arrow keys on the keyboard. Each trial remained onscreen until the participant made a key press. The next trial immediately followed.

## **MEASURE 2: CHANGE DETECTION TASK**

#### **MATERIALS**

Images were sourced from Google Image Search and were edited using Adobe Photoshop CS5 (Adobe Systems, 2010). Based on the pilot study, which provided information on the most commonly played action and strategy games, images from *Call of Duty* and *StarCraft* were chosen. As both games are part of a much larger series of games, the most recent versions of each franchise were used (*StarCraft II: Wings of Liberty*; Blizzard Entertainment, 2010) and *Call of Duty: Black Ops 2* (Treyarch, 2012). All trial images were scaled such that they subtended 26◦ of visual angle.

## **DESIGN**

As it used both expertise and specialization as independent variables, this experiment had the same design as described in Bilalic´ et al. (2009). Players of different skill levels and specialized with the video games *Call of Duty* or *StarCraft*, as well as a control

The task itself was an adapted form of Clark et al. (2011). There were 13 trials in total, the first of which was a practice trial and was not included in later analyses. Three repeated measures conditions were used: Call of Duty, StarCraft and Landscape (Control) with each condition consisting of all those trials containing the images derived from those games or scenes. There were four images in each condition.

## **PROCEDURE**

Participants fixated a centrally presented cross for 4000 ms prior to the start of the change detection task image presentation. The first, unedited (UE), image was then presented for 240 ms, followed by a blank gray screen for 80 ms. A second image, identical to the first, would then appear for a period of 240 ms before being replaced by a blank gray screen for 80 ms. The process would then repeat but with the edited (E) version (identical save for a change in a single image feature) of the same image, i.e., the sequence appeared as UE, UE, E, E, UE, UE, E, E. . . This cycle repeated until the participant responded by pressing the spacebar on the computer keyboard (see **Figure 1**).

On detection of a change, the participant ceased the trial via keypress (spacebar) and then reported both the location and nature of the perceived change to the experimenter who remained with the participant in the room during data collection. The participant then pressed the spacebar again in order to trigger the next trial presentation.

## **RESULTS**

## **MEASURE 1: ERIKSEN FLANKER TASK**

Prior to analysis, one outlier was removed for failing to comply with task instruction. Data were analyzed using a mixed ANOVA

**FIGURE 1 | One cycle of a sample Change Detection trial showing a Landscape (Control) image.** The sequence involved two presentations of the unedited image prior to two presentations of an edited image. An example of an unedited and associated edited image is shown in the bottom right corner: The feature missing in the edited version of the image is indicated by the red circle on the original image.

to determine the effect of specialization and skill (betweensubjects) with performance on congruent and incongruent trials (within-subjects).

## *Accuracy*

Analyses indicated a main effect of congruence, *F*(1*,* 83) = 27*.*90, *p <* 0*.*001, *η*<sup>2</sup> *<sup>p</sup>* = 0*.*25, *f* = 0*.*58. The Congruent (Same) condition (*M* = 23*.*69, *SD* = 0*.*69) had a higher average score than the Incongruent (Distractor) condition (*M* = 18*.*90, *SD* = 8*.*38). This effect was not qualified by participant expertise, *F*(3*,* 83) = <sup>0</sup>*.*57, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*64, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*02, *f* = 0*.*14, or genre, *F*(1*,* 83) = 2*.*55, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*11, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*03, *f* = 0*.*18. No interaction was found between congruence, expertise and genre, *<sup>F</sup>*(3*,* 83) <sup>=</sup> <sup>2</sup>*.*00, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*12, *<sup>η</sup>*<sup>2</sup> *p* = 0*.*07, *f* = 0*.*27.

### *Reaction time*

Analyses showed a main effect of congruence on reaction time, *<sup>F</sup>*(1*,* 73) <sup>=</sup> <sup>53</sup>*.*31, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*42, *f* = 0*.*85; overall, a lower mean reaction time was demonstrated in the Congruent (Same) condition (*M* = 433*.*85, *SD* = 70*.*47) than the Incongruent (Distractor) condition (*M* = 529*.*46, *SD* = 155*.*82). This effect was not qualified by participant expertise, *F*(3*,* 73) = 0*.*42, *p* = 0*.*74, *η*<sup>2</sup> *<sup>p</sup>* = 0*.*02, *f* = 0*.*14, or genre, *F*(1*,* 73) = 1*.*71, *p* = 0*.*59, *η*2 *<sup>p</sup>* = 0*.*006, *f* = 0*.*08. No interaction was found between congruence, expertise and genre, *<sup>F</sup>*(3*,* 73) <sup>=</sup> <sup>1</sup>*.*71, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*17, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*07, *f* = 0*.*27.

#### *VGPs vs. nVGPs*

In order to attempt to replicate previous research using this task, we also carried out analyses where the participants were allocated to only two groups (VGPs and nVGPs). nVGPs were identified as any participant who played, on average, less than 1 h of either console or computer video games per week. **Figure 2** shows a summary of the reaction time and accuracy data for the flanker task for VGPs and nVGPs.

With respect to accuracy, there was a main effect of congruence on accuracy, *<sup>F</sup>*(1*,* 89) <sup>=</sup> <sup>19</sup>*.*96, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*18, *f* = 0*.*47, that was not qualified by the players' status [*F*(1*,* 89) = 0*.*24, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*62, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*008, *f* = 0*.*09].

With respect to reaction time, both groups performed overall faster in the Congruent (Same) condition (*M* = 433*.*85, *SD* = 70*.*47) than the Incongruent (Distractor) condition (*M* = 529*.*46, *SD* = 155*.*82). Analysis indicated a main effect of congruence on reaction time, *<sup>F</sup>*(1*,* 79) <sup>=</sup> <sup>32</sup>*.*23, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, *<sup>η</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*29, *f* = 0*.*64, that was not qualified by the players' status, *F*(1*,* 79) = 1*.*12, *p* = 0*.*29, η<sup>2</sup> *<sup>p</sup>* = 0*.*01, *f* = 0*.*1.

## **MEASURE 2: CHANGE DETECTION TASK**

After outliers were removed due to task non-compliance, 85 participants remained from the original 92. A Mixed ANOVA was carried out and outliers were controlled in an identical way to the Flanker Task.

Analysis indicated a main effect of image type on reaction time, *<sup>F</sup>*(2*,* 154) <sup>=</sup> <sup>36</sup>*.*57, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, *<sup>n</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*32, *f* = 0*.*69. Response times were quicker in the Landscape condition (*M* = 9399 ms, *SD* = 5119 ms) than in the Call of Duty condition (*M* = 16*,* 138 ms, *SD* = 7556 ms) and Starcraft condition (*M* = 20*,* 247 ms, *SD* = 10*,* 761 ms). This effect was not qualified by participant expertise [*F*(6*,* 154) <sup>=</sup> <sup>1</sup>*.*06, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*39, *<sup>n</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*04, *f* = <sup>0</sup>*.*2] or genre [*F*(2*,* 154) <sup>=</sup> <sup>0</sup>*.*57, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*57, *<sup>n</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*01, *f* = 0*.*1]. No interaction was found between image type, expertise and genre [*F*(6*,* 154) <sup>=</sup> <sup>0</sup>*.*27, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*95, *<sup>n</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*01, *f* = 0*.*1].

We also analyzed the data by grouping the participants into players and non-players. Mauchly's Test indicated a violated assumption of sphericity, *χ*<sup>2</sup> (2) = 21*.*57, *p <* 0*.*001, therefore degrees of freedom (*df*) were corrected using Greenhouse-Geisser estimates of sphericity (*ε* = 0*.*81). Analysis indicated a main effect of image type on reaction time, *F*(2*,* 166) = 25*.*06, *p <* 0*.*001, *n*<sup>2</sup> *<sup>p</sup>* = 0*.*23, *f* = 0*.*55, that was not qualified by the "Video Game Players vs. Non-Video Game Players" variable [*F*(2*,* 166) = <sup>1</sup>*.*37, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*26, *<sup>n</sup>*<sup>2</sup> *<sup>p</sup>* = 0*.*02, *f* = 0*.*14]. **Figure 3** shows a summary

of the reaction time data for the change detection task for VGPs and nVGPs.

## **POWER SUMMARY**

One advantage of this study is the relatively large number of participants who were involved. However, to ensure that the absence of expertise effects was not due to a lack of statistical power, we investigated the size of effects we could be expected to find. All calculations are based on a power criterion of 0.8 and a 0.05 alpha level. For Measure 1: Eriksen Flanker task, one could expect to find significant differences of effect sizes for the interaction of congruency and specialization of 0.31 and of 0.36 for the two-way interaction of congruency with skill and for the three-way interaction of congruency with skill and specialization. For Measure 2: Change Detection Task, effect size of 0.25 could be expected to be identified for interactions of image type with specialization and effect sizes of 0.30 for both the two-way interactions of specialization and skill with image type and the three-way interaction of image type, specialization, and skill.

In all cases, the observed effect sizes for game specialization and skill in each of the measures was considerably smaller than the minimum expected detectable effect size even given our relatively large sample size. The power analyses also highlight that given our sample size we could expect to detect small effect sizes.

## **DISCUSSION**

One of the major conclusions of research into learning and expertise is that transfer from one domain to another is rare and difficult, and happens only when the two domains share components that ask for the same cognitive skills. In recent years, a series of experiments on action video game playing have found that playing this kind of game leads to substantial transfer, in particular with tasks engaging attentional processing. The aim of this study was to replicate this phenomenon with two tasks that had previously been used in the action video-game literature. In addition, the study aimed to use a finer measure of expertise than had been done in the past, and to look at the extent to which skills acquired in a specific VG genre (action or strategy) can be used in a task using material linked to either of the two genres.

In neither task were we able to find any effect of skill or a superiority of the VGPs when compared to the nVGPs. Thus, our study did not support the hypothesis of far transfer, in line with most theories of expertise but in contradiction with previous VG studies. Our failure to replicate previous results cannot be ascribed to a lack of power, as the number of participants (*n* = 92) was high for this kind of study and our design, incorporating different levels of skill, was in principle able to identify subtle skills effects that cannot be found when only two groups are compared (VGPs vs. nVGPs). In addition, the results we obtained in each task were consistent with the results normally obtained in these tasks. For example, we found significantly faster reaction times and a greater number of percent correct accurate trials for the congruent trials compared to the non-congruent trials in the flanker task.

For the flanker task, despite strong congruency effects, the absence of a significantly different interference effect for VGP vs. nVGP is not entirely unexpected, despite our predictions to the contrary. We argued that based on previous work that we might expect a small difference (Green and Bavelier, 2003), especially given our sample size and a finer division of game expertise than previously used. However, this was not the case and, although the null hypothesis cannot be accepted, the absence of a significant interaction of game expertise and congruency does add to a number of results showing that, at low loads there is little difference between the performance on VGP and nVGP on a flanker task (Green and Bavelier, 2003; Cain et al., 2012).

We followed Boot et al.'s (2011) advice of pre-screening participants long before the experiments *per se*, and of asking them to fill in the questionnaire on video game activities at the end of the study. Thus, our procedure minimized demand characteristics. Together with the fact that other studies have failed to find a VGP effect (e.g., Castel et al., 2005; Boot et al., 2008; Murphy and Spencer, 2009), our results are consistent with the possibility that such demands might have played a role in previous research showing VGPs' superiority. However, given that this conclusion is based on null results, further research is needed to validate or invalidate this hypothesis.

Most unexpected was perhaps the failure to find an expertise effect in the change detection task. Near transfer did not occur in spite of the fact that the material used came from the players' domain of expertise (either action or strategy players) and that we based the stimulus choice on the most popular games within each genre, an expected level of increased familiarity. Why is it that the patterns that the players had presumably acquired by playing their favorite game did not enable them to find changes in the stimuli more rapidly? One possible explanation is that the change detection paradigm is unsuitable for detecting domain-specific patterns used for unconscious pattern recognition. An examination of the mean reaction times shows that the average time spent Gobet et al. Player 1 level up!

on task was very long (about 20 s on average in the Starcraft condition) and thus is likely to engage more conscious mechanisms. This explanation gains further plausibility given that Gobet et al. (in preparation) did find an interaction between specialism and expertise in a recognition task using a similar design as that used here: action players specializing in the *Call of Duty* and racing players specializing in *Gran Turismo* performed better when dealing with images from their own game.

Our study was not without weaknesses. The measure of videogame expertise and the allocation to a specific genre were based on self-reports, and perhaps it would have been desirable (albeit unpractical) to ask players to play segments of their favorite game to estimate their level. With regard to obtaining a pure measure of specific video game genre benefits a study, such as this, that concentrates on training effects in nVGP (e.g., Green and Bavelier, 2003) may be in a better suited to detect subtle changes in performance. A comparison of preferred gametype may be expected to find subtle differences between players of different games but only when those players partake of a single type of game. Anecdotally, many game players are "poly-gamers" and will play other game types in addition to their preferred category. A training study where nVGP gain experience playing only a single genre of game would therefore be better placed to detect subtle differences between expertise benefits of individual game types. A similar argument can be made to better examine any potential gender differences. The sample here did not allow for a meaningful investigation of potential gender differences and, for the same reasons previously argued to account for poly-gamers, a training study would be ideal. Finally, there were few trials in the change detection task.

In the last year several professional associations and journals have emphasized the need for more replications. However, in spite of previous calls (e.g., Gobet et al., 2004), and partly due to the difficulty of finding experts, research into expertise is rarely replicated. The current paper contributes to this effort of obtaining more robust empirical data.

#### **REFERENCES**


of visual search. *Acta Psychol.* 119, 217–230. doi: 10.1016/j.actpsy.2005. 02.004


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 03 November 2014; published online: 28 November 2014.*

*Citation: Gobet F, Johnston SJ, Ferrufino G, Johnston M, Jones MB, Molyneux A, Terzis A and Weeden L (2014) "No level up!": no effects of video game specialization and expertise on cognitive performance. Front. Psychol. 5:1337. doi: 10.3389/fpsyg. 2014.01337*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Gobet, Johnston, Ferrufino, Johnston, Jones, Molyneux, Terzis and Weeden. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Multi-domain computerized cognitive training program improves performance of bookkeeping tasks: a matched-sampling active-controlled trial

## *Amit Lampit 1,2\*, Claus Ebster 2,3 and Michael Valenzuela1*

*<sup>1</sup> Regenerative Neuroscience Group, Brain and Mind Research Institute, University of Sydney, Sydney, NSW, Australia*

*<sup>2</sup> Lauder Business School, Vienna, Austria*

*<sup>3</sup> Department of Marketing, University of Vienna, Vienna, Austria*

### *Edited by:*

*David Zachary Hambrick, Michigan State University, USA*

#### *Reviewed by:*

*Zach Shipstead, Arizona State University, USA Michael Francis Bunting, University of Maryland Center for Advanced Study of Language, USA*

#### *\*Correspondence:*

*Amit Lampit, Regenerative Neuroscience Group, Brain and Mind Research Institute, University of Sydney, 94 Mallett St., Camperdown, NSW 2050, Australia e-mail: amit.lampit@sydney.edu.au*

Cognitive skills are important predictors of job performance, but the extent to which computerized cognitive training (CCT) can improve job performance in healthy adults is unclear. We report, for the first time, that a CCT program aimed at attention, memory, reasoning and visuo-spatial abilities can enhance productivity in healthy younger adults on bookkeeping tasks with high relevance to real-world job performance. 44 business students (77.3% female, mean age 21.4 ± 2.6 years) were assigned to either (a) 20 h of CCT, or (b) 20 h of computerized arithmetic training (active control) by a matched sampling procedure. Both interventions were conducted over a period of 6 weeks, 3–4 1-h sessions per week. Transfer of skills to performance on a 60-min paper-based bookkeeping task was measured at three time points—baseline, after 10 h and after 20 h of training. Repeated measures ANOVA found a significant Group X Time effect on productivity (*F* = 7*.*033, *df* = 1*.*745; 73.273, *p* = 0*.*003) with a significant interaction at both the 10-h (Relative Cohen's effect size = 0.38, *p* = 0*.*014) and 20-h time points (Relative Cohen's effect size = 0.40, *p* = 0*.*003). No significant effects were found on accuracy or on Conners' Continuous Performance Test, a measure of sustained attention. The results are discussed in reference to previous findings on the relationship between brain plasticity and job performance. Generalization of results requires further study.

**Keywords: cognitive training, bookkeeping, young adults, job performance, far transfer**

## **INTRODUCTION**

Cognitive abilities are one of the most significant predictors of future job performance (Schmidt, 2002). The question therefore arises as to whether interventions that augment cognitive and psychomotor skills can improve work-related outcomes (Arthur et al., 2003; Academy of Medical Sciences, 2012). Computerized cognitive training (CCT), which is particularly effective in clinical groups and the elderly, is one method for improving cognitive performance (Buschert et al., 2010; Vinogradov et al., 2012). A recent report by leading British scientific societies cited CCT as a potentially effective intervention to improve workplace performance, stressing the importance of examining the transferability of CCT into vocational tasks (Academy of Medical Sciences, 2012). Previous studies have reported positive effects of CCT on tasks with high psychomotor demands such as flight (Hart and Battiste, 1992; Gopher et al., 1994), driving (Roenker et al., 2003; Cassavaugh and Kramer, 2009; Pradhan et al., 2011) and laparoscopic surgery (Schlickum et al., 2009; Adams et al., 2012), as well as enhanced employment outcomes in schizophrenic patients (Vauth et al., 2005b; McGurk et al., 2007; Bell et al., 2008; Lindenmayer et al., 2008; McGurk et al., 2009). Conversely, a recent RCT conducted by our group (Borness et al., 2013) did not find any effect of an online CCT program on job performance in a large cohort of white collar employees of an Australian public sector organization. Thus, the extent to which CCT can augment performance of middle-skill office tasks in young healthy adults remains unclear. Here, we use a classic example of mid-level skilled occupational task—bookkeeping—to explore the effectiveness of a cognitive training program on work-related task performance.

The rapid computerization of accounting practice has transformed the nature of contemporary bookkeeping into repetitive translation of transactions from natural language into an accountancy software system (Bhaskar et al., 1983; Cooper and Taylor, 2000). The objective of basic bookkeeping tasks is to analyze and organize newly acquired data according to the principles of double entry bookkeeping and subsequently transpose these into journal entries. The objectivity of these tasks, their prevalence in accounting education, and their relatively low dependency on external factors (e.g., trends in consumer behavior) make bookkeeping a convenient and occupationally- relevant workrelated outcome when studying skill acquisition and the effect of cognitive factors (Dillard et al., 1982).

An accurate entry of transactions into an accounting system is a bookkeepers' key performance indicator; therefore, our measure of productivity was based on the number of transactions correctly transposed. Both speed and accuracy of work are important, as speed (transactions per unit of time) determines the maximum number of transactions that a worker can execute while accuracy rate (the ratio of correct entries out of total transactions) reflects the relative frequency of errors. Theoretically, improvements in productivity could result from an increase in speed and/or accuracy. However, in practice, increases in task speed usually decrease accuracy, a phenomenon known as the speed-accuracy tradeoff (Wickelgren, 1977). When evaluating the effectiveness of any CCT program on work-related productivity, the potential influence of such tradeoff must therefore be analyzed (Förster et al., 2003).

Since a CCT intervention typically involves many procedural features beyond the CCT program itself, which include trainer contact, socialization, motivational prompts, increased stimulation, and the traditional Hawthorne expectancy bias and retest effect, it is vital to compare the putative effects of CCT on productivity outcomes with respect to an intensity-matched active control (AC) condition to make valid inferences (Jacoby and Ahissar, 2013). Yet, of the thirteen trials of CCT in vocational settings mentioned above, seven (Gopher et al., 1994; Vauth et al., 2005a; McGurk et al., 2007, 2009; Bell et al., 2008; Cassavaugh and Kramer, 2009; Pradhan et al., 2011) did not include an AC arm. We therefore aimed to assess whether CCT can increase the speed or accuracy of work-related task productivity over and above any effects seen in an active control condition.

## **MATERIALS AND METHODS**

#### **RESEARCH DESIGN**

This study was a matched-sampling, double-blind, repeatedmeasures, active-control trial. Subjects were allocated to either (1) 20 h of CCT arm or (2) active control arm matched in time and intensity. Performance on a 60-min paper-based bookkeeping task (see below) was measured three times, at baseline, after 10 h of training, and after 20 h of training. All procedures took place in a university classroom converted into a training lab. The study was approved by the Lauder Business School Ethics Committee.

#### **PARTICIPANTS AND SAMPLING PROCEDURE**

In January 2009, all at Lauder Business School (*n* = 242), an English-language teaching business school in Vienna, Austria, were invited by email to participate in the study. None of the participants spoke English as first language. Inclusion criteria were (1) successful completion of at least one semester in accounting, (2) lack of any major neurological or psychiatric disorders, and (3) active participation in a scholarship program offered by the school. These criteria ensured an adequate understanding of basic accounting principles, including the posting task used in this study, and enabled compensation to be offered to volunteers as outlined below. All candidates gave a written informed consent prior to the baseline assessment.

Of the 44 participants who completed the study, 34 (77.3%) were females and 10 were males. Four subjects (9%) were lefthanded. The average age was 21.4 ± 2.6 years. Fourteen participants (31.8%) were 2nd semester students, 15 (34.1%) were 4th semester students, 14 (31.8%) were 6th semester students, and 1 (2.3%) was an 8th semester student. Subjects were compensated with one "credit" (which provides approximately C40 worth of dining and housing from the scholarship program) for every hour of participation. The total number of credits was 25, equivalent to about C1000, provided in-kind by the school. Compensation was offered for participation (in hours) rather than on a pay-perperformance basis. To ensure ethical compensation, participants who left the study prematurely were compensated based on the number of hours performed.

Due to the conservable attentional demand of CCT and documented interactions between baseline cognitive ability and response to CCT (White and Shah, 2006), we aimed to distribute attentional skill equally between our two groups. The Conners' Continuous Performance Test (Conners, 2000) was administered to assess sustained attention. Two scores derived from the test, namely confidence interval and response time variability (Response Time Block Change, RTBC), were used. A lower confidence interval indicates better overall attention skills, as does RTBC as it approaches zero (Homack and Riccio, 2006). Subjects with the same confidence interval were graded according to the RTBC score and then divided into pairs.

## **MEASURES**

#### *Primary outcome measure: performance on a bookkeeping task*

Performance on a repetitive paper-based posting task was the primary outcome measure. The task was based on basic rules of double-entry bookkeeping (Bhaskar et al., 1983) and adapted from didactic tasks included within the students' curriculum (Kieso et al., 2004). Task materials included: (1) A journal of 300 randomly generated transactions, based on 25 different cash and accruals transactions (i.e., payables and receivables) with a sum between 1 and 500; (2) A blank general ledger containing 14 accounts (assets, liability, equity, income, and expenses), with a summary line every 37 lines of data; and (3) An instruction sheet containing the correct debit and credit postings for each of the 25 types of transactions, which was designed to help participants when not sure how to handle a transaction. Examples of the materials used for the measuring bookkeeping performance are provided in the Supplementary Material.

Three 60-min bookkeeping test sessions were administered, at baseline (pre-training, T1), after 10 h of training (T2) and after 20 h of training (T3). Every test session started with 3 practice tasks to unsure understanding. Participants were asked to work as quickly and as accurately as possible. A performance goal of 200 transactions per hour (based on a pilot study) was defined to facilitate self-regulation (Kozlowski and Bell, 2006). A blinded research assistant recoded the number of correct debit and credit postings entries in the corresponding row according to the rules provided (productivity) and total entries (to calculate accuracy rate). Only rows in which both postings (debit and credit) were entered according to the rules were considered as correct.

#### *Secondary measure: sustained attention*

In addition to its role in the matched-sampling process, the Conners' Continuous Performance Test (Conners, 2000) was used to detect potential intervention effects on participants' attention skills. The test was administered at baseline and post-training. The confidence interval, RTBC and Overall (i.e., average) Hit Reaction Time measures were used (with a lower score indicating better performance).

## **INTERVENTIONS**

Both the experimental (CCT) and active control (AC) groups participated in 3–4 1-h training sessions weekly for the duration of 6 weeks—the total duration of training in both groups was 20 h. Supervised training took place in a computer lab with up to six subjects per session. Supervision aimed to enhance motivation and meta-cognition, according to studies of instruction in cognitive training (Gopher et al., 1994; Salas and Burke, 2002; Sandford, 2003; Leemkuil and De Hoog, 2005). The training software played sounds through standard speakers, producing a degree of auditory distractions for all participants and increasing selective attentional demands.

## *Computerized cognitive training (CCT)*

CCT was based on the commercially available cognitive training software "Captain's Log" (BrainTrain Inc, Richmond, VA). Captain's Log has a history of use in clinical populations such as adults diagnosed with traumatic brain injury (Tinius and Tinius, 2000; Stathopoulou and Lubar, 2004), schizophrenia (Bellucci et al., 2003), and chronic psychiatric disorders (Burda et al., 1994) as well as children with attention difficulties (Rabiner et al., 2010) and older adults (Eckroth-Bucher and Siberski, 2009).

An adaptive and personalized training program was set using the software's Personal Trainer Wizard, which adjusts training difficulty and content according to predefined settings as well as individual performance. Visual and auditory distractions were set at 10% of cases for each type. Seven exercises from the Conceptual Memory Skills module (The Ugly Duckling, Happy Trails, Total Recall, Domino Dynamite, Tower Power, Max's Match, and What's Next) and five exercises from the Numerical Concepts/Memory Skills modules (Bits and Pieces, Match Maker, City Lights, Counting Critters, and Happy Hunter) were used. **Table 1** describes the cognitive skills trained by each exercise. A detailed description of each of the exercises is provided in the Supplementary Material.

## *Active control (AC)*

Several studies suggest that accounting is associated with strong arithmetic skills, a view that is more likely based on accountancy stereotypes rather than actual demands of the profession (Wells and Fieger, 2006). We therefore chose arithmetic training as our AC condition to control fully for non-specific training effects as well as maintenance of blinding of subjects to the training condition of interest. AC training was based on Maths Trainer, a commercially available computerized arithmetic training program (Oak Systems, Binstead, UK). The program entails 24 different arithmetic exercises (addition, subtraction, multiplication, division, fractions, percentage, number sequence, rounding), as well as daily tests, kakuro and sudoku. The level of complexity rises as the user progress along the training program. Since a typical training session on Maths Trainer lasts about 45 min, subjects in the AC group also played Math Ninja (a freeware published by Piotr J. Walczack) a computerized calculation game, for about 10 more minutes during each session in order to match the duration of CCT sessions.

## **ANALYSIS**

Analyses were perfomed using SPSS 20. Time X Group effects were tested using repeated measures ANOVA. Where assumption of sphericity was violated, the Greenhouse-Geisser correction was applied to degrees of freedom. Within-group and relative effect sizes were calculated using Cohen's ds at T2 and T3, using difference from baseline and pooled standard deviation for each time point.

## **RESULTS**

### **RECRUITMENT AND PARTICIPANT FLOW**

Participant flow is depicted in **Figure 1**. Overall, 62 students expressed interest to participate. 57 of them were screened


*AA, Alternating Attention; CPS, Central Processing Speed; CR, Conceptual Reasoning; DA, Divided Attention; FA, Focused Attention; FMC, Fine Motor Control; GA, General Attention; IM, Immediate Memory; SLA, Selective Attention; VP, Visual Perception; VPS, Visual Processing Speed; VS, Visual Scanning; VSC, Visuospatial Classification; VSQ, Visuospatial Sequencing; VT, Visual Tracking; WM, Working Memory. Detailed description of the exercises can be found in the Supplementary Material.*

for eligibility and were administered the baseline Continuous Performance Test; 48 attended an introductory lecture and completed baseline bookkeeping assessment. Subsequently, they were assigned to groups. Forty four completed the first and second follow-up sessions. Of the four participants who left the study following group assignment (two from each group), three withdrew from the study and one was expelled from the university for reasons unrelated to the study.

## **BASELINE DATA**

Descriptive statistics and baseline performance are provided in **Table 2**. Two-sample tests revealed no significant differences between the treatment groups in age, gender, Continuous Performance Test scores, productivity, and accuracy. Pearson zero-order intercorrelations analysis among the study variables at baseline found a positive correlation between age and attentional capacity, as indicated by lower Continuous Performance Test confidence interval, which implies lower probability of attention disorders (*r* = −0*.*305, *p <* 0*.*05). Males performed worse than did females on this measure (*T* = −2*.*82, *p* = 0*.*007). Students who had completed more years of their university degree also performed better in baseline bookkeeping (*r* = 0*.*446, *p <* 0*.*01). There was no relationship between Continuous Performance Test scores and bookkeeping performance.

## **PRIMARY OUTCOMES**

**Table 3** reports changes in the outcome variables across the three time points. Repeated measures ANOVA found a significant Group X Time interaction effect on productivity, as defined by

#### **Table 2 | Baseline measures.**


*aPercentage of females.*

*bPearson X2.*

*cNo. of correct entries in a 60-min session.*

*CI, Confidence index; RTBS, Response time block change.*

the number of correct entries in a 60-min test session (*F* = 7*.*033, *df* = 1*.*745; 73.273, *p* = 0*.*003), at both the 10-h (Cohen's effect size = 0.38, *p* = 0*.*014) and 20-h time points (Cohen's effect size = 0.40, *p* = 0*.*003—see **Figure 2**). No significant effects were found for accuracy, although slight improvements of accuracy rates were noted in both the CCT and the AC group (2.61 and 4.69%, respectively).

The results indicated strong correlations between performance in these two time points for both productivity (*r* = 0*.*65, *p <* 0*.*01) and accuracy (*r* = 0*.*707, *p <* 0*.*01). Whilst the speed of correct entries increased in the CCT group as a result of our training (i.e., increased productivity), we observed no evidence for a corresponding drop in accuracy, as accuracy rates at baseline (91.65%) were similar to accuracy at both T2 (95.58%) and T3 (94.32%) (repeated measures *F* = 0*.*225, *p* = 0*.*638).

### **SECONDARY OUTCOMES**

The results indicated no significant TIME X GROUP interaction on either Continuous Performance Test Confidence Index (*F* = 3*.*665, *p* = 0*.*62), Response Time Block Change (*F* = 0*.*663, *p* = 0*.*62) or Overall Hit Reaction Time (*F* = 0*.*527, *p* = 0*.*47, see **Table 3**).

## **DISCUSSION**

Participation in 10 and 20 h of CCT produced significant relative Cohen's effect sizes (i.e., after accounting for active control training effects) of 0.38 and 0.40 on an untrained bookkeeping task. Because the active control condition included all non-specific factors inherent to training, such as socialization, supervisor interaction, motivational factors, as well as to some extent cognitive stimulation, the observed effects are unique to CCT and cannot be explained by test-retest and Hawthorn effects. Moreover, since performance on similar bookkeeping tasks is generally predictive of actual bookkeepers' proficiency (Bhaskar et al., 1983), it is reasonable to hypothesize that this outcome may potentially generalize into real-life job performance in the bookkeeping sector. As CCT may represent a cost-effective type of on-the-job training for the better promotion of workforce productivity (Academy of Medical Sciences, 2012), replication of this study in the workplace environment is critical.

We sought to determine whether changes in measures of attention accompanied and potentially explained our observed bookkeeping effects. None of the Continuous Performance Test measures were correlated with either baseline bookkeeping performance or changes thereof, and CCT produced no discernible change in any of the test measures. One interpretation of this result is that sustained attention and inhibitory control has no ecological relevance to bookkeeping task performance in healthy young adults. Alternatively, a more likely explanation for the lack of effect on response inhibition is that our cognitive training program did not specifically target this domain, and so the resulting improvement in bookkeeping skills may be attributed to other cognitive domains. Similarly, we did not observe any difference in

GROUP differences were observed at T2 and T3.

**Table 3 | Estimated marginal means, change rates and summary statistics for outcome variables by training group and assessment time.**


*aFrom baseline.*

*bNegative scores mean improvement.*

*cScores closer to zero represent higher attentional performance.*

*The results in bold highlight significant TIME X GROUP interactions. CCPT, Continuous Performance Test; CI, Confidence index; RTBS, Response time block change; RT, Response time.*

the Continuous Performance Test's response time measure, whilst participants in both groups clearly improved their information processing speed ability on the bookkeeping task.

Therefore, whilst the results indicate that CCT can enhance performance on the target bookkeeping task, the nature of the underlying cognitive mediators remain unclear. Lack of additional cognitive measures to probe for these mechanisms is therefore a major limitation of this study, and future studies should complement functional outcomes with a wider neuropsychological battery. What cognitive changes could theoretically mediate the observed improvement in bookkeeping performance? Previous task analyses in the field of bookkeeping (Dillard et al., 1982; Bhaskar et al., 1983; Dillard, 1984) point to several cognitive processes that may be taxed by bookkeeping tasks and thus underpin skill acquisition. Typically, bookkeeping involves integrating new information (a particular transaction) with previous knowledge of how to categorize this information within the bookkeeping set of accounts. Working memory and rulebased decision-making may therefore play an important role in bookkeepers' ability to quickly and accurately classify and record transactions (Dillard, 1984). It is also clear that participants in the CCT group spent less time on each individual transaction compared to controls, suggesting a CCT-induced improvement in information processing speed. Indeed, of the 12 exercises in our CCT regimen, 8 (66%) had a working memory component, 7 (58%) had a processing speed component, and all had a conceptual reasoning component at increasing rate of difficulty (see **Table 1** and Supplementary Material for descriptions of the exercises and the cognitive skills trained by each). However, transfer to these domains was not formally tested and remains an open area for further research.

Our study has several other limitations. First, the sample size was quite small and warrants replication. Second, the results did not identify cognitive skills that are essential to bookkeeping performance or factors (e.g., age, experience, and compensation) that may predict response to training. Third, the subjects were a highly-educated, young, and relatively-restricted university-based volunteer cohort, unlike the heterogeneous workforce. Subjects were also highly motivated to complete the training (and did so) for secondary gains. Whether workers in a busy work environment facing multiple time- and productivity- pressures would respond similarly is debatable, and the dollar value of CCT to organizations is unknown.

To that end, we have recently reported negative results from a larger trial of CCT in organizational settings (Borness et al., 2013). This trial was based on short (15–20 min) selfadministered sessions, whereas the current study provided 60 min group-based, supervised sessions. Supervision and support may therefore be crucial for training success (Salas and Burke, 2002; Sandford, 2003; Medalia and Richardson, 2005), and was associated with greater effects in healthy older adults (Kelly et al., 2014). Other differences between these two studies, most notably the use of distinct CCT programs and choice of productivityrelated endpoints emphasize the need for more specific protocols in this area.

Improving workforce cognition along the life span may be a key factor in maintaining economic prosperity in the context of ageing populations and growing weight of cognitively demanding tasks in the contemporary workplace. Though limited in scope, our laboratory-based study suggests that supervised CCT can help improve performance on a work-related task such as bookkeeping. Field-based research in workplace settings is now required to test the idea that CCT can boost workers' cognitive abilities and translate to enhanced real-world occupational outcomes such as work productivity.

## **ACKNOWLEDGMENTS**

Amit Lampit is supported by the Dreikurs Bequest. Michael Valenzuela is a National Health and Medical Research Council of Australia research fellow (ID 1004156), and received funding from the Brain Department LLC as well as in-kind support from BrainTrain Inc for projects unrelated to this study. We thank the study participants as well as Silvia Kucera, Martin Samek, Alla Fayvishenko, and Eva Hammerschmid for their help in conducting the study.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fpsyg*.*2014*.* 00794/abstract

## **REFERENCES**


**Conflict of Interest Statement:** Dr. Valenzuela received funding from the Brain Department LLC as well as in-kind support from BrainTrain Inc for projects unrelated to this study.

*Received: 17 February 2014; accepted: 06 July 2014; published online: 28 July 2014. Citation: Lampit A, Ebster C and Valenzuela M (2014) Multi-domain computerized cognitive training program improves performance of bookkeeping tasks: a matchedsampling active-controlled trial. Front. Psychol. 5:794. doi: 10.3389/fpsyg.2014.00794 This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Lampit, Ebster and Valenzuela. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## On the effect of chess training on scholastic achievement

## *William M. Bart\**

*Department of Educational Psychology, University of Minnesota, MN, USA \*Correspondence: bartx001@umn.edu*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

*Reviewed by:*

*Philippe Chassy, Liverpool Hope University, UK*

**Keywords: chess, cognition, education, scholastic achievement, expertise**

What are the effects of chess training especially on scholastic achievement among school-aged students? Can chess instruction facilitate the acquisition of scholastic competency? The current state of the research literature is that chess training tends not to provide educational benefits. This article provides a critical review of research on the effects of chess training on the scholastic achievement levels of school-aged students.

## **EDUCATIONAL BENEFITS OF CHESS**

Various studies and conference presentations (e.g., Christiaen and Verholfstadt, 1978; Liptrap, 1998; Bart and Atherton, 2004) provided results in support of the educational benefits of chess instruction in the schools. Gobet and Campitelli (2006) reviewed that research and reached the following conclusions: (a) the educational effects of optional scholastic chess training remain undetermined; (b) compulsory scholastic chess instruction may engender motivational problems among students; and (c) chess instruction may be beneficial among novices, but is less important among intermediate and advanced players for whom the amount of practice and the acquisition of knowledge are of paramount importance. Gobet et al. (2004) contended that such conclusions are in line with the view of de Groot (1977, 1978) that educational benefits of chess instruction are likely "low-level gains" such as improvements in attention and concentration and interest in learning, rather than "high-level gains" such as improvements in intelligence, scholastic achievement, and creativity.

Additional research is supportive of de Groot's view. Bilalic et al. (2007) ´ determined that intelligence explained a smaller amount of the variance in chess skill among competent young chess players than the amount of practice time. Waters et al. (2002) also found little support for a relationship between intelligence and chess skill.

Contrary to the results of Gobet and Campitelli (2006) that chess instruction provides very modest if any educational benefits is research that attests to the benefits of chess training. For example, Smith and Cage (2000) reported the effects of 120 h of chess instruction on the mathematics achievement among rural, African-American secondary school students in northern Louisiana. They determined that the treatment group composed of 11 females and 10 males scored significantly higher in mathematics achievement and non-verbal cognitive ability than the control group composed of 10 females and 10 males after controlling for differences among pretest scores.

In a more recent study Aciego et al. (2012) used a quasi-experimental study to examine the cognitive effects of chess training. The experimental group consisted of 170 students, 6–16 years of age, who received extracurricular chess instruction. The comparison group consisted of 40 students in a similar age range. Those students received extracurricular sports (soccer or basketball) activities. The Wechsler Intelligence Scale for Children (WISC-R) and a record completed by the tutor-teacher to measure problem solving were dependent variables.

After adjusting for pretest scores, the chess group registered significantly higher posttest scores than the sports group for five of nine WISC-R subtests—i.e., the Similarities, Digit Span, Block Design, Object Assembly, and Mazes subtests. The chess group also registered significantly higher posttest scores in problem solving capacity than the sports group. The authors concluded that chess is a "valuable educational tool" (p. 558).

In another recent study Kazemi et al. (2012) examined the cognitive effects of chess play. They employed an experimental group composed of 86 randomly selected school-aged students, who received chess instruction for six months, and a control group of 94 randomly selected school-aged students. All participants were male and from 5th, 8th, and 9th grades from schools in Shanandaj in western Iran. All participants were administered a measure of metacognitive ability and a grade-appropriate mathematics exam prior to and after the intervention.

The chess group participants registered significantly higher posttest metacognitive ability scores and higher posttest mathematics test scores than the non-chess group participants. A major conclusion of the study is that chess instruction improves significantly the mathematical abilities and the metacognitive capacities of school-aged students.

In a third study, Trinchero (2013) examined the effects of chess instruction on the mathematical ability of primary school students. His study involved 568 primary school children in Italy placed in four groups: (1) experimental, (2) control, (3) experimental without a pretest, and (4) control without a pretest. The experimental group received chess training in addition to ordinary class lessons. The control group only received ordinary class lessons. One prominent result was that the experimental group that received chess training registered a modest but statistically significant increase in scores on mathematics test items that required problem-solving skills on complex tasks. That effect was greater among students who had more hours of chess instruction.

These last four studies lend support to the view that chess training has positive cognitive effects on regular schoolaged students. In addition, there are some studies that address the issue of cognitive effects of chess training on school-aged students with disabilities.

Scholz et al. (2008) investigated the effects of chess training on mathematics learning among students with learning disabilities based on intelligence scores in the 70–85 IQ range. School classes from four elementary schools in Germany were randomly assigned to two groups: (a) an experimental group that received chess instruction of one hour per week for one entire school year; and (b) a comparison group that received supplementary mathematics instruction of one hour per week. The two groups did significantly differ in their calculation abilities for simple addition tasks and counting. The authors concluded "chess could be a valuable learning aid for children with learning disabilities" (p. 138).

In a second study Barrett and Fish (2011) investigated the cognitive effects of a 30-week chess-training program within mathematics classes for students in special education in a middle school in southwestern United States. All participants qualified for special education services and were in either 6th, 7th, or 8th grades. A sample of 31 participants were randomly placed into two groups: (a) an experimental group composed of 15 students who received the chess instruction along with a sizable portion of the regular instruction in resource mathematics specially designed for students in special education; and (b) a comparison group composed of 16 students who received all of the regular mathematics instruction in resource mathematics rather than any chess instruction. The dependent variables for this study were scores on the mathematics Texas Assessment of Knowledge and Skills (TAKS) that is a standardized test of mathematics competencies for pre-collegiate students in Texas, and end-of-year course grades in resource mathematics. All participants completed a version of the mathematics TAKS that was modified for special education students.

This study had some interesting results. First, there was a significant relationship between chess instruction and end-ofyear grades. Second, there was a statistically significant relationship between chess instruction and mathematics TAKS test scores.

This study provided support that chess instruction facilitates transfer of cognitive skills from chess to mathematics for students in special education.

These latter two studies using special education students indicate that chess instruction has the potential to promote mathematics achievement among students in special education.

In a third study Hong and Bart (2007) examined the cognitive effects of chess instruction on students at risk for academic failure in Korea. The total sample of participants was 38 students from three elementary schools randomly placed into two groups: (a) an experimental group that received 90-min chess lessons weekly for 3 months; and (b) a comparison group that received regular school activities after class.

All participants were tested prior to and after the intervention with the following several instruments: (a) the Test of Nonverbal Intelligence—Third Edition (TONI-3) to measure cognitive ability in a language-free manner; and (b) a Chess Skill Rating method using chess software to assess level of chess competency.

TONI-3 posttest scores and chess skill ratings were significantly correlated after controlling for TONI-3 pretest scores. This statistical finding suggests that chess instruction that produces higher chess skill ratings may lead to gains in levels of nonverbal intelligence among students at risk for academic failure.

These studies indicating positive effects of chess training among students with disabilities support the view of Storey (2000) who extolled the use of chess training as a means to promote higher-order thinking skills among disabled students. Storey (2000, p. 47) recommended that teachers consider "chess as an instructional strategy for reinforcing skills such as concentration, problem identification, problem solving, planning strategies, creativity, and lucid thinking."

But why would chess training lead to improvements in scholastic achievement? To play chess well, one must attend to and comprehend chess positions and induce patterns among the pieces, an indication of fluid intelligence and concentration capacity. The chess positions can be very complex with up to 32 pieces from six piece types arrayed on a 64 square board.

One must then formulate and evaluate possible moves, an indication of executive functioning and critical thinking. For example, middle game positions often permit 30 different legal moves at every turn. The chess player must ideally evaluate positions resulting from such moves selecting the move that produces the position most advantageous to the player. The chess player must evaluate chess moves and their resulting positions without actually moving any pieces. There are thus substantial demands on visual working memory.

In chess, one must engage in this sequence of (1) position comprehension, (2) pattern induction, and (3) move formulation and evaluation relatively quickly. This coordinated set of cognitive skills required in competent chess play likely transfers to the learning of mathematics and related fields that also often require comprehension, induction, analysis, and evaluation of complex phenomena. This constitutes a theoretical framework why chess training likely has cognitive benefits. This cognitive explanation for the benefits of chess training is compatible with comparable explanations provided by Storey (2000) and Trinchero (2013).

## **CHESS TRAINING, EXPERTISE, AND THE ISSUE OF RESEARCH RIGOR**

The research reported thus far provides evidence that chess training has salutary cognitive and educational effects among school-aged students. However, the argument of Gobet and Campitelli (2006) needs to be considered before we can be confident that chess training is a valid means to improve scholastic achievement levels. To Gobet and Campitelli (2006), rigorous experimental research is needed to determine the extent to which chess training has strong cognitive and educational effects.

However, such rigorous experimental inquiry involving, for example, random placement of participants into experimental and control groups is costly and difficult to implement in a school setting. What is needed is an increase in the quality and quantity of empirical studies to determine the extent to which the acquisition of chess expertise facilitates the acquisition of scholastic expertise among students.

## **REFERENCES**


counterarguments," in *Chess in the Classroom. An answer to NIE,* ed H. Lyman, (Saugus, MA: The Massachusetts Chess Association and the American Chess Foundation), 1–10.


of southern, rural, black secondary students. *Res. Sch.* 7, 19–26.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 March 2014; accepted: 29 June 2014; published online: 08 August 2014.*

*Citation: Bart WM (2014) On the effect of chess training on scholastic achievement. Front. Psychol. 5:762. doi: 10.3389/fpsyg.2014.00762*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Bart. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Restricting range restricts conclusions

## *Nemanja Vaci , Bartosz Gula and Merim Bilalic´ \**

*Department of General Psychology and Cognitive Science, Institute of Psychology, Alpen-Adria University Klagenfurt, Klagenfurt, Austria \*Correspondence: merim.bilalic@aau.at*

#### *Edited by:*

*David Zachary Hambrick, Michigan State University, USA*

#### *Reviewed by:*

*Fred Oswald, Rice University, USA Fernand Gobet, University of Liverpool, UK*

**Keywords: expertise, skill acquisition, chess, Elo rating, gender differences, gerontology, talent**

## **THE EXPERTISE APPROACH AND SKILL ACQUISITION**

Research on expertise is by definition focused on a restricted sample of individuals. Experts are people who consistently produce outstanding performance in their domains (Ericsson, 2006) and as such are without exception located on the positive side of the skill distribution. The usual approach in the study of expertise is to compare the extreme group of the skill distribution, experts, with the extreme group at the other end, that of novices. This contrasting approach, which we have called the "expertise approach" (Bilalic et al., 2010, 2012 ´ ), has a long tradition (Chase and Simon, 1973; Simon and Chase, 1973; De Groot, 1978; Preacher et al., 2005). Its main advantage over the common approach in cognition, where all participants are at the same skill level, is the presence of a control group of novices that enables falsification of results obtained on experts (Wason, 1960; Kuhn, 1970; Campitelli and Speelman, 2013). In that way, the expertise approach is not unlike the neuropsychological approach that contrasts results obtained on patients with the results of "normal" participants (Shallice, 1988).

The main goal of the expertise approach is to provide evidence relating to the cognitive and neural mechanisms behind processes such as object and pattern recognition, which would be difficult to obtain from subjects who possess approximately the same level of expertise. The skill acquisition process, which is one of the main topics of expertise (William and Harter, 1899), is of secondary importance in the expertise approach. This is understandable as the contrast between experts and novices captures only the beginning and the end product of the skill acquisition process. It is unrealistic to follow people for the length of time required in order to achieve expertise in a given domain. However, expertise researchers have recently started employing an archival approach that provides a more complete picture of the skill acquisition process (Charness and Gerchak, 1996; Chabris and Glickman, 2006; Howard, 2008, 2009; Bilalic et al., 2009 ´ ). In the game of chess, a domain commonly studied in expertise research, there are precise records of all practitioners from an early age (Howard, 2006a; Bilalic et al., 2009 ´ ). These records include not only personal information such as gender and age, but also skill levels at different stages, numbers of games played, and corresponding skill change. The records provide a wealth of data for investigating the influence of factors such as age, gender, and even talent, on the skill acquisition process. Here we want to draw attention to the fact that some of the databases used in previous research only provide records of the very best practitioners. In the expertise approach, such restriction is an integral part of the methodology, but restricting the range of population in the archival approach could have grave consequences for the conclusions about the nature of skill acquisition.

## **DIFFERENT DATABASES, DIFFERENT CONCLUSIONS**

One of the advantages of chess as a domain is that there is an objective and reliable measure of skill. Skill is measured on an interval scale that reflects the performance of a player against other players. The Elo rating, named after Arpad Elo who introduced the scale as a measure of chess skill (Elo, 1978), is measured in the same way all over the world. A beginner is supposed to have 600–800 Elo points, average players about 1500 Elo points, master players above 2200 Elo points, while the very best players, called grandmasters, have ratings above 2500 Elo points. Expert players are considered to have above 2000 rating points.

The most frequently used database in skill acquisition studies is the database of the International Chess Federation, FIDE (for more information, see Howard, 2006a). This database, like other chess databases we will mention here, offers multiple advantages for skill acquisition research. Firstly, it gathers records from the 1970s to the present, and so it is possible to obtain trajectories of ratings over the course of players' lives. Secondly, it represents the whole population of the very best players in the world. Thirdly, this database contains multiple measurement points from players, so it can be used to observe individual skill trajectories. Besides rating points, numerous other variables are recorded (e.g., number of games played, gender of participants, nationality, title, rating change, rating rank) which could be used for research purposes (Howard, 2006a). In other words, the FIDE database offers a fruitful basis for exploration and description of the multiple factors and processes behind chess skill acquisition.

For all its advantages, the FIDE database provides only the records of the very best players. Due to technical and logistical reasons, the FIDE database at the beginning logged only master level players (above 2200 Elo). Only in the 1990s was the level lowered to expert level players (2000 Elo) and then in the last decade to the level of average players (1500 Elo and below). In other words, the worst players in the FIDE database are still average practitioners.

The ideal situation would be to have records of all players from the very beginning of their careers, not only when they reach a particular level of expertise. In this scenario, the database would also encompass people who for whatever reasons do not become experts. Fortunately, there are such databases. National databases, such as the databases of the German Chess Federation and the United States Chess Federation (USCF), keep records of all their members and thus represent the whole population of competitive (national) players. They provide all the information the FIDE database offers without restricting the range of skill.

If one is trying to examine factors that influence skill acquisition, a database that contains only average and above-average players should not be the starting point of investigations. As is well known in the field of statistics, the conclusions obtained on data with restricted range could be misleading especially if they are not obtained by appropriate analyses (Long, 1997; Sackett and Yang, 2000). The best possible way to examine differences in effects that researchers could obtain while analyzing the data is through the comparison of effects made on different databases. Here we illustrate the possible pitfalls of skill range restriction by comparing distributions of ratings from a database with restricted range (FIDE) and a database with unrestricted range (German).

The FIDE and German databases contain similar number of practitioners, around 120,000 (see **Figure 1**). However, the ratings of the two distributions overlap only at the highest values of the German distribution and the lowest values of FIDE distribution. Not only are the mean and variance of both distribution vastly different, but also other parts of the distribution, such as quartiles, are also extremely different.

Restricting the database to the best players also has a consequence for the skill trajectories of players. One needs time to become an expert and the players in the FIDE database are older (37 years) than the players in the German database (32 years). The differences between the two databases are also evident when we compare typical skill trajectories. **Figure 2A** shows the FIDE players entering the database at around age 10, having already become competent players (rating of 1900 Elo), with a subsequent shallow increase to the peak at age 39. In contrast, the German players have a steeper increase, since they are entered the database as novices and learn faster at beginning skill level, as implied by the power law of practice (Newell and Rosenbloom, 1981), until the same peak at age 39. The decline in later years is also different in the two databases with FIDE players declining faster than their German counterparts.

One could say that the FIDE and German players have vastly different ratings that make the comparison between them difficult. One way around this

problem is to standardize ratings in each database separately and check if the skill trajectories are similar in both datasets. **Figure 2B** shows the standardized rating as a function of age. Again, on average FIDE players start at a higher skill level but improve more slowly. They also have a higher peak but their decline is so rapid that in the latter stages of their careers their standardized performance is lower than the standardized performance of their German colleagues.

## **EXPLAINING CONTRADICTORY RESULTS**

We have demonstrated that there are vast differences between the databases commonly used in the archival approach to skill acquisition. The two datasets are completely different in the range of values as well as in the number of participants that are obtaining a particular rating. The restricted databases, such as FIDE, do not represent the whole skill range and may provide inadequate answers to the questions under investigation. The restriction of range and its consequences may also explain some of the inconsistencies and contradictory findings in the field.

For example, Roring and Charness (2007) used the FIDE databases to investigate age effects on skill acquisition. They demonstrated the peak age in chess skill to be around 43 years, much later than previously proposed peak around 35 years (e.g., Howard, 2005). Another surprising result was the fact that the decline is steeper for initially lower rated participants than for higher rated participants. In other words, initially more able participants were declining significantly more slowly than their initially weaker colleagues. Our illustrations (**Figures 2A,B**) indicate that both peak age and declining rate are influenced by the range restriction in the FIDE database. The conclusion would be significantly different if a whole range database, such as the German database, were used.

To further illustrate possible consequences of the range restriction, we can consider the inconsistent findings in the research on gender differences in skill acquisition. It is notable that the studies using the restricted FIDE database regularly find gender differences in skill acquisition (Howard, 2005, 2006b, 2014 but see, Bilalic and McLeod, 2006, ´ 2007). Furthermore, the studies using the national German and USCF databases (Chabris and Glickman, 2006; Bilalic et al., ´ 2009) also noted the differences in the mean and highest ratings of women and men. However, using the unrestricted range and the full lifespan data, they observed that factors such as participation rates and dropout rates could explain the differences. This kind of analysis is impossible with the FIDE database where dropouts are not recorded because the people concerned stopped playing chess before they achieved expert level.

Researchers using the FIDE database to investigate talent (Howard, 2008, 2009) face a similar problem. The time required to reach a certain level and the amount of practice (as measured by games played) may well provide clues about the different natural endowments of certain players. This in turn may allow us to speculate about different levels of talent. It is, however, impossible to make any certain conclusions if we lack the very first part of their skill acquisition process, as we do in the FIDE database. As with the gender factor in skill acquisition process, the differences in the early stages may as well overshadow the differences at the highest level. Similarly, the causes behind dropouts may remain unresolved because the data of the people who for whatever reasons stopped playing chess is not available.

Both the expertise and archival approaches are important vehicles for the investigation of expertise and cognition in general. The restricted range of focus in the expertise approach is a fundamental part of the methodology and an advantage over usual research on cognition. The archival approach offers the possibility of capturing the full cycle of the long-term skill acquisition process. In this paper we have demonstrated that the results obtained on the restricted range do not necessarily generalize to the whole range of values. The effects obtained with restricted range cannot and should not be used to make inferences about the mechanisms and factors that influence skill acquisition. When we restrict our data, we restrict our conclusions too.

#### **ACKNOWLEDGMENT**

We would like to thank Frank Hoppe for providing the German database and Robert Gaschler for the FIDE database.

### **REFERENCES**


of a large cohort of competitive chess players. *Psychol. Sci.* 17, 1040–1046. doi: 10.1111/j.1467- 9280.2006.01828.x


practice," in *Cognitive Skills and Their Acquisition*, ed J. R. Anderson (Hillsdale, NJ: Erlbaum), 1–55.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 April 2014; accepted: 22 May 2014; published online: 12 June 2014.*

*Citation: Vaci N, Gula B and Bilali´c M (2014) Restricting range restricts conclusions. Front. Psychol. 5:569. doi: 10.3389/fpsyg.2014.00569*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Vaci, Gula and Bilali´c. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## *Scott Barry Kaufman1,2\**

*<sup>1</sup> The Imagination Institute, Philadelphia, PA, USA*

*<sup>2</sup> Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA*

*\*Correspondence: sbk@psych.upenn.edu*

*Edited by:*

*David Zachary Hambrick, Michigan State University, USA*

*Reviewed by:*

*Fernand Gobet, University of Liverpool, UK*

#### **Keywords: expertise, intelligence, motivation, individual differences, expert performance, inspiration, general cognitive ability**

I recently had the pleasure of editing a volume of essays on the determinants of greatness (Kaufman, 2013a). A variety of perspectives were represented in the volume, including behavioral genetics, individual differences, and expert performance. The clearest conclusion from the volume was that the development of high achievement involves a complex interaction of many personal and environmental variables that feed off each other in nonlinear, mutually reinforcing, and nuanced ways, and that the most complete understanding of the development of elite performance can only be arrived through an integration of perspectives.

To help spur more integration, I suggest that cognitive psychologists who are studying deliberate practice and chunking, and individual differences researchers who are investigating cognitive ability and personality, focus more on common ground. I've noticed that the debate often ends up being "innate talent vs. deliberate practice" (see Ericsson et al., 2007; Ericsson, 2014), when that false dichotomy is detrimental to scientific progress (Gobet, 2013; Kaufman, 2013a). Deliberate practice defined by Ericsson (2013) as "engagement with full concentration in a training activity designed to improve a particular aspect of performance with immediate feedback, [and] opportunities for gradual refinement by repetition and problem solving"—depends on many traits which vary in the general population, and which have a genetic basis. But that doesn't mean that heritable traits are necessarily "immutable constraints on the acquisition of various types of expert performance" (Ericsson, 2014).

Given our current state of scientific knowledge, I hope we can all agree that:


Assuming researchers can agree on these seven basic principles, a fruitful research direction is the investigation of the manner in which individual differences influence (but not necessarily constrain) the development of expertise. One mode of operation is by influencing the *efficiency* of expertise acquisition, therefore speeding up the rate of acquisition. Ericsson (2013) acknowledges that the 10,000 h of practice he found among elite violinists at age 20 was just an *average*, with substantial variation around the mean. In fact, Simonton has found across the arts, sciences, and leadership, that those with the greatest lifetime productivity and highest levels of eminence required the *least* amount of time to acquire the requisite expertise (Simonton, 1991a,b, 1992, 1997, 1999).

General cognitive ability is one factor that can influence the efficiency of expertise acquisition. Individual differences researchers have spent over 100 years studying patterns of variation in cognitive ability (e.g., Carroll, 1993; Jensen, 1998). Brain imaging studies support the idea that people who do well on tests of cognitive ability use fewer brain resources to solve novel and complex problems (Haier et al., 1992; Neubauer and Fink, 2009; Van den Heuvel et al., 2009; Deary et al., 2010; Prabhakaran et al., 2011). Unfortunately, this literature (which emphasizes cognitive efficiency) is not well integrated with the research of cognitive psychologists who emphasize deliberate practice, chunking, and strategy use. However, I believe these various approaches are better suited for integration than it may seem at first blush.

Consider a set of studies conducted by Bor and colleagues, in which they found that chunking consistently activates the prefrontal-parietal brain network (Bor et al., 2004; Bor and Owen, 2007; Bor, 2012; Bor and Seth, 2012). Bor and Owen (2007) had participants memorize unfamiliar verbal and numerical double-digit sequences. The sequences were either *randomly arranged* (e.g., 31, 24, 89, 65) and therefore not conducive to the use of strategies—or *structured* (e.g., 57, 68, 79, 90)—which made them amenable to the use of chunking strategies. The prefrontalparietal brain network was consistently most active during the *structured* trials, even though the unstructured trials placed a higher demand on working memory, and were more difficult for participants to memorize.

The prefrontal-parietal network has also been heavily implicated on tests of working memory and general cognitive ability (Prabhakaran et al., 2000; Jung and Haier, 2007; Colom et al., 2009). The research of Bor and colleagues suggests that one of the primary functions of the prefrontal-parietal brain network is the conscious detection of patterns, which aids in the efficiency of learning. Indeed, Spearman (1904) argued that the best measure of general cognitive ability requires grasping relationships, inferring rules, noticing similarities and differences, and "educing" (Lating for "drawing out") the relevant relations in a complex pattern. Indeed, the Ravens Progressive Matrices test—which is strongly correlated with the general cognitive ability factor—appears to measure these skills (Conway et al., 2003). The Ravens test places a heavy burden on working memory because you must engage in fluid reasoning on the spot, with no external aids and often with strict time limits. However, those who have more efficient cognitive strategies for lessening the cognitive load will be at a distinct advantage in this testing environment.

Consistent with this idea, Nandagopal et al. (2010) had twins think aloud while they solved various tasks, including an associative learning task that is significantly correlated with general cognitive ability (see Kaufman et al., 2009). They found that performance on tests of cognitive ability were heavily influenced by the use of strategies, and differences in strategy use on an associative learning task (which was amenable to use of strategies) explained a significant amount of the genetic influences on performance.

Their study raises the intriguing suggestion that the heritability of general cognitive ability may be due, in part, to *the ability to efficiently chunk information in working memory*. Therefore, while Ericsson (2014) may be right that cognitive ability does not necessarily *constrain* the acquisition of expertise, it's still entirely possible that cognitive ability *influences* the efficiency and rate of expertise acquisition (especially when expertise acquisition draws heavily on general cognitive ability; 2014 special issue). Consistent with this, Meinz and Hambrick (2010) found that although deliberate practice accounted for 45.1% of the variation in piano sight-reading performance among expert pianists, working memory accounted for an additional 7.4% of the variance.

Of course, cognitive efficiency isn't the only way that individual differences can influence expertise acquisition. Another mode of operation is by *sustaining the motivation to practice over an extended period of time*. Ericsson et al. (1993) acknowledged this possibility when they say: "It is quite plausible, however, that heritable individual differences might influence processes related to motivation and the original enjoyment of the activities in the domain and, even more important, affect the inevitable differences in the capacity to engage in hard work (deliberate practice)" (p. 399). Even Arthur Jensen (one of the biggest proponents of general cognitive ability) once concluded that "some kind of motivational factor that sustains enormous and prolonged interest and practice in a particular skill *probably plays a larger part in extremely exceptional performance* than does psychometric *g* or the speed of elementary information processes (Jensen, 1990, p. 259, italics added)."

I believe an overlooked characteristic that influences the motivation to engage in deliberate practice is *inspiration* (Kaufman, 2013b). When people become inspired, they usually are inspired to realize some future image of themselves (Torrance, 1983). It is the clarity of this vision, and the belief that the vision is attainable, that can propel a person from apathy to engagement, and sustain the energy to engage in deliberate practice over the long haul, despite obstacles and setbacks. Indeed, Todd Thrash, Andrew Elliot, and colleagues have conducted multiple studies showing that inspiration (measured both as a trait and a motivational state) is associated with an approach motivation, positive emotions, and an increase in creative productivity (Thrash and Elliot, 2003, 2004; Thrash et al., 2010).

In fact, in one of their studies (Thrash et al., 2010), inspiration not only predicted the creativity of writing samples in science and poetry, but also increased the *efficiency* of the writing samples (e.g., a larger number of typed words that were retained in the final product, and less time pausing and more time writing). This raises the intriguing idea that *motivational characteristics may cause an increase in cognitive efficiency*, which would ultimately increase the rate of expertise acquisition. I believe this is a promising area for future research.

These are just a few examples of how the cognitive psychology approach to expertise and the investigation of individual differences can be more tightly integrated. To conclude: while others have suggested the importance of computer modeling for integration (Gobet, 2013), I have argued here that other important contributors to scientific progress are accurate framing of the issues, standing on a common ground of assumptions, and investigating the influence of traits on the development of expertise.

## **ACKNOWLEDGMENT**

Thanks to Zach Hambrick for his encouragement to submit this paper.

## **REFERENCES**


attention, working memory, and chunking. *Front. Psychol.* 3:63. doi: 10.3389/fpsyg.2012.00063


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 19 June 2014; published online: 09 July 2014.*

*Citation: Kaufman SB (2014) A proposed integration of the expert performance and individual differences approaches to the study of elite performance. Front. Psychol. 5:707. doi: 10.3389/fpsyg.2014.00707*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Kaufman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**OPINION ARTICLE**

## Expertise: defined, described, explained

#### *Lyle E. Bourne Jr. <sup>1</sup> \*, James A. Kole2 and Alice F. Healy1*

*<sup>1</sup> Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA*

*<sup>2</sup> School of Psychological Sciences, University of Northern Colorado, Greeley, CO, USA \*Correspondence: lyle.bourne@colorado.edu*

#### *Edited by:*

*David Z. Hambrick, Michigan State University, USA*

#### *Reviewed by:*

*David Z. Hambrick, Michigan State University, USA Jonathan Wai, Duke University, USA*

**Keywords: expertise, definition, description, explanation, mind, brain**

## **INTRODUCTION**

Science aims to define, describe, and explain significant natural phenomena. Each of these goals of science suggests an increasingly deeper understanding of the target phenomenon. We discuss in this paper how these goals are or might be realized in the science of expertise.

## **DEFINITION**

Definitions are given in an attempt to identify phenomena and to delineate examples from non-examples. Expertise is consensually defined as elite, peak, or exceptionally high levels of performance on a particular task or within a given domain. One who achieves this status is called an *expert* or some related term, such as *virtuoso, master, maven, prodigy*, or *genius*. These terms are meant to label someone whose performance is at the top of the game. An expert's field of expertise can be almost anything from craftsmanship, through sports and music, to science or mathematics. People usually agree on examples of expertise, like Yo-Yo Ma (musical performance), Fred Astaire and Ginger Rogers (ballroom dancing), Antiques Roadshow Appraisers, Albert Einstein (physics), Tiger Woods (golf), Bette Davis (acting), Nelson Mandela (politics), or Hillary Rodham Clinton (international relations).

Why different terms? Each term carries with it a slightly nuanced meaning. Shaded meanings vary in their emphasis on experience or constitutional factors as the source of high levels of performance. The term chosen to characterize superior performance carries with it an implied cause. Like expert, *virtuoso* or *master* is the result of hard work and long training. If talent is involved, it is a talent for hard labor. In contrast, *prodigy*, like *genius*, results from an endowment, which shows up early in life without the benefit of training.

It might be appealing to the layperson to believe that a genius is just born that way. Elite performance just comes natural to a genius; you don't have to invest all that time and effort on training, because if you don't have what it takes you'll never get there. Moreover, you don't have to explain why you never had a significant insight, because you just didn't inherit the right abilities or genes. But the facts seem to be that, although people do differ in something called ability or talent, in sports or medicine or any area of human endeavor, talent is a necessary starting point, a platform from which to begin. To become an elite performer one has to capitalize on his or her abilities. Training is the *sine qua non*.

Consider a specific case. Pablo Picasso, Spanish painter and sculptor, was one of the greatest and most influential artists of the 20th century. Born into a family that cultivated the arts, he demonstrated extraordinary artistic ability at an early age, encouraged by his parents. All the elements were in place for Picasso—paints, brushes, canvases, and parents who could recognize good artistic work. Painting in the beginning in a naturalistic manner, his style changed later in life as he experimented with different theories, techniques, and ideas, for example, creating (with Braque) a unique style that has come to be known as cubism. There is no doubt that Picasso was a child *prodigy*. He had an ability to create significant objects that the art world and collectors recognized early on for their value. He seems to have been endowed with pure *genius* for painting and sculpting. But it is less often recognized that he was trained classically in the arts and that he worked incessantly at his craft, devoting long hours day and night. And, over time, the quality of his work improved, as judged by his peers, and expanded into previously unexplored areas and techniques. He could produce new paintings later in life quickly, some consisting of little more than three or four strokes of his pen, and more or less at will, each of them a *virtuoso* performance. But that performance was based on a level of *expertise* achieved, by dint of hard work, by few other mortals. Picasso is but one case of expertise and, as such, cannot validate a general rule. Nonetheless, his accomplishments are clearly based on a combination of ability and effort, a characteristic that other experts share.

## **DESCRIPTION**

We all know an expert when we see one. Normally people will quickly recognize the difference between expertise and normal or ordinary performance in any domain. Expertise, itself, is a descriptive term. To describe is to add detail in the specific case to a more general definition. A description of expertise requires an inventory of what the expert knows, knows how to do, wants or intends to do, and what he or she does or achieves. Psychologically, knowledge and skills are mental or cognitive concepts. They are not material entities, known by their physical make-up, but rather they are states of mind. This fact alone does not make them unscientific. Rather they are quite sound scientific concepts, known by their function, by the behavior potential they provide. Mind, knowledge, skill, and other cognitive concepts are analogous to gravity in physics or evolution in biology, understood in terms of their effects or functions, not by their material structure.

Obviously, there is more to expertise that just acquiring the right knowledge and skills. Expertise is based in some measure on the resources a person comes equipped with, his or her natural talent or biological endowment. We put an emphasis on practice and experience primarily because their contribution to expert performance is too often overlooked or minimized by the layperson (Ericsson et al., 1993). But clearly, inherited prodigiousness, body characteristics, dexterity, and the like, which are part and parcel of the equipment we come to any task with, all play a role, allowing some people just to be flat out better prepared than others. These natural factors provide essentially a foundation for expertise in any task. Given abilities and potential are useless unless they are capitalized on or activated by experience and practice, and, conversely, practice might be futile if one doesn't have some initial capacity. Both endowment and experience must be a part of a complete description of expertise.

Thus, to describe expertise is to identify the endowed resources, catalog the knowledge, and specify the skills of a person who is capable of performing in some domain at the very highest level, achieved by few others (perhaps by only a very small percentage of the general population).

## **EXPLANATION**

How do experts get to be experts? What's the explanation? Is there something deeper than a description that we need to know about expertise? Maybe the brain or genes are at the bottom of it.

Mind and brain are often conflated terms and used interchangeably (Bourne and Healy, 2014). It is tempting to equate mind and brain, and it's quite commonly done among psychologists and other scientists, not to mention laypersons. Brain scanning is often used to study "how the mind works." The general assumption behind brain-scanning procedures is that the brain provides a mechanism for mental functions. Thus, you commonly come across phrases such as, "How to train the brain," "The brain learns (this, that, or the other thing)," "Learning is a rewiring of the brain," or "The brain is the mind's machine." The implication is that the brain causes thinking and behavior to be what they are. The psychological aspects of behavior are caused by a material, biological entity called the brain. Thus, the ultimate explanation for why and how we behave as we do is to be found in a material thing called the brain. In theories of this type, the brain is the *deus ex machine* that resolves difficulties we might have in understanding why people behave as they do.

But the facts of the matter are different. Training, experience, and practice directly change the mind (i.e., thought and behavior), but only indirectly the brain. It is a person or a mind, defined by the collection of all current knowledge and skill, that is trained, not a brain. A person learns, not the brain. That is not to say that the brain and what goes on in the brain are irrelevant, inconsequential, or unimportant in skill or knowledge acquisition. Quite the contrary, what happens in the brain as we learn and behave is essential to understanding the mind. As thinking happens, so do brain processes. Mind and brain processes are time-locked, and one can actually measure brain changes during thought. Still, there is no good reason to believe that one of these processes, say, brain activity, is more fundamental or causes the other, thought or behavior, to be what it is. In fact, the other causal direction that thought causes brain activity to be what it is—is just as plausible. Consider the possibility that neither causes the other in a direct way but that both are going on in parallel simultaneously and in an interrelated way at all times. We think of that position as consistent with the long accepted first principle of the unity of the sciences. What we observe to be true in one domain of science should not conflict with what we observe at the same time in another domain of science. What we observe to be true psychologically should be consistent with what we observe biologically (or chemically, physically, etc.). Thus, mind (psychological) and brain (biological) are unique but different, and both will reflect, in their different ways, the expression of expertise in behavior.

So what is the explanation for expertise? Consider this, can you explain something you cannot first describe? Logically, we need to be able to describe expertise before attempting to explain it. That is why we tried to explicate description before attempting to deal with explanation. The more specific and detailed the description of a phenomenon, the better we understand it. So, given the right description of expertise, what are we missing? Reductionism asserts that explanations go beyond mere description to find more fundamental causes of the target behavior. The causes lie in more basic sciences. For psychological phenomena the immediate causes are likely to be biological. That's why the brain is often invoked as the controller, monitor, or generator of behavior.

Does brain activity then explain behavior? Does the explanation of expertise lie in brain circuitry or in genes? The correlations are there, between brain activity and behavior. But saying my brain made me do it is akin to saying "The Devil made me do it." It's attributing a cause where there is no causal evidence. The available scientific evidence is strictly correlational. No one has yet demonstrated that the independent creation of a brain process will result in the specific behavior for which it is claimed to be the cause. Thus, asserting that the brain causes behavior is a matter of faith or belief. And faith has no place in science. Remember that correlation does not imply causation. Neither does correlation imply explanation. There is no good reason other than faith to believe that the explanation of behavior lies in biological events. The claim that psychological processes or behaviors cause biological events to be what they are is just as plausible or believable.

So, in our view, a scientific explanation (or "deep understanding") of expertise, based on other sciences, remains to be realized. We suggest that, if thorough and complete descriptions of specific cases of expertise can be achieved, then there might be nothing left to explain, at least not in these cases. This possibility suggests that, among other things, the implied difference among the three goals of science (definition, description, explanation) is an illusion. Proper and complete description might supersede the need to explain.

But, if an explanation is to be sought, it will be found, in our view, in the domain of psychology, rather than some physical or biological science. By this psychological explanation, expertise results from practice and experience, built on a foundation of talent, or innate ability. The psychology laboratory has revealed empirically based training principles that further elucidate the explanation of expertise. These principles enable learners to maximize the acquisition, retention, and transfer of knowledge and skills, as summarized in Healy and Bourne (2012).

## **AUTHOR CONTRIBUTIONS**

Lyle E. Bourne Jr. wrote the initial and final drafts of the manuscript. James A. Kole and Alice F. Healy edited the initial draft of the manuscript and suggested revisions of it.

## **ACKNOWLEDGMENTS**

We are indebted to the members of the Center for Research on Training at the University of Colorado for their helpful suggestions about this research. The research reported here was supported primarily by Army Research Office Grant W911NF-05-1-0153.

## **REFERENCES**


*Rev.* 100, 363–406. doi: 10.1037/0033-295X. 100.3.363

Healy, A. F., and Bourne, L. E. Jr. (eds.). (2012). *Training Cognition: Optimizing Efficiency, Durability, and Generalizability.* New York, NY: Psychology Press.

*Received: 09 January 2014; accepted: 15 February 2014; published online: 04 March 2014.*

*Citation: Bourne LE Jr., Kole JA and Healy AF (2014) Expertise: defined, described, explained. Front. Psychol. 5:186. doi: 10.3389/fpsyg.2014.00186*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Bourne, Kole and Healy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Studying real-world perceptual expertise

## *Jianhong Shen1, Michael L. Mack2 and Thomas J. Palmeri 1\**

*<sup>1</sup> Vanderbilt Vision Research Center, Department of Psychology, Vanderbilt University, Nashville, TN, USA <sup>2</sup> Center for Learning and Department of Psychology, The University of Texas at Austin, Austin, TX, USA*

#### *Edited by:*

*Michael H. Connors, Macquarie University, Australia*

#### *Reviewed by:*

*Andy Wills, Plymouth University, UK Michael H. Connors, Macquarie University, Australia*

#### *\*Correspondence:*

*Thomas J. Palmeri, Vanderbilt Vision Research Center, Department of Psychology, Vanderbilt University, 301 Wilson Hall, 111 21st Avenue South, Nashville, TN 37203, USA e-mail: thomas.j.palmeri@vanderbilt.edu* Significant insights into visual cognition have come from studying real-world perceptual expertise. Many have previously reviewed empirical findings and theoretical developments from this work. Here we instead provide a brief perspective on approaches, considerations, and challenges to studying real-world perceptual expertise.We discuss factors like choosing to use real-world versus artificial object domains of expertise, selecting a target domain of real-world perceptual expertise, recruiting experts, evaluating their level of expertise, and experimentally testing experts in the lab and online. Throughout our perspective, we highlight expert birding (also called birdwatching) as an example, as it has been used as a target domain for over two decades in the perceptual expertise literature.

**Keywords: perceptual expertise, expertise, learning, categorization, recognition, birding**

## **INTRODUCTION**

In nearly every aspect of human endeavor, we find people who stand out for their high levels of skill and knowledge. We call them experts. Expertise has been studied in domains ranging from chess (Chase and Simon, 1973; Gobet and Charness, 2006; Connors and Campitelli, 2014; Leone et al., 2014) to physics (Chi et al., 1981) to sports (Baker et al., 2003). Perceptual experts, such as ornithologist, radiologists, and mycologists, are noted for their remarkable ability to rapidly and accurately recognize, categorize, and identify objects within some domain. Understanding the development of perceptual expertise is more than characterizing the behavior of individuals with uncanny abilities. Rather, if perceptual expertise is the endpoint of the trajectory of normal visual learning, then studying perceptual experts can provide insights into the general principles, limits, and possibilities of human learning and plasticity (e.g., Gauthier et al., 2010).

Several reviews have highlighted empirical findings and theoretical developments from research on perceptual expertise in various modalities (for visual expertise, see, e.g., McCandliss et al., 2003; Palmeri and Gauthier, 2004; Palmeri and Cottrell, 2009; Richler et al., 2011; for auditory expertise, see, e.g., Chartrand et al., 2008; Holt and Lotto, 2008; for tactile expertise, see, e.g., Behrmann and Ewell, 2003; Reuter et al., 2012). Here, we instead highlight more practical considerations that come with studying perceptual expertise; we highlight visual expertise because this modality has been most extensively studied. We specifically consider some choices that face researchers: whether to use real-world or artificial objects, what domain of perceptual expertise to study, how to recruit participants, how to evaluate their expertise, and whether to test in the lab or via the web. Throughout our perspective, we use birding as an example domain because it has been commonly used in the literature (e.g., Tanaka and Taylor, 1991; Gauthier et al., 2000; Tanaka et al., 2005; Mack et al., 2007; Mack and Palmeri, 2011).

## **REAL-WORLD vs. ARTIFICIAL DOMAINS OF EXPERTISE**

Expertise-related research has been conducted using both artificial and real-world objects. Artificial objects include simple stimuli like line orientations, textures, and colors (e.g., Goldstone, 1998; Mitchell and Hall, 2014), and relatively complex novel stimuli like random dot patterns (Palmeri, 1997), Greebles (Gauthier and Tarr, 1997; Gauthier et al., 1998, 1999), and Ziggerins (Wong et al., 2009a). Real-world objects include birds, dogs, cars, and other categories (Tanaka and Taylor, 1991; Gauthier et al., 2000). Studies using artificial objects are often training studies, where researchers recruit novices and train them to become "experts" in a domain. Changes in behavior or brain activity are measured over the course of training to understand the development of expertise, making these studies longitudinal. The weeks of training used in these studies can only be a proxy for the years of experience in real-world domains. Because real-world expertise takes so long to develop, most real-world studies are cross-sectional.

An advantage of training studies with artificial objects is the power to establish causality. Experimenters have precise control over properties of novel objects, relationships between them, and how categories are defined (e.g., Richler and Palmeri, 2014). Participants can be randomly assigned to conditions and training and testing can be carefully controlled. As one example, Wong et al. (2009a,b) used novel Ziggerins and trained people in two different ways, one of which mirrored individuation required for face recognition, another of which mirrored the letter recognition demands required for reading. Accordingly, the face-like training group showed behavior and brain activity similar to that seen in face recognition while the letter-like training group showed behavior and brain activity similar to that seen in letter recognition. Studies of artificial domains of expertise can provide insights into real-world domains.

If researchers are interested in understanding what makes experts experts, not just investigating limits of experiencerelated changes, then it is important to complement carefully controlled laboratory studies using artificial domains with the study of real-world experts. Because of their quasi-experimental nature – recruiting novices and those with varying levels of expertise as they occur in the real world – these studies cannot establish unambiguous causal relationships between expertise and behavioral or brain changes. Apart from considerations of external validity, studies of real-world experts permit the study of a range and extent of expertise that cannot easily be reproduced in the laboratory. And practically speaking, testing real-world perceptual experts on real-world perceptual stimuli saves researchers the effort and expense needed to train participants in an artificial domain.

Studies using real-world domains also come full circle to inform studies using artificial domains. For example, consider the classic result of Tanaka and Taylor (1991), reproduced in our own online replication in **Figure 1**. Bird experts categorized birds (their expert domain) and dogs (their novice domain). For novices (Rosch et al., 1976), objects are categorized faster at a basic level (*dog*) than a superordinate (*animal*) or subordinate level (*blue jay*), while for experts (Tanaka and Taylor, 1991; Johnson and Mervis, 1997), objects are categorized as fast at a subordinate level as a basic level. This entry-level shift (Jolicoeur et al., 1984; see also Tanaka et al., 2005; Mack et al., 2009; Mack and Palmeri, 2011) has been used as a behavioral marker of expertise in training studies employing artificial domains (Gauthier et al., 2000; Gauthier and Tarr, 2002).

Our group recently reviewed considerations that factor into studies using artificial domains (Richler and Palmeri, 2014), so here we focus on real-world domains for the remainder of our perspective.

### **DOMAINS OF REAL-WORLD PERCEPTUAL EXPERTISE**

In addition to everyday domains of perceptual expertise, like faces (Bukach et al., 2006) and letters (McCandliss et al., 2003), studies have used domains ranging from cars and birds (Gauthier et al., 2000), where expertise is not uncommon, to more specialized and sometimes esoteric domains like latent fingerprint identification (Busey and Parada, 2010; Dror and Cole, 2010), budgie identification (Campbell and Tanaka, 2014), and chick sexing (Biederman and Shiffrar, 1987). The particular choice of expert domain depends on a combination of theoretical goals and practical considerations.

For example, consider a goal of understanding how the ability to categorize at different levels of abstraction changes with perceptual expertise (Mack and Palmeri, 2011), which impacts understanding of how categories are learned, represented, and accessed. Birding is a useful domain because birders must make subordinate and sub-subordinate categorizations, sometimes at a glance, and often under less than ideal conditions with poor lighting and camouflage. Other kinds of bird experts have different skills: budgie experts (a budgerigar is a bred parakeet) can keenly identify unique individuals in cages, but need not have expertise with other birds, while professional chick sexers can quickly discriminate male from female genitalia on chicken hatchlings. In an entirely different domain, fingerprint experts typically match latent prints with a known sample, with both clearly visible, presented side by side, and with time limits imposed by the analyst, not the environment.

**FIGURE 1 | Mean correct categorization response times for a novice domain (dogs) and an expert domain (birds) measured online.** Following Tanaka and Taylor (1991), bird experts were tested in a speeded category verification task where they categorized images at the superordinate (*animal*), basic (*bird* or *dog*), or subordinate (specific species or breed) level. In their novice domain (dogs), a classic basic-level advantage was observed, whereby categorization at the basic level was significantly faster than the superordinate (*t*<sup>22</sup> = 2.67, *p* = 0.014) and subordinate level (*t*<sup>22</sup> = 6.75, *p* < 0.001). In their expert domain (birds), subordinate categorization was as fast as basic-level categorization (*t*<sup>22</sup> = 0.81, *p* = 0.429). This replication was conducted using an online Wordpress + Flash custom website with only 23 participants from a single short 10 min experimental session. Error bars represent 95% confidence intervals on the level × domain interaction.

There are real-world consequences for studying certain domains of perceptual expertise, such as latent fingerprint examination. Despite the widespread use of forensic evidence – as well as its popular depiction on television – a recent National Research Council of the National Academy of Sciences (2009) noted a "dearth of peer-reviewed, published studies establishing the scientific bases and validity of many forensic methods," especially those methods that require subjective visual pattern analysis and expert testimony. That scientific evidence is emerging, especially in the case of latent fingerprint expertise (e.g., Busey and Parada, 2010; Busey and Dror, 2011).

The choice of domain can also be influenced by various practical considerations. It is easier to study perceptual expertise in a domain with millions of possible participants than an esoteric domain with a few isolated members. It is easier to study a domain where relevant stimuli are widely available in books and online. And it is easier to study a domain without barriers to contact, which can be the casefor experts in the military, homeland security, and certain professions. For example, studies of expert baggage screeners require coordination with the Transportation Security Administration (TSA) and many details regarding stimuli and procedures cannot be shared with the public (e.g., Wolfe et al., 2013). In the case of birding, there are millions of people in the US alone who consider birding a hobby, spending hours in their yards and parks, and billions on books, equipment, and travel (La Rouche, 2006). Photos of birds are widely available; books have

been published on particularly difficult bird identifications (e.g., Kaufman, 1999, 2011). Birders regularly participate in citizen science efforts, such as the Christmas bird count and provide data on bird sightings to databases like ebird.org. Anecdotally, this translates into a keen interest in science and a willingness to participate in research.

## **RECRUITING**

In the past, experts usually had to be recruited locally, with advertisements posted around a university campus and in local newspapers. It may be hard for some to remember that it has only been in the past several years that not having an email address has become almost equivalent to not having a phone number, and that only recently has it become the case that most people have some Internet access. Being able to recruit participants more widely via the Internet promises not only to increase heterogeneity of participants, but also, and especially relevant for expertise research, promises to locate participants with a far greater range of expertise than might be possible when recruiting in a local geographic region.

One rapidly exploding means of recruiting and testing (see "Testing") participants is Amazon Mechanical Turk (AMT). AMT allows hundreds of subjects to be easily recruited and tested in a matter of days; participants on AMT are more demographically diverse than typical American college samples (Buhrmester et al., 2011). This diversity is important for research examining individual differences in perception and cognition. While the potential population of AMT workers is large, it is unknown how many with high levels of domain expertise might be workers on the platform. For expertise research, recruitment via AMT may need to be supplemented by more direct recruitment of true domain experts (e.g.,Van Gulick, 2014).

Large domains of expertise have organizations, web sites, blogs, and even tweets and Facebook updates that target particular individuals. In principle, online recruiting through these channels offers a quick, easy, and inexpensive means of finding experts. These could involve paid advertisements online and in electronic newsletters. More directly, these could involve messages sent to email lists. The biggest challenge to this, however, is that many professional organizations or workplaces would rarely allow, and many outright prohibit, direct solicitation of members or employees, even for basic research; researchers cannot directly contact TSA baggage screeners or latent fingerprint examiners. By comparison, birding organizations, including local Ornithological and Audubon Societies, whose members join as part of a hobby, not a profession, can be less restrictive in terms of allowing contact with members, so long as contact is non-intrusive. In our case, we have identified several hundred birding groups in the US and Canada, we have contacted several dozen directly, and have received permission to solicit volunteer participants from most, having so far tested several hundred birders with a wide range of experience and expertise.

## **EVALUATING LEVELS OF PERCEPTUAL EXPERTISE**

How do we know someone is a perceptual expert? A simple approach relies on subjective self-rating, often supplemented by self-report on the amount of formal training, years of experience, or community reputation. For example, bird experts in Tanaka and Taylor (1991) were recommended by members of bird-watching organizations and had a minimum of 10 years of experience, and those in Johnson and Mervis (1997) led birding field trips and some had careers related to birding.

It is now well-recognized that self-reports of expertise are insufficient and that objective measures of expert performance are needed (e.g., Ericsson, 2006, 2009); self-report measures of perceptual expertise are not always good predictors of performance (e.g., McGugin et al., 2012; Van Gulick, 2014). Therefore, recent work has used quantitative measures to assess expert abilities (e.g., see Gauthier et al., 2010). A detailed review and discussion of such measures is well beyond the scope of a brief perspective piece. A variety of quantitative measures of perceptual expertise have been used and new measures are currently being developed – these efforts to develop and validate new measures reflect a quickly growing interest in exploring individual difference in visual cognition (e.g., Wilmer et al., 2010; Gauthier et al., 2013; Van Gulick, 2014).

While expert-novice differences are sometimes loosely described as if they were dichotomous, it is self-evident that expertise is a continuum, people vary in their level of expertise, and any measure of expertise must place individuals along a (perhaps multidimensional) continuum. Some behavioral or neural markers might distinguish pure novices from those with some experience but asymptote at only an intermediate level of expertise, while other behavioral or neural markers might distinguish the true experts from more middling experts and novices. Understanding the continuum of behavioral and brain changes, whether they are asymptotic, monotonic, or even non-monotonic over the continuum of expertise, can have important implications for understanding mechanistically and computationally how perceptual expertise develops (e.g., see Palmeri et al., 2004).

Briefly, one useful measure has focused on the perceptual part of perceptual expertise: using a simple one-back matching task, images are presented one at a time and participants must say whether consecutive pictures are the same or different. Experts have higher discriminability (d- ) on images from their domain of expertise relative to non-expert domains, and this difference predicts behavioral and brain differences (e.g., Gauthier et al., 2000; Gauthier and Tarr, 2002). Another measure has focused on memory as an index of perceptual expertise: the Vanderbilt Expertise Task (VET; McGugin et al., 2012) mirrors aspects of the Cambridge Face Memory task (Duchaine and Nakayama, 2006). Participants memorize exemplars from several different artifact and natural categories and then recognize other instances under a variety of conditions, and these differences in memory within particular domains predict behavioral and brain differences (e.g., McGugin et al., 2014). With our interest in categorization at different levels of abstraction, in work in preparation, we have developed a measure that has focused on categorical knowledge in perceptual expertise: adapting common psychometric approaches, we are refining what could essentially be characterized as an Scholastic Assessment Test (SAT, a standardized test widely used for college admission in the United

States) of birding knowledge, with multiple-choice identifications of bird images ranging from easy (common backyard birds like the Blue Jay), to intermediate (distinctive yet far less common birds, like the Pileated Woodpecker or Great Kiskadee), to quite difficult identifications that even fairly expert birders find difficult (like discriminating Bohemian from Cedar Waxwing, Hairy from Downy Woodpecker, or correctly identifying the many extremely similar warblers, sparrows, or flycatchers). Future work must consider to what extent different measures of perceptual expertise capture the same dimensions of expert knowledge and predict the same behavioral and brain measures that vary with expertise.

## **TESTING**

Laboratory testing allows careful control and monitoring of performance, permits experiments that require precisely-timed stimulus presentations, and of course allows sophisticated behavioral and brain measures like eye movements, fMRI, EEG, and the like. But laboratory testing incurs a potential cost in that the number of laboratory participants is often limited due to the expense of subject reimbursement, personnel hours, lab space, and equipment. And for any study of unique populations who might be geographically dispersed, such as perceptual experts, the cost of bringing participants to the laboratory can be prohibitively expensive.

Until fairly recently, the only real methods for testing participants from a wide geographic area, apart from having experimenters or participants travel, was to have the experiments travel. For simple studies, this could mean mailed pencil-and-paper tests, while for more sophisticated studies, this could mean sending disks or CDs to participants to run on a home computer (e.g., Tanaka et al., 2010). As anyone who programs well knows, getting software to run properly on a wide range of computer hardware and operating system versions can be a daunting task. In the past few years, it has become popular, and wildly successful, to have experiments run via a web browser. While not entirely immune to the vagaries of hardware and operating system versions, browserbased applications are often more robust to significant variation, and can often automatically prompt users for upgrades to requisite software plug-ins.

There are multiple platforms and approaches to online webbased experiments. One approach, highlighted earlier, uses AMT. In AMT, researchers publish Human Intelligence Tasks (HITs) that registered workers can complete in exchange for modest monetary compensation. AMT integrates low-level programming tools for stimulus creation, test design, and programming into one webbased application; other elements in AMT include automated compensation, recruitment, and data collection. Aside from the availability of these tools, a clear advantage of AMT is the potential to recruit from a large and diverse pool of participants. An alternative approach is to develop and support a custom webbased server for experiments. There are powerful tools for creating web pages, such as Wordpress (wordpress.org), and fairly sophisticated programs can be developed in Adobe Flash or Javascript (e.g., De Leeuw, 2014; Simcox and Fiez, 2014). Perhaps an advantage of such custom portals is that people may be more attracted to them because of their interest in participating in research, not

because of the potential to earn money, as might sometimes be the case for AMT. In the end, we suspect that most labs will use a combination of both platforms for recruiting, testing, or both.

At least given current computer hardware in wide use, a potential vexing problem for web-based experiments is timing. Fortunately, platforms such as Flash and Javascript run on the local (participant) computer, so properly-designed programs can avoid problems that could be introduced by variability in Internet connection speeds. Thankfully, reasonable response time measurements can be obtained (Reimers and Stewart, 2007; Crump et al., 2013; Simcox and Fiez, 2014). Indeed, as illustrated in **Figure 1**, we have successfully observed differences in RTs for expert and novice domains in online experiments using a Wordpress + Flash environment that mirror observations of expert speeded categorization from classic laboratory studies (Tanaka and Taylor, 1991). Unfortunately, the most critical limitation for now concerns stimulus timing. It is well known that LCD monitors in wide use have response characteristics far too sluggish to permit the kind of "single-refresh" presentations that would have been possible on previous CRTs. While presentation times of 100 ms or more are probably a safe bet, anything faster would require calibration to check that a participant had a sufficiently responsive monitor; it may be that the next generation of LCD, LED, or other technologies will (hopefully) eliminate these limitations.

## **SUMMARY**

Most human endeavors have a perceptual component. For example, keen visual perception is required in sports, medicine, science, games like chess, and a wide range of skilled behavior. Thus research on real-world perceptual expertise has potential theoretical and applied impacts to many domains. Here we briefly outlined at least some of the practical considerations that factor into research on real-world perceptual expertise. Several of these considerations are things that researchers often fret over behind the scenes without making it into a typical research publication, so in that sense we hope this brief perspective fills a small but important hole in the literature.

## **ACKNOWLEDGMENTS**

This research was supported in part by grants SBE-1257098 and SMA-1041755 from the National Science Foundation and a Discovery Grant from Vanderbilt University.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; accepted: 19 July 2014; published online: 06 August 2014. Citation: Shen J, Mack ML and Palmeri TJ (2014) Studying real-world perceptual expertise. Front. Psychol. 5:857. doi: 10.3389/fpsyg.2014.00857*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Shen, Mack and Palmeri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 16 October 2014 doi: 10.3389/fpsyg.2014.01155

## *Tadhg E. MacIntyre1\*, Eric R. Igou2 , Mark J. Campbell 1, Aidan P. Moran3 and James Matthews <sup>4</sup>*

*<sup>1</sup> Department of Physical Education and Sport Sciences, University of Limerick, Limerick, Ireland*

*<sup>2</sup> Department of Psychology, University of Limerick, Limerick, Ireland*

*<sup>3</sup> School of Psychology, University College Dublin, Dublin, Ireland*

*<sup>4</sup> School of Public Health, Physiotherapy and Population Science, University College Dublin, Dublin, Ireland*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*Derrick L. Hassert, Trinity Christian College, USA Tom Carr, Michigan State University, USA Guillermo Macbeth, National Scientific and Technical Research Council, Argentina*

#### *\*Correspondence:*

*Tadhg E. MacIntyre, Department of Physical Education and Sport Sciences, University of Limerick, PESS Building, County Limerick, Limerick, Ireland e-mail: tadhg.macintyre@ul.ie*

For over a century, psychologists have investigated the mental processes of expert performers – people who display exceptional knowledge and/or skills in specific fields of human achievement. Since the 1960s, expertise researchers have made considerable progress in understanding the cognitive and neural mechanisms that underlie such exceptional performance. Whereas the first modern studies of expertise were conducted in relatively formal knowledge domains such as chess, more recent investigations have explored elite performance in dynamic perceptual-motor activities such as sport. Unfortunately, although these studies have led to the identification of certain domain-free generalizations about expert-novice differences, they shed little light on an important issue: namely, experts' metacognitive activities or their insights into, and regulation of, their *own* mental processes. In an effort to rectify this oversight, the present paper argues that metacognitive processes and inferences play an important if neglected role in expertise. In particular, we suggest that metacognition (including such processes as "meta-attention," "metaimagery" and "meta-memory," as well as social aspects of this construct) provides a window on the genesis of expert performance. Following a critique of the standard empirical approach to expertise, we explore some research on "metacognition" and "metacognitive inference" among experts in sport. After that, we provide a brief evaluation of the relationship between psychological skills training and metacognition and comment on the measurement of metacognitive processes. Finally, we summarize our conclusions and outline some potentially new directions for research on metacognition in action.

**Keywords: metacognition, expertise, cognition, motor cognition, social cognition, cognitive neuroscience, sport, sport psychology**

## **INTRODUCTION**

Expertise is characterized by superior reproducible performances and "refers to the characteristics, skills, and knowledge that distinguish experts from novices and less experienced people" (Ericsson, 2006a, p. 3). Quintessentially, sport provides many such instances. For example, when Lewis (Lewis and Marx, 1996) became the first track and field athlete to win four consecutive Olympic titles, he accomplished this feat with a long jump of 8 m 50 cm, winning by a margin of 21 cm. Sport is a domain that provides benchmarks to distinguish experts from novices, through performance outcomes (e.g., podium placing), player statistics (e.g., batting average in baseball) or competition level (e.g., Olympic vs. Collegiate). Given such criteria, it is not surprising that the question of how one becomes an expert within the sport domain has been of increasing scientific (Ericsson, 2014; Hambrick et al., 2014a) and popular interest (Ross, 2006; Gladwell, 2008) in the two decades since Ericsson's seminal paper on deliberate practice. "Deliberate practice presents performers with tasks that are initially outside their current realm of reliable performance, yet can be mastered within hours of practice by concentrating on critical aspects and by gradually refining performance through repetitions after feedback" (Ericsson, 2006b, p. 692). Although subject of much debate

(e.g., Baker and Young, 2014; Detterman, 2014), the theory of deliberate practice and the development of expertise both warrant further analyses. This paper adds to the expertise debate by presenting a novel argument contending that metacognitive processes are central to expertise in the sport context. Furthermore, we suggest that some of the aforementioned controversies in the research literature, that stem from conflict between those focused largely on exploring the role of automaticity in skilled performance, and researchers focused on understanding the representation of experts knowledge, may have led to the explanatory role of metacognition being overlooked.

In this review we propose that our understanding of aspects of both social and cognitive dimensions of sporting expertise can be adequately explained from a metacognitive perspective. The potential of metacognitive inferences and domain-general skills including psychological skills training (PST) are posited as integral to the genesis of expert performance. Subsequently, the contribution of both mental imagery (e.g., mental practice) and attentional strategies (e.g., routines) to our understanding of expertise and metacognition will be discussed. Finally, new directions for future research that emanate from our metacognitive perspective on sporting expertise will be outlined. Firstly, however, in the rationale for our approach we will attempt to answer the following questions: are there limits to the current expertise approach? Why is sport an appropriate field of study? And finally, what is metacognition?

## **ARE THERE LIMITS TO THE EXPERTISE APPROACH?**

Historically, the rich tapestry of research on expert performance has been interwoven with a common thread-the study of grandmasters in chess (Williams and Ericsson, 2005). Investigations into expertise in chess, a competitive sporting activity that was rule bound, amenable to measurement through objective ratings (e.g., ELO rankings), with a range of possible contextual requirements (e.g., blindfold chess; Campitelli and Gobet, 2005) led to a proliferation of literature on the topic (de Groot, 1965; Chase and Simon, 1973; Holding, 1985, 1992). Chess is a challenge of perceptual-cognitive skill and thus provides a fitting laboratory for testing constructs such as pattern recognition, visual imagery, and memory (Bilalic et al., 2010). Sport in the more traditional sense emphasizes motor skill execution under stressful conditions typically in a dynamic environment (Baker and Young, 2014). One legacy of the chess expertise literature was that this perceptual-cognitive lens was subsequently applied in the sport domain (MacIntyre et al., 2013). Two interlinked events led researchers to become enthusiastic in their study of visual cognition and sport (Williams and Ford, 2007). Firstly, the emergence of the expert performance approach (Ericsson and Smith, 1991) and later, the theory of deliberate practice (Ericsson et al., 1993), both provided an impetus for investigations into perceptual-cognitive skills, such as anticipation and decisionmaking in sport. The tenet that sport could be a dynamic natural laboratory was well made (Moran, 1996; Williams and Ericsson, 2005) and the development of innovative methodologies occurred in parallel (e.g., eye-tracking as a measure of attention). A burgeoning literature developed and sport as a domain of study gained popularity as a result (Moran, 2009; MacIntyre et al., 2013).

However, there were limits to this approach, particularly in the focus on visual-cognitive expertise, which arguably was to the detriment of our understanding of the underlying psychological processes. Take, for example, the quiet eye phenomenon which has recently gained prominence in sport science research (Vine et al., 2014). Increasingly, this is becoming a topic of interest within both cognitive psychology (Klostermann et al., 2013) and neuroscience (Vine et al., 2011). The quiet eye is defined as "a final fixation or tracking gaze that is located on a specific location or object in the visuomotor workspace within 3◦ of visual angle (or less) for a minimum of 100 ms" (Vickers, 2007, p. 11) prior to the onset of a critical movement. According to Vickers, quiet eye offset occurs when the gaze deviates off a specific location for more than 100 ms (Vickers, 2007, p. 11). Despite the success in establishing a quiet eye phenomenon "there has been a lack of explicit tests of the processes through which quiet eye training interventions exert their positive effect" (Vine et al., 2014, p. S237). To date, little knowledge of the psychological basis of the quiet eye phenomenon has emerged (Moran, 2012a). Similarly, while a wealth of knowledge has accumulated on the characteristics of individuals' saccadic pursuit during visual attention tasks (Williams and Ford, 2007), little evidence exists to support the *trainability of* visual search (Mann et al., 2007).

A further limitation to the expertise approach is that, for example, the focus has been on a narrow set of conclusions from the original publication on deliberate practice (Ericsson et al., 1993). Therefore, it is not surprising that the debate over the contribution of deliberate practice to expert performance continues in chess (Campitelli and Gobet, 2011; Detterman, 2014), sporting expertise (Williams and Ford, 2007; Baker and Young, 2014) and professional expertise (Ericsson, 2009; Hoffman, 2014). Disagreements over the number of hours accumulated, starting age, and the link to general cognitive abilities continue to dominate the field (e.g., Gobet and Campitelli, 2007; Baker and Young, 2014; Hambrick et al., 2014a). For example, Hambrick et al. (2014b, p. 113) concluded that evidence had accumulated to suggest that, although relevant, deliberate practice, in itself, "does not largely account for individual differences in performance." One caveat to be considered here is that these authors were concerned with the predictive ability of the theory solely within the context of chess. The acquisition of motor skills in the traditional sporting context is arguably more complex (Voss et al., 2010). For example, even defining deliberate practice among athletes is more challenging than with chess grandmasters (Mac-Intyre et al., 2013; Healy et al., 2014). Nevertheless, the conclusion by Hambrick et al. (2014b, p. 114) that "the question now is what else matters" suggests that we should consider a broader range of constructs in order to more comprehensively understand expertise.

Before we consider the construct of metacognition, the rationale for studying athletes and sport performers must be made readily apparent. Numerous authors have highlighted the role in which sport can provide a natural laboratory for the study of constructs within psychology and expertise (Ericsson and Smith, 1991; Moran, 1996; MacIntyre et al., 2013). According to Ericsson (2009, p. 18) "performance can be publically observed and even objectively measured in open competition and public performances." Similarly, it has been noted that the high performance sport environment is dynamic. For example, typically performers have to execute complex skills under conditions of extreme stress where their limits are being constantly challenged (Baker and Young, 2014). Among the topics that have only recently received scrutiny are the role of attention and the allocation of effort in deliberate practice (Baker and Young, 2014). One explanation for this is that researchers concentrated on the variables that were most measurable, including the quantification of hours in practice (Helsen et al., 1998). A challenge for researchers has been reconciling the automaticity and procedural knowledge, central to expert sport performance, with the notion that declarative knowledge and metacognitive abilities may also play a role in the acquisition of expertise (Stanley and Krakauer, 2013; Toner, 2014). To explain, while procedural knowledge is inherently linked to optimum sport performance, declarative knowledge may have both a debilitative (Beilock and Carr, 2001) and facilitative role (Carson and Collins, 2011; MacIntyre et al., 2013; Brick et al., 2014). For example, it is probable that Carl Lewis knew his precise stride count to enable him to hit the board and take off into the sandpit at the 1984 Olympic Games. Thus, expertise in sports goes

beyond mere procedural knowledge and arguably metacognitive processes are present at all stages of the target skill and may work in parallel. We thus propose an integrative model of expertise in sports, one that explores action and cognition in sport, a topic that has arguably only recently returned to the forefront of psychology.

In Rosenbaum (2005) suggested that researchers in psychology have historically turned relatively late to cognition and action. While this point is debatable, given the recent emergence of research on exceptional performance states (e.g., Choking; Beilock and Carr, 2001), the paucity of research on action in prior decades may be worthy of review. Understanding movement had long been the preserve of the fields of motor control, biomechanics, and neurophysiology perhaps due to the complexity of the cognition-action nexus and the lack of clear methodological approaches within psychology (Rosenbaum, 2005)."Thinking and action seem to lie at opposite ends of the behavioral spectrum" (Moran, 2012b, p. 1). The disembodied approach of information processing theorists in the 1970s led scientists to conduct research on thinking independently from the study of sensorimotor processes and mechanisms (Laakso, 2011). It was not until the advent of the motor cognition paradigm (Jeannerod, 1994) that "action" became subject to intensive scrutiny by researchers in psychology (MacIntyre et al., 2013). Jeannerod proposed that action, rather than movement *per se*, was vital to understand, as evidence for the role of cognition in movement planning was accumulating. The interest in understanding action from different perspectives was increasing rapidly (see Guillot and Collet, 2010). According to Moran (2009) importance of inter-disciplinary collaboration between researchers in cognitive sport psychology, cognitive psychology, and cognitive neuroscience has been brought to the fore by this new paradigm. Similarly, social cognition has developed as a field of study which has added considerably to our understanding of action (Gallese et al., 2004; Frith, 2012). And recently, cognitive researchers have embraced the study of the domain of sport in their quest to understand how the mind works (MacIntyre et al., 2013). Consequently, we propose that metacognition, a construct that is central to motor cognition, social cognition and action, can augment our current explanations and understanding of the preparation and execution of motor skills within the sport context and elucidate our conceptions of expertise.

## **WHAT IS METACOGNITION?**

Metacognition, or "knowledge or cognition about cognitive phenomena" (Flavell, 1979, p. 906) is curiously under-explored in the domain of expertise among sports performers (Moran, 1996; MacIntyre and Moran, 2010). Elite athletes are not just experts in movement execution but conceivably they are also experts in planning, metacognition, and reflection. Metacognition has been defined as an individual's insight into and control over their own mental processes (Flavell, 1979). In the decades since Flavell's (1979) pioneering article, the term metacognition has been operationalized as the scientific study of the mind's ability to monitor and control itself or, in other words, the study of our ability to know about our knowing (Van Overschelde, 2008, p. 47). It is a different kind of cognition as explained by Nelson (1999, p. 625): "If one aspect of cognition is monitoring or controlling another

aspect of cognition, then the former aspect is metacognitive in relation to the latter aspect. Flavell and subsequent investigators have suggested a tripartite model of metacognition, with knowledge, control and monitoring components (Flavell, 1979, 1987; Tarricone, 2011; Halpern, 2014). Recently, Tarricone (2011) indicated that the main interaction between metacognition and self-regulation is to control, monitor, and regulate strategies to meet task demands and goals. Previously, the study of metacognition has targeted intellectual skills and a substantial corpus of research exists on metacognition in educational settings (Hacker et al., 2009). However, Augustyn and Rosenbaum (2005, p. 911) recently challenged the status quo in metacognition research and stated that "if intellectual and perceptual-motor skills rely on similar mechanisms, one would expect metacognition to apply to the guidance of perceptual-motor skills, just as it does to the guidance of intellectual skills." The approach among cognitive neuroscientists, focused on visual perceptual tasks to measure metacognition (Palmer et al., 2013; Weil et al., 2013), is similarly narrow, perhaps due to methodological issues. Metacognition and action, on the other hand, offers new possibilities in illuminating our understanding of expertise and action.

## **EXPERTISE AND METACOGNITION**

Expertise is tightly coupled with metacognition in both training (e.g., knowledge of when a skill has been acquired) and competitive settings (e.g., self-regulation under stress). We propose that metacognitive processes are inherently related to expertise in sports and we have summarized recent findings in the sport literature that reflect the prominence of metacognitive explanations (see **Table 1**). Early investigations were focused on judgments of learning (Simon and Bjork, 2001) and more recently more specificity in the research questions has led to the development of specific models (MacIntyre and Moran, 2010). In the coming sections, we postulate that people use different sources of information, including metacognitive inferences. Firstly, we contend that expertise in any given area facilitates metacognitive inference and secondly, that expertise itself may consist of metacognitive inference, among a range of other non-metacognitive processes including working memory and motivation. Given that expertise is explained by differences in knowledge, many processes involving the use of that knowledge are more or less automatic or procedularized, and consequently they do may not place onerous demands on working memory (Beilock and Carr, 2001). This creates the opportunity for metacognitive reasoning to optimize the assessment of situations and to structure one's goal pursuit. Furthermore, experts have quite good ideas about standards and deviations from such standards, whether this refers to one's own behavior or to the behavior of others. Deviations from sophisticated mental models (e.g., the ideal long jump) are thus more likely to become salient to experts than to non-experts, also providing opportunities for reflective thoughts and interventions. The use of both action simulations (e.g., mental practice) and pre-performance routines by elite performers can be conceptualized as domain-general strategies which rely upon metacognitive processes. Evidence to support these contentions will be presented in the forthcoming sections.



#### **THE ROLE OF METACOGNITIVE INFERENCES**

The Coliseum, Los Angeles, XXIII Olympic Games, August 8th 1984. A strong wind was swirling around the stadium in the afternoon as Carl Lewis's was preparing for his long jump (50 stunning Olympic moments No. 44: Carl Lewis's four golds in 1984, 2012). ABC network commentators were referring to Carl Lewis' adjustments: "He has to block that out and has to only think about heading down the runway and getting off as long a jump as possible." The other reporter then stated: "He has got a bit of a headwind. I think that's what he was waiting for... to decide what he is going to do with his step with regard to this wind." Carl Lewis

ran down the runway and leapt into the history books with a jump that at that stage was 50 cm greater than his rivals. Nevertheless, the commentators stated that it"looked like a very restrained effort to me, the last four or five strides... he really looked like he was sort of in that same stride that he was running in the last 50 m of the 200 this morning [200 m heats]" (Corry, 1984).

After fouling his next jump, Lewis decided not to take his four other allotted jumps and many of the crowd responded by booing despite the margin of victory ultimately being 30 cm (Corry,1984). Many of the spectators plausibly wanted him to break Bob Beamon's longstanding world record. However, Lewis' rationale was that he had other goals to achieve (e.g., to equal Jesse Owens four track and field gold medals in 1936) and he still had his additional rounds to run in the 200 and 100 m relay. Given that Carl Lewis won four gold medals, just as Jessie Owens had, we can conclude that his strategies were successful. Both, the execution of the final jump, in addition to his decision to rest, indicate metacognitive inferences in addition to his athletic expertise.

When individuals perform skills in the sporting context the social situation provides many pieces of information. For example, when Carl Lewis is about to perform a long jump, he may remember his coach's words about his holding back on his speed both to hit the board accurately and to preserve energy. As he looks down the runway and notices that it is somewhat slippery, and is unsure whether his left foot will begin to ache again, just as it did 30 years ago he is metacognitively engaged. There may be even much more metacognitive activity occurring than the aforementioned examples may suggest. The cognitive system is challenged with the amount of information that is accessible at a given time and the implications that these pieces of information may have for the execution of skill need to be considered. For optimum adjustments, people need to be attuned to thoughts and feelings about the self and the requirements of the social situation. Yet, given that most social situations provide a tremendous amount of information, people need to be able to focus, that is, they need to be selective. So, when it comes to particular tasks such as a long jump, people need to select "what is relevant" and ignore "what is not relevant" for the task at hand. How does this work?

## *A model of metacognition*

Bless et al. (2009) proposed a model of metacognition that focuses on different information types. Specifically, cognitively accessible declarative knowledge (i.e., knowledge about something) and feelings (e.g., how people feel about an action) and memories (e.g., whether they can remember a coach's instruction) serve as information about the situation and the target in question. This has particular resonance when people are uncertain about the judgmental target or when affective states are not in line with implications of the situation, then metacognitive processes are more likely to come into play (e.g., Martin et al., 1993; Bless et al., 2009). Metacognitive inferences are governed by rules or theories that decide whether the accessible information should be used and how it should be used in the moment.

When an expert judges performance, this may access a representation about the target behavior (e.g., a motor image). For example, imagine Carl Lewis watching his 1984 long jumps at the Olympic Games. A vast amount of information about long jumps, such as Bob Beamon's Leap of the Century at the Olympic Games in Mexico City in 1968, and his own jump might be cognitively accessible at the time. Which pieces of information are relevant? People apply filtering rules, considering whether the information is relevant and representative for the target behavior (e.g., Martin, 1986; Bless et al., 2009). Information might just as easily become accessible in a conversation and thus comprise declarative information. Declarative knowledge is defined as "knowing that" (e.g., taping factual information) and contrasts with procedural knowledge, or skill memory, which is explained

as "knowing how" (Sternberg and Sternberg, 2012). For example, Carl Lewis might have a chat with Bob Beamon and Mike Powell about his 1984 long jump performance. If information becomes accessible as part of conversations, the person also needs to ask him or herself whether the information is appropriate to be used for a judgment or decision. Here conversational rules as outlined seem to guide decisions on relevance and informativeness of communicated information (e.g., Schwarz, 1994; Igou and Bless, 2003, 2005, 2007; Bless et al., 2009). Furthermore, metacognitions may relate to past or future affective states. These are thoughts that include many different sources of information (e.g., Wilson and Gilbert, 2003; Igou, 2004, 2008) and assessments of the past or forecasts of future affective states are likely to influence one's behavior (Kahneman and Snell, 1992). For example, a long jumper may know that he has felt good being in competitions and thus decides to take part in one that is coming up in 2 weeks time.

Another important type of information and likely source of metacognitive processes are people's feelings at the time of judgments or behavior. For example, a long jumper's mood, emotion, or the ease with which information comes to mind, may influence the execution of the long jump. Importantly, these feelings have informational value (e.g., Schwarz, 2002) for the judgment or behavior at hand. Usually, affective feelings are distinguished from cognitive feelings (Bless et al., 2009). Moods and emotions are considered affective feelings, and they can inform us about the situation and the target (e.g., Schwarz, 1990). For example, according to Schwarz and Clore (1983), when asked to judge a target (e.g., life in general), people ask themselves "How do I feel about it?" which leads to positive evaluations when people are in a positive mood, and to negative evaluation when they are in a negative mood (e.g., Forgas, 2001).

The feeling of knowing (e.g., Koriat, 1993), ease of retrieval (e.g., Schwarz et al., 1991), familiarity (e.g., Jacoby et al., 1989), and processing fluency (e.g., Reber et al., 1998) are examples of cognitive feelings (e.g., Bless et al., 2009; Huntsinger and Clore, 2011). In a nutshell, these are all experiences that accompany cognitive processes and interact with these processes by serving as information about judgmental targets. For example, according to the ease of retrieval heuristic (Schwarz et al., 1991), if it is easy to think of having performed long jumps, then one would be likely to evaluate one's jumping capacity more favorably, than if it was difficult to retrieve such examples.

Memories have direct effects on how cognitive representations are formed. However, congruent with Strack and Bless (1994) and Bless et al. (2009), we argue that people also use theories about the functioning of memory as an indicator as to whether accessible information is valid and relevant for a judgment at hand. For example, Carl Lewis may not remember that the coach warned him to run within himself for the long jump run-up. Possibly, Carl would reason that he would remember that because it would have been very untypical for his coach to hold this opinion.

Our conceptualization of metacognition is in line with a broad definition of the construct, namely that metacognition is cognition about both thoughts and feelings. However, the narrower definition of metacognition as control process (e.g., Shea et al., 2014) is also just as valid. The latter refers to processes in which attention and cognitive control are essential in structuring peoples' thoughts and actions. As Shea et al. (2014) describe in detail, these types of thoughts are associated with cognitive effort and limited capacity (cognitive system 2) rather than automatic, effortless processes (cognitive system 1; Stanovich, 2011).

Generally, monitoring and excluding accessible items of information requires cognitive resources. As a result, reduction in cognitive capacity increases the likelihood that relatively irrelevant information is used for the judgment task at hand (Bless et al., 2009). To be clear, we do not think that metacognitions always need awareness and controlled thought processes; however, we believe that metacognitive inferences are especially influential when people need to engage in this type of reflective behavior in order make a judgment or decision. This is especially the case when situations are ambiguous and complex, when more or less automatic processes fail, or when accessible information is conflicting, contradictory, or perceived as inappropriate for the task at hand.

## **PST AND METACOGNITION: THE CASE FOR DOMAIN-GENERAL SKILLS**

Interestingly, the ability of Carl Lewis to combine excellence in both track (e.g., 100, 200 m sprints and relay) and field events (e.g., long jump) supports the domain-specificity of expertise. To explain, long jump performance is determined largely by the athlete's ability to attain a fast horizontal speed at the end of the approach runway, thus the physiological task demands were compatible (Bridgett and Linthorne, 2006). However, it is also likely that what are termed "domain-general skills" including psychological skills and metacognition may have played a role in his nine Olympic gold medal winning performances.

As noted earlier (see **Table 1**), Foster and Weigand (2008) highlighted that some theoretical inadequacies in sport psychology could be reconciled by considering other conceptual frameworks, including meta-cognitive approaches (Flavell, 1979). For example, by augmenting our understanding of psychological skills in sport with the construct of metacognition, we could more clearly understand the role of self-monitoring and self-regulation in the application of the aforementioned strategies.

Moran (1996)suggested that PST in sport is essentially an exercise in meta-cognitive instruction. Thus, in order to help athletes become independent thinkers, we need to know what they know and believe about how their own minds work. In this regard, metacognitive control processes are especially valuable because they allow people to change their behavior strategically in accordance with task demands. Eccles and Feltovich (2008) proposed that accelerated learning and enhanced performance, and ultimately expertise, may be the result of a combination of psychological support skills (e.g., self-talk, goal-setting, relaxation, and mental practice) and metacognitive abilities. They are domain-general in that they "can be applied to a variety of novel tasks and domains" (Eccles and Feltovich, 2008, p. 43). Meta-cognitive skills in this case are higher order skills that regulate learning and performance, including the coordination of the use of psychological support skills (i.e., PST).

Within sport psychology, psychological skills have been shown to differentiate successful Olympians from their less successful counterparts (Orlick and Partington, 1988; Gould et al., 2002; Fletcher and Sarkar, 2012). The coordinating role of metacognition may be a key factor in the efficient use of psychological skills. Furthermore, emotional regulation is trainable and sustainable by the application of PST and this has applications beyond the realm of sport (Eccles et al., 2011). Two other aspects of psychological skills that are indeed trainable are now discussed, meta-imagery and routines.

## *Is meta-imagery linked to expertise?*

One dimension of metacognition that has been illuminated by recent research activity is "meta-imagery," a performer "beliefs about the nature and regulation of their own imagery skills" (Moran, 2002, p. 415). In 2002, little was known about this topic relative to the knowledge base on other aspects of imagery, such as motor imagery. Over the preceding years research in the expertise literature emerged to suggest that meta-imagery is another factor that differentiates novices from experts (Moran et al., 2012). Researchers had explored the topic by asking athletes to indicate why, where, how, what, and when they use mental imagery processes (e.g., Munroe et al., 2000; MacIntyre and Moran, 2007a,b). Athletes' responses from both interviews and surveys demonstrated a comprehensive knowledge of the multimodal potential of imagery, showed they employed imagery in creative ways for contingency planning (see Moran, 2009) and were also aware of robust imagery effects (e.g., mental practice). Interestingly, a meta-analysis conducted in 1994, indicated a possible constraint on the efficacy of mental practice for novice learners (i.e., experts improved more). Driskell et al. (1994) suggested that novices may not have an appropriate approximation of the motor skill or that their imagery abilities may be insufficient to generate and manipulate the requisite visuo-spatial motor configuration. An alternative possibility is that experts may simply possess greater meta-cognitive knowledge of how to employ imagery effectively for skill improvement as compared to novices (MacIntyre et al., 2013). In fact, a model of meta-imagery was developed to account for the above findings and this also generates possibilities of developing a test of meta-imagery (MacIntyre and Moran, 2010).

Furthermore, contemporary evidence from cognitive psychology supports the role of meta-cognitive knowledge of imagery ability and relates it to our ability to judge individual episodes of imagery (Pearson et al., 2011). The voluntary nature of imagery and the role of conscious awareness during imagery tasks make it amenable to introspection, ironically the method that was central to the demise of the scientific study of imagery, a century ago (Roeckelein, 2004).

## *Is winning just a matter of routine?*

Pre-performance routines are integral to performance excellence in many self-paced sporting skills, from sprint running to penalty taking in field games (Singer, 2000,2002; Jackson and Baker, 2001). Defined by Moran (1996, p. 177) as "a sequence of task-relevant thoughts and actions which an athlete engages in systematically prior to his or her performance of a specific sports skill." The widespread use of routines in sport demonstrates that attention is central to cognitive sport psychology because the ability to exert mental effort effectively is vital for optimal athletic performance (Moran, 2009). One function of routines is to regulate arousal prior to skill execution and evidence for their role in buffering stress or choking has also emerged (Mesagno and Mullane-Grant, 2010). While routines have been explored across a range of sports, perhaps, the sport of golf has received the most attention from researchers (e.g., McCann et al., 2001; Cotterill et al., 2010). Interestingly, three time major winner golfer Padraig Harrington is quoted as saying "my key isn't working at the moment so we have to figure out a way... I have gone a bit stale focusing on the target." What is instructive about this statement is that the golfer appears to realize that his routine is no longer functioning appropriately. Routines need to be revised regularly to avoid the routine itself becoming too automatic (Moran, 1996). From a metacognitive perspective, this may be accounted for by metacognitive monitoring. Thus metacognition may be fundamental to the refinement of pre-performance routines as well as their acquisition. A recent review noted that "at a fundamental level it is still not clear what function routines fulfill, what they should consist of or the most effective way to teach them" (Cotterill, 2010, p. 132). The potential for metacogniton

research to shed light on the development and refinement of routines as well as their theoretical and conceptual basis is readily apparent.

## **NEW AVENUES FOR FUTURE RESEARCH**

In the preceding paragraphs, arguments for the potential for the construct of metacognition to clarify our understanding of expertise have been made. Now we wish to specify research topics augmented by appropriate methodologies and possible tasks (see **Table 2**).

## *Measurement*

One of the key challenges in the operationalization of any construct is the development of appropriate measurement tools. At present, there is a paucity of questionnaires to assess metacognitive abilities. One such measure is the 52-item *Metacognitive Awareness Inventory* (Schraw and Dennison, 1994) which employs a two factor model (knowledge of cognition; regulation of cognition). Currently, there is a need to develop and validate revised psychometric instruments to assess, for example, meta-imagery beliefs and knowledge. Further questionnaires for meta-attention

**Table 2 | Proposed research topics, methods, and objectives of future studies to study expertise and metacognition in sport settings.**


or social metacognition (for team sports) could also be piloted, refined and analyzed using factor analysis. Parallel with the objective would be the refinement of the construct of metacognition as it relates to both expertises. Tarricone (2011) had conducted a comprehensive "taxonomy of metacognition" which is primarily focused upon on the wide-scale research in the educational research domain. A similar task would be beneficial for metacognition within the context of expertise and the models discussed in this review (e.g., Bless et al., 2009). The approach of Fleming and Lau (2014, p. 1) who distinguish between "metacognitive *bias* (a difference in subjective confidence despite basic task performance remaining constant), metacognitive *sensitivity* (how good one is at distinguishing between one's own correct and incorrect judgments) and metacognitive *efficiency* (a subject's level of metacognitive sensitivity given a certain level of task performance)"raises interesting questionsfor the study of metacognition and conscious awareness. Their approach, focusing on perceptual expertise, includes elements which have a direct relevance to expert performance. For instance, they suggest that "metacognitive confidence" can be interpreted as a probability judgment directed toward one's own decisions-the probability of a previous judgment being correct. This is synonymous with expertise (i.e., ability to predict actions). It is our view that the exploration of the construct of metacognition and how they interface with action related processes (e.g., motor imagery), we can contribute to the conceptual development of the construct of metacognition. The divergent approaches to date necessitate a degree of conceptual analysis, a process that is all too rare in psychology (Machado and Silva, 2007).

## *Motor cognition*

Recent conceptualizations of imagery, action observation and motor execution, view these processes as overlapping, differing by degree rather than kind (Jeannerod, 1994, 2006; Vogt et al., 2013). The preceding section on meta-imagery is illustrative of the progress that can be made in our understanding of expertise, metacognition and imagery, alike. Given the overlap between for example, imagery (e.g., visualizing a long jump-the run-up, take-off, and landing phases) and action observation (e.g., viewing Bob Beamons' world record long jump), a question arises as to whether the same metacognitive processes underlie these related processes. This issue is further complicated by evidence from several sources which suggests motor imagery is grounded in physical experience, for example, the specific training either simulated or executed (Olson and Nyberg, 2011; Debarnot et al., 2014). The question remains as to whether the respective metacognitive processes are domain-general or domain-specific? As a result, deeper conceptual analysis is required to comprehensively describe and explain the range of metacognitive processes that pertain to cognitive simulation strategies. This new dimension to metacognition research offers a range of experimental possibilities that can enable a greater understanding of metacognition with regard to action preparation, simulation, and execution.

## *Anxiety*

Research on "choking" in sport has illuminated our understanding of anxiety across both cognitive skills (e.g., Lyons and Beilock, 2012) and motor skill contexts (e.g., Beilock and Carr, 2001; Beilock and Gonso, 2008; DeCaro et al., 2011; Toner and Moran, 2011; Toner et al., 2013). The "explicit monitoring hypothesis" suggests that attending to a well learned skill may lead to failure in the precise execution of the skill under pressure. Metacognitive abilities obviously have a role in regulation emotion, based on the aforementioned model by Bless et al. (2009). Furthermore, the role of "stereotype threat" which occurs when "knowledge of a negative stereotype about a social group leads to a lesthan-optimal performance by members of that group" should be investigated from a metacognitive perspective (Beilock and McConnell, 2004). Previous investigations supported the contention that stereotype threat prompts attention to the executed action and thus can disrupt performance (Beilock et al., 2003). However, this effect can be alleviated by the inclusion of a secondary task. Findings across two studies conducted by Beilock et al. (2003) were inconsistent and the impact of stereotype threat may be more telling across a tournament than a putting skill as it may interfere with the metacognitive processes that help modulate attention and regulate emotion. Another avenue is to systematically examine how current affect and anticipated affective responses to performance influence action in sports via metacognitive thoughts. This is based on the recent literature on affect regulation (e.g., Baumeister et al., 2007; Loewenstein, 2011) Moreover, it would be interesting to investigate how training in metacognition can influence skilled performance and athletes' susceptibility to overcompensation of attention and affect regulation.

## *Neuroscience*

The rise of neuroscience in recent decades has been based largely upon advances in methodologies that facilitate the study of internal mental experiences, such as metacognition, in a robust and scientific way. Cognitive neuroscience, in particular, has had a dramatic effect on our understanding of individual domains of cognition from vision to memory (Beran et al., 2012), in chess (Bilalic et al., 2010, 2012; Bartlett et al., 2013) and more recently in sport (Debarnot et al., 2014). As we have seen throughout the current paper sport and athletic skills offer a dynamic and fascinating arena to study and explore cognitive processes, metacognitive processes, and experiences.

Regarding expertise, we saw that in order to engage in effective training episodes for long periods of time, athletes must be highly self-disciplined and self-regulated (Crews et al., 2001). This notion of self-regulation, defined as a set of cognitive, behavioral, and motivational processes that interact to influence performance (Kitsantas and Kavussanu, 2011) has been the go-to approach for examining expertise differences in performance domains. This approach has been concerned with self-regulatory processes (imagery, attentional control, for example) and researchers typically attempted to make confidence judgments about the efficacy of some aspect of their cognition. Neuroscience has enabled researchers to move beyond the study of processes and focus on metacognitive judgments instead (a case in point being; the feeling of inaccessibility otherwise known as the tip of the tongue phenomena). This temporary failure of retrieval for a memory highlights a problem with a particular cognitive process but not

a problem with one's metacognitive judgment. What tip of the tongue research has shown us is that different underlying processes are responsible for the cognition and metacognition that monitors it (Schwartz and Diaz, 2014). Metacognitive experiences arise from cognitive processes and correspond to particular behaviors. The cognitive processes that produced the behavior are not the same as the processes that gave rise to the metacognition. For example, an object is recognized as having been seen before (cognitive process) accompanied by an experience of confidence, and the person then says that they know the answer (behavior). Essentially, there are a set of cognitive processes driving the recall of information but another set of processes driving our awareness of it. "Thus understanding any metacognitive judgment must involve understanding the cognition it measures and the multiple processes that contribute to the judgment" (Schwartz and Diaz, 2014, p. 9). Attempts at componential analysis of metacognition are in their infancy (Fleming and Lau, 2014; Garrison, 2014), but they appear to be fruitful with regard to understanding its impact upon visual perceptual tasks. Nevertheless, the investigation of the neural basis of metacognition (Baird et al., 2013) is not without its limitations. It has been noted that the application of neurophysiological measurement techniques impose restrictions on the ecological validity of studies which are not readily overcome (Mann et al., 2013).

## *Developmental*

Currently, a gap exists in our knowledge of how performers acquire pre-performance routines (Singer, 2000, 2002; Cotterill, 2010). Unfortunately, researchers have neglected to explore how these strategies are developed over time with one recent notable exception, a study with gymnastics athletes (Faggiani et al., 2012). The role of cognitive development in the acquisition of meta-cognitive skills may be a limiting factor for applied sport psychology interventions (Foster and Weigand, 2008). Thus the gap in the knowledge base may be due to the complex interaction of domain-general and domain-specific cognitive skills. Given that our approach has centered on the role of metacognitive abilities and processes, we propose that a developmental approach to understanding pre-performance routines could be augmented by exploring metacognitive skill development from a longitudinal perspective. The potential of specific interventions to enhance metacognitive ability could be explored for those who are impaired in their metacognitive development or for those who suffer plateaus in their skill development. For example, recent research has demonstrated the ability of a 2-week meditation program in enhancing metacognitive ability in a perceptual task (Baird et al., 2014).

#### **CONCLUSION**

Do metacognitive processes enhance performance? Are they helpful in the acquisition of expertise? The general answer to these questions is in the affirmative. Metacognitive processes are part of the inventory of human thought (Nelson and Narens, 1994; Sternberg, 2001; Perfect and Schwartz, 2002). As such, they serve as a resource to structure thought and regulate behavior. Will metacognitive processes always lead to better outcomes? No, certainly not. As for all areas of information processing, people can err.

However, understanding the role of metacognition, the breadth and flexibility of processes involved and how they are associated with expertise, allows for more precise predictions of behavior.

We argue that more research and empirical scrutiny of the construct of metacognition can help to develop principles that govern the relation between internal cognitive processes and subjective experience. These principles could be very effective for expertise research looking to differentiate a "real" expert from a "skilled" performer, currently a major challenge in expertise research (e.g., MacIntyre et al., 2013; Bourne et al., 2014).

In the sporting domain, training athletes to initiate, develop and engage in metacognition can equip them with the proper strategies, beliefs, and self-understanding to excel in sports. Currently, there is no unified view as to what athletic training entails and coaches, despite a burgeoning literature (Healy et al., 2014) have, for example, focused on a relatively narrow set of conclusions from the deliberate practice literature (Baker and Young, 2014). Metacognition offers the potential to expand our understanding of expertise and individual domains of cognition through a rigorous examination of the mechanisms underlying self-initiated monitoring and control of ones own performance. Consequently, our understanding of expertise can be illuminated by studying metacognition in the sporting, domain, specifically, by using a strength-based approach with expert samples (MacIntyre et al., 2013). Sport offers researchers a fertile natural laboratory where expertise is easily quantifiable through the quest to be faster, higher and stronger. In conclusion, we have demonstrated that the construct of metacognition has the potential to be a springboard for research into sporting expertise.

## **ACKNOWLEDGMENTS**

This work was supported jointly by the Irish Research Council under their 'New Foundations' scheme and the Department of Physical Education and Sport Science, University of Limerick 'Start-Up' grant scheme awards to the first author. We wish to acknowledge the efforts of the three reviewers and editorial team in refining our ideas and assisting us in creating a clearer narrative.

#### **REFERENCES**


Schraw, G., and Dennison, R. S. (1994). Assessing metacognitive awareness. *Contemp. Educ. Psychol.* 19, 460–475. doi: 10.1006/ceps.1994.1033


Singer, R. N. (2002). Pre-performance state,routines, and automaticity: what does it take to realize expertise in self-paced events? *J. Sport Exerc. Psychol.* 24, 359–375.


Tarricone, P. (2011). *The Taxonomy of Metacognition*. New York: Psychology Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 June 2014; accepted: 24 September 2014; published online: 16 October 2014.*

*Citation: MacIntyre TE, Igou ER, Campbell MJ, Moran AP and Matthews J (2014) Metacognition and action: a new pathway to understanding social and cognitive aspects of expertise in sport. Front. Psychol. 5:1155. doi: 10.3389/fpsyg.2014.01155*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 MacIntyre, Igou, Campbell, Moran and Matthews. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## In praise of conscious awareness: a new framework for the investigation of "continuous improvement" in expert athletes

## *John Toner 1\* and Aidan Moran2*

*<sup>1</sup> School of Sport, Health and Exercise Sciences, University of Hull, Hull, UK <sup>2</sup> School of Psychology, University College Dublin, Dublin, Ireland*

#### *Edited by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

#### *Reviewed by:*

*John Sutton, Macquarie University, Australia Doris Jane McIlwain, Macquarie University, Australia*

#### *\*Correspondence:*

*John Toner, School of Sport, Health and Exercise Sciences, University of Hull, Hull, HU6 7RX, UK e-mail: john.toner@hull.ac.uk*

A key postulate of traditional theories of motor skill-learning (e.g., Fitts and Posner, 1967; Shiffrin and Schneider, 1977) is that expert performance is largely automatic in nature and tends to deteriorate when the performer "reinvests" in, or attempts to exert conscious control over, proceduralized movements (Masters and Maxwell, 2008). This postulate is challenged, however, by recent empirical evidence (e.g., Nyberg, in press; Geeves et al., 2014) which shows that conscious cognitive activity plays a key role in facilitating further improvement amongst expert sports performers and musicians – people who have *already* achieved elite status (Toner and Moran, in press). This evidence suggests that expert performers in motor domains (e.g., sport, music) can strategically deploy conscious attention to alternate between different modes of bodily awareness (reflective and pre-reflective) during performance. Extrapolating from this phenomenon, the current paper considers how a novel theoretical approach (adapted from Sutton et al., 2011) could help researchers to elucidate some of the cognitive mechanisms mediating continuous improvement amongst expert performers.

**Keywords: expertise, continuous improvement, attention, embodiment, bodily awareness**

Many traditional and contemporary theories of motor learning emphasize the apparently "spontaneous" nature of skilled performance. For example, Fitts and Posner (1967) and Shiffrin and Schneider (1977) argued that skilful action runs "automatically" or "procedurally." Similarly, Dreyfus (2002) claimed that expert performance proceeds "without calculating and comparing alternatives ... what must be done, simply is done" (p. 372). Common to these accounts of expertise is the belief that any form of conscious involvement during on-line skill execution is likely to prove deleterious to movement and performance proficiency (Beilock et al., 2002). Challenging these dominant perspectives on skilllearning, however, is an emerging body of theory (e.g., Breivik, 2013; Toner and Moran, in press; Winter et al., 2014) and empirical evidence (see Nyberg, in press) which suggests that"continuous improvement" (i.e., the phenomenon whereby certain skilled performers appear to be capable of increasing their proficiency even though they are already experts) at the elite level of sport is heavily reliant upon the performer's ability to move efficiently between reflective and unreflective modes of bodily awareness. For example, experts are often required to pay conscious attention to, or "reinvest" in the training context, when their movements become "attenuated" (see Collins et al., 1999) because they believe that in order to optimize their performance they must "experiment with and research their moving body" (Ravn and Christensen, in press).

Interestingly, evidence suggests that many skilled performers remain "somaesthetically" aware (i.e., focusing on the proprioceptive "feel", Shusterman, 2008) of their movement during on-line skill execution (in the performance context; see Nyberg, in press) and can use global or holistic cue words to improve performance proficiency under pressurized conditions (see Mullen and Hardy, 2010). Therefore, instead of relying wholly on unthinking spontaneity to guide their performance, elite athletes appear to alternate between different modes of cognitive processing, and also between types of bodily self-awareness (i.e., reflective and pre-reflective) in practice and performance contexts. Here athletes might adopt what Colombetti (2011) refers to as a *reflective* awareness of their bodily selves when they consider their intentions or actions and assess whether they are appropriate to a certain situation. By contrast, *pre-reflective* awareness occurs when we are immersed in an activity but our attention is not on our bodily selves. However, in the latter case, Colombetti (2011) argues that the body is not entirely invisible or absent from experience as it remains "as a source of feeling, affect, agency and expressivity" (p. 27).

In a recent paper (see Toner and Moran, in press), we argued that self-focused attention (including reflective bodily awareness) plays an important role in allowing skilled athletes to refine inefficient movements during deliberate practice. The present paper extends this argument by postulating that skilled performers are capable of strategically allocating attention, and hence alternating between reflective and pre-reflective modes of awareness, in order to meet the requirements of dynamically unfolding and contextually contingent performance environments. Furthermore, we argue that influential theoretical accounts (such as self-focus theories; e.g.,Beilock and Carr,2001) used by researchers to identify the cognitive mechanisms underpinning performance at the expert level may be unable to adequately capture or explain the dynamical (i.e., that it may be freely allocated) nature of attentional processing amongst elite performers. Arising from these arguments, we propose that Sutton et al.'s (2011) "Applying intelligence to the reflexes" (AIR) approach, and a number of methodologies which aim to uncover participants' phenomenological descriptions of training and performance, may be better suited in achieving this latter aim.

Over the last decade or so, experimental psychologists have used standardized laboratory tasks in order to identify the attentional processes that govern skilled movement and performance in sport (e.g., Beilock and Carr, 2001; Jackson et al., 2006). For example, Beilock et al. (2002) found that when skilled golfers attended to a specific aspect of their putting technique (e.g., the exact moment that the clubhead finished its follow-through), their performance was impaired relative to counterparts in a dual-task condition who putted while performing a secondary task (an auditory tonemonitoring activity). Further evidence for the detrimental effects of skill-focused attention on skilled performance have been found in a baseball batting task (Gray, 2004); in soccer dribbling (Ford et al., 2005); and in a golf putting task (Mullen and Hardy, 2000). According to self-focus theories of attention (including Beilock and Carr, 2001, "explicit monitoring hypothesis" and Masters, 1992, "conscious processing hypothesis") attending to the stepby-step component processes of a proceduralized skill results in its control structures being broken down into a sequence of smaller, separate, independent units. Ultimately, this creates the opportunity for error that was not present in the "chunked" control structure (Beilock et al., 2002). Accordingly, performers have been advised to avoid focusing on their bodily movement and, instead, to "shift the focus to the external world, in particular on the impact or effect of one's behaviors" (Weiss and Reber, 2012, p. 174).

At first glance, it would seem that the case is quite clear – any form of conscious processing that directs the performer's attention to their bodily movement is likely to disrupt fluent skill execution. However, a closer look at the laboratory-based evidence base would suggest that its findings must be interpreted with caution. To explain, in each of the aforementioned studies athletes were asked to focus on a feature of their movement which they may never have previously focused on (and hence never practiced doing so). Indeed, few studies have sought to capture athletes' attentional processes over time (e.g., between the "off-season" and competitive season) or across different situations (e.g., when recovering from injury). Given the dearth of studies in psychology on the temporal and/or contextual dynamics of attentional processes, we question the degree to which available evidence supports the received wisdom that conscious attention will *inevitably* disrupt skilful performance. In fact, recent research has begun to cast doubt upon the validity of this latter assumption. For example, Nyberg (in press) found that elite freeskiers attended to on-line skill execution in order to identify any features of their movement which might require alteration/adjustment in order to maintain performance proficiency. In addition, a large volume of evidence indicates that elite performers are capable of flexibly allocating their attention (i.e., moving from reflective to pre-reflective modes of bodily awareness) dependent upon the context-specific demands confronting them during training and competitive performance (see Bernier et al., 2011) or the challenges (e.g., injuries, slumps) that they will inevitably face at some stage during their

careers (see Collins et al., 1999). In stark contrast to self-focus theories, these emergent findings suggest that skilled performance is likely to be impeded if the "proceduralization" of skills is excessive (see Ericsson, 2006) – because experts must be able to deliberately access and strategically re-route any semi-automated routines in order to facilitate "continuous improvement" (Montero, 2010; Breivik, 2013).

Against this background, we argue that there is ample empirical evidence that "continuous improvement" at the elite level is heavily dependent upon the performer's ability to effectively utilize reflective modes of bodily awareness. Skill-focused attention (including conscious bodily awareness) appears to be a key feature of skilled performance because athletes operating at this level are driven by the desire to learn "new and better techniques" (Breivik, 2007, p. 127). For example, despite having won eight medals at the Beijing Olympics, Michael Phelps decided to change his freestyle technique in a bid to increase his sprinting speed (Andersson, 2009). Moreover, athletes will inevitably experience injury, fatigue, growth and aging which may disrupt habitual movement (see Bissell, 2013; Eden, 2013) and require them to correct, relearn and adjust their spontaneous performance (Shusterman, 2008). In fact, research on the topic of "skill recovery" or "skill refinement" shows that skilled athletes who are attempting to regain prior levels of performance often deliberately reinvest conscious control to restore or refine habitual movements in sports such as javelin throwing and swimming (Hanin et al., 2004). In these studies, researchers have helped athletes regain or refine disrupted movement patterns by encouraging them to become more consciously aware of the somaesthetic differences between current (problematic) and desired actions.

Somaesthetic awareness appears to play an important role in "continuous improvement" by allowing performers to identify movements that are causing them discomfort or outcomes which are unusual or undesirable. Indeed, evidence suggests that these forms of self- awareness are important mediators of "flow" or optimal competitive performance in sport. On the basis of their pioneering research on flow in sport, Jackson and Csikszentmihalyi (1999) argued that "without self-awareness an athlete misses important cues that can lead to a positive change in performance" (p. 105). According to these authors, self-awareness simply means paying attention to cues provided by movements, and making adjustments to your actions when something is amiss. Athletes may use reflective bodily awareness to identify "attenuated" habits in the performance context and subsequently adjust problematic movements in the training context. However, evidence suggests that performers may also choose to adjust problematic movements during on-line execution during competition. To illustrate, Collins et al. (2001) found that elite weightlifters chose to consciously modify their movement during competition in order to maintain movement proficiency. Similarly, Nyberg found that elite freeskiers learn how to discern (i.e., through "focal awareness") their rotational velocity to such an extent that they "know whether they will be able to perform the trick the way it was intended without adjustments, or whether they will need to make adjustments during the flight phase" (p. 7). In this study, elite free skiers were video recorded during practice and subsequently interviewed using a technique known as stimulated recall (SR – a method for

enhancing reflection by recalling situations through audiotapes or video recordings). Nyberg suggests that these performers rely on their focal awareness (which is conscious and might include knowledge of their velocity and how they need to modify it) and their subsidiary awareness which is "less conscious" and includes knowledge of the "particulars" such as the friction of the snow and their feelings of previous jumps. These elite performers were found to navigate their focal awareness by rapidly shifting its target even in the midst of the activity itself. For one performer, this meant that he was focally and embodiedly aware of his rotational velocity while in the air but could quickly change his awareness to take into account environmental conditions such as his position in relation to the targeted landing area. Clearly, these findings suggest that some performers seek to counteract automaticity by ensuring that certain features of performance are subject to strategic control.

Although some performers may choose to reinvest conscious attention by adjusting movement in the performance context, others may choose to use cue words as "instructional nudges" (i.e., explicit verbal phrases or maxims: see Sudnow, 2001; Sutton, 2007) in order to"re-route"embodied routines. According to Sutton et al. (2011) cues may allow the performer to build and access "flexible links between knowing and doing" (p. 95). Cue words appear to represent forms of thinking and remembering which can, in some circumstances, allow performers to animate the kinesthetic mechanisms of skilled performance. To illustrate, Jenkins (2007) interviewed 113 European tour golfers and found that 70% of these performers used at least one "swing thought" (i.e., a form of cue word) during on-line skill execution. Clearly, certain forms of mindedness or conscious processing are a common feature of elite competitive performance.

The preceding evidence would suggest that a better understanding of the cognitive processes mediating continuous improvement at the elite level of skilled performance can be achieved only by adopting a theoretical framework which can account for the dynamic nature of attentional processing. Therefore, instead of explanations (e.g., self-focus theories) that emphasize the proceduralized nature of skilled performance, we may require theoretical accounts which focus on the *interchanging* phases or stages of learning (Shusterman, 2008, 2009) that appear to characterize training and performance at the elite level of sport. Consequently, we propose that Sutton et al.'s (2011) AIR approach may help researchers to explain how performers can alternate between different modes of processing. Briefly, Sutton et al.'s (2011) model is cyclical in the sense that the maintenance and enhancement of performance efficiency requires the "rapid switching of modes and styles" (2011, p. 93) within the training and performance context. This framework proposes that expert skill relies on a mindedness that"facilitates the dynamic flexibility of attention, allowing it to be allocated freely and in a way that best meets contingent contextual demands" (Geeves et al., 2014, p. 676). Accordingly, Geeves et al. (2014) argue that expert performers may determine the amount of attention they need to pay to certain processes in the practice context (depending on their current level of performance) and during on-line performance (according to the situational demands presented to them).

Additionally, we propose that the AIR model may help researchers to interpret the accumulating body of empirical

evidence which suggests that skilled performers seek to avoid automaticity by ensuring that performance remains open to strategic control in both the practice and performance context. Sutton et al. (2011) argued that there are a number of different ways in which embodied coping is minded or mindful (varying across individuals, task domains, and cultures) and recommend that we search for forms of theorizing that highlight these differences by exploring what actually happens to performers as they "direct attention to kinesthetic cues in increasingly skilful ways" (p. 96). Given the preceding evidence documenting the mindful nature of expertise, it would also seem important to identify methodological approaches that will help researchers capture the attentional switching mechanisms that appear to underpin "continuous improvement" at the elite level of sport. Specifically, in order to understand embodied perspectives of experience researchers may wish to adopt methodological approaches that are "truly grounded in the carnal realities of the lived sporting bodies" (Hockey and Allen-Collinson, 2007, p. 116). Some researchers have recently taken up this challenge by drawing on participants' phenomenological insights to better understand how embodied expertise is shaped by training and performance. For example, Ravn and Christensen (in press) utilized a phenomenology-related analysis of qualitative data (including participant observations and interviews) with an elite golfer to explore how the "described experience comes into being rather than what this experience means to the subject" (p. 5). The principal author observed Line (an elite golfer), between 5 and 8 h each day, over 5 days of training. The researchers drew on notes taken during these observations to invite Line to describe her practice sessions and experiences in detail. The researchers subsequently analyzed the data by looking for "petite generalizations" (i.e., generalizations that regularly occurred in the case study) relating to how Line used her awareness of bodily sensations during training. Overall, the findings suggested that training at the elite level is not just about handling the physicality of the body but also about listening to it and regulating how it should "feel" in order to perform optimally (see Nyberg, in press, for a similar methodological approach).

In another study, Bernier et al. (2011) used a naturalistic investigation to explore the attentional foci adopted by elite golfers in training and performance contexts. The initial phase of the study involved filming participants (for 60 min) in a training session during the winter (non-competitive) season. The second phase took place during the competitive season and involved filming participants over the course of a complete round (i.e., 18 holes of play) in a professional competition. Self-confrontation interviews, based on video footage, were conducted within 2 h of the completion of the training session and competitive performance. The interviews sought to encourage SR by asking each participant to view the videotape with the researcher and to recall and describe what thoughts he was processing during the training session or competition. Initially, the participants and the researcher watched the first situations on the video recording (i.e., the first training exercise and the first three holes of the competition). Having discussed the attentional foci adopted during these situations, the participant was asked to indicate specific circumstances that he considered relevant to analyse. These situations included Toner and Moran In search of a new theoretical

specific exercises during the training sessions or great/poorly executed strokes during competition. Each sequence involved an action (e.g., a practice drill or a shot), the preparatory phase (e.g., the pre-shot routine), and the step following this action (e.g., walking to the next shot). Participants were urged to express their thoughts during each sequence, rather than being asked to explain "their solution for the task or to provide a summary of the general strategy adopted" (Bernier et al., 2011, p. 331). Inductive content analysis revealed that these elite golfers adapted their attentional focus depending on the context. That is, golfers were found to flexibly adjust their attentional focus (moving back-andforth between internal and external foci) across the preparatory, execution and evaluative stages of training and competitive performance.

In summary, a significant volume of evidence shows that skilled performers' foci of attention may change dramatically over the course of a competitive season (e.g., to deal with "attenuated" movement patterns) or during a competitive event (i.e., between preparation, execution, and evaluation). Unfortunately, as most studies of attentional processes in psychology are limited to static snapshots of the phenomena of interest, they shed little light on the dynamical nature of attention. Against this background, the current paper has argued how Sutton et al.'s (2011)AIR approach may provide researchers with a more detailed understanding of the embodied nature of skilful action and performance – thereby helping to explain how athletes strategically allocate attentional resources in seeking to maintain and enhance performance proficiency. In order to shed light on the complex and dynamic attentional mechanisms that mediate "continuous improvement" in expert performers researchers may need to use a variety of methods including both standardized laboratory techniques (e.g., occlusion paradigms, eye-tracking) and phenomenological and naturalistic investigations (e.g., SR). Together, these approaches may help researchers understand how and why performers alternate between reflective and pre-reflective modes of bodily awareness across training and performance contexts.

#### **REFERENCES**


Sudnow, D. (2001). *Ways of the Hand: A Rewritten Account*. Cambridge, MA: MIT press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 April 2014; accepted: 01 July 2014; published online: 16 July 2014. Citation: Toner J and Moran A (2014) In praise of conscious awareness: a new framework for the investigation of "continuous improvement" in expert athletes. Front. Psychol. 5:769. doi: 10.3389/fpsyg.2014.00769*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Toner and Moran. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## An adaptive toolbox approach to the route to expertise in sport

## *Rita F. de Oliveira1\*, Babett H. Lobinger <sup>2</sup> and Markus Raab1,2*

*<sup>1</sup> Department of Applied Sciences, London South Bank University, London, UK <sup>2</sup> Institute of Psychology, German Sport University, Cologne, Germany*

### *Edited by:*

*Merim Bilalic, University of Klagenfurt, Austria*

*Reviewed by: Bartosz Gula, University of Klagenfurt, Austria Merim Bilalic, University of*

#### *Klagenfurt, Austria \*Correspondence:*

*Rita F. de Oliveira, Department of Applied Sciences, London South Bank University, 103 Borough Road, London, SE10AA, UK e-mail: r.oliveira@lsbu.ac.uk*

Expertise is characterized by fast decision-making which is highly adaptive to new situations. Here we propose that athletes use a toolbox of heuristics which they develop on their route to expertise. The development of heuristics occurs within the context of the athletes' natural abilities, past experiences, developed skills, and situational context, but does not pertain to any of these factors separately. This is a novel approach because it integrates separate factors into a comprehensive heuristic description. The novelty of this approach lies within the integration of separate factors determining expertise into a comprehensive heuristic description. It is our contention that talent identification methods and talent development models should therefore be geared toward the assessment and development of specific heuristics. Specifically, in addition to identifying and developing separate natural abilities and skills as per usual, heuristics should be identified and developed. The application of heuristics to talent and expertise models can bring the field one step away from dichotomized models of nature and nurture toward a comprehensive approach to the route to expertise.

**Keywords: heuristics, expertise, sport, talent development, talent identification, cues**

## **INTRODUCTION**

Most current theories of expertise are based on the principle that knowledge underlies performance (Ericsson et al., 2006). Specifically, Ericsson et al. (2006) suggest that the specific knowledge of experts arises through about 10.000 h of deliberate practice. More research studies have also shown that previous knowledge guides attention for accurate performance (e.g., Bilali´c et al., 2010). Drawing on the importance of knowledge for performance, the heuristic approach focuses on how that knowledge can be effectively searched and a how solution implemented.

Heuristics are rules of thumb that allowfast andfrugal decisionmaking. The concept of heuristics was introduced by Simon (1982) to explain how humans decide when they have limited resources. He proposed that behavior could only be understood through analyzing both the person and the environment where the behavior took place. The subsequent work of Gigerenzer et al. (1999) identified and tested specific heuristics in a number of different environments. For instance, they found the recognition heuristic whereby people choose the option they recognize over the option they do not recognize (Gigerenzer and Goldstein, 1999). Recently, the simple heuristics research program has been used in the context of sport. Raab (2012) showed that athletes use simple heuristics both to make decisions and to implement them in the sports environment. What is still lacking, however, is an understanding of how simple heuristics develop in the route to expertise.

## **HEURISTICS AS CHARACTERISTICS OF EXPERTISE AND TALENT DEVELOPMENT**

The topic of expertise has been gaining increased prominence in science and the media because researchers, practitioners, and laypeople wish to replicate the route to success in the most efficient way. In sport, and especially in team sports, experts are those with repeated top-level performance who can most efficiently resolve the situation put before them. The route to that level of expertise will no doubt have involved uncountable attempts some successful but many unsuccessful. Here we will argue that the developmental process is inherently non-optimal and nonlinear, but that this is indispensable to acquire the highest levels of expertise. To say that athletes show optimal adaptation to various situations related to their sport does not equate to saying that their actions or behaviors are optimal, but rather that these are cost-effective actions or behaviors. This is especially the case where performance involves fast decision-making. The best decision is not the optimal decision per se, but the one that can solve the current situation well enough and fast enough (Simon, 1982; Raab et al., 2009; Todd et al., 2012). It is important to qualify what is meant by optimal performance so that efforts to identify and develop talented athletes are geared toward functional (not optimal) decision-making. The route to expertise starts by demonstrating a talent. Talent identification is the process of recognizing the potential of an athlete to excel in a particular sport. Talent development, on the other hand, is the process by which an athlete can realize that potential, which includes benefiting from the most appropriate learning and training environments (Vaeyens et al., 2008).

## **CLASSICAL MODELS IN TALENT DEVELOPMENT: NATURAL ABILITIES AND NURTURE**

Natural abilities together with environmental and intrapersonal catalysts are, according to Gulbin et al. (2010), the non-random factors contributing to the developmental process which leads to specific competencies. The distinction between natural abilities and catalysts reflects current views on talent identification and development which show a facet of the nature vs. nurture debate (e.g., Baker et al., 2003; Epstein, 2013). The focus of this debate is on general natural abilities and environmental factors that lead to general competencies (cf. categorizations by Gagné, 1999). For instance perceptiveness and coordination are considered general natural abilities and thought to influence talent development. Athletes are often identified on general perceptual or motor skills, and their development often focuses on these general abilities. Paradoxically, researchers and practitioners agree that expert athletes specialize in their sport and that a general skill is not sufficient for expert performance. One of the marks of expertise is the use of unique solutions to solve situations in the playfield and in other situations related with the sport. In other words, expert athletes are characterized by their optimal adaptation to all things related with their sport, including effective decision-making *in situ* rather than by general abilities.

## **THE HEURISTICS APPROACH TO TALENT DEVELOPMENT**

A useful framework to understand unique adaptations to new situational contexts is the heuristic approach. Athletes use heuristics or rules of thumb which are specific to the type of situation and can be used rapidly without much cost. The development of heuristics occurs within the context of the athletes' natural abilities, but also their past experiences, developed skills, and situational contexts. It does not pertain to any of these factors separately, instead, heuristics pertain to the repertoire of the athlete and it is our contention that talent identification methods and talent development models should be geared toward the assessment and development of heuristics. This can be done by improving the efficiency in the use of an existing heuristic (e g., calibrating the heuristic to more valid cues), or learning which heuristic fits best with a specific environment (e g., Gigerenzer and Gaissmaier, 2011). The heuristics repertoire consists of psychological, neurophysiological, and perceptual-motor adaptations (Raab et al., 2009; Raab, 2012; Todd et al., 2012). Each heuristic is used for specific situations in much the same way as a hammer is used for nailing pictures but not for cutting branches. By definition a heuristic is composed of at least three building blocks. They are *search rules*, *search-stopping rules*, and *decision rules.* Within sports a fourth building block has been proposed which deals with the *execution rules*. We will expand on these.

*Search rules* include two kinds of search: search for information cues and search for alternatives. In most ball games the alternatives are fixed (i.e., players can only either pass, dribble, throw, etc.) so the athlete searches for information cues to decide on which of these actions to use. While novice athletes may search for information cues randomly, expert athletes can directly use the information cues with the highest validity (de Oliveira et al., 2009; Esteves et al., 2011). When the alternative actions are not specified, then search rules for the action itself must be generated. In these cases the task characteristics specify whether it is most advantageous to broaden or limit the search. For instance in chess it is advantageous to broaden the search for options (Bilalic et al., 2009), whereas in more timepressured sports it may be best to narrow the search for options (Raab and Johnson, 2007).

*Search-stopping* rules are the rules by which one stops searching for information cues or alternative actions. Classical models presumed that there was a way to compute the optimal stopping point where the costs of further search would exceed its benefits. However, to say that athletes show optimal use of heuristics does not equate to saying that their actions or behaviors are optimal, but rather that these are cost-effective actions or behaviors. This means that an expert athlete knows when the search for information cues must stop and will use whatever information was gathered to make the decision in due time. This also means that novice athletes must be placed in situations that potentiate their search for the most valid information cues, and must also be placed in situations where decisions must be made based on low-quality information cues.

*Decision rules* describe how a decision is made after search has been stopped. Decision rules define how the available information is used to make a decision. Psychology has a tradition of assuming that intelligent behavior implies weighting and combining information cues (e.g., multiple linear regression models), but the research on fast and frugal heuristics has shown that frequently less is more. For instance, the recognition heuristic is a decision rule whereby the option chosen is simply based on one valid cue that point to one option and not to an alternative option (Gigerenzer and Goldstein, 1999). Again, expert athletes have the heuristic repertoire to make the best decisions the fastest, whereas novice athletes must be placed in situations that build up their repertoire.

*Execution rules* address questions like what action to carry out and how to execute it as already described for decision processes (Raab et al., 2005). Those rules are based on individual experience (Raab and Johnson, 2007). Athletes should be exposed to situations that force them to decide between options to learn execution criteria and build heuristics for various situations.

These rules are the building blocks of heuristics and they can help explain how athletes develop their expertise in terms of decision-making and problem-solving which are key competences in expert performance.

## **HEURISTICS IN THE ROUTE TO SPORT EXPERTISE: BUILDING AN ADAPTIVE TOOLBOX**

Heuristics are domain-specific and can be used to formally describe the link between natural and nurtured characteristics. In fact heuristics are neutral in the nature vs. nurture debate because they can be learned but they can also be available at birth (cf., Baker et al., 2003, 2012). As an illustration, very small children will naturally cluster around a ball. The building blocks of the heuristic used might be: search for ball, stop searching when the ball is found, decide to move closer to the ball. This behavior continues until they learn that exploring the space increases the chance of receiving the ball. Here they may change the rules of the existing heuristic into: search for space in relation to the

ball, stop when space is found, and decide to move the space. Again, this behavior will continue until they learn that not only space but also the defensive players are important in receiving the ball. Here they may again change the rules of the existing heuristic into: search for the space in relation to the ball and the defensive player, stop when you found it decide to move to the space.

Heuristics are also specific to the sportive situations that the athlete encounters. Therefore, nature and nurture, which are normally described separately in models of talent identification and development, can instead be described as heuristics. This would allow future research on expertise to conceptually integrate phylogenetic (natural abilities), ontogenetic (development) and situational factors. As an illustration, take the phenomenon of less-is-more for situations where there is an abundance of cues. We explore how athletes with different natural abilities (John and Mary) may become experts in their sport. John may have a natural ability to focus on a small number of cues and use those cues to maximal advantage. He will be identified as talented because of his consistent results in particular situations (rather than his creativity). During development, John may specialize in the use of those cues and become an expert in using them and hence build a narrow repertoire of expertise. Provided these are the most valid cues for the sports situation, John will also be an expert in his sport. If the sport offers a lot more variety, however, John will need to benefit from a varied training program that forces him to explore and use other cues for other situations. Mary, on the other hand, may have a natural ability to focus on a large number of cues and will therefore use various combinations of cues. She might be identified as talented because of her creative solutions (rather than her consistent results). During development, Mary may learn which cues are most valid to which situation and hence build a broad repertoire of expertise. Provided a number of cue combinations is required for the sports situation, Mary will also be an expert in her sport. If the sport offers little variety, however, Mary will need to benefit from a specialized training program that forces her to use the most valid cues.

## **CONCLUSION**

The heuristics approach to expertise is useful because it takes into account the natural abilities and development of the athlete, as well as the situations posed by the sport and the training environment. It partly addresses the eternal nature vs. nurture debate and can provide suggestions for training programs which aim at developing the individual natural abilities of athletes by providing adequate sports situations. This is currently being developed in the area of decision-making (Marasso et al., accepted). The practical application of heuristics to talent identification and talent development models will bring the field one step away from dichotomized models and toward a true comprehensive approach to the route to expertise. Future research can use the heuristics approach to investigate how the route to expertise sometimes deviates from the mainstream to create novel solutions. For instance, new techniques like the Fosbury flop in track-and-field and the Tsukahara's vault in gymnastics (Bar-Eli et al., 2008) highlight alternatives found by expert athletes who did not fit a standard model of talent.

## **ACKNOWLEDGMENTS**

The three authors made substantial contributions to: the conception of the manuscript; its critical revision for important intellectual content; final approval of the version submitted for publication. The three authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 April 2014; accepted: 19 June 2014; published online: 08 July 2014. Citation: de Oliveira RF, Lobinger BH and Raab M (2014) An adaptive toolbox approach to the route to expertise in sport. Front. Psychol. 5:709. doi: 10.3389/fpsyg.2014.00709*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 de Oliveira, Lobinger and Raab. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*