# THE TEMPORAL DYNAMICS OF COGNITIVE PROCESSING

EDITED BY: Timothy Michael Ellmore, Peter Ford Dominey and John Magnotti PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-888-7 DOI 10.3389/978-2-88919-888-7

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **THE TEMPORAL DYNAMICS OF COGNITIVE PROCESSING**

#### Topic Editors:

**Timothy Michael Ellmore,** The City College of New York, USA **Peter Ford Dominey,** Centre National de la Recherche Scientifique (CNRS), France **John Magnotti,** Baylor College of Medicine, USA

The development of grouped intracranial EEG (icEEG) techniques allows us to develop maps of the architecture and interregional dynamics of distributed cortical networks. Here, frontal-ventral temporal activity and interactions are evaluated during a word-completion task, using icEEG data combined from 15 patients implanted with subdural electrodes. (Left) Electrodes localized over regions demonstrating significant task-related activity are visualized as spheres on a common cortical surface model (fusiform gyrus, green; pars triangularis, red; pars opercularis, blue; precentral gyrus, orange; postcentral gyrus, cyan; subcentral gyrus, purple). A time-series representation depicts the mean percent change in broadband gamma activity for each region 50 to 700 ms after stimulus presentation (shading denotes 1 SEM). (Right) Information flow between regions evaluated using short-time direct directed transfer functions (SdDTF) is summarized graphically using arrows to depict the direction and relative amplitude of interregional influences.

Figure by Cihan M. Kadipasaoglu and Nitin Tandon.

From our ability to attend to many stimuli occurring in rapid succession to the transformation of memories during a night of sleep, cognition occurs over widely varying time scales spanning milliseconds to days and beyond. Cognitive processing is often influenced by several behavioral variables as well as nonlinear interactions between multiple neural systems. This frequently produces unpredictable patterns of behavior and makes understanding the underlying temporal factors influencing cognition a fruitful area of hypothesis development and scientific inquiry. Across two reviews, a perspective, and twelve original research articles covering the domains of learning, memory, attention, cognitive control, and social decision making this research topic sheds new light on the temporal dynamics of cognitive processing.

**Citation:** Ellmore, T. M., Dominey, P. F., Magnotti, J., eds. (2016). The Temporal Dynamics of Cognitive Processing. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-888-7

# Table of Contents


*133 Change blindness in pigeons (***Columba livia***): the effects of change salience and timing*

Walter T. Herbranson

*142 Temporal Dynamics of the Integration of Intention and Outcome in Harmful and Helpful Moral Judgment*

Tian Gan, Xiaping Lu, Wanqing Li, Danyang Gui, Honghong Tang, Xiaoqin Mai, Chao Liu and Yue-Jia Luo

*154 Development of grouped icEEG for the study of cognitive processing* Cihan M. Kadipasaoglu, Kiefer Forseth, Meagan Whaley, Christopher R. Conner, Matthew J. Rollo, Vatche G. Baboyan and Nitin Tandon

# Editorial: The Temporal Dynamics of Cognitive Processing

#### Timothy M. Ellmore<sup>1</sup> \*, Peter F. Dominey <sup>2</sup> and John F. Magnotti <sup>3</sup>

*<sup>1</sup> Psychology, The City College of New York, New York, NY, USA, <sup>2</sup> Centre National de la Recherche Scientifique, Inserm U846, Lyon, France, <sup>3</sup> Neurosurgery, Baylor College of Medicine, Houston, TX, USA*

Keywords: memory, attention, cognitive control, aging, social decision making

**The Editorial on the Research Topic**

#### **The Temporal Dynamics of Cognitive Processing**

From our ability to attend to many stimuli occurring in rapid succession to the transformation of memories during a night of sleep, cognition occurs over widely varying time scales spanning milliseconds to days and beyond. Cognitive processing is often influenced by several behavioral variables as well as non-linear interactions between multiple neural systems. This frequently produces unpredictable patterns of behavior and makes understanding the underlying temporal factors influencing cognition a fruitful area of hypothesis development and scientific inquiry. Across two reviews, a perspective, and 12 original research articles covering the domains of learning, memory, attention, cognitive control, and social decision making this research topic sheds new light on the temporal dynamics of cognitive processing.

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

> \*Correspondence: *Timothy M. Ellmore ellmoret@gmail.com*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *01 June 2016* Accepted: *06 June 2016* Published: *21 June 2016*

#### Citation:

*Ellmore TM, Dominey PF and Magnotti JF (2016) Editorial: The Temporal Dynamics of Cognitive Processing. Front. Psychol. 7:930. doi: 10.3389/fpsyg.2016.00930*

Highly relevant to a fundamental understanding of behavior is how prediction errors (PEs), the differences between predicted and actual effects, modulate learning. Bayesian models of associative learning include PEs as a critical variable, while other models posit PEs as key to uncertainty-related learning as well as cognitive control. Limongi et al. hypothesized that temporal PEs might disrupt behavior-change performance under uncertainty. Subjects made temporal predictions while observing a moving ball strike a stationary ball, which deflected at a variable temporal gap, and in some trials, a change signaled them to alter their behaviors. Performance accuracy fell as a function of both temporal PE and the delay. The authors interpret from a free energy perspective the participants' inaccurate prepotent behavior as compensation for imprecise perceptual inference.

Stress is a critical variable influencing learning both positively and negatively, especially when a stressful episode comes immediately before or right after learning. Cadle and Zoladz present the idea and review data suggesting that learning is enhanced in some instances because the learned information becomes a part of a "stress context" and is subsequently tagged by the emotional memory being formed. However, when the stressful episode is separated in time from learning, the memory is impaired because the learning is experienced outside the context of the stress episode.

Daniel et al. report on how pigeons use time to alternate between two opposing rules, matching and non-matching, using a mid-session reversal task. In the first half of a session, pigeons performed a match-to-sample task, which then switched to non-match-to-sample for the second half. Pigeons that had previously learned either a matching or non-matching abstract concept were trained in this mid-session reversal task. After training, novel stimuli were inserted into either the early (matching-to-sample) or late (non-match-to-sample) part of the session. Birds responded to these novel trials according to the previously-learned abstract concept, independent of the time of the session, while still responding to trained stimuli according to the current rule (matching vs. non-matching). Their study shows that pigeons are an excellent model for understanding the comparative processes that underlie the temporal dynamics of discrimination learning and concept formation.

Escobar et al. investigate the associative structure of longdelay conditioning using a rat model. In long-delay conditioning, a stimulus of long duration elicits a conditioned response primarily in its latter stages, rather than throughout the stimulus. Of theoretical importance is whether the initial segment of a long-delay conditioned stimulus actually acquires inhibitory strength. Using an appetitive conditioning procedure, they tested for conditioned inhibition and found that the initial segment of a long delay CS appears to share more characteristics with a latent inhibitor than a conditioned inhibitor. The findings are consistent with componential theories of conditioning. They conclude that stimuli must not be viewed as unitary events, but instead as a sequence of temporally linked events, each link carrying different types of information about the conditions that precede and follow them.

Working memory includes the ability to hold information in mind that is no longer physically present, and to manipulate it to achieve goals. Kaiser provides a much needed review of electro- and magneto-encephalographic studies of auditory working memory. Studies of spatial and non-spatial auditory working memory have found differential roles for ventral and dorsal stream regions, while event-related potentials reveal sustained memory load-dependent deflections during retention with activation peaks during delay periods related to task performance. The findings highlight that auditory working memory relies on the dynamic interplay between frontal executive and sensory representation systems.

Which working memory strategies best offset age-related declines in fluid cognition? Basak and O'Connell tackle this question by studying older adults who were randomly assigned to one of two types of working memory training. One group was trained on a predictable memory updating task while another was trained on a novel, unpredictable memory updating task. Compared to predictable training, unpredictable memory updating requires greater demands on cognitive control. The results obtained showed significantly enhanced performance on episodic memory and faster learning of a new working memory task after the unpredictable training. Unpredictable memory updating training may offer a better way to improve cognitive abilities in aging adults.

Knowledge of one's own memory abilities involve processes that are built over time. Chua and Solinger focus on how metamemory processes depend on different prospective and retrospective factors across the learning and memory timescale. By analyzing feeling-of-knowing judgments about target retrievability and retrospective confidence judgments about retrieved targets, they conclude that metamemory judgments should not be thought of as discrete experiences in time, but instead as an evolving awareness that dynamically incorporates previous judgments with new information.

Consolidation, the stabilization and strengthening of memories, also evolves over time often during periods of "offline" processing when the participant isn't engaged in similar information processing. The behavioral factors that influence how offline processing results in the temporal stabilization of memories are not well understood. Ellmore et al. tested different groups of subjects after each was exposed to novel visual stimuli. One group remained in the lab and engaged in an attention demanding task. Another group remained in the lab and rested quietly. Yet another group left the lab and returned later for testing. The different manipulations during the offline processing period affected long-term recognition of the visual stimuli a day later. Results indicated that remaining in the same context and resting quietly with minimal engagement of attention results in the best ability to distinguish old from novel visual stimuli.

How does attention affect processing of time? Using a temporal variant of the mismatch negativity (MMN) paradigm, Campbell and Davalos studied temporal processing by introducing rare, deviant interstimulus intervals amid a series of standard intervals and recorded EEG. They manipulated subjects' level of attention to the intervals and the difficulty of the task (size of the deviant interval). They found an interaction between task difficulty and attention—the largest neurophysiological responses came from the conditions with high attention and the largest deviants. Their results reiterate the importance of attention in temporal and memory processing generally and may provide a way to assess differences in temporal processing between individuals and groups.

Limited attentional capacities are reliably demonstrated with the attentional blink, a task in which humans miss a second target when it should be detected within 600 ms of an initial target. Dynamic attention theory posits that attention cycles in an oscillatory manner with regular pulses evoking expectations regarding the point of the next occurrence of a tone in a rhythm. This allows for more attentional resources to be provided. New findings reported by Bermeitinger and Frings suggest that oscillatory cycling attention does not affect temporal selection as tapped in an AB paradigm, indicating the need for future research to test whether regular and irregular rhythms and long/stronger entrainment phases differ in their influence on the AB effect.

Multitasking is ubiquitous in today's world, and a more thorough understanding of how multiple task demands presented closely in time affect performance is highly relevant. Completion of multiple tasks often affects performance negatively, as the prioritized task can be detrimentally affected by additional tasks. Scherbaum et al. conduct a dynamic investigation of two tasks, Task 1 and Task 2, in a dual-task setting. An additional Task 2 interfered with Task 1, with the influence dependent on the temporal proximity of task stimuli. The influence onto Task 1 performance was continuous and irrespective of any critical window of influence. Furthermore, there was modulation of crosstalk by previous interference, which indicates flexible adaptation of task-shielding. Finally, the execution of Task 1 was influenced by previously executed responses of both tasks and the influences showed different temporal patterns, indicating a sustained reactivation of the previous response of Task 1.

Multitasking abilities also depend critically on cognitive flexibility, a component of executive functions allowing us to behave in dynamic environments. Dshemuchadse et al. distinguish between shifting flexibility and spreading flexibility. Using a homonym relatedness judgment task combined with mouse tracking, they demonstrate that these two types of cognitive flexibility follow independent temporal patterns in their influence on participants' mouse movements during the relatedness judgments. The results demonstrate the need for future studies to consider shifting and spreading flexibility independently when studying moderators of cognitive flexibility.

Noticing how the environment changes over time is critical for survival. A large body of literature suggests that when these changes occur during a blank temporal interval, they can go unnoticed, a phenomenon called change blindness. Herbranson reports that pigeons also experience such change blindness, suggesting commonalities across species in how visual scenes are processed. Like humans, pigeons show sensitivity to the presence of a blank interval between the original and unchanged displays and performance increases as the number of repetitions increases. Pigeons also appear to engage in a serial search for changes, as additional time is required to search additional locations.

In social decision making, the ability to integrate moral intention information with the outcome of an action plays a crucial role in mature social judgment. This integration process occurs across time, but is likely to be quick. Gan et al. show evidence for fast moral intuition reactions and subsequent integration processing in the right temporo-parietal area. Participants made moral judgments for agents who produced either negative/neutral outcomes with harmful/neutral intentions or positive/neutral outcomes with helpful/neutral intentions. Neural differences measured with EEG between attempted and successful actions over prefrontal and bilateral temporo-parietal regions were found in both harmful and helpful moral judgment with a neural integration time course from right temporo-parietal area to left temporo-parietal area, then to prefrontal and right temporo-parietal area again.

The aim of this research topic was to understand basic temporal aspects of cognitive processing more thoroughly. To understand the temporal structure of the underlying neural processes, however, it is clear that further methodological development will be needed. Kadipasaoglu et al. report on the development of robust techniques for group analysis of intracranial EEG data, a powerful new method to investigate the architecture of interregional dynamics of distributed cortical networks. An illustration of this work is featured as the cover of the research topic. The combination of structured cognitive tasks, the high spatial and temporal resolution of intracranial EEG, and powerful within- and between-subjects statistical inference will no doubt help contribute to a more detailed understanding of how cognition plays out in time.

# AUTHOR CONTRIBUTIONS

TE, PD, and JM have made substantial, direct and intellectual contributions to the work, and approved it for publication.

# FUNDING

TE is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number SC2GM109346. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ellmore, Dominey and Magnotti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Temporal prediction errors modulate task-switching performance**

*Roberto Limongi 1,2 \*, Angélica M. Silva 2,3 and Begoña Góngora-Costa <sup>4</sup>*

*<sup>1</sup> UDP-INECO Foundation Core on Neuroscience, Diego Portales University, Santiago, Chile, <sup>2</sup> Instituto Venezolano de Investigaciones Lingüísticas y Literarias "Andres Bello", Caracas, Venezuela, <sup>3</sup> Escuela Lingüística de Valparaíso de la Pontificia Universidad Católica de Valparaíso, Chile, <sup>4</sup> Escuela de Fonoaudiología, Facultad de Medicina, Universidad de Valparaíso, Chile*

We have previously shown that temporal prediction errors (PEs, the differences between the expected and the actual stimulus' onset times) modulate the effective connectivity between the anterior cingulate cortex and the right anterior insular cortex (rAI), causing the activity of the rAI to decrease. The activity of the rAI is associated with efficient performance under uncertainty (e.g., changing a prepared behavior when a change demand is not expected), which leads to hypothesize that temporal PEs might disrupt behavior-change performance under uncertainty. This hypothesis has not been tested at a behavioral level. In this work, we evaluated this hypothesis within the context of task switching and concurrent temporal predictions. Our participants performed temporal predictions while observing one moving ball striking a stationary ball which bounced off with a variable temporal gap. Simultaneously, they performed a simple color comparison task. In some trials, a change signal made the participants change their behaviors. Performance accuracy decreased as a function of both the temporal PE and the delay. Explaining these results without appealing to *ad hoc* concepts such as "executive control" is a challenge for cognitive neuroscience. We provide a predictive coding explanation. We hypothesize that exteroceptive and proprioceptive minimization of PEs would converge in a fronto-basal ganglia network which would include the rAI. Both temporal gaps (or uncertainty) and temporal PEs would drive and modulate this network respectively. Whereas the temporal gaps would drive the activity of the rAI, the temporal PEs would modulate the endogenous excitatory connections of the fronto-striatal network. We conclude that in the context of perceptual uncertainty, the system is not able to minimize perceptual PE, causing the ongoing behavior to finalize and, in consequence, disrupting task switching.

**Keywords: prediction errors, predictive coding, response inhibition, insular cortex, cognitive neuroscience**

# **Introduction**

Bayes-based theories of brain function state that the brain is a predictive machine and that perception is no more than a prediction of the sensorium's causes (Friston and Stephan, 2007; Friston, 2009, 2010; Friston and Kiebel, 2009; Daunizeau et al., 2012). Predictions produce prediction errors (PEs, the differences between the predicted and the actual events). In general, PEs are considered to drive both inference and learning. In predictive coding, this is equivalent to regarding the brain as a hierarchical Bayesian filter (Friston and Stephan, 2007; Friston, 2009, 2010; Friston and Kiebel, 2009; Daunizeau et al., 2012) whereas in associative learning the classic Rescorla–Wagner model

#### *Edited by:*

*Timothy M. Ellmore, The City College of New York, USA*

#### *Reviewed by:*

*Philip R. Corlett, Yale University, USA Ramiro Salas, Baylor College of Medicine, USA Karl J. Friston, University College London, UK*

#### *\*Correspondence:*

*Roberto Limongi, UDP-INECO Foundation Core on Neuroscience, Diego Portales University, Grajales 1898, Santiago 8320000, Chile roberto.limongi@fulbrightmail.org*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 01 May 2015 Accepted: 27 July 2015 Published: 25 August 2015*

#### *Citation:*

*Limongi R, Silva AM and Góngora-Costa B (2015) Temporal prediction errors modulate task-switching performance. Front. Psychol. 6:1185. doi: 10.3389/fpsyg.2015.01185* (Rescorla and Wagner, 1972) calls upon reward PEs to learn the value of stimuli or actions. However, Bayesian models of associative learning also include PEs as a driving variable (Kruschke, 2008; Gershman et al., 2010). More cognitive models also state that PEs drive higher order cognition such as uncertainty-related cognitive control and learning (Carter et al., 1998; O'Reilly et al., 1999; Cohen et al., 2002; Brown and Braver, 2005; Alexander and Brown, 2010, 2011). Moreover, PEs might play a central role in explaining psychotic disorders (Braver et al., 1999; Adams et al., 2012; Bastos-Leite et al., 2015) and intersubject variability in social cognition of patients with brain damage (Limongi et al., 2014).

In our recent neurophysiological study (Limongi et al., 2013), we identified a conjoint effect of uncertainty and PEs as driving and modulatory inputs of brain regions. In dynamic causal models of imaging data, a driving input is modeled as an experimental effect that directly drives the activity of a region whereas a modulatory effect is modeled as a change in the connection strength between two regions (Kahan and Foltynie, 2013). In our previous study, the participants performed temporal predictions with different levels of temporal uncertainty, and we found that temporal uncertainty drove the activity of the right anterior insular cortex (rAI) when the participants predicted the onset time of an event. However, we also found that the temporal PEs negatively modulated the excitatory (as assumed in dynamic causal models) connection strength between neurons of the right anterior cingulate cortex (rACC) and neurons of the rAI. This negative modulatory effect counteracted the driving effect of temporal uncertainty (**Figure 1**).

When we are certain about an event's onset time, we anticipatorily prepare the behavior that we will execute at the

**and temporal PEs in the rACC-rAI coupling as reported by Limongi et al. (2013).** When participants accurately predict an event's onset time, the activity of the rAI increases. Additional insular activation is provided by afferent excitatory projections from the rACC. This extra excitatory effect is dampened by temporal PEs which suggests that when participants fail to accurately predict an event's onset time the performance of an unexpected secondary-task demand decreases. Notice that in DCM endogenous connections are assumed excitatory.

event's onset, we engage *temporal preparation* (Nobre et al., 2007; Fischer et al., 2012; Los, 2013; Rohenkohl et al., 2014). However, if after preparing a behavior we suddenly need to change the planned action with another, this time, "unprepared" behavior (e.g., stepping back from a road crossing when a walk signed changes unexpectedly) we face uncertainty because we are not expecting a task-switching demand. As mentioned, temporal PEs exert a negative modulatory effect on the excitatory cingulateinsular coupling which causes the activity in the rAI to decrease. Therefore, we should expect a PE-driven dampening effect on task-switching performance accuracy (e.g., changing a prepared behavior when a change demand is not expected) when we face perceptual uncertainty.

The above inference, however, is far from conclusive because, on one side, the activity of the rACC is also associated with behavioral contingencies explained in terms of conflict monitoring (Botvinick et al., 1999), cognitive control (Brown, 2008, 2011; Alexander and Brown, 2010), general attention (Carter et al., 1998), attention for learning (Bryden et al., 2011), and response inhibition (Aron and Poldrack, 2006). On the other side, the activity of the rAI has also been associated with behavioral contingencies explained in terms of attention (Eckert et al., 2009; Menon and Uddin, 2010; Nelson et al., 2010), response inhibition (Aron and Poldrack, 2006; Aron et al., 2014; Cai et al., 2014), and other forms of uncertainty (Preuschoff et al., 2008; Schultz et al., 2008; Bossaerts, 2010; Jones et al., 2010, 2011; Sarinopoulos et al., 2010; Payzan-LeNestour and Bossaerts, 2011, 2012; Payzan-LeNestour et al., 2013; Symmonds et al., 2013; Nursimulu and Bossaerts, 2014). In other words, the sole fact that the activity in the rAI is negatively modulated by temporal PEs is not sufficient to conclude that temporal PEs modulate an action update (i.e., a change in a prepared behavior). Otherwise, we would be committing a reverse inference fallacy (Poldrack, 2006, 2011). This neurophysiology-driven hypothesis needs specific behavioral test. In this work, we show that perceptual uncertainty compromises task switching or action selection when subjects have to inhibit a prepotent response and replace it with a new action. We will refer to this as *task switching* and examine the effect of perceptual uncertainty on task switching in terms of performance accuracy. In brief, subjects were required to report a perceptual decision at a particular peristimulus time. We introduced perceptual uncertainty by increasing the delay (i.e., temporal gap) between the perceptual decision and the time of response. Crucially, this was repeated with and without a taskswitching demand during response preparation.

Our hypothesis is based upon predictive coding accounts of sensorimotor integration—and in particular active inference. We hypothesize that increasing perceptual uncertainty (by increasing the temporal gap) would compromise task switching and reduce performance accuracy. Based upon our previous neuroimaging findings, we suppose that this effect would be mediated by an encoding of uncertainty or precision. In brief, we argue that greater temporal gaps (and subsequent uncertainty) would have two consequences. First, there would be an increase in behavioral PEs in terms of the timing of the response. Second, this increased uncertainty or decreased precision would result in a decreased sensitivity of the rAI to ascending PEs. The subsequent reduction

of precise predictions about action selection would reduce task switching and be revealed as a drop in response accuracy—when, and only when, task switching is necessary.

# **Materials and Methods**

# **Participants**

Sixteen right-handed students (five males, *M* age = 22.7 years) signed an informed consent form and participated in the study. The study was conducted fulfilling the ethical principles for medical research involving human subjects comprised in the Declaration of Helsinski and approved by the Ethics Committee of Instituto Pedagógico de Caracas.

# **General Task Description**

The participants had to report whether the color of two balls were the same or different when cued to respond a period of time after the decision was made. This delay or temporal gap was progressively increased to induce uncertainty about when the response would be cued. A trial comprised the appearance of two balls, where one ball moved toward a center ball from the periphery of the screen. After the first ball touched the second ball, the second ball bounced off with a variable temporal gap. Crucially, the balls could switch their colors shortly before touching. This meant that some trials required both the inhibition of the prepotent response to the initial colors and a preparation of a new response.

# **Stimuli and Procedure**

A single trial comprised three events: linguistic cue (2000 ms), fixation point (540 ms), and visual animation (2700 ms). The linguistic cue informed on the magnitude of the temporal gaps ("no delay," 0 ms; "short delay," 150 ms; and "long delay," 300 ms). The fixation point announced the animation's onset. At the animation's onset, two colored balls (1.30° of visual angle in diameter) simultaneously appeared on the left and center of a computer screen. Then, the left-most ball (first ball in **Figure 2**) moved to the center of the screen at a constant speed (17.32 deg/s) until it stopped 900 ms later at the edge of the second ball. After a delay (temporal gap) of 0, 150, or 300 ms, the right-most ball (second ball in **Figure 2**) began moving to the right.

The participants had to press a response key when they predicted the second ball's onset time. They pressed the "S" key if the balls' colors were the same and the "D" key if the colors were different.

Critically, there were three task conditions: change, falsealarm, and no change. Our condition of interest was the change condition, but the false-alarm and the no-change conditions were included as control conditions and to prevent the participants from anticipating a task-switching demand which would improve their performance (Jahfari et al., 2012). Each task condition comprised 33% of the trials. In the change condition, the balls' colors changed at some random time within a time window between 250 and 500 ms after the animation's onset time. For example, if the initial colors were "red" and "red" they changed to "blue" and "white." We will refer to the change time of the balls' colors as the change-signal onset time (CSO). In the falsealarm condition, the balls changed in color, but the relational value remained the same. For example, if the initial colors were "red" and "white" (for the first and second ball respectively) they could change to "yellow" and "blue." Notice that despite this change, the colors' relational value (i.e., different) was the same. In the no-change condition, the balls' colors did not change. Four colors were used (red, white, blue, and yellow). The stimulus delivery program randomly chose the combination of colors. The program also randomly varied the initial positions of the balls in the horizontal axis across trials; however, the initial distance between the balls remained constant across trials. **Figure 2** shows the sequence of events in a single trial.

The experimenter explicitly instructed the participants to press the appropriate key just at the "exact" onset time of the second ball. Eight participants used the index finger to press the "S" key (middle finger to press the "D" key) whereas eight participants used the middle finger to press the "S" key (index finger to press the "D" key). The participants used the same hand in all of the trials. The dependent variables of interest were the response accuracy based upon the balls' relational value and the absolute temporal PEs ( |response time *−* second ball's onset time| ). Regardless of the duration of the temporal gap, the subjects sometimes made predictions before the second ball's onset time (early predictions) and sometimes after the second ball's onset time (late predictions). Young et al. (2005) showed that the absolute value of the temporal PE would better account for the effect of the temporal gaps than the relative (early/late) values. Moreover, we recently found that the absolute value of the temporal PEs better accounts for the neurophysiological effects of temporal gaps estimation than the relative values (Limongi et al., 2013). Notice that the absolute value of the behavioral PEs is related to their squared values. This means that the absolute values can be taken as a proxy for the precision (inverse variance) of behavioral response times.

#### **Design**

We constructed a 3 *×* 3 factorial design: temporal gaps (with three levels: no delay, short delay, and long delay) times tasks (with three levels: change, no change, and false alarms). Each participant performed 450 trials (50 trials/condition) divided into 10 blocks (45 trials/block and five randomly intermixed trials per condition within each block). The participants also performed a familiarization block. Between blocks, a display message encouraged the participants to relax during a short break and to decide when to continue with the experiment. The experimenter provided feedback to the participants only during the familiarization block. The stimulus delivery program was E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA, USA).

#### **Data Analysis**

To verify that the temporal gaps actually produced different PEs beyond random fluctuation, we performed a simple linear-mixed effects regression analysis with all of the valid responses. To make sure that we considered *predictions* of the second ball's onset time rather than *reactions* to the second ball's motion, we excluded late predictions if these were greater than 200 ms. This exclusion criterion yielded 91% of valid responses.We regressed the absolute PEs against the temporal gaps and included the subjects as a random effect.

To specifically test our hypothesis, we fit a series of mixedeffects linear models to the task-switching performance accuracy. More specifically, we defined a model space with four models representing our hypothesis and one additional model representing an alternative hypothesis. All of the models included subjects as a random effect.

First, it is possible that task-switching performance accuracy is disrupted by PEs but not by temporal gaps. Model 1a represented this possibility. It comprised the main effect of task, the main effect of PE, and the Task *×* PEs interaction. PEs were indexed in term of Vincentiles (Balota and Yap, 2011). At a subject level, the distribution of PEs was ordered and divided into 10 Vincentiles. Large Vincentiles represented large PEs.

Second, it is possible that the accuracy in sudden task switching is affected not only by the PEs, but also by temporal gap-induced uncertainty. Model 2a included all of the effects of model 1, the main effect of temporal gap, and the Temporal Gap *×* Task interaction.

Third, although the temporal window comprising the CSO was constant across temporal gaps, the CSO varied with respect to the second ball's onset time. Therefore, it is possible that the CSO also accounts for some proportion of variance (Verbruggen et al., 2008). To model this possible confounding variable, we constructed two additional models (models 1b and 2b) by adding the Task *×* CSO interaction to the effects of models 1a and 2a.

Fourth, it is possible that neither PE nor temporal gap account for the decrease in task-switching performance accuracy. Alternatively, it is possible that only the CSO accounts for this effect. Therefore, model 3 included the main effect of task, the main effect of CSO, and the Task *×* CSO interaction.

To select the best model, we relied upon the models' corrected Akaike information criterion number (AIC*c*) as a measure of the best compromise between generalizability, complexity, and goodness of fit (Myung, 2000; Pitt et al., 2002; Myung and Pitt, 2004; Myung et al., 2009). We also included the relative merits of the different models in terms of their Akaike weights (Wagenmakers and Farrell, 2004; Anderson, 2008). The Akaike weight (*w*) of a model *i* is defined by

$$\omega\_{l} = \frac{e^{\frac{-1\Delta l}{2}}}{\sum\_{r=1}^{r} e^{\frac{-1\Delta r}{2}}} \tag{1}$$

where, ∆*<sup>i</sup>* = *AICc<sup>i</sup> − AICc*min.

Notice that the Akaike weight of a specific model changes depending on the number of competing models (i.e., the model space). Moreover, we complemented the models comparison strategy with traditional *F* tests on the fixed effects.

# **Results**

A simple linear mixed-effects regression model shows that, as expected, the absolute values of the PEs increased as a function of the temporal gap (β = 0.26, SE = 0.01, **Figure 3**), which replicates

previous results (Young et al., 2005; Limongi et al., 2013). The slope (β) indicates that the absolute PE increased 0.26 ms per each millisecond of temporal gap increment.

The models comparison procedure shows that model 2a (AIC*<sup>c</sup>* = 6301) better accounts for the observed effects than the other three models (AIC*<sup>c</sup>* model 1a = 6351, AIC*<sup>c</sup>* model 1b = 6391, AIC*<sup>c</sup>* model 2b = 6340, AIC*<sup>c</sup>* model 3 = 6561). **Figure 4** shows the Akaike weights of the models. Clearly, Model 2a surpasses all of the other models in the defined model space. Therefore, we selected model 2a as the simplest model that best fits the collected data and best generalizes to other data samples. The fixed effects tests confirmed the main effect of task, *F*(2, 6571) = 192.2, *p <* 0.0001; the main effect of temporal gap, *F*(1, 6572) = 39.6, *p <* 0.0001; the main effect of Vincentile, *F*(1, 6573) = 22.91, *p <* 0.0001, the Temporal Gap *×* Task interaction, *F*(2, 6571) = 36.03, *p <* 0.0001; and the Vincentile *×* Task interaction, *F*(2, 6571) = 27.28, *p <* 0.001.

**Figure 5** shows the observed effects and the fitted lines as yielded by the parameters estimates (**Table 1**) of the winning model. The disrupting effect of the PEs on task-switching

performance accuracy is fairly evident. Specifically, the slope of the regression line between Vincentile and color comparison accuracy was steeper for the change condition than for both the non-change and false-alarm conditions, meaning that the task-switching performance accuracy strongly decreased as a function of the temporal PEs. Finally, the parameter estimates also show that the slope of the regression line between temporal gap and the color comparison accuracy was steeper for the change condition than for both control conditions. It is relevant that the effects of PEs and temporal gaps were not collinear as verified by the small variance inflation factors (VIF).

## **Discussion**

In the causality literature, there is a well documented hypothesis stating that temporal contiguity of dynamics events is strongly associated with causal attribution and temporal prediction (Young et al., 2005; Young and Sutherland, 2009). Moreover, anticipatory (i.e., predictive) smooth pursuit eye movements are strongly associated with both temporal contiguity and causal attribution (Badler et al., 2010, 2012). This might suggest an alternative hypothesis on the observed increased in PEs associated with the increase in temporal gaps: Violation of causality rather than temporal uncertainty would induce temporal PEs. Although this alternative hypothesis deserves further studies, the independent contributions of both behavioral PEs and temporal gaps on task-switching performance accuracy are fairly supported by the data, providing behavioral evidence to the neurophysiologically motivated hypothesis that temporal PEs modulate unexpected task-switching performance.

A challenge to cognitive neuroscience is proposing brainbased mechanisms of cognition without appealing to *ad hoc* constructs such as a "central executive" or a "homunculus" (Hazy et al., 2006, 2007; O'Reilly and Frank, 2006). With this challenge in mind, we think that our results are entirely consistent with the predictive coding hypothesis: the experimental manipulation of delay (temporal gaps) induces a delay-specific



*Vincentile and temporal gap values are mean centered.*

encoding of uncertainty and precision. The ensuing reduction in precision explains the increase in behavioral PEs and reduces taskswitching performance accuracy (through a decreased sensitivity to ascending PEs). In other words, in the absence of precise information, the brain relies on its prior beliefs and is more likely to emit prepotent responses. Crucially, the brain knows when sensory information is likely to be imprecise. This computational explanation fits comfortably with the decreased sensitivity of the rAI to ascending connections when stimuli have greater temporal uncertainty or less precision (because precision is thought to be encoded by the gain or postsynaptic sensitivity of neurons encoding PEs). Following, we expand upon this explanation. First, we will introduce general concepts of the predictive coding approach. Second, we will propose a neurophysiological model that would give rise to these behavioral results.

## **Predictive Coding: Free Energy and the Hierarchical Minimization Process of PEs**

The predictive coding theory of brain function defines perception as *exteroceptive* predictions (Adams et al., 2012). A percept is a hypothesis of the sensory data, and the perception process ends with the best hypothesis at hand in terms of Bayes optimal estimates of the sensorium's causes. The mechanism through which the organism reaches this optimal hypothesis comprises the minimization of PEs as a hierarchical process.

A hierarchical minimization process assumes that higher cortical areas (e.g., the prefrontal cortex) sends prediction signals to lower cortical areas and subcortical areas (e.g., primary visual cortex and fronto-basal ganglia circuits). At a given cortical level, the internal neural circuit (i.e., within the six-layer cannonical cortical column) computes a PE. This PE is sent forward to higher levels in the hierarchy (e.g., secondary visual area) to revise higher level representations. These updated representations then reciprocate descending or backward predictions to suppress PEs at the lower level. This process continues at all hierarchical levels until the PE has been minimized throughout the hierarchy (**Figure 6**).

The minimization of PEs gains physiological meaning in terms of free energy minimization. The free energy principle states that an organism tends to change its internal state to minimize free energy (Friston and Stephan, 2007). The free energy principle is congruent with the physiological tendency of an organism to reach equilibrium which is referred to as homeostasis. Therefore, free energy minimization is an adaptive "goal" of an organism while interacting with the environment. Critically, the minimization of the sensory PEs (i.e., perception) is only one mechanism available to this end.

Predictive coding also proposes that "action" or, in general, "behavior" is another way to minimize PEs (Friston et al., 2006, 2010), and, in consequence, free energy. Action commands are no more than *proprioceptive* predictions. Moreover, actions can be understood as being mediated by exactly the same mechanisms as exteroceptive predictions or perceptions (Adams et al., 2013). Histological data support this hypothesis. Specifically, the infragranular layers in the motor cortex and primary sensory neurons (projecting from muscle spindles to the dorsal horn of the spinal cord) comprise prediction units whereas alpha-motor neurons represent proprioceptive PEs units. Both types of proprioceptive predictions are compared in the ventral horn of the spinal cord, resulting in a proprioceptive PE that is minimized via discharges of alpha-motor neurons. Notice that whereas exteroceptive PEs minimization takes place in the granular layers of the cortex, proprioceptive PEs mimization takes place in the ventral horn of the spinal cord via alpha-motor neurons discharges. This histological difference between both systems accounts for the agranular property of the primary motor area (Adams et al., 2013; Shipp et al., 2013).

It follows that in pursuing adaptive homeostatic responses, *optimal* free-energy minimization must comprise explaining away exteroceptive and proprioceptive PEs in coordination. Echopraxia exemplifies this homeostatic need. An organism experiences echopraxia when it simultaneously perceives (observes) and executes an action, meaning, in the context of predictive coding, that exteroceptive and proprioceptive PEs are being simultaneously minimized. To counteract echopraxia (because it is not an adaptive homeostatic response), one type of PEs should not be minimized while the other is being explained away. In the context of active inference, this is exemplified by the dual physiological role of the so-called mirror neurons (Shipp et al., 2013) in the motor cortex. Mirror neurons fire when a primate either executes an action or observes the execution of such action. However, they do not fire when the primate simultaneously executes and observes the action.

In the current paradigm, a reactive response elicited by a taskswitching demand decreases proprioceptive precision (increases uncertainty). This is because large proprioceptive PEs result from the comparison between the highly precise ongoing or prepotent response and the descending predictions of the unprepared behavior. Mechanistically, the reactive response translates into descending predictions originating in the infragranular layer of the primary motor cortex that opposes primary somatosensory signals, resulting in large and imprecise PEs (i.e., uncertainty). We speculate that if this situation occurs when exteroceptive PEs are minimized (e.g., in the 0-ms temporal gap condition), the organism successfully minimizes the proprioceptive PEs, resulting in successful task switching. In contrast, if this situation occurs in the context of large and not minimized PEs (e.g., in the 300-ms temporal gap condition), the organism increases free energy (analog to what happens during episodes of echopraxia) which is not an adaptive homeostatic response. Therefore, a predictive-coding based mechanism explaining how temporal PEs affects task-switching performance accuracy should include the coordination between exteroceptive and proprioceptive PEs minimization.

## **A Predictive Coding Mechanism to Account for the Conjoint Effect of Temporal Gaps and PEs on Behavior-Change Performance**

It is possible that in the presence of a large temporal gap (e.g., long delay) the extereoceptive process would reach a suboptimal state (large PEs without minimization). Triggering an *anticipatorily* prepared action minimizes additional free energy and compensates for this suboptimal state. A neurophysiological mechanism implementing this compensatory (i.e., homeostatic) response must satisfy two conditions. First, it must integrate exteroceptive and proprioceptive minimizations of PEs, which is no more than the integration of perception and action in a simple mechanism. Second, it must include the effects of temporal gaps (i.e., temporal uncertainty) and temporal PEs as driving or modulatory inputs.

The first condition is fulfilled with the fact that once the temporal PEs depart from lower level sensory areas and reach higher level areas such as the rAI, they would affect motor regions. Not coincidentally, the rAI is involved in the processing of both temporal PEs and in the fronto-basal ganglia circuit of motor control (Bolam, 2010) which is critical for successful task switching. The fronto-basal ganglia circuit is engaged during behavior inhibition<sup>1</sup> (Aron et al., 2007). When an organism engages a behavior inhibition, a GO process (i.e., the prepotent behavior) triggered by a GO signal competes against a STOP process triggered by a STOP signal. Each process has a finishing time. Inhibition would be successful if the STOP process reaches its finishing time before the GO process.

Successful inhibition is associated with the activity of either the indirect or the hyperdirect fronto-basal ganglia circuit (Aron et al., 2007). It is relevant that behavior inhibition is an essential stage of task switching (Verbruggen et al., 2008; Verbruggen and Logan, 2009). In a stop-change task (Verbruggen et al., 2008; Verbruggen and Logan, 2009), the organism stops the prepotent behavior (i.e., GO1 behavior) before preparing a second behavior (i.e., GO2 behavior). Therefore, our task-switching paradigm might activate the frontostriatal network. A salient feature in this mechanism is that the rAI shows strong activity when the participant fails to inhibit responses (Cai et al., 2014). Furthermore, the rAI has anatomical connections with the presupplementary motor area and with the striatum which are part of the indirect pathway mediating effective behavior inhibition. Therefore, we suggest that the rAI anatomically connects "exteroceptive-related" circuits with "proprioceptive-related" circuits in a single network (**Figure 6**).

The second condition is fulfilled by the fact that both temporal gaps and temporal PEs might affect the effective connectivity (i.e., how the regions affect to each other) of the fronto-basal ganglia circuit (**Figure 6**). Based on our current data and our previous work (Limongi et al., 2013), we predict that whereas temporal uncertainty (caused by temporal gaps) would drive activity in the rAI, the temporal PEs would modulate the effective connections of the fronto-basal ganglia circuit. If proven true at the neurophysiological level, this mechanism would account for the modulatory effect of temporal

# **References**


PEs on task-switching performance that we found in this work.

### **Conclusion**

In his Principles of Psychology, James (1950) proposed that action follows perception which in modern neuroscience is referred to as the perception and action cycle. As a corollary, an accurate action demands an accurate perception. In consequence, in the temporal domain, the organism privileges temporal perception before engaging an action. Inaccurate temporal perceptions (i.e., temporal predictions) translate into large and not minimized PEs. These errors must be minimized before engaging an action. Therefore, the system must privilege the exteroceptive error minimization over other tasks (i.e., engaging a new and "uprepared" behavior). From a free energy perspective, we could interpret the irreversible (inaccurate) prepotent behavior as a compensation for imprecise perceptual inference. In other words, the brain calls upon precise prior beliefs (prepotent responses) when faced with imprecise sensory information so that to minimize the left-over free energy associated with the suboptimal Bayes estimate of the sensorium's causes (i.e., not minimized exteroceptive PEs).

# **Author Contributions**

RL designed the study, performed the data analysis, and wrote the manuscript. AS designed the study, performed pilot data collection, and critically reviewed the manuscript. BG performed final data collection and critically reviewed the manuscript.

# **Acknowledgment**

We thank the research assistant Francisco J. Perez for providing feedback on the final version of the manuscript.


<sup>1</sup>The action control literature uses the term "response inhibition."


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Limongi, Silva and Góngora-Costa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Stress time-dependently influences the acquisition and retrieval of unrelated information by producing a memory of its own

*Chelsea E. Cadle and Phillip R. Zoladz\**

*Department of Psychology, Sociology, and Criminal Justice, Ohio Northern University, Ada, OH, USA*

Stress induces several temporally guided "waves" of psychobiological responses that differentially influence learning and memory. One way to understand how the temporal dynamics of stress influence these cognitive processes is to consider stress, itself, as a learning experience that influences additional learning and memory. Indeed, research has shown that stress results in electrophysiological and biochemical activity that is remarkably similar to the activity observed as a result of learning. In this review, we will present the idea that when a stressful episode immediately precedes or follows learning, such learning is enhanced because the learned information becomes a part of the stress context and is tagged by the emotional memory being formed. In contrast, when a stressful episode is temporally separated from learning or is experienced prior to retrieval, such learning or memory is impaired because the learning or memory is experienced outside the context of the stress episode or subsequent to a saturation of synaptic plasticity, which renders the retrieval of information improbable. The temporal dynamics of emotional memory formation, along with the neurobiological correlates of the stress response, are discussed to support these hypotheses.

Keywords: stress, long-term potentiation (LTP), hippocampus, amygdala, metaplasticity

# What is Stress?

Stress is experienced during situations that pose a threat to an organism and leads to the activation of two major physiological systems, the sympathetic nervous system (SNS) and the hypothalamuspituitary-adrenal (HPA) axis. SNS activation allows for the immediate fight-or-flight response through rapid release of epinephrine (EPI) and norepinephrine (NE) from the adrenal medulla (Gunnar and Quevedo, 2007). Activation of the HPA axis, on the other hand, leads to a slower response, eventually resulting in the release of corticosteroids from the adrenal cortex (de Kloet et al., 1999; Joels, 2001).

Stress response neurochemicals exert a profound effect on learning and memory by influencing cognitive brain areas, such as the hippocampus, prefrontal cortex (PFC), and amygdala. Both the hippocampus, which is crucial for the formation of declarative and spatial memories (Moser and Moser, 1998; Kaut and Bunsey, 2001; Broadbent et al., 2004; Eichenbaum, 2004; Squire et al., 2004; Broadbent et al., 2006), and the PFC, which is responsible for working memory and higher-order cognitive function (Rowe et al., 2001; Bechara, 2005; Nebel et al., 2005; Muller and Knight, 2006), have a high density of corticosteroid receptors (McEwen et al., 1968, 1969; Mcewen et al., 1970;

#### *Edited by:*

*Timothy Michael Ellmore, The City College of New York, USA*

# *Reviewed by:*

*Almut Hupbach, Lehigh University, USA Peter Serrano, Hunter College, USA*

#### *\*Correspondence:*

*Phillip R. Zoladz, Department of Psychology, Sociology, and Criminal Justice, Ohio Northern University, 525 S. Main Street, Ada, OH 458120, USA p-zoladz@onu.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 01 April 2015 Accepted: 18 June 2015 Published: 30 June 2015*

#### *Citation:*

*Cadle CE and Zoladz PR (2015) Stress time-dependently influences the acquisition and retrieval of unrelated information by producing a memory of its own. Front. Psychol. 6:910. doi: 10.3389/fpsyg.2015.00910* Diorio et al., 1993; McGaugh, 2004), making them highly susceptible to the effects of stress. The amygdala is primarily responsible for the processing of emotional information and serves to exacerbate the stress response by enhancing HPA axis activity (McGaugh, 2004; Roozendaal et al., 2009). Stress differentially impacts learning that is dependent on these brain areas, and when considering the different forms of stress-memory interactions, the most complex appears to be that of stress effects on hippocampus-dependent memory (Zoladz et al., 2014a), which will be the focus of this review.

# Type of stress

Stress effects on learning and memory depend on the type of stressor that is employed. Intrinsic stress is a stressor that is intrinsic to, or a part of, the learning experience, and extrinsic stress is a stressor that is extrinsic to, or outside, the learning experience. In general, intrinsic stress (e.g., emotionally arousing words in a word list, colder water temperature in a Morris water maze) facilitates learning and memory (Sandi et al., 1997; Cahill and McGaugh, 1998). Extrinsic stress (e.g., exposing participants to a stressor and then having them learn a word list, shocking rats and subsequently testing their ability to navigate a maze) effects on learning and memory, on the other hand, are much more complex and can involve enhancement, impairment or no effects on cognition (Joels et al., 2006; Zoladz et al., 2011b, 2014a). In the present review, we will focus on the influence of extrinsic stress on learning and memory. During a stressful, or even traumatic, event (e.g., wartime combat, witnessing a crime), learning that occurs often results in a powerful memory for the stressor, although this may depend on what aspects (i.e., central or peripheral details) about the stressor are tested (see section below). Here, it is our goal to discuss how the physiological changes that occur during the stress impact learning and memory for events that occur subsequent to/prior to stress exposure.

# Stress Effects on Learning and Memory Depend on Stage

Learning and memory can generally be divided into three major stages: encoding, consolidation and retrieval. Encoding involves the acquisition phase, during which information is initially learned. Consolidation is when the learned information is stored in order to be successfully retrieved (the third stage) at a later point in time. Most research, in both humans and rodents, has reported facilitative effects of post-learning stress or corticosteroid administration on long-term memory consolidation (Cahill et al., 2003; Beckner et al., 2006; Hui et al., 2006; Smeets et al., 2008; Preuss and Wolf, 2009) and deleterious effects of stress or corticosteroid administration on long-term memory retrieval (de Quervain et al., 1998; Buss et al., 2004; Kuhlmann et al., 2005a,b; Buchanan et al., 2006; Diamond et al., 2006; Buchanan and Tranel, 2008; Park et al., 2008; Smeets et al., 2008; Tollenaar et al., 2008). The effects of pre-learning stress or corticosteroid administration on encoding have been more inconsistent, with studies revealing long-term memory enhancements, impairments or no effects at all (Kim et al., 2001, 2005; Jelicic et al., 2004; Elzinga et al., 2005; Diamond et al., 2006; Payne et al., 2006, 2007; Nater et al., 2007; Park et al., 2008; Schwabe et al., 2008; Duncko et al., 2009; Zoladz et al., 2011a, 2013, 2014b). Importantly, because it is administered prior to encoding, prelearning stress can affect both the acquisition and storage of information; thus, researchers often assess short-term memory in such studies to infer what stage of information processing is being affected. A representative summary of the research studies that have examined stress effects on hippocampus-dependent learning and memory is illustrated in **Table 1**.

The effects of stress on different stages of learning and memory appear to depend on an interaction between corticosteroid and noradrenergic mechanisms in the amygdala and hippocampus. Inactivation or lesions of the basolateral amygdala (BLA), as well as systemic or intra-BLA/intra-hippocampus administration of β-adrenergic receptor antagonists, have been shown to prevent stress and corticosteroid effects on learning and memory (Kim et al., 2001; Roozendaal et al., 2003, 2004; Kim et al., 2005; Zoladz et al., 2011b). Additionally, the effects of stress are frequently selective for emotionally arousing (i.e., amygdalaactivating) information (Kuhlmann et al., 2005b; Buchanan et al., 2006; Smeets et al., 2008), emphasizing amygdala involvement in the effects. Another contributing factor to stress-memory interactions is the type of information affected. That is to say, stress often exerts differential effects on learning and memory for central and peripheral details. During stress or arousal, attention is narrowed (Easterbrook, 1959), which can hinder one's ability to subsequently learn or remember peripheral aspects of an event or scene. Thus, in some instances, stress can enhance one's memory for the gist, or central aspects, while impairing an individual's ability to recollect finer details (Kensinger, 2004). These findings resonate with additional work showing that stress sometimes facilitates memory for emotional, potentially more important, information, at the cost of memory for neutral, potentially less important, information (Payne et al., 2006, 2007).

# Theoretical Approaches to Stress Effects on Cognition

Over the past several decades, numerous theories have been proposed to account for stress effects on learning and memory. Initially, researchers emphasized the deleterious effects of elevated corticosteroid levels on synaptic plasticity and related them to the effects of stress on learning (Joels and Vreugdenhil, 1998; Conrad et al., 1999). Glucocorticoid receptors (GRs), which have a lower affinity for corticosteroids than mineralocorticoid receptors (MRs), generally only become occupied when corticosteroid levels rise, such as during times of stress. The idea put forth was that moderate GR activity is optimal for cognitive processes, but too much GR activity, such as that which occurs following stress, has negative repercussions for

#### TABLE 1 | Summary of the findings from studies examining acute stress effects on hippocampus-dependent learning and memory.


*(Continued)*

#### TABLE 1 | Continued


*CPT, cold pressor test; HR, heart rate; IA, inhibitory avoidance; LTM, long-term memory (*≥*24 h); MAST, Maastricht Acute Stress Test; STM, short-term memory (<24 h); TSST, Trier Social Stress Test. In some cases, the CPT was the socially evaluated version (SECPT); rat studies are marked with an asterisk (*∗*).*

synaptic plasticity and, therefore, learning. Support for this idea came from studies reporting a curvilinear, U-shaped relationship between corticosteroids and hippocampal synaptic plasticity and learning (Diamond et al., 1992; Andreano and Cahill, 2006), as well as from research showing that extensive GR activity results in excessive calcium influx and negative gene-dependent effects on cellular function (Joels et al., 2003). Combined with work on chronic stress and corticosteroid-hippocampal volume relationships observed in humans with psychological disorders (Campbell et al., 2004; Zoladz and Diamond, 2013), a majority of the research led investigators to conclude that stress generally exerts deleterious effects on hippocampal structure and function.

Over time, a greater appreciation for the complexity of stress-memory interactions arose, as evidence accumulated suggesting that stress could enhance, impair or have no effect on hippocampus-dependent learning and synaptic plasticity. Researchers began showing that corticosteroids not only have delayed, gene-dependent, negative consequences on cellular activity, but can also exert rapid, non-genomic, facilitative effects (Orchinik et al., 1991; Karst et al., 2005). This led to much different theoretical approaches to how stress affects cognition, including an appreciation for the timing of the stress relative to learning or memory, the sex of the organism being investigated, and the type of learning and memory being assessed, to name a few (Joels et al., 2006; Diamond et al., 2007; Wolf, 2009; Joels et al., 2011; Schwabe et al., 2012). Diamond and colleagues, echoing prior theoretical views (Diamond et al., 1990; Shors and Thompson, 1992), put forth another idea – that stress might impair memory by producing a memory of its own (Diamond et al., 2004, 2005). Here, we have extended this view to consider how stress, as a memory formation process, time-dependently affects encoding, consolidation, and retrieval.

# Stress as a Learning Event

For the past several decades, long-term potentiation (LTP) has been studied as a putative physiological mechanism underlying memory formation (Shors and Matzel, 1997; Kim and Yoon, 1998; Diamond et al., 2007; Joels and Krugers, 2007). LTP is a long-lasting enhancement of synaptic efficacy that results from high-frequency stimulation (HFS) of afferent fibers (Hebb, 1949; Marr, 1971; Lomo, 2003) and can be performed *in vitro* (in brain slices), in awake and behaving animals, or in anesthetized animals (Lynch, 2004). *In vitro* setups keep brain tissue functional via artificial cerebrospinal fluid and allow investigators to stimulate and record from populations of neurons. Setups in awake or anesthetized animals involve intracerebral implantation of stimulating and recording electrodes via stereotaxic surgery; these electrodes can subsequently be used to examine LTP induction. Successful memory formation for a learning event is believed to coincide with the strengthening of neural connections and a lasting pattern of altered synaptic weights. However, if multiple LTP-inducing events occur in close proximity, the limited number of available neurons may result in a "ruthless competition" for access to synaptic plasticity production and successful memory formation (Diamond et al., 2004, 2005). In other words, with limited resources, the brain would be forced to prioritize information that is more important.

In an effort to understand the dynamic nature of hippocampus-dependent memory formation, researchers have examined the influence of LTP induction on subsequent hippocampal synaptic plasticity and learning. Application of HFS to afferent fibers has been shown to produce widespread saturation of hippocampal synapses, and the long-lasting alteration of synaptic weights produced by this HFS can lead to an inhibition of subsequent LTP and hippocampus-dependent learning (Huang et al., 1992; Barnes et al., 1994; Moser and Moser, 1998, 1999; Otnaess et al., 1999). This activity-dependent modification of synaptic efficacy has been termed metaplasticity, corresponding to the notion that a prior change in synaptic plasticity can influence the direction and degree of subsequent changes in synaptic plasticity (Abraham and Bear, 1996).

Because we know that prior LTP induction can influence subsequent LTP induction, it stands to reason that the formation of one memory could influence subsequent memory formation. However, research has shown that this tends to occur only when a learning task produces widespread synaptic saturation (extensively reviewed in Diamond et al., 2004). Learning events that produce such a strong memory or change in synaptic plasticity are those that have a strong emotional component and elicit a significant stress response. Accordingly, research has revealed very similar molecular mechanisms underlying stressand LTP-induced changes in hippocampal function (see Huang et al., 2005 for a review). Some of these commonalities include increased early gene induction (Cole et al., 1989; Schreiber et al., 1991; Platenik et al., 2000), increased NMDA and AMPA receptor activity (Tocco et al., 1991, 1992; Kim et al., 1996; Brun et al., 2001), increased levels of neurotrophins [e.g., brain-derived neurotropic factor (BDNF; Gooney and Lynch, 2001; Marmigere et al., 2003)] and increased glutamate and intracellular calcium levels (Sapolsky, 1996; Abraham et al., 1998; Hossmann, 1999; Venero and Borrell, 1999; Joels, 2001; Takahashi et al., 2002; Joels et al., 2003; McEwen and Chattarji, 2004). Additional evidence for shared mechanisms between artificially induced LTP and stress-induced neuroplasticity is research indicating that NMDA receptor antagonists, which impair LTP induction, prevent the effects of stress on subsequent hippocampus-dependent learning and LTP (Kim et al., 1996; Park et al., 2004). In theory, the NMDA receptor antagonists block the stress-memory formation, which allows subsequent hippocampus-dependent learning and LTP to occur. Research has also shown that despite stress impairing subsequent learning and LTP induction, the memory for the stress-inducing event remains intact (Diamond et al., 2004; Zoladz et al., 2010). Together, these findings have provided support for the idea that stress induces an endogenous form of LTP that allows a memory of the stress experience to be formed. Although this is adaptive, because it allows an organism to remember the stress experience, in some cases it can also serve to impair subsequent cognitive processing.

# Stress Effects on Learning and Memory and the Important Role of Timing

Knowing that stress exposure results in the activation of molecular mechanisms that are remarkably similar to those observed as a result of artificially induced synaptic plasticity, we might consider stress, itself, as a memory-producing event. Viewed in this light, stress, and the waves of psychobiological responses that result from such a learning event, can be expected to strongly influence the successful encoding, consolidation, and retrieval of unrelated information (i.e., information not related to the stressor).

Stress effects on the retrieval of previously learned information could be understood as the formation of one memory (i.e., the stress-memory) interfering with the retrieval of another memory. Similar to this line of reasoning, studies have shown that LTP can produce retrograde amnesia for previously learned information (McNaughton et al., 1986; Brun et al., 2001). In theory, the initial learning task (spatial learning in this case) leads to the potentiation of a small subset of synapses, and the preretrieval LTP leads to a complete saturation of synaptic nodes. The all-encompassing wave of plasticity that results from the LTP induction alters the pattern of synaptic weights throughout the hippocampus, resulting in an impaired ability to retrieve the previously formed memory. Consistent with these findings, and as described above, studies examining stress effects on retrieval have found that acute stress exposure that occurs before a memory test leads to impaired memory performance, while preserving the memory that has formed as a result of stress exposure (Diamond et al., 2004; Zoladz et al., 2010). In the referenced studies, rats trained in an inhibitory avoidance task that involved foot shock as an unconditioned stimulus (i.e., a stressor) exhibited impaired spatial memory retrieval, despite retaining the shock-induced fear memory that was formed in the inhibitory avoidance task. This effect was observed when the spatial learning and memory occurred on the same day or when they were separated by 24 h. Importantly, the memory impairment may not result from complete elimination of the original memory. Instead, the stress-induced neuroplasticity may cause an impaired ability to activate the synapses required to retrieve the previously formed memory (Diamond et al., 2004).

As described above, post-learning stress almost always enhances long-term memory, but pre-learning stress effects on long-term memory have resulted in inconsistent findings. The timing of stress relative to learning has been shown to influence both types of effects. Studies in which learning or HFS of afferent fibers occurred immediately before or after stress exposure revealed a significant enhancement of longterm memory or an increase in the duration of hippocampal LTP (reviewed in Diamond et al., 2007). In the same way that stress exposure immediately after a learning event leads to a strong memory formation for both the stress event and unrelated learning event, acute stress exposure occurring immediately before an unrelated learning event typically facilitates memory formation for both events (note that the facilitation can be selective for emotional/neutral or central/peripheral information). Alternatively, when acute stress exposure is temporally separated from a prior or subsequent unrelated learning event, memory formation for that learning event is often impaired (pre-learning stress) or unaffected (postlearning stress).

Based on the seemingly time-dependent effects of stress on hippocampal function, Diamond et al. (2007) developed the temporal dynamics model of emotional memory processing. This model illustrates the biphasic modulation of hippocampal function by stress-induced amygdala activity and is described in **Figure 1**. The first phase encompasses a rapid enhancement of hippocampal plasticity resulting from stress-induced neurochemical interactions in the amygdala and hippocampus. Support for this phase comes from electrophysiological work showing that stimulation of the amygdala immediately prior to HFS in the hippocampus results in strengthened synaptic connections in the hippocampus (Akirav and Richter-Levin, 1999, 2002). Importantly, this amygdala-induced enhancement of hippocampal plasticity depends on both noradrenergic and corticosteroid mechanisms. During this phase, corticosteroids released as a result of stress would be expected to exert rapid, excitatory effects on hippocampal function as a result of nongenomic activity (Karst et al., 2005; Karst and Joels, 2005). The stress-induced facilitation of hippocampal function, however, is short lived and may last only minutes after onset of the stress experience. The second phase of the model represents a refractory period during which the acquisition of new information or LTP induction would be improbable. This refractory period is caused by the desensitization of glutamatergic receptors, which have been over-stimulated by stress-induced glutamate release, and delayed, gene-dependent activity of corticosteroids. Accordingly, in electrophysiological work, when amygdala stimulation and hippocampal HFS are separated in time, the resulting synaptic change is a suppression of hippocampal LTP. Thus, application of tetanic stimulation during the refractory period would likely fail to overcome the newly elevated threshold for LTP induction.

Consistent with the temporal dynamics model, Schwabe et al. (2012) proposed that the rapid stress-induced increase of catecholamine and non-genomic corticosteroid activity puts an organism in a 'memory formation mode,' which results in enhanced memory production for a stressful event and information that is temporally proximal to such an event. However, as the stress continues and/or upon the initiation of gene-dependent corticosteroid activity, a 'memory storage mode' is induced, which impairs cognitive processes that could compete or interfere with the storage of information about the stress event. Both Diamond and Schwabe would likely agree that the temporal dynamics of memory processing subsequent to stress exposure is adaptive, despite the fact that it can result in enhancing *or* deleterious effects on learning and memory. When a stressful experience occurs, it is beneficial to survival for an organism to form a strong memory of that event. Moreover, the suppression of subsequent cognitive processing or memory formation would also be advantageous because it allows the brain to focus on storing the stressrelated memory, without interference from competing cognitive processes.

The temporal dynamics model addresses the functionality of the hippocampus from stress onset to hours after stress exposure. This information can be extended to understand how stress, as a learning experience, influences additional unrelated learning experiences. We have summarized the research studies that have examined the effects of acute stress, administered at different time points, on hippocampus-dependent learning and memory in **Table 1**. Stress-induced facilitation or impairment of unrelated information will be largely determined by the convergence of information from the stress and learning experiences in "time" and "space" (Joels et al., 2006). Convergence of the experiences in "time" relates to the idea of stress and learning occurring in close temporal proximity, whereas the convergence in "space" refers to the two events sharing mutual brain circuits that overlap during the memory formation process. As such, stress will enhance memory when a learning event occurs in the same context as the stress event, or when the two events occur

closely in time. This enhancement results from shared neural circuits simultaneously forming memories for the learning event and stress event. As memory formation for a stress event is characterized by rapid psychobiological responses that allow for strong memory development, the resulting alteration of synaptic plasticity encodes information for both learning experiences. This enhanced consolidation for arousing experiences and learning events that occur in close proximity is adaptive in nature, allotting the highest priority of memory formation to events that code for information relevant to survival.

Alternatively, if a learning event occurs outside of a stress event context, or the events are temporally separated, encoding for this unrelated information would be significantly impaired as the neural circuits necessary for memory formation had already been previously saturated. As discussed, the hippocampus descends into a refractory period shortly following stress onset due to stress-induced synaptic saturation. Although the hippocampus does not

display complete suppression, memory formation will be severely impaired. Much like the rapid facilitation of hippocampal LTP serves adaptive purposes, the refractory period also offers the organism benefits. The first benefit is a protection against increased glutamate exposure, which would eventually lead to neurotoxicity; second, the refractory period offers a short window in time in which the emotional memory experience can reduce corruption by subsequent learning; third, it allows for the consolidation of emotional information acquired in phase one (Diamond et al., 2007).

# Conclusion and Caveats

Initial support for the temporal dynamics model of emotional memory processing came from preclinical work showing that a brief stressor applied immediately before learning could enhance long-term spatial memory in rats (Diamond et al., 2007). If the stressor was separated from the learning by a period of 30 min, however, no memory enhancement was observed. More recently, investigators have extended the work to humans. This research has, for the most part, provided much needed support for the temporal dynamics model in people (Zoladz et al., 2011a, 2013, 2014b,c; Quaedflieg et al., 2013). However, some issues have arisen. One is that the sex of the organism appears to be influential in the types of effects that stressor timing has on learning and memory. Indeed, the temporal dynamics model was originally based on research performed in male, but not female, rodents. Thus, it is perhaps not surprising that work in humans has shown that females can respond very differently to the same stressor. As an example, Zoladz et al. (2013) reported that males, but not females, exhibited an impairment of longterm memory when exposed to a brief stressor 30 min prior to learning. These investigators also showed that stress immediately before learning reduced false memory production in males and females but enhanced true memory in females only (Zoladz et al., 2014c).

One factor that may underlie these observed sex-dependent effects is the modulatory role female sex hormones can exert on physiological mechanisms involved in memory formation. As many studies that have included female participants did not control for phase of the menstrual cycle, levels of estrogen and progesterone, or use of oral contraceptives, the possible interaction that may be occurring between female sex hormones and the time-dependent effects of stress-induced neurochemicals is not well understood. Further research investigating the modulatory role that female sex hormones may be playing in stress effects on learning and memory may offer much needed information as to how the timing of stress differentially influences learning and memory in males versus females.

An additional nuance that is important when considering the temporal dynamics model is the severity of the stressor. For severe stressors, the excitation phase of hippocampal function could be much more short-lived, or in the case of traumatic stressors, non-existent. For milder stressors, it is possible that the excitation phase could last longer. That the temporal dynamics

# References


of stress-induced alterations of hippocampal function could vary from stressor to stressor could relate to individual differences in physiological responses to stress, perceptions of control in times of stress, and what constitutes a stressful event. Numerous studies have shown that some individuals respond strongly to laboratory stressors (defined as "Responders"), while others show little changes in SNS or HPA axis activity (defined as "Non-Responders"). Moreover, it may be useful to consider what type of genetic variations across individuals could make them more or less susceptible to stress-induced changes in amygdala and hippocampal function. For instance, in a recent study, we showed that female carriers of the ADRA2B deletion variant (a genetic alteration that make the noradrenergic system more responsive to stress) were more susceptible to stress-induced enhancements of long-term memory (Zoladz et al., 2014b). If some genetic variants influence susceptibility to stress-induced enhancements of longterm memory, this could lend insight into who is more likely to form an intrusive, traumatic memory following extreme stress.

Finally, it is worth noting that the idea of stress inducing an amygdala-dependent biphasic effect on hippocampal function is largely dependent on electrophysiological work focusing more exclusively on the perforant pathway, which terminates in the dentate gyrus of the hippocampus. Other electrophysiological work has shown that corticosteorids can exert much different effects on different hippocampal subregions, such as CA1 and CA3 (Joels et al., 2009). Therefore, stress-induced amygdala activity, which biphasically influences dentate gyrus LTP, could affect other areas of the hippocampus in a different time-dependent manner.

Clearly, stress can exert differential effects on learning and memory depending on when it is administered and how long it lasts. Although the temporal dynamics notion may be an oversimplification of an overly complex area of research, it provides a useful guide for understanding how stress timedependently influences learning and its neurobiological basis. Future work is necessary to clarify how timing interacts with stress effects on memory and how sex and individual differences can influence these effects.


hippocampal glutamate transmission by corticosterone. *Proc. Natl. Acad. Sci. U.S.A.* 102, 19204–19207. doi: 10.1073/pnas.0507572102


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Cadle and Zoladz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Temporal dynamics of task switching and abstract-concept learning in pigeons

#### *Thomas A. Daniel1, Robert G. Cook2 and Jeffrey S. Katz1\**

*<sup>1</sup> Comparative Cognition Laboratory, Department of Psychology, Auburn University, Auburn, AL, USA, <sup>2</sup> Avian Cognition Laboratory, Department of Psychology, Tufts University, Medford, MA, USA*

The current study examined whether pigeons could learn to use abstract concepts as the basis for conditionally switching behavior as a function of time. Using a mid-session reversal task, experienced pigeons were trained to switch from matching-to-sample (MTS) to non-matching-to-sample (NMTS) conditional discriminations within a session. One group had prior training with MTS, while the other had prior training with NMTS. Over training, stimulus set size was progressively doubled from 3 to 6 to 12 stimuli to promote abstract concept development. Prior experience had an effect on the initial learning at each of the set sizes but by the end of training there were no group differences, as both groups showed similar within-session linear matching functions. After acquiring the 12-item set, abstract-concept learning was tested by placing novel stimuli at the beginning and end of a test session. Prior matching and non-matching experience affected transfer behavior. The matching experienced group transferred to novel stimuli in both the matching and non-matching portion of the sessions using a matching rule. The non-matching experienced group transferred to novel stimuli in both portions of the session using a non-matching rule. The representations used as the basis for mid-session reversal of the conditional discrimination behaviors and subsequent transfer behavior appears to have different temporal sources. The implications for the flexibility and organization of complex behaviors are considered.

#### Keywords: matching, non-matching, behavioral flexibility, concept learning, relational rule, reversal, pigeon

# Introduction

For any goal-directed behavior, an animal must selectively attend to the relevant cues in an environment while simultaneously ignoring irrelevant cues. An animal's adaptability to these cues is known as behavioral flexibility (Aston-Jones et al., 1999; Shettleworth, 2009). An animal with high behavioral flexibility can readily switch between different relevant cues based on changes in the environment. Behavioral flexibility has been correlated with intelligence, and species that display high behavioral flexibility are on average considered to be more intelligent than those with low behavioral flexibility (Bitterman, 1965; Reader and Laland, 2002; Lefebvre et al., 2004; Roth and Dicke, 2005; Shettleworth, 2009; Reader et al., 2011).

The mid-session reversal procedure requires such behavioral flexibility because the relevance of available cues in the task dynamically changes with time, as the reinforcement contingencies are reversed in the middle of a session. For example, selecting the green icon in a simultaneous task

#### *Edited by:*

*Timothy Michael Ellmore, The City College of New York, USA*

#### *Reviewed by:*

*Neil McMillan, University of Alberta, Canada Jessica Stagner Bodily, University of Florida, USA*

#### *\*Correspondence:*

*Jeffrey S. Katz, Comparative Cognition Laboratory, Department of Psychology, Auburn University, 226 Thach Hall, Auburn, AL 36830, USA katzjef@auburn.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 25 June 2015 Accepted: 19 August 2015 Published: 02 September 2015*

#### *Citation:*

*Daniel TA, Cook RG and Katz JS (2015) Temporal dynamics of task switching and abstract-concept learning in pigeons. Front. Psychol. 6:1334. doi: 10.3389/fpsyg.2015.01334* is reinforced for the first half of a session, and then halfway through the session selecting the red icon is reinforced. This midsession reversal thus requires the subject to adapt its behaviors flexibly within a session to respond optimally.

Cook and Rosen (2010) found that pigeons did not behave optimally in a mid-session reversal task. In their two-alternative conditional choice task, the first half of a session was a matchingto-sample (MTS) task and the second half of the session used a non-matching-to-sample (NMTS) task. Both tasks used the same red and cyan circles as stimuli. In the MTS task, for example, the pigeon was presented with a sample stimulus (e.g., red circle), completed an observing response, and then was presented with two choice stimuli (e.g., red circle and cyan circle) that were equidistant to the left and right of the sample. The correct response was to choose the comparison stimulus that matched the sample. NMTS was identical to MTS, except the correct response was to choose the comparison stimulus that did not match the sample. Thus, during the first half of a session the pigeons had to learn "if cyan circle then peck cyan circle" and in the second half of a session the pigeons had to learn "if cyan circle then peck red circle" (and with corresponding rules for red samples). Hence, to perform well, pigeons had to learn to switch at the midpoint of a session MTS to NMTS behaviors. If pigeons performed optimally they should have learned to respond exclusively based on matching rules during the first half of the session then switched to non-matching rules at the midpoint of the session. Pigeons did not respond optimally. Cook and Rosen (2010) found that before the reversal, pigeons began to anticipate the NMTS contingency and switched to pecking the incorrect comparison prematurely. Likewise, after the reversal, the pigeons perseverated on the formerly correct comparison for too long. This effect has been replicated in a number of subsequent studies testing simpler discriminations and other species (Rayburn-Reeves et al., 2013a,b; McMillan et al., 2014, 2015; McMillan and Roberts, 2015). According to Cook and Rosen (2010), these errors occurred because pigeon behavior was controlled less by the outcome of the previous trial than by internal temporal factors (i.e., time). The anticipatory and perseverative errors occurred on a learned time course, with switching behavior mediated by temporal cues rather than the quantity of trials or the outcomes of previous trials. This temporal control over responding has been confirmed by studies also manipulating time-related factors such as the inter-trial interval (McMillan and Roberts, 2012; Rayburn-Reeves et al., 2013a) or when the reversal points vary from session to session rather than remain fixed (Rayburn-Reeves et al., 2013b).

In the present study we were interested in testing whether temporal factors would similarly control switching between relational rules in a mid-session reversal task. In all previous studies, only Cook and Rosen (2010) involved conditional discriminations where relational rules could have been learned. Given that the training set of stimuli involved only two items, it is likely pigeons learned via item-specific rules. It is well established that such small training sets generally result in item-specific learning (Katz et al., 2007). To generate relational learning, so that the relationship between the items is learned (i.e., abstractconcept learning: "peck the matching stimulus" or "peck the non-matching stimulus"), it is best to use the largest training sets possible. To promote the formation of relational rules in the present task, we took advantage of two factors. The first is that we tested the animals with larger set sizes that have previously supported relational transfer. Second we used two groups of pigeons that had previously demonstrated full-concept learning in either MTS or NMTS using such larger training sets. We will refer to these two groups as the matching-concept group (MCG) and non-matching-concept group(NMCG) based on this prior experience. The MCG was trained in MTS and eventually demonstrated full concept learning (Bodily et al., 2008). They were initially trained with a small set size of three stimuli and showed no transfer to novel stimuli. These pigeons then had their training set systematically doubled eight times to a final set size of 768 stimuli. After reaching a performance criterion at each set size they were tested with trial-unique novel stimuli. Transfer performance increased at the smallest set size from chance (50%) to equivalent to baseline and over 80% by then end of set-size expansion. Such transfer constitutes full abstract-concept learning (Katz et al., 2007). The NMCG was trained exactly like the MCG including the same stimuli, sessions, and apparatus. The only procedural difference between the groups was that the NMCG was rewarded for pecking the non-matching, different comparison stimulus, whereas the MCG was rewarded for pecking the matching comparison stimulus. Similar to the MCG, the NMCG demonstrated full concept learning by the end of set-size expansion (Daniel et al., 2015).

These two groups of pigeons served as subjects in the present experiment. The first half of each session used a MTS task, and the second half used an NMTS task. Implementing a setsize expansion procedure, the initial training set size consisted of three stimuli, and this was progressively doubled to 6, then 12 total stimuli. After acquiring the mid-session reversal contingency with the 12-item set, abstract-concept learning was tested by placing novel stimuli at the beginning and end of test sessions. Several questions were of interest: first, previous midsession reversals used 2 or 3 stimuli, and the effect of additional stimuli (i.e., 6 or 12) is still unknown. Cook and Rosen (2010) attempted to train a three-alternative conditional discrimination, but found that learning all three possible outcomes to each sample was very difficult within the same sessions (eventually they learn two). Second, the impact of prior concept learning (i.e., MTS or NMTS) has yet to be tested in mid-session reversals. We anticipated that the matching and NMCGs would show an early advantage in their respective MTS- and NMTS- half of the mid-session reversal, but that the advantage would disappear over training. Third, it is unclear to what level pigeons could learn the mid-session reversal task with the increasing number of competing contingencies. That is, if the pigeons learned item-specific rules, then the number of competing contingencies increases as the set size expands. Finally, it is unknown if pigeons can apply relational learning in a mid-session reversal, particularly with the expanded set size. We anticipated only partial concept learning given previous results with set sizes of 12 stimuli (Katz et al., 2010). Nonetheless, such conditions, along with the birds' previous experience, should be sufficient to see if pigeons could show flexibly switching using matching and non-matching-concepts within the same session.

# Materials and Methods

#### Subjects

Six male pigeons (*Columba livia*) from the Palmetto Pigeon Plant served as subjects. Subjects were maintained within 80– 85% of their free-feeding body weight throughout the study; in the event that a subject's weight fell above or below this range for the day, it did not participate in that day's session. Subjects resided in a colony room governed by a 12 h light/dark cycle and were housed individually with free water and grit access. All subjects previously demonstrated full transfer in MTS (Bodily et al., 2008) or in NMTS (Daniel et al., 2015). The three subjects that learned the *matching* concept will hereby be referred to as the MCG, and the remaining three subjects that learned the *non-matching-concept* will be referred to as the NMCG.

#### Apparatus

Pigeons were tested using custom wood (35.9-cm wide × 45.7 cm deep × 51.4-cm high) test chambers. A fan (Dayton 5C115A, Niles, IL, USA) located in the back wall of each chamber provided ventilation and white noise. The computer detected pecks via an infrared touch screen (17" Unitouch, Carroll Touch, Round Rock, TX, USA). This pressure-fit touch screen sat within a 40.6-cm × 32.1-cm cutout in the front panel that was centered 7.7 cm from the top of an operant chamber. A 28-V (No. 1829, Chicago Miniature, Hackensack, NJ, USA) houselight, located in the center of the ceiling, illuminated the chamber during intertrial intervals (ITI). A custom hopper containing mixed grain could be accessed through an opening (5.1 cm × 5.7 cm) centered in the front panel 3.8-cm above the chamber floor.

Custom software written with Visual Basic 6.0 on a Dell Optiplex GX110 recorded and controlled all events in the operant chamber. A video card controlled graphics generated by the computer while a computer-controlled relay interface (Model no. PI0-12, Metrabyte, Taunton, MA, USA) maintained operation of the grain hopper and the lights to both the hopper and the chamber.

#### Stimuli

Visual stimuli were computer-created, color cartoon JPEG images that were 2.5-cm high × 3-cm wide at 28 pixel/cm (cf. Katz et al., 2008, **Figure 2**). All stimuli used were of similar size and shape but were distinguishable from one another. Each sample stimulus and comparison stimulus appeared at approximately 8 cm above the bottom of the monitor directly above the grain hopper. The center of the left and right comparison stimuli appeared 8.5 cm from the center of the sample.

#### Training

Pigeons were initially trained with a set size of three stimuli (apple, duck, and grape). Trials began with a round white circle displayed on the monitor (in the same position as the sample) as a ready signal. Once pigeons pecked the white circle once, the sample stimulus appeared. Pigeons pecked the sample 20–25 times (randomly selected); this pecking requirement began with one peck but was systematically increased over approximately eight sessions to 20–25 pecks. After pigeons completed the response requirement, two comparison stimuli were presented: one comparison stimulus matched the sample and the other did not. Daily sessions were conducted 5–7 days a week, with each session comprised of 96 trials. In the first half of the session (i.e., trials 1–48), a response to the matching comparison resulted in grain reinforcement. In the second half of the session (i.e., trials 49–96), a response to the non-matching comparison resulted in grain reinforcement. Grain access was between 2 and 3.5 s of mixed grain depending on the pigeon's body weight prior to the session. An incorrect choice response resulted in an unlit 5-s timeout. All trials were followed by a 3-s ITI whether the response was correct or incorrect. With a set size of three stimuli, there were 12 possible combinations. Each combination appeared 8 times per 96-trial session. Stimuli were pseudorandomized to ensure that a combination would not directly repeat on the next trial. Correct response locations (left or right) were counterbalanced so that an equal number of correct left and right responses occurred in any given session.

A correction procedure required subjects to repeat any incorrect trials until a correct response was made, but only the first response to each trial was counted and computed for accuracy. The correction procedure was used at the onset of training and for maintenance when a side-bias developed toward left or right comparisons. Training continued until a pigeon reached above a mean 65% accuracy during the first and last eight trials of a session across 10 consecutive sessions without correction procedure, or until they experienced at least 100 sessions of training. Only sessions without correction procedure were analyzed. This performance-based criterion was created to ensure that pigeons were reliably matching and non-matching above chance before advancing to larger training sets.

#### Set-Size Expansion

Once the performance-based criterion was reached, an equal number of new training stimuli were added to the previous training set. The number of images used in training increased from 3 to 6, and then to 12. Each training set retained the stimuli from the previous training set. For each session, sample and comparison stimuli were pseudorandomly assigned from the stimulus set. Each session consisted of 96 trials counterbalanced for left/right-correct. After pigeons reached criterial performance at the 12-item set size transfer testing began on the next session.

#### Transfer Testing

Transfer testing was comprised of eight consecutive sessions without correction procedure. Each testing session contained 96 trials (88 baseline and 8 transfer trials). Within each testing session, two transfer trials occurred within four blocks of eighttrial blocks (Trials 1–8, 9–16, 81–88, 89–96). The first and last two blocks of a session were used to capture the highest level of matching and non-matching performance in the task. Left and right correct responses were counterbalanced within each eight-trial block. All transfer trials were novel, so each cartoon image was only used once during transfer testing. Hence, with each configuration composed of two different images, there were 16 novel cartoon images in each testing session for a total of 128 (16 stimuli × 8 sessions) novel cartoon images. Responses on transfer trials were reinforced identically to baseline trials. A correction procedure was not used at any point during transfer testing. Baseline trials were also counterbalanced for left and right responses using the same specifications as those during training.

# Results

#### Acquisition

The first 10 sessions of acquisition for all set-sizes are shown in the top panels of **Figure 1**. In **Figure 1**, the percentage matching choices are plotted for the matching-concept (filled circles), and non-matching-concept (open circles) group. High values indicate matching behavior while low values indicate nonmatching behavior. For all set-sizes, both groups' initial choice behavior was mostly stable across a session. This result shows the pigeons were not strongly task-switching as a function of the mid-session reversal. Instead, early acquisition was often characterized by pigeons making many choice errors across the session with little or no savings from training with the previous set-sizes. Separate two-way repeated-measure ANOVA of Group (matching, non-matching) × 8-Trial Block (1 – 12) for each set size revealed main effects of Group, *F*s(1,2) *>* 27.2, *p*s *<* 0.05, η<sup>2</sup> ps *>* 0.96. This suggests that prior conceptual training of each group had a significant effect on behavior at the start of each set-size acquisition, with the MCG tending to match and the NMCG tending to non-match. These group differences may have emerged because the NMCG had a shorter break in testing between the end of the prior experiment and the start of the present experiment than the MCG (*>*1 year).

Also, the group differences may have emerged because MCG had responses to their experience-congruent concept reinforced for the first portion of the session, while NMCG had to make experience-incongruent concept responses in the first half of the session and then switch to experience-congruent responses in the second portion. If we had tested non-matching in the first portion and matching in the second half of the session the acquisition results may have been different. Main effects of 8-Trial Block were found only for the 6-item and 12-item training sets, *F*s(11,22) *>* 12.8, *p*s *<* 0.01, η<sup>2</sup> ps *>* 0.87, indicating that pigeons were starting to engage in task-switching even within the first 10 sessions of acquisition as they become more familiar with the task. An interaction between the two factors was found only at the 12-item set, *F*(11,22) = 3.1, *p <* 0.05, η2 <sup>p</sup> = 0.61, due to the MCG engaging in task-switching, while the NMCG did not. This was confirmed by examining the slope of the percent matching function across a session, with only the MCG significantly different from 0 (MCG: *M* = −3.2; NMCG: *M* = −1.76).

By the last 10 sessions of acquisition, all set-sizes supported good mid-session reversal for each group. This is shown in the bottom three panels of **Figure 1**. Pigeons were able to learn to conditionally match or non-match across a session (3-item = 41 sessions, 6-item = 52 sessions, 12-item = 58 sessions). One pigeon from each group failed to reach criterial performance at the 12-item training set. Thus, unlike the top panels, the bottom panels show a functional transition between matching and non-matching behavior, and that pigeons were engaged in both the anticipatory and perseverative errors common in mid-session reversal (cf. Cook and Rosen, 2010). For all setsizes, pigeons began the session matching and then linearly reverse responding to non-matching, as indicated by the negative linear functions for both groups at all set sizes. Separate two-way repeated-measure ANOVA of Group (matching, nonmatching) × 8-Trial Block (1 – 12) for each set size found no main effects of Group, but did reveal main effects of 8-Trial Block for all set-sizes *F*s(11,22) *>* 26.2, *p*s *<* 0.01,

η2 ps *>* 0.96. Subsequent trend analyses show that percent matching decreased linearly across the session, confirming that the pigeons engaged in task-switching behaviors across all setsizes, *F*s(1,5) *>* 210.2, *p*s *<* 0.01, η<sup>2</sup> ps *>* 0.99. Thus, prior learning (i.e., MCG or NMCG) did not have a significant impact on their terminal reversal behavior once the pigeons reached criterial performance.

#### Transfer

Over transfer testing, all pigeons maintained the linear decrease of matching performance within a session found during task switching. The pigeons readily transferred to novel stimuli. In regard to relational learning, the MCG applied a matching relational rule when presented with novel stimuli during the matching *and* non-matching halves of the task switching procedure. In contrast, the NMCG applied a non-matching relational rule when presented with novel stimuli during the non-matching *and* matching halves of the task switching procedure.

**Figure 2** shows mean percent matching across 8-Trial Blocks for baseline (filled circles) and transfer (unfilled circles) trials for the MCG on the left and the NMCG on the right. With trained stimuli, both groups (NMCG: *M* = −5.2, MCG: *M* = −4.8) continued to show a decrease in percent matching across Trial Blocks. The MCG transferred to novel trials in the first two Trial Blocks when required to match, but when required to nonmatch for the last two Trial Blocks they did not. The opposite pattern of behavior was shown for the NMCG. That is, the NMCG transferred to novel trials in the last two Trial Blocks when required to non-match, but when required to match for the first two Trial Blocks they did not. These results were confirmed by a three-way interaction, *<sup>F</sup>*(3,3) <sup>=</sup> 16.2, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.94, from a three-way repeated-measures ANOVA of Group (matching, non-matching) × Trial-Type (baseline, transfer) × 8-Trial Block (first, second, eleventh, twelfth) on percent matching. To analyze each pigeon's transfer during matching, we compared the 32 transfer trials to the mean accuracy of the baseline trials from

the first two and last two 8-Trial Blocks from the eight transfer sessions using one-sample *t*-tests. At the first two 8-Trial Blocks, for the two matching-concept pigeons, transfer was equal to baseline (L822, *p* = 1, S8288, *p* = 0.09) and for the two nonmatching-concept pigeons, transfer was lower than baseline [D7, *t*(31) = 3.4, *p <* 0.01; L5, *t*(31) = 5.8, *p <* 0.01]. At the last two 8-Trial Blocks, for the two matching-concept pigeons, transfer was higher than baseline [L822, *t*(31) = 3.5, *p <* 0.01; S8288 *t*(31) = 5.4, *p <* 0.01] and for the two non-matching-concept pigeons, transfer was equivalent to baseline for one pigeon (D7, *p* = 1) and lower than baseline for one pigeon [L5, *t*(31) = 5, *p <* 0.01].

# Discussion

The present experiment shows that pigeons can learn a midsession reversal involving conditional discriminations with set sizes up to 12 stimuli. With trained stimuli, MTS and NMTS discrimination was bound to the temporal cues of each session, replicating the main finding from Cook and Rosen (2010) showing a modulation of task switching over a session. The pigeons made errors of anticipation before the reversal and errors of perseveration after the reversal at all set sizes. Unlike with normal expansion of set sizes, however, the pigeons show little savings across successive acquisitions of these tasks with the added stimuli. Thus, based on acquisition data alone, the pigeons showed no evidence of relational rule use within each MTS and NMTS portion of a session. When tested for relational rule use with novel stimuli, pigeons reverted back in all tests to their prior MTS or NMTS rules learned prior to the mid-session reversal training. Thus, the MCG applied the abstract matching rule to all novel stimuli, regardless of when the novel stimuli were presented in the session. The NMCG applied the abstract non-matching rule to all novel stimuli. These results suggest differences in the temporal dynamics of how trained and untrained stimuli are processed by the pigeons. One effect of training is that it binds familiar stimuli to the different and competing MTS and NMTS behaviors temporally required across a session. Novel stimuli, in contrast, are not bound to time in this way and as a result, the pigeons use their previous learned rules to respond to them.

As a consequence, it appears the pigeons learned the present conditional discrimination mid-session reversal task by learning item-specific rules that were bound to the session's time-course. For example, pigeons learned rules such as "if grape is the sample, then peck the grape in the first half of the session," and "if grape is the sample, then peck the apple or duck in the second half of the session." This learning can be contrasted with if the pigeons had learned separate abstract concepts bound to the first and second half of a session. Such relational rules would have been "if it is the first half of the session, peck the matching picture" and "if it is the second half of the session, peck the non-matching picture."

This focus on item-specific, rather than conceptual, learning may have stemmed from the need to deal with the competing behaviors required to the shared stimuli of each portion of the session. It certainly would explain why adding more exemplars to the task did not benefit learning as established previously (Bodily et al., 2008; Daniel et al., 2015). Every expansion of the set size presented a new challenge and apparently a need for new learning by the birds. This issue was in part responsible for why we tested concept learning with novel stimuli after finishing our expansion to a set size of 12. It was not clear to us that a 24-item set size would be manageable, at least not without extensive additional training. Thus, the pigeons reacted differently to set-size expansion than that found in the previous successful concept studies. Using the same three stimuli (duck, apple, grape) pigeons have acquired MTS and NMTS in less than 11 sessions (Bodily et al., 2008; Daniel et al., 2015), a rate fourfold less than the pigeons in the present experiment. In addition, when the set size was expanded to 6 and 12 stimuli acquisition decreased relative to the initial acquisition with three stimuli in both MTS and NMTS. In contrast, in mid-session reversal learning acquisition increased with expansion further indicating item-specific learning.

The difficulty of this item-specific learning may also be responsible for the highly linear switching function seen in this experiment. Switching between the matching and nonmatching discrimination within the same session was always quite gradual for the pigeons. Previous switching functions found in mid-session reversal with simpler discrimination contingences are typically not linear, with extended periods of good performance at the beginning and end of sessions. Perhaps because the many individual items, were spread out over the whole session, binding them to specific portions of the session was difficult, resulting in the observed linear switching function. If rule-based concepts had been learned, a more marked sigmoidal switching function would have been strongly expected.

The absence of concept learning across the mid-session reversal task was further evidenced by the lack of novel stimulus transfer across a session. Instead of conditional transfer depending on temporal location, the groups performed quite differently during transfer. Both groups reverted back to their prior matching and non-matching-concept learning experience to discriminate these novel stimuli. The MCG applied the matching concept to all novel stimuli, regardless of whether it was presented before or after the mid-session reversal point. In an identical manner, the NMCG applied the non-matchingconcept to all novel stimuli. Thus, when untrained stimuli were introduced, they were unbound from the session's time course needed to support item-specific learning. Without such temporal cues, the pigeons then relied on their previous matching and non-matching-concept learning experience to solve the trial.

The latter reversion back to previously learned rules may best be explained within a framework similar to behavioral renewal (Bouton, 2002, 2004). Behavioral renewal occurs when an animal learns a behavior in one context, is given a second context where that original behavior is extinguished, followed by a return to the first context. If, after extinction, the animal behaves in accord with the first context, the previous behavior is "renewed" (Bouton, 2002). In our experiment, both groups were trained in an abstract-concept learning task, serving as the first context. Then, these pigeons were trained in a mid-session reversal task, serving as the second context. When novel stimuli appeared in transfer trials, it created an ambiguous context since they had no reversal history to bind them to time within a session, and thus pigeons relied on the previous learning of their first context (i.e., abstract-concept learning). Because no item-specific rules had yet been formed for these new stimuli, pigeons responded to these trials like transfer trials from their first context and applied the abstract relational matching or non-matchingconcept.

In the future, it will be of interest to see how naïve pigeons trained with large set sizes that have not learned a prior abstract matching or non-matching-concept discriminate novel stimuli across the different portions of a mid-session reversal. Such naïve pigeons would not have a prior behavior to "renew" when given an ambiguous context (i.e., a transfer trial). They may fail to transfer completely. This would suggest that competing relational rules (matching and non-matching) may be very hard to learn within the same session. Alternatively, they may transfer based on the configuration's placement in the session. If so, it would suggest that abstract concepts can be differentially bound to the context of the within session time course.

In summary, these results add to the literature demonstrating the behavioral and cognitive flexibility of pigeons (Cook, 2001; Shettleworth, 2009; Cook and Rosen, 2010; Qadri and Cook, 2015). Our results indicate that the dynamics in pigeons for a mid-session reversal task are bound to the temporal time

### References


course within a daily session for trained stimuli. When stimuli have no prior association with the temporal dynamics within the mid-session reversal task, pigeons rely on an abstract concept that is unbound from time and perhaps had temporal priority due to renewal-like processes. These results warrant further investigation: what impact does previous experience have on transfer? Would this renewal effect remain using a different procedure or multiple reversal points (McMillan et al., 2015)? More information is needed to understand these mechanisms, and pigeons serve as an excellent model to understand the comparative processes that underlie the temporal dynamics of discrimination learning and concept formation.

# Ethical Standards

This experiment complied with current United States law and following the relevant ethical guidelines for animal research (IACUC approved and conducted in AAALAC approved facilities).

## Acknowledgments

We wish to thank Lauren Goff, Adam M. Goodman, John F. Magnotti, and Andrea M. Thompkins for their careful assistance in conducting this experiment.


Reader, S. M., Hager, Y., and Laland, K. N. (2011). The evolution of primate general and cultural intelligence. *Philos. Trans. R. Soc. B Biol. Sci.* 366, 1017–1027. doi: 10.1098/rstb.2010.0342

Reader, S. M., and Laland, K. N. (2002). Social intelligence, innovation, and enhanced brain size in primates. *Proc. Natl. Acad. Sci. U.S.A.* 99, 4436–4441. doi: 10.1073/pnas.062041299


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Daniel, Cook and Katz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Do long delay conditioned stimuli develop inhibitory properties?

*Martha Escobar1,2\*, W. T. Suits3, Elizabeth J. Rahn4 and Francisco Arcediano1,2*

*<sup>1</sup> Department of Psychology, Auburn University, Auburnm, AL, USA, <sup>2</sup> Department of Psychology, Oakland University, Rochester, MI, USA, <sup>3</sup> Department of Psychology, Seminole State College, Sanford, FL, USA, <sup>4</sup> Department of Neurobiology, Evelyn F. McKnight Brain Institute, University of Alabama at Birmingham, Birmingham, AL, USA*

In long-delay conditioning, a long conditioned stimulus (CS) is paired in its final segments with an unconditioned stimulus. With sufficient training, this procedure usually results in conditioned responding being delayed until the final segment of the CS, a pattern of responding known as inhibition of delay. However, there have been no systematic investigations of the associative structure of long delay conditioning, and whether the initial segment of a long delay CS actually becomes inhibitory is debatable. In an appetitive preparation with rat subjects, the initial segment of long delay CS A passed a retardation (Experiment 1a) but not a summation (Experiment 1b) test for conditioned inhibition. Furthermore, retardation was observed only if long delay conditioning and retardation training occurred in the same context (Experiment 2). Thus, the initial segment of a long delay CS appears to share more characteristics with a latent inhibitor than a conditioned inhibitor. Componential theories of conditioning appear best suited to account for these results.

#### *Edited by:*

*John Magnotti, Baylor College of Medicine, USA*

#### *Reviewed by:*

*Bruce L. Brown, City University of New York, USA Edgar Harry Vogel, Universidad de Talca, Chile*

> *\*Correspondence: Martha Escobar marthaescobar@oakland.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 12 August 2015 Accepted: 05 October 2015 Published: 23 October 2015*

#### *Citation:*

*Escobar M, Suits WT, Rahn EJ and Arcediano F (2015) Do long delay conditioned stimuli develop inhibitory properties? Front. Psychol. 6:1606. doi: 10.3389/fpsyg.2015.01606* Keywords: inhibition, inhibition of delay, long-delay conditioning, timing, conditioned inhibition, latent inhibition

# INTRODUCTION

In general terms, a conditioned stimulus (CS) can have one of three relationships to the unconditioned stimulus (US): The CS may provide no information about the US occurrence (neutrality), it may provide information about US delivery (excitation), or it may provide information about US omission (inhibition). Labeling a stimulus as a 'CS' suggests that the stimulus as a whole becomes conditioned (i.e., produces a CR). However, there seem to be instances in which the stimulus as a whole does not elicit a CR. For example, long stimuli trained to signal US delivery in their final segment (so-called long delay conditioning training) usually elicit little conditioned responding during their initial segment. This attenuated responding is observed even if they elicit robust conditioned responding during their final segment. That is, unlike other forms of conditioning, in long delay conditioning the latency of the CR increases rather than decreases as training progresses (but see, e.g., Schneiderman and Gormezano, 1964; Gormezano, 1972). Pavlov (1927) coined the term *inhibition of delay* to describe this pattern of behavior and to reflect his assumption that inhibitory processes operated during long delay conditioning and resulted in delaying of the conditioned response until the moment in which the US occurred. This inhibitory process was disrupted if a novel stimulus was presented simultaneously with the long delay CS, which resulted in an immediate elicitation of the conditioned response (i.e., disinhibition). Supporting Pavlov's assumptions, Rescorla (1967) provided evidence of inhibition to the initial segment of a long delay CS using a summation test in a fear conditioning preparation. In his preparation, the initial segment of the CS attenuated fear of the highly excitatory experimental context. Unfortunately, both Pavlov's (1927) and Rescorla's (1967) results can be explained without invoking the construct of inhibition. For example, Pavlov's observation of disinhibition could simply reflect generalization decrement stemming from presentation of a novel stimulus together with the long delay CS. Similarly, Rescorla's observation of summation may be due to non-specific disruption of responding to the context (i.e., external inhibition; cf. Pavlov, 1927), or generalization decrement of responding to the context due to the addition of the long-delay CS.

A potential way to circumvent the problems identified with Pavlov's (1927) and Rescorla's (1967) identification of the initial segment of a long delay CS as inhibitory is to implement the so-called *two-test strategy* (cf. Rescorla, 1969), which is based on using both summation and retardation tests to determine whether a stimulus is inhibitory. In a summation test, a target (putative inhibitor) is presented in compound with a known excitor, and evidence of inhibition is defined as attenuation of responding to the excitor as compared to a condition in which the target is absent. But Rescorla (1969) suggested that this test alone was not sufficient because processes other than inhibition could explain the observed response attenuation. For example, it is possible that inhibition training results in enhanced attention to the target and, at test, the presence of the inhibitor shifts attention away from the excitor, thereby attenuating responding to it. Rescorla (1969) proposed the concurrent use of a *retardation test*, in which the target (putative inhibitor) is paired with the US, and evidence of inhibition is defined as retarded acquisition of excitatory conditioned responding as compared to a condition in which no inhibitory training took place. Rescorla (1969) suggested that this test alone was insufficient to demonstrate inhibition because other mechanisms can also produce this pattern of responding. For example, if inhibition training resulted in attenuated attention to the target, its learning rate would decrease resulting in retardation. Because the alternative explanations for summation and retardation tests are mutually exclusive, Rescorla (1969) proposed that both tests are needed to determine that a stimulus has inhibitory behavioral control. Thus, if the initial segment of a long delay CS is inhibitory, it should attenuate responding to an excitor presented in compound with it *and* exhibit retardation of acquisition of excitatory behavioral control if paired with US delivery. That is, to be considered inhibitory, the initial segment of a long delay CS should pass both summation and retardation tests for conditioned inhibition.

An alternative to the inhibition view is that the initial segment of the long delay stimulus comes to signal something other than omission of an expected US. For example, the initial segment of the stimulus may simply remain neutral and acquire neither excitatory nor inhibitory behavioral control. If this was the case, the initial segment of the stimulus should readily develop excitatory behavioral control if reinforced, and should not attenuate responding to an excitor presented in compound with it. A third alternative is that the initial segment of the CS may exhibit attenuated response potential as a consequence of either a decrease in associability of the initial segment of the CS or the development of strong associations to the context, both of which have been proposed as mechanisms underlying latent inhibition (see Escobar et al., 2002; Escobar and Miller, 2010, for a review). In this case, the initial segment of the CS should exhibit retardation of acquisition of excitatory behavioral control if paired with US delivery, and should not attenuate responding to a conditioned excitor presented in compound with it (Rescorla, 1971). That is, if the initial segment of a long delay CS was latently inhibited, it should pass a retardation but not a summation test for conditioned inhibition. The present studies were designed to contrast these three alternatives.

Rats received long delay conditioning with a 60-s CS (CS A) that signaled delivery of sucrose pellets 55 s after CS onset. Conditioned responding was assessed in terms of number of head entries (nose pokes) into a niche containing the feeding cup. This procedure resulted in animals' producing few nose poking responses during the initial segment of CS A, and a gradually increasing number of responses as the CS presentation progressed, with a response peak at about the time of US expectation (i.e., inhibition of delay). In Experiment 1a, the initial segment of the CS was then paired with the US (delivered 10 s after CS onset) to assess retardation of acquisition of the nose poking conditioned response. Experiment 1b assessed whether the initial segment of the long delay CS could attenuate responding to an independently trained, discrete excitor (i.e., a summation test). Experiment 2 assessed the possibility that the initial segment of the long delay CS had developed into a latent inhibitor rather than a conditioned inhibitor.

Notably, we selected an appetitively- rather than an aversively motivated preparation for the present studies. Conditioned inhibition is often studied in the framework of fear conditioning, although there are a few demonstrations of inhibition in appetitive preparations (e.g., Nelson, 2002; Williams et al., 2008a). An appetitive preparation was chosen because appetitive conditioning better allows for online measuring of the development of temporally specific behavior; thus, we could readily measure the development of inhibition and excitation across the duration of the CS.

# EXPERIMENTS 1A AND 1B

Experiments 1a and b investigated whether the initial segment of a long delay CS becomes inhibitory by using retardation and summation tests. Experiment 1a assessed retardation in the development of excitatory response potential to the initial segment of the long delay CS. Experiment 1b tested whether the initial segment of the long delay CS attenuated responding to a transfer excitor trained to predict US delivery during its initial segment. If retarded acquisition of a response to the initial segment of the long delay CS was observed in Experiment 1a, the possibility that it is neutral can be rejected. Furthermore, if attenuation of responding to the initial segment of the transfer excitor was also observed in Experiment 1b, the view that long delay conditioning results in the development of inhibition to this stimulus segment would be supported. However, if retardation was observed in Experiment 1a but summation was not observed in Experiment 1b, the assumption that the initial segment of the long delay CS becomes inhibitory would not be supported and an alternative mechanism would be suggested. As a possibility, the response potential of the initial segment of the CS may be attenuated due to a process akin to latent inhibition, which should result in a retarded acquisition of responding to the initial segment of the stimulus and no attenuation of responding to the transfer excitor (Rescorla, 1971).

In both Experiments 1a and b, CS A was trained as a long delay predictor of the US. Thus, the US (sucrose pellets) was delivered 55 s after onset of the 60-s CS. The US was delivered at 55 s to ensure that subjects used CS duration rather than CS termination cues to determine time of delivery of the US. A second CS, C, was trained as a short delay predictor of the US with US delivery 10 s after onset of the 60-s CS (see **Figure 1**). Training with the short delay CS had two purposes: first, it allowed for a measure of response discrimination (responses were expected to peak during the initial segment of CS C and the final segment of CS A); second, the initial segment of CS C could be used as a transfer excitor for summation testing (for consistency, the C-US trials were included in all studies of the series). Because the development of inhibition appears to be highly dependent on the number of non-reinforced training presentations of the putative inhibitor (e.g., Yin et al., 1994; Stout et al., 2004), nine times more CS A than CS C trials were delivered, for a total of 450 and 50 trials, respectively. After training was completed, conditioned inhibition to the initial segment of CS A was assessed.

In Experiment 1a, after completion of the long delay conditioning training phase, subjects began retardation training (see **Figure 1**). During these trials, CS A, which had been trained to predict US delivery 55 s after CS onset, predicted US delivery 10 s after onset (LongDelay group). In the Control condition, subjects received equivalent training with a novel CS, B. Thus, development of conditioned responding to the initial segment of the putative inhibitor, CS A, was compared to development of conditioned responding to the equivalent segment of a neutral stimulus. Experiment 1b assessed inhibition using a summation test. In this test, following completion of the long delay conditioning training phase, subjects in the Long Delay condition received presentations of the compound of CSs A and C. CS C predicted US delivery 10 s after onset; thus, if

the initial segment of CS A was inhibitory, it should attenuate responding to the initial segment of CS C. In the Control condition, subjects were presented with the compound of CS C and novel CS B. Because B had received no previous training, it was not expected to affect responding to CS C beyond external inhibition; thus, if inhibition had developed, responding should be attenuated in the initial segment of the AC compound more than in the initial segment of the BC compound.

# Materials and Methods Subjects

The subjects were 48 male (227–264 g in Experiment 1a, 219– 257 g in Experiment 1b) albino rats (Holtzman stock, Harlan Labs). The 24 subjects in each experiment were randomly assigned to one of two groups, LongDelay or Control (*n*s = 12). Subjects were housed in pairs in standard plastic cages with wire lids in a vivarium maintained on a 12-hr light/12-hr dark cycle. All experimental manipulations occurred during the light portion of the cycle. Water was available *ad lib* to all subjects. A food deprivation schedule was imposed during the week preceding the initiation of the experiment such that feedings were gradually reduced to maintain animals at approximately 85% of their free-feeding weight. Food (regular rat chow) was provided approximately 1 h after completion of each experimental session. From the time of arrival to the laboratory until initiation of the study, animals were handled for 30 s every other day. All cagemates were assigned to different groups. The research was conducted in accordance with the "Principles of laboratory animal care" (NIH publication No. 86-23, revised 1985) and all procedures were approved by the Auburn University Institutional Animal Care and Use Committee.

#### Apparatus

The apparatus consisted of eight Med Associates standard rat chambers (30.5 cm long × 24.1 cm wide × 21.0 cm high). The sidewalls of each chamber were made of aluminum sheet metal, and the front wall, back wall, and ceiling of the chamber were made of transparent polycarbonate. The floor was constructed of 4.8-mm stainless steel rods, spaced 1.6 cm center-to-center.

FIGURE 1 | Design of Experiments 1a and b. The white, light gray, and dark gray rectangles represent CSs A, B, and C, respectively. CSs A and B were a 60-s white noise or a 60-s flashing light, counterbalanced within groups. CS C was a 60-s 2,900-Hz tone. Sucrose pellet USs (represented by black rectangles under the time line) were delivered at either 55 or 10 s after CS onset (represented at the appropriate location under the time). Onset of CSs A and C (Group LongDelay) and B and C (Group Control) during summation testing (Experiment 1b) were simultaneous. Slashes represent intermixed presentation of CSs during training. See text for details.

Each chamber was housed in a melamine sound attenuation cubicle equipped with an exhaust fan that provided a constant, 70 dB background noise (this and all subsequent sound pressure level measures were performed using the A scale). All chambers were equipped with a pellet dispenser that could deliver 45 mg sucrose pellets into a cup located inside a niche (5.1 cm long × 5.1 cm wide × 5.1 cm high). The niche was placed on a side wall, 1.5 cm above the grid floor, and was equipped with an infrared photo beam, which when disrupted could be used to detect the number of head entries into the niche; this was used as our dependent variable. All chambers were also equipped with a speaker mounted above the pellet dispenser and two speakers mounted on the opposite wall. These speakers could produce a 2,900-Hz tone or a 4,500-Hz tone, and a white noise or an 800-Hz tone, respectively. All auditory stimuli were delivered at an intensity of 10 dB above background. A 1.12-Watt (#1820) flashing houselight (0.50 s on/0.50 s off) was used as a visual stimulus. All stimulus events were programmed and all data was recorded using MedPC software.

#### Procedure

**Figure 1** presents the critical aspects of Experiments 1a and b. CSs A and B were the white noise and the flashing houselight, counterbalanced within groups. CS C was the 2,900-Hz tone. When delivered, CSs A, B, and C were 60 s in duration. The US consisted of the delivery of two 45-mg sucrose pellets 55 s after CS A onset and 10 s after CS C onset. All sessions were 120 min in duration, except for the test session, which was 60 min in duration. Throughout training and testing, the chamber was dark, with the exception of presentations of the houselight as a stimulus. The procedure was identical for both studies, with exception of the treatment provided on Day 27 (retardation training in Experiment 1a, summation testing in Experiment 1b), which is detailed below. Note that counterbalancing the noise and houselight as CSs A and B and using the tone as CS C resulted in some animals receiving summation testing with an auditory–auditory compound and other animals being tested with an auditory-visual compound. Pilot research in our laboratory suggested that the noise and houselight acquire equivalent control of nose poke behavior, and no differences in responding based on stimulus nature were observed in any of the present studies.

#### *Acclimation*

On Day 1, all subjects were acclimated to the experimental context and to retrieving pellets from the niche. Sucrose pellets were delivered on a fixed-time 5-min schedule. During this session subjects were exposed to two presentations of all stimuli in the order houselight, tone, noise, houselight, noise, tone, with a mean intertrial interval of 15 min in order to reduce unconditioned fear to these stimuli.

#### *Long delay conditioning training*

On Days 2–26, all subjects received 18 daily CS A-US pairings (US delivery occurred 55 s after onset of CS A) as well as 2 daily CS C-US pairings (US delivery occurred 10 s after onset of CS C). Thus, A was trained as a long delay CS and C was trained as a short delay CS. Although the C-US pairings were irrelevant to the retardation studies, they were included in all studies of the series for consistency of treatment with the summation study. Two schedules of training were used on alternate days. Trials 3 and 14 (Schedule 1) or 7 and 18 (Schedule 2) were designated as C trials. In both schedules, the mean intertrial interval was 6 (±3) min. Probe trials (presentation of the CS without US delivery) were included to test for acquisition of the response without contamination from US presentation. The 90, 180, 270, 360, and 450th presentation of A and the 10, 20, 30, 40, and 50th presentation of C were designated as probe trials.

#### *Experiment 1a: retardation training*

On Day 27, subjects received either 20 CS A (Group LongDelay) or 20 CS B (Group Control) presentations, with a mean intertrial interval of 6 (±3) min. During 19 of the A-US and B-US trials, US delivery occurred 10 s after onset of the 60-s stimuli; thus, conditioned responding was expected to develop to the initial segment of the CS. The 20th trial was designated as a probe trial (i.e., it was not reinforced).

#### *Experiment 1b: summation test*

On Day 27, subjects received two presentations of either AC (Group LongDelay) or BC (Group Control), with an intertrial interval of 15 min. All stimuli were 60 s in duration and no stimulus presentation was followed by delivery of the US.

#### Data Analysis

Number of head entries during all training and test sessions was recorded in 5-s bins. For purposes of analysis, the three first and three last bins (i.e., the first and last 15 s) of the 60-s CS were used as a measure of responding during the initial and final segments of the CS. The occurrence of long delay conditioning was assessed by analyzing the last probe trial, retardation (Experiment 1a) was assessed by analyzing the retardation training probe trial, and summation was assessed by analyzing the average responding to the two test trials. Outlier scores for each statistical analysis were excluded using Grubbs' test (e.g., Grubbs and Beck, 1972), with the constraint that a maximum of one data point would be excluded from any analyzed segment (i.e., the test was performed with no iterations). In Experiment 1a, application of the outlier criterion resulted in the data from one subject in Group LongDelay being excluded from the long delay conditioning analysis, and the data from one subject in Group LongDelay being excluded from the retardation test analyses. In Experiment 1b, the data from two subjects in Group LongDelay were excluded from the long delay conditioning analysis, and the data from one subject in Group Control were excluded from the summation test analyses.

# Results and Discussion Experiment 1a

Long delay conditioning training occurred as expected, with subjects in both groups exhibiting more responding to the final than the initial segment of CS A and more responding to the initial than the final segment of CS C (see **Figures 2** and **3**). The data of greatest interest were obtained during the retardation training phase. Acquisition of conditioned responding to the initial segment of long delay CS A was slower than acquisition

of conditioned responding to the equivalent segment of novel CS B. That is, the initial segment of the long delay CS exhibited retarded acquisition of excitatory behavioral control. The following analyses support these conclusions.

#### *Long delay conditioning training*

Responding to long delay CS A and short delay CS C acquired temporal-specific properties as training progressed. **Figure 2** presents the bin-by-bin data collected during first four probe trials with both stimuli (equivalent data for the last probe trial of the long delay conditioning phase is presented in **Figures 3A,B** described below). The figure evidences that responding initially occurred throughout the duration of both CSs A and C, and as training progressed became constrained to the segment of the stimulus more contiguous with reinforcement. Acquisition occurred in a similar fashion in all subsequent experiments. To

ensure that responding to the long and short delay CSs had reached equivalent levels at the end of training across groups, a 2 (group, between groups factor) × 2 (stimulus: A vs. C, withinsubjects factor) × 2 (segment: initial vs. final, within-subjects factor) analysis of variance (ANOVA) was conducted on the last probe trial administered during the long delay conditioning training phase. This analysis revealed a main effect of segment, *F*(1,21) = 4.63, *MSE* = 13.07, *p <* 0.05, as well as an interaction of Stimulus × Segment, *F*(1,21) = 87.42, *MSE* = 21.98, *p <* 0.0001 (see **Figures 3A–C**). No other main effect or interaction was significant, all *p*s *>* 0.28. *Post hoc* comparisons using the Bonferroni correction revealed that subjects in both groups exhibited more responding to the final segment of CS A than the equivalent segment of CS C, and more responding to the initial segment of CS C than the equivalent segment of CS A, all *<sup>p</sup>*<sup>s</sup> *<sup>&</sup>lt;* 0.0005 (see **Figures 3A–C**).

#### *Retardation test*

A 2 (group) × 2 (segment) ANOVA was performed on the number of head entries recorded during the test stimulus probe presentation. This analysis yielded a main effect of segment and a Group × Segment interaction, *F*s(1,21) = 5.71 and 22.24, *MSE* = 10.10, *p*s *<* 0.05 and 0.0005, respectively. The main effect of group was not significant, *F <* 1. A series of pair-wise comparisons derived from the 2 × 2 ANOVA were conducted to analyze responding in the initial and final segments of the test stimulus. Groups LongDelay and Control differed in level of responding to the initial segment of the test stimulus, *F*(1,21) = 7.49, *MSE* = 16.63, *p <* 0.05, reflecting retarded acquisition of conditioned responding in Group LongDelay. Groups LongDelay and Control also differed in level of responding to the final segment of the test stimulus, *F*(1,21) = 24.27, *MSE* = 4.15, *p <* 0.0001; that is, Group LongDelay continued to exhibit more responding to the final segment of CS A than Group Control (**Figure 3D**). A further analysis included responding during Bins 1 and 2 of all CS A retardation trials, which represent conditioned responding prior to presentation of the US (Bin 3, which was included for the probe trial analyses, occurred after US presentation and thus represents both conditioned and unconditioned responding). A 2(group) × 5(block of 4 trials) ANOVA revealed main effects of group, *F*(1,21) = 8.44, *MSE* = 53.12, *p <* 0.01, and block, *F*(4,84) = 5.49, *MSE* = 2.97, *p <* 0.001. The interaction was borderline significant, *F*(4,84) = 2.43, *p* = 0.05. Responding in the two groups did not differ in the first block, *p >* 0.20, but differed in all subsequent blocks, *p*s *<* 0.05, suggesting that behavioral control by CS A was equivalent in the two groups at the initiation of retardation training, and retardation emerged as training progressed.

#### Experiment 1b

Long delay conditioning resulted in subjects in both groups exhibiting more responding to the initial than the final segment of CS C and more responding to the final than the initial segment of CS A (see **Figure 4**). However, low responding to the initial segment of CS A was not indicative of inhibition, as assessed with a summation test. CS A failed to attenuate responding to the initial segment of CS C beyond any attenuation produced by a stimulus that did not receive long delay conditioning training, CS B. These conclusions are supported by the following statistical analyses.

#### *Long delay conditioning training*

A 2 (group, between groups factor) × 2 (stimulus: A vs. C, withinsubjects factor) × 2 (segment: initial vs. final, within-subjects factor) ANOVA conducted on the last probe trial administered during long delay conditioning training revealed an interaction of Stimulus × Segment, *F*(1,20) = 58.25, *MSE* = 33.77, *p <* 0.0001. No other main effect or interaction was significant, all *p*s *>* 0.13. *Post hoc* comparisons using the Bonferroni correction revealed that subjects in both groups exhibited more responding to the final segment of CS A than the equivalent segment of CS C, and more responding to the initial segment of CS C than the equivalent segment of CS A, all *<sup>p</sup>*<sup>s</sup> *<sup>&</sup>lt;* 0.05 (see **Figures 4A–C**).

#### *Summation test*

A 2 (group) × 2 (segment) ANOVA was conducted on the mean of the data collected during the two test compound presentations to assess the occurrence of conditioned inhibition. This analysis yielded a main effect of group, *F*(1,21) = 19.11, *MSE* = 19.09, *p <* 0.0005, and a Group × Segment interaction, *F*(1,21) = 15.74, *MSE* = 14.80, *p <* 0.001. Planned comparisons derived from the 2 × 2 ANOVA revealed no differences between groups during the initial segment of the test compound presentation, *F <* 1. Indeed, there was an ordinal difference between groups in the direction *opposite* to inhibition; that is, the LongDelay group responded (non-significantly) more than the Control group to the test compound (see the bottom-right panel of **Figure 4**). This difference between groups most likely reflects generalization decrement affecting the Control group more than the LongDelay group. Indeed, in the Control group, responding based on

Stimulus C dropped from a mean (±SEM) of 12.36 (±2.14) responses to a mean of 7.91 (±1.08) responses. One could argue that this large generalization decrement in the Control group made it difficult to detect inhibition in the LongDelay group, and a more appropriate control condition would have included a comparison between responding to the AC compound and responding to Stimulus C alone. However, decrements in responding during a summation test may come from three sources: external inhibition produced by the added stimulus, generalization decrement due to the change from the training stimulus configuration to the test stimulus configuration, and conditioned inhibition *per se*. Adding a novel stimulus to the transfer excitor (Stimulus C) controls for decrements due to external inhibition and generalization decrement; thus, to be considered inhibitory, the putative inhibitor should attenuate responding to C beyond the attenuation produced by these factors. Furthermore, a comparison of responding to the initial segment of Stimulus C during the last probe trial and responding to the initial segment of the test compounds (AC and BC) revealed a main effect of stimulus (C vs. test compound), *F*(1,22) = 14.80, *MSE* = 14.22, *p <* 0.001, but neither a main effect of Group nor a Group × Stimulus interaction, *F*s *<* 1. Thus, generalization decrement occurred in both groups and the failure to detect any response attenuation beyond generalization decrement suggests a failure to obtain evidence of inhibition with the summation test.

A possible concern with the present study was that half of the subjects were tested on the compound of two auditory stimuli and, the remaining half, on the compound of one auditory and one visual stimulus. The ANOVA was repeated including the nature of stimulus A (visual vs. auditory) as a factor. This analysis replicated the results of the previous analysis, yielding a main effect of group and a Group × Segment interaction, *F*s(1,18) *>* 10.02, *p*s *<* 0.01, but neither a main effect of stimulus nor an interaction of stimulus with any of the other factors, *F*s *<* 1.35, *p*s *>* 0.26.

A comparison of responding to the final segment of the test compounds revealed more head entries during presentation of AC (Group LongDelay) than BC (Group Control), *F*(1,21) = 46.17, *MSE* = 12.79, *p <* 0.0001. This difference was expected because CS A predicted US delivery 55 s after CS onset. In contrast, an expectation of US delivery during the second half of the BC compound presentation was unlikely because neither B nor C predicted US delivery during this period. Thus, presentation of the AC compound seemingly resulted in responding based on the expected time of US delivery provided by both CS C (initial segment) and CS A.

#### Conclusion

Long delay conditioning training established CS A as a predictor of the US in its final segment. Consequently, subjects exhibited little responding to the initial segment of CS A, which constitutes the typical inhibition of delay response pattern (**Figures 2**, **3A,B** and **4A,B**). When inhibition was assessed, the initial segment of CS A was slower than the initial segment of a novel stimulus in acquiring control of conditioned responding; that is, the initial segment of CS A passed a retardation test for conditioned inhibition (Experiment 1a). However, the initial segment of CS A did not pass a summation test (Experiment 1b). Thus, Experiment 2 focused on alternative perspectives that use mechanisms other than conditioned inhibition to explain the response pattern that characterizes long delay conditioning.

# EXPERIMENT 2

Experiment 1a detected retardation of acquisition of a conditioned response to the initial segment of a CS that had undergone long delay conditioning training. That is, the stimulus passed a retardation test for conditioned inhibition. This same training, however, did not result in the initial segment of the long delay CS attenuating responding to the initial segment of an excitor trained to predict the US 10 s after CS onset (Experiment 1b). That is, the initial segment of the long delay CS failed to pass a summation test for conditioned inhibition. Although one could argue that passing both summation and retardation tests for conditioned inhibition is not a necessary requirement to consider a stimulus inhibitory (e.g., Williams et al., 1992; Papini and Bitterman, 1993; see the Discussion for elaboration), one should wonder whether the initial segment of the CS indeed became a 'true' conditioned inhibitor.

As an alternative to conditioned inhibition, consider the possibility that latent inhibition developed to the initial segment of the long delay CS. In a latent inhibition procedure (cf. Lubow and Moore, 1959), a stimulus that is repeatedly presented in the absence of a US is later retarded in acquiring or expressing an association with the US. Notably, latent inhibition has long been regarded as distinct from conditioned inhibition because latent inhibitors do not pass a summation test for conditioned inhibition (i.e., they fail to attenuate responding to a known excitor; Rescorla, 1967) and CS preexposure results not only in retardation of excitatory responding, but also in retardation of inhibitory responding (e.g., Baker and Mackintosh, 1977; Friedman et al., 1998). Experiments 1a and b provide some support for this hypothesis because, as is the case with latent inhibitors, the initial segment of the CS passed a retardation but not a summation test. Furthermore, one of the defining characteristics of latent inhibition is that it is highly contextspecific. That is, when preexposure treatment (i.e., CS alone presentations) is given in Context 1 and conditioning treatment (i.e., CS-US pairings) is given in Context 2, latent inhibition is greatly attenuated (e.g., Channell and Hall, 1983; Hall and Minor, 1984; Rosas and Bouton, 1997). Indeed, the context-specificity of latent inhibition has served to differentiate among different theoretical approaches to the phenomenon (for a review, see Escobar and Miller, 2010). Applied to the present studies, if retardation of acquisition following our long delay conditioning training resulted from preexposure to the initial segment of the CS, then changing the context between long delay conditioning training and retardation training should greatly attenuate the retardation effect.

In Experiment 2, three groups of rats received long delay conditioning training of CS A in Context 1. Then, all groups received retardation training such that the initial segment of a 60-s CS came to signal US delivery 10 s after onset. In the Long Delay condition, these retardation training trials involved pairings of CS A and the US, whereas in the Control condition, they involved pairings of CS B and the US. Retardation training occurred either in the same or a different context from that of long delay conditioning. Group LongDelay.Same received CS A-US pairings in Context 1, Group LongDelay.Diff received CS A-US pairings in Context 2, and Group Control received CS B-US pairings in either Context 1 or Context 2 (CS B was a novel stimulus used to provide a baseline of acquisition; see **Figure 5**). Based on the results of Experiment 1a, retardation of acquisition was expected in Group LongDelay.Same relative to Group Control. Furthermore, if long delay conditioning results in latent inhibition of the initial segment of the long delay CS, no evidence of retardation would be expected in Group LongDelay.Diff relative to Group Control.

# Materials and Methods

#### Subjects and Apparatus

The subjects were 32 male (355–546 g) albino rats acquired, housed, and maintained as in the previous studies. All animals had previously served as subjects in a study using a different set of stimuli and an aversively motivated task. Eleven subjects were assigned to each the LongDelay.Same and LongDelay.Diff groups, and the remaining 10 subjects were assigned to the Control group. The apparatus were the same as those described for the previous studies.

#### Procedure and Data Analysis

**Figure 5** presents the critical aspects of Experiment 2. The procedure was the same as that described for Experiment 1a, with the following exceptions. Two contexts (the grid and plexi enclosures) were used in this study. The *grid enclosure* was the chamber as described in Experiment 1a. The *plexi enclosure* used the same chambers as the grid enclosure, but a smooth, transparent, Plexiglas sheet was used to cover the grids and a pattern of alternating black and white stripes was used to cover the clear walls of the enclosure. For each animal, different physical chambers were used as enclosures grid and plexi. Designations of enclosures grid and plexi as Contexts 1 and 2 was counterbalanced within groups. The flashing light and noise served as CSs A and B, counterbalanced within groups. The 800-Hz tone served as CS C for all subjects.

On Day 1, all subjects were acclimated to the experimental context as described for Experiments 1a and b, except that they received a 60-min exposure to each of Contexts 1 and 2. On Days 2–26, all subjects received long delay conditioning training as described for Experiments 1a and b. Long delay conditioning training occurred in Context 1 for all subjects. On Day 27, subjects received either 20 A-US pairings (Groups LongDelay.Same and LongDelay.Diff) or 20 B-US pairings (Group Control) following the same procedure described for Experiment 1a. These pairings took place in Context 1 for Group LongDelay.Same and half the subjects in Group Control, and in Context 2 for Group LongDelay.Diff and the remaining half of the Group Control subjects.

All statistical analyses were performed following the guidelines outlined for Experiments 1a and b. The data from one subject in Group LongDelay.Same was excluded from the long delay conditioning analyses due to its final segment CS A score meeting the outlier criterion. No other long delay conditioning or test score met the outlier criterion.

# Results and Discussion

Long delay conditioning resulted in all groups exhibiting more responding to the final than the initial segment of CS A and more responding to the initial than the final segment of CS C (see **Figures 6A–C**). Groups LongDelay.Same and Control replicated the results of Experiment 1a: retardation of acquisition of a conditioned response to the initial segment of CS A was observed. Importantly, this difference was not observed when Groups LongDelay.Diff and Control were compared. That is, changing the context between long delay

FIGURE 5 | Design of Experiment 2. The white, light gray, and dark gray rectangles represent CSs A, B, and C, respectively. CSs A and B were a 60-s white noise or a 60-s flashing light, counterbalanced within groups. CS C was a 60-s 800-Hz tone. Sucrose pellet USs (represented by black rectangles under the time line) were delivered at either 55 or 10 s after CS onset (represented at the appropriate location under the time). Long delay conditioning training for all groups occurred in Context 1 (white background). Retardation training occurred in Context 1 for Group LongDelay.Same, Context 2 (shaded background) for Group LongDelay.Diff, and in either Context 1 or Context 2 for Group Control. Slashes represent intermixed presentation of CSs during training. See text for details.

conditioning training and retardation training attenuated the retardation effect. The following analyses support these conclusions.

#### Long Delay Conditioning Training

A 3 (group, between groups factor) × 2 (stimulus: A vs. C, within-subjects factor) × 2 (stimulus segment: initial vs. final, within-subjects factor) ANOVA conducted on the last probe trial of long delay conditioning training revealed a Stimulus × Segment interaction, *F*(1,28) = 70.63, *MSE* = 15.20, *p <* 0.0001, and a marginally significant main effect of stimulus, *F*(1,28) = 4.13, *MSE* = 13.30, *p* = 0.052. Unexpectedly, there was also a significant interaction of Segment × Group, *F*(2,28) = 7.15, *MSE* = 7.88, *p <* 0.005. No other main effects or interactions were significant, all *p*s *>* 0.35. The Segment × Group interaction suggests differential acquisition across groups. To further investigate the source of this interaction and take into consideration the marginal main effect of stimulus, a series of 3 (group) ×2 (segment) ANOVAs was conducted on the data collected during the last probe trial of each individual stimulus. An analysis of CS A revealed a main effect of segment, *F*(1,28) = 49.73, *MSE* = 12.62, *p <* 0.0001, but no main effect of group nor an interaction, both *p*s *>* 0.21. A similar analysis of CS C revealed a main effect of segment, *F*(1,28) = 43.32, *MSE* = 10.46, *p <* 0.0001. Importantly, this analysis also revealed a Segment × Group interaction, *F*(1,28) = 4.88, *MSE* = 10.46, *p <* 0.05. The source of this interaction appears to be the relatively similar responding to the two segments of CS C in Group LongDelay.Same, which resulted in lower responding to its initial segment and higher responding to its final segment than in the other two groups (notably, the pattern of responding was the expected across trials; see **Figure 6**). Because CS C was not of relevance in the present study and responding was equivalent across groups when CS A was analyzed, we concluded that acquisition had been equivalent across groups (see **Figures 6A,C**).

#### Retardation Test

A preliminary analysis of the retardation training data revealed that the relative novelty of the context used in the Different condition adversely affected overall rates of responding. Specifically, subjects receiving retardation training in a context different from that used for long delay conditioning produced about 15% fewer responses throughout the retardation test session than subjects receiving retardation training in the same context (such differences in responding between contexts were not observed during long delay conditioning training). In consequence, retardation data were analyzed using overall rate of responding as a covariate in a 3 (group) × 2 (segment) analysis of covariance (ANCOVA), which yielded a Group × Segment interaction, *F*(2,28) = 13.89, *MSE* = 5.85, *p <* 0.0001, and marginal main effects of group, *F*(2,28) = 3.02, *MSE* = 17.83, *p* = 0.07, and segment, *F*(1,28) = 3.58, *MSE* = 5.85, *p* = 0.07. Planned comparisons revealed that responding to the initial segment of the test CS was lower in Group LongDelay.Same than Groups LongDelay.Diff and Control, *F*s(1,28) = 6.18 and 9.06, *p*s *<* 0.05 and 0.01, respectively. Groups Control and LongDelay.Diff did not differ, *F*(1,28) *<* 1. That is, acquisition of responding to the initial segment was retarded only when the context of retardation training was the same as the context of long delay conditioning. Analyses of responding during the final segment of the test CS revealed that Groups LongDelay.Diff and Control differed, while this difference was marginal for Groups LongDelay.Same and Control, *F*s(1,28) = 12.04 and 3.71, *p*s *<* 0.005 and 0.065, respectively. The unadjusted means for responding during the initial/final segment of the test CS were 5.36/7.27, 7.72/8.26, and 9.40/4.00, for Groups LongDelay.Same, LongDelay.Diff, and Control, respectively (least squared means, adjusted for the covariate (presented in **Figure 6D**) were 4.96/6.69, 8.40/9.33, and 9.10/3.57, respectively).

# DISCUSSION

The term inhibition of delay has been used to describe either situations in which there is minimal or no conditioned responding to the initial segment of a long delay CS, or situations in which conditioned responding to the initial segment of a long delay CS decreases as training progresses. The former use of the term would suggest that the initial segment of the CS acquires no behavioral control, whereas the latter would suggest that the initial segment of the CS acquires behavioral control that decreases as acquisition of the temporal relationship between CS and US progresses, possibly due to inhibitory processes (cf. Pavlov, 1927). In three studies, rats were trained to retrieve food pellets delivered during the final segment of a long delay CS. Throughout training, behavior became temporally specific, with most of the conditioned responding occurring during the final segment of the long delay CS. The initial segment of the CS elicited little conditioned responding, which is the characteristic pattern that results from long delay conditioning. Despite this pattern of behavior being commonly known as *inhibition of delay*, we obtained little evidence of conditioned inhibition developing to the initial segment of the CS. The initial segment of the stimulus passed a retardation test for conditioned inhibition (Experiments 1a and 2) but failed to pass a summation test for conditioned inhibition (Experiment 1b). In Experiment 2, a context change was imposed between the long delay conditioning and retardation training phases. When the context of long delay conditioning training was different from the context of retardation training, retardation was attenuated. These three characteristics (passing a retardation test, failure to pass a summation test, and context-dependence of retardation) closely

resemble the defining characteristics of the CS-preexposure effect (for a review see e.g., Escobar et al., 2003) and suggest that response attenuation in the initial segment of a long delay CS may be the result of the cumulative effects of repeated exposures to a non-reinforced stimulus segment. That is, at least in the present preparation, long delay conditioning appears to result in the development of latent inhibition, rather than 'true' conditioned inhibition, to the initial segment of the CS.

Our observation that the initial segment of the long delay CS failed to pass a summation test in Experiment 1b should not be viewed as definitive evidence against the possibility that conditioned inhibition develops to the initial segment of a long delay CS for several reasons. First, some authors have regarded passing of a retardation test as sufficient evidence of conditioned inhibition, as long as the alternative explanation of attenuated attention to the inhibitor can be precluded with appropriate control conditions (Papini and Bitterman, 1993). Furthermore, other authors have claimed that passing just a summation or a retardation test should be considered sufficient evidence of inhibition because multiple variables (e.g., collateral excitation) may preclude a stimulus from passing both tests (Williams et al., 1992). However, passing both tests is still regarded by most researchers as necessary to conclude that a treatment is conducive to the development of conditioned inhibition (e.g., Cole et al., 1997), and this two-test strategy (cf. Rescorla, 1969) is frequently used as a behavioral definition of inhibition (Savastano et al., 1999).

Previous studies on inhibition of delay have reported evidence of inhibition as assessed with a summation test. For example, Rescorla (1967) trained dogs to fear a 30-s tone that coterminated with a 5-s shock. This tone was later used as a warning signal in a discriminated avoidance preparation. The long delay conditioning treatment resulted in the initial 5 s of the CS attenuating fear to the training context; that is, the number of avoidance responses during the initial segment of the CS fell below baseline level. Thus, one can conclude that the long delay CS passed a summation test for conditioned inhibition. We did not observe a similar attenuation: mean number of head entries during the period of time immediately preceding CS onset was 1.5 and during the initial segment of the CS was 2.25 (these numbers were taken from the last probe trial data across studies, using 15 s as window for both measures). This discrepancy may be due to conditioned inhibition developing at different rates in aversive (Rescorla's) vs. appetitive (Pavlov, 1927; the present studies) conditioning. Because conditioned inhibition appears to be a positive function of the number of training trials administered (e.g., Heth, 1976; Yin et al., 1994; Cole and Miller, 1999; Stout et al., 2004), further increasing the number of long delay conditioning trials might favor the development of conditioned inhibition in the present preparation. Notably, in a preparation similar to that used in the present experiments, Williams et al. (2008b) observed attenuated responding to the initial segment of a long delay CS, as compared to the preceding ITI. In Williams et al.'s (2008b) preparation, the CS-US contingency was degraded by delivering unsignaled pellet USs during the intertrial intervals; thus, this response attenuation may have been due to the degraded contingency rather than (or in addition to) the long delay conditioning training.

It is possible that the development of robust inhibition in the long delay procedure requires that the training excitor be highly excitatory. A problem with this latter possibility is to determine what stimulus acts as the training excitor for the initial segment of the long delay CS. A likely candidate is the final segment of the CS, which is contiguous with the US. However, conditioned inhibition is most readily obtained when the training excitor is presented in a non-reinforced simultaneous compound with the candidate inhibitor. Although there are some reports of inhibition when the training excitor and the putative inhibitor are presented serially (e.g., Rescorla, 1985; Stout et al., 2004), this type of inhibition generally develops more slowly, requiring many more compound trials than the simultaneous type of inhibitory training (Stout et al., 2004) and sometimes fails to result in inhibition at all (e.g., Holland, 1985). Because the initial and final segments of a long delay CS are separated by time, neither compound training nor close proximity training can take place. A second candidate excitor is the training context. During long delay conditioning training, the context is paired *both* with the US and the initial segment of the CS. Assuming that the context plays the role of training excitor can explain some of our results as well as some of the apparent discrepancies between the previous literature and the present studies. For example, because of the relatively extensive exposure to the training context in our preparation, the context may not have been excitatory enough to support robust inhibition. During each of the 25 training sessions, the context was paired with the US approximately 20 times (18 times during the 5 probe days, 20 times the remaining 20 days) and was presented in the absence of the US for about 118 min in each of these 25 training sessions. This relatively massive context extinction should have attenuated the impact of the context-US pairings, thus resulting in low excitation to the context (see Gibbon and Balsam, 1981; Gallistel and Gibbon, 2000, for a description of how long durations of exposure to the context without the US might attenuate contextual excitation). In Rescorla's study, both the final segment of the CS and the training context could have been more effective as training excitors because, in his study, the CS duration was 30 s, which is half of the duration of the CSs in the present studies. Thus, it is more likely that the excitatory properties of the final segment of the CS affected learning about the initial segment of the CS (closer temporal proximity), and the significantly shorter non-reinforced exposure to the context made it more excitatory than in the present studies (but see Detert et al., 2008, for evidence that rats acquire little fear of the context during long-delay conditioning training).

It has been long known that animals can delay responding until the expected time of US delivery. Furthermore, recent reports suggest that animals also encode the specific time of US omission (e.g., Denniston et al., 1998a,b; Burger et al., 2001; Williams et al., 2008a). The present results support the view that time is a relevant variable both for expectation of US delivery and expectation of US omission. Furthermore, they support the view that different segments of a CS may acquire different associative meanings (i.e., excitatory and inhibitory response potentials). Consistent with this view, Romaniuk and Williams (2000) observed that backward conditioning results in excitation to the CS segment that is contiguous with US delivery, even if the CS as a whole was inhibitory. Similarly, Williams et al. (2008a) observed that conditioned inhibition is specific to the expected time of US omission, which is determined by the time of reinforcement of the training excitor. Taken together, these results indicate that information about US delivery and US omission can coexist through the duration of a CS, and development of excitation and inhibition, respectively, is determined by the temporal information encoded as part of the association. Our results add to this body of research by demonstrating that a segment of a CS that *predicts* the future delivery of the US can signal a relationship with the US different from excitation.

The present experiments suggest that a stimulus (in the present case, a long delay CS) should not be viewed as a unitary event but as a sequence of events that occurs over time. The initial segment of the CS has distinct properties, including the change in stimulation that accompanies stimulus onset. Similarly, the final segment of the CS has distinct properties, including CS duration and termination. Real-time associative learning theories should be well suited to account for this perspective and the present results, especially if they assume that CS perception is achieved through some sort of stimulus sampling process (cf. Estes and Burke, 1953). For example, recent extensions of Wagner's (1981) SOP model (e.g., C-SOP; Wagner and Brandon, 2001) propose combining a componential representation view with a competitive learning rule to account for the response pattern characteristic of inhibition of delay. According to Vogel et al. (2003; also see Brandon et al., 2003), the sampling of CS components is not random, but determined by a temporal process, such that CS components with greater proximity to the US become more strongly associated to the US than others (Wagner's original model assumed random sampling). When the componential representation and competitive learning views are combined, the model predicts the pattern of responding characteristic of inhibition of delay. Briefly, C-SOP assumes that delayed responding in long delay conditioning should be viewed as an instance of an AX+, BX− discrimination, where A, B, and X represent CS components that are differentially activated due to their association with the US. A-elements are those uniquely activated during US presence, B-elements are those uniquely activated during US absence, and X-elements are those activated during either CS presence or CS absence. In consequence, A-elements are assumed to become strongly excitatory, X-elements moderately excitatory, and B-elements just inhibitory enough to counter the excitation from X-elements during BX joint activation.

The C-SOP approach appears consistent with our data because it represents long-delay conditioned stimuli as neutral, rather than inhibitory, in their initial stages (BX elements) and excitatory in their final stages (AX elements). However, the model would need to account for *both* retarded acquisition of responding to the (presumably neutral) initial segment of the CS, and the context-specificity of the effect observed in Experiment 2. Retardation could be addressed by assuming that the B elements are inhibitory enough that their associative strength would grow at a slower rate after long delay conditioning than for a novel stimulus. Context specificity could be addressed by assuming that the B elements become strongly associated to the context, which would lead to the prediction of latent inhibition, a phenomenon explained by SOP in terms of strong CS-context association making some CS elements unavailable to later enter into associations with the US (Wagner, 1981). However, latent inhibition should be weak in this situation because at least some of the B elements would be inhibitory and unavailable to enter into associations with the context.

There are potential alternative explanations for our data. For example, it is possible that the initial segment of the long delay CS underwent habituation, which could have in turn resulted in retardation (a habituated stimulus segment may be less ready to enter into associations with the US when reinforced) and a failure of summation (a habituated stimulus would detract little responding from the transfer excitor). Long-term habituation and latent inhibition evolve from similar operations (see Lubow, 1989, for an extensive discussion), and some theoretical frameworks suggest that the two may be the result of a common underlying process (e.g., Wagner, 1976). The present studies do not allow for a full dissociation of latent inhibition and long-term habituation, and further research would be needed to obtain evidence for such a dissociation. However, Experiment 2 may provide greater support to the hypothesis that latent inhibition rather than habituation developed to the initial segment of the long delay CS because, at least under some circumstances, latent inhibition exhibits greater context dependence than habituation (see e.g., Hall, 1991). Another possible explanation for our data is that the initial segment of the CS develops weak associations to later segments of the CS or the US due to the relatively long duration of the inter-stimulus interval (ISI, defined as the interval between CS onset and US onset). However, duration of the ISI alone does not account for the development of long delay conditioning. For example, situations in which the ISI is maintained constant but the CS is of short duration (i.e., if a trace is introduced between CS termination and US onset) result in similar levels of responding and develop at similar ontogenetic times as long delay conditioning (Barnet and Hunt, 2005). However, trace conditioning and long delay conditioning appear to be mediated by different physiological systems. Trace conditioning is disrupted by the cholinergic antagonist scopolamine and enhanced by the cholinesterase inhibitor physostigmine, neither

# REFERENCES


of which has an effect on long delay conditioning (Hunt and Richardson, 2007). These observations can be viewed as problematic for our conclusions, considering that latent inhibition is disrupted by low doses (*<*0.5 mg/kg) of scopolamine (Barak and Weiner, 2007). Nonetheless, the reports that scopolamine spares long delay conditioning used a higher dose of scopolamine (1.0 mg/kg), and at high doses scopolamine does not disrupt but actually enhances latent inhibition (Barak and Weiner, 2009). These latter observations are consistent with our conclusion that long delay conditioning results in at least some degree of latent inhibition, likely mediated by the development of strong associations between the CS and the context during long delay exposure to the CS (cf. Escobar et al., 2002).

The present data suggest that stimuli should not be viewed as unitary events, but as a sequence of temporally linked events that can carry different types of information about the conditions that precede and follow them. The use of associative strength as a singular summary statistic for the associative value of a stimulus may be an oversimplification that ignores that different segments of the CS may carry different associative meanings with respect to US delivery and US omission.

# FUNDING

This research was partially supported by a Research Fellowship from Auburn University and NIH Grant 81269 to Martha Escobar. Elizabeth Rahn was supported by an Undergraduate Research Fellowship from Auburn University and a Psi Chi Undergraduate Research Grant.

# ACKNOWLEDGMENTS

ME is now at Oakland University, MI. This research was supported by a Research Fellowship from Auburn University and NIH Grant 81269 to ME. ER was supported by an Undergraduate Research Fellowship from Auburn University and a Psi Chi Undergraduate Research Grant. Experiment 1b was part of a thesis submitted by ER to the Honors College of Auburn University. We thank Ralph R. Miller for their comments on a preliminary version of this manuscript. Thanks are also due to Daniel Bradford, Whitney Kimble, Tyson Platt, Heather Sissons, Dale Smith, and Seth Wilhelmsen for their assistance with data collection.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Escobar, Suits, Rahn and Arcediano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Dynamics of auditory working memory**

#### *Jochen Kaiser\**

*Institute of Medical Psychology, Goethe University, Frankfurt am Main, Germany*

Working memory denotes the ability to retain stimuli in mind that are no longer physically present and to perform mental operations on them. Electro- and magnetoencephalography allow investigating the short-term maintenance of acoustic stimuli at a high temporal resolution. Studies investigating working memory for non-spatial and spatial auditory information have suggested differential roles of regions along the putative auditory ventral and dorsal streams, respectively, in the processing of the different sound properties. Analyses of event-related potentials have shown sustained, memory load-dependent deflections over the retention periods. The topography of these waves suggested an involvement of modality-specific sensory storage regions. Spectral analysis has yielded information about the temporal dynamics of auditory working memory processing of individual stimuli, showing activation peaks during the delay phase whose timing was related to task performance. Coherence at different frequencies was enhanced between frontal and sensory cortex. In summary, auditory working memory seems to rely on the dynamic interplay between frontal executive systems and sensory representation regions.

#### *Edited by:*

*Timothy M. Ellmore, The City College of New York, USA*

#### *Reviewed by:*

*Jonathan R. Folstein, Florida State University, USA Christine Lefebvre, Centre de Recherche de L'institut Universitaire de Gériatrie de Montréal, Canada*

#### *\*Correspondence:*

*Jochen Kaiser, Institute of Medical Psychology, Goethe University, Heinrich-Hoffmann-Strasse 10, 60528 Frankfurt am Main, Germany j.kaiser@med.uni-frankfurt.de*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 26 March 2015 Accepted: 25 April 2015 Published: 11 May 2015*

#### *Citation:*

*Kaiser J (2015) Dynamics of auditory working memory. Front. Psychol. 6:613. doi: 10.3389/fpsyg.2015.00613* **Keywords: review, event-related potentials, spectral activity, gamma, coupling, spatial processing, non-spatial processing**

# **Introduction**

Working memory allows the temporary storage of relevant information and its task-dependent manipulation. It is involved in many higher cognitive functions and thus constitutes a fundamental function of our brain. While most previous research has focused on visual working memory (Drew et al., 2006; Luck and Vogel, 2013), less is known about the neural correlates of auditory working memory (AWM). This brief review summarizes some of the main findings on auditory shortterm or working memory (both terms will be used interchangeably) studies in humans. The focus will be on the dynamics of working memory-related processes; therefore the review is limited to studies assessing non-invasive measures of neural activation with a high temporal resolution, i.e., electro- or magnetoencephalography (EEG and MEG). Most of this work has considered eventrelated potentials (ERPs), but some investigations have looked at spectral activity and at oscillatory coupling between cortical sources.

Evidence from both types of studies speaks against the existence of a single working memory store for auditory information. Instead, activation patterns vary with the type of memorized auditory information, suggesting that working memory involves the same systems that underlie perceptual processing. Sound feature-specific activation differences were particularly obvious for comparisons between sound identity and location, i.e., stimulus parameters that are processed in topographically distinct cortical regions (Rauschecker and Tian, 2000).

# **Auditory Working Memory for Non-spatial Sound Features**

The short-term retention of pitch elicits a load-dependent frontal negative wave. Using non-verbal, pure-tone stimuli to avoid phonological or semantic processing, memory load effects were tested by presenting either one 200-ms pure tone to both ears or two different stimuli to each ear (Guimond et al., 2011). A sustained anterior negative wave (SAN) during the 2-s delay interval showed higher amplitudes for two than one to-be-remembered stimulus. Control experiments confirmed the role of the SAN for short-term memory processing by excluding a mere sensorydriven response or internal rehearsal. Comparison with a visual short-term memory paradigm showed that the SAN during retention was specific to the auditory task (Lefebvre et al., 2013). A memory load-sensitive SAN was also observed during the retention of sounds differing in timbre instead of pitch (Nolden et al., 2013). The cortical generators of this wave were assessed with MEG. During AWM for tone sequences, source localization revealed memory load-dependent activations in bilateral superior temporal, superior parietal and frontal cortex (Grimault et al., 2009). A study involving the comparison of tone sequences of different lengths identified several brain areas whose activation correlated with the number of successfully memorized items (Grimault et al., 2014). These included bilateral superior/middle temporal cortex and several regions in bilateral frontal cortex. This source topography partly overlapped with fMRI results (Gaab et al., 2003; Koelsch et al., 2009) and suggested that the retention of simple acoustic features involves the sustained activation of sensory representations in addition to frontal executive regions.

The frontal negativity is a robust phenomenon that was also observed in ERP studies employing verbal sounds that may elicit semantic processing beyond low-level acoustic storage. A sustained frontal negative shift was larger for aurally than visually presented digits (Lang et al., 1992). Similarly, a memory loaddependent frontal negativity was larger for spoken than written syllables (Ruchkin et al., 1997), whereas visual stimuli gave rise to a posterior positivity. The role of the prefrontal cortex for AWM was further supported by a study in patients with frontal cortex lesions. They showed reduced activations both in auditory areas and prefrontal cortex and failed to attenuate their responses to distracting tones during the delay period of an AWM task (Chao and Knight, 1998).

While ERP investigations focus on time-locked broad-band activity, spectral analysis is typically performed on single-trial basis, maintaining activity that is not phase-locked to a defined event. Analyses of spectral activity in different frequency bands may inform about aspects of processing not captured by ERPs. For example, activity in the alpha band (8–12 Hz) has been related to active inhibition of interfering processing (Klimesch et al., 2007; Jensen and Mazaheri, 2010), and gamma activity (*>*30 Hz) has been linked to object representations, attention and memory (Kaiser and Lutzenberger, 2003; Jensen et al., 2007). Moreover, coherence or phase synchronization calculated on the basis of spectral signals provide information about cortico-cortical interactions.

Increases of spectral power and synchronization over frontal cortex characterized AWM for different types of non-spatial sounds. During the maintenance phase of an AWM task requiring the memorization of sound durations, we found increased gamma activity (70–80 Hz) over prefrontal cortex (Kaiser et al., 2007b). A similar result was obtained for artificial syllables varying in voice onset time and formant structure. Here gamma activity (65–70 Hz) was increased over left anterior temporal/inferior frontal cortex (Kaiser et al., 2003). Gamma coherence between the putative sensory representation regions and prefrontal cortex showed a sustained increase across the delay phase (Kaiser et al., 2005), possibly reflecting enhanced cross-talk between storage and executive networks underlying stimulus maintenance. Right frontal alpha and right temporal beta activity correlated positively with memory load during the delay period of a Sternberg-type task using natural syllables (Leiberg et al., 2006b). The alpha increase was consistent with other auditory (Luo et al., 2005; Kaiser et al., 2007a; Kawasaki et al., 2010) and visual shortterm memory studies (Sauseng et al., 2005, 2009) and may have reflected increased executive demands and/or the suppression of irrelevant processing.

# **Auditory Working Memory for Spatial Sound Features**

MEG studies investigating spatial AWM tasks with filtered noise sounds found gamma activity over regions of the putative auditory dorsal space processing stream (Rauschecker and Tian, 2000). When comparing auditory spatial working memory with a nonmemory contral task, both maintenance and retrieval of lateralized sounds were accompanied by increased parietal gamma activity (55–70 Hz) (Lutzenberger et al., 2002; Leiberg et al., 2006a). In addition, enhanced frontal gamma activity was found during the final 100 ms of the maintenance period. As in our study with artificial syllables described above (Kaiser et al., 2003), gamma coherence between the putative sensory representation regions and frontal cortex was increased during the delay phase.

Inspired by the hypothesized role of gamma activity for sensory representations (Jensen et al., 2007), we searched for spectral signatures of the short-term maintenance of individual auditory stimuli by contrasting delay-period activations between individual memory stimuli. We performed Fast Fourier Transforms on single trials for about 1.5 Hz-wide frequency bins across the gamma range. The problem of multiple testing was addressed by applying a statistical probability mapping based on permutation tests. When frequency ranges showing significant differences between stimuli were identified, the data were filtered in these frequencies to assess spectral activity time courses.

We identified stimulus-specific components of gamma activity during the maintenance of different sound lateralization angles (Kaiser et al., 2008). Sample stimuli were 200-ms noises convoluted with head-related transfer functions to create virtual lateralization angles of either 15° or 45° with respect to the midsagittal plane. After an 800-ms delay period, these stimuli had to be compared with test stimuli that could either be presented

with the same, with a more medial or a more lateral angle. Participants were assigned to two groups who were presented with only right- or left-lateralized stimuli, respectively. For both groups, stimulus-specific gamma activity (55–70 Hz) was found over occipito-parietal cortex contralateral to stimulation. This topography could be considered consistent with the auditory dorsal "where" stream, but might also indicate an involvement of visual spatial imagery. Gamma activity was most pronounced at latencies of 200–500 ms after sound offset, i.e., in the middle of the 800-ms delay phase.

This timing of stimulus-specific gamma activity could either have reflected delayed responses to memory sounds or preparatory activations preceding the test stimuli. To decide between these possibilities, a follow-up study used delay durations of either 800 or 1200 ms in separate recording blocks (Kaiser et al., 2009b). The main results of this study are depicted in **Figure 1**. We replicated stimulus-specific gamma activity (75–100 Hz) over contralateral posterior cortex. For the shorter delay duration, this activity peaked again in the middle of the maintenance phase, i.e., about 400 ms after the offset of the memory stimulus. In contrast, stimulus-specific activity was clearly delayed for the longer delay duration, peaking at around 800 ms after memory stimulus offset. In other words, gamma activity reached its maximum 400 ms before the onset of the test stimulus for both delay durations. The time course of stimulus-specific activity thus seemed to reflect the activation of task-relevant information in preparation for comparison with the test sound.

We also examined the relationship between stimulus-specific gamma activity and task performance. If these signals reflect the activation of task-relevant information, they should predict the accuracy of the comparison with the test stimuli. In both studies (Kaiser et al., 2008, 2009b), we found positive correlations between task performance and gamma activity during the final part of the delay phase. Exploring the nature of this relationship further, we compared gamma activity time courses between better and poorer performers. Interestingly, neither group differed in the absolute magnitude of stimulus-specific activations but in their timing. As shown in **Figure 2**, better performers showed a more sustained representation of the memorized information until the end of the delay period. Correlations between gamma activity and performance have been reported in a wide variety of paradigms (Rieder et al., 2011). Here they supported the functional relevance of activating representations of the sample sounds for accurate comparisons with the test stimuli.

# **Direct Comparisons of Auditory Spatial Versus Non-spatial Working Memory**

Studies that compared working memory for sound locations and sound patterns directly supported the notion of dorsal and ventral streams for the processing of auditory spatial and nonspatial information, respectively (Rauschecker and Tian, 2000). In line with this dual-stream model, positive ERP deflections at 300–500 ms after both memory and test stimuli were found at fronto-temporal electrodes for a non-spatial AWM task and at centro-parietal electrodes for a spatial task with 500-ms noise bursts (Alain et al., 2001). Positive maintenance-related ERP shifts during the non-spatial task are at odds with the SAN reported above (e.g., Guimond et al., 2011; Lefebvre et al., 2013). However, several differences between studies make it hard to compare these findings directly: Alain et al. (2001) used longer and spectrally richer sounds and a much shorter delay duration than the studies reporting an SAN (500 versus 2000 ms, respectively), raising the possibility that echoic memory may have been involved rather than short-term memory. Moreover data were shown from a few selected (e.g., fronto-temporal) electrode sites only, whereas the SAN was most pronounced at midline fronto-central sites.

Differences between auditory location and pitch working memory were found also for the N1 component to pure tones serving as test stimuli, suggesting an early onset of segregated processing at about 100 ms (Anourova et al., 2001). The N1 findings were replicated in a subsequent study requiring the memorization of either location or frequency of short sound sequences (Anurova et al., 2003). In addition, sample sounds elicited more negative ERPs at 200 and 400 ms in the frequency than location task and more positive ERPs at 450–650 ms for the location than frequency task. Source analysis of late positive potentials to probe stimuli revealed a predominant involvement of middle temporal cortex in pitch and of occipito-temporal regions in location processing (Anurova et al., 2005). In contrast, a late slow wave was modulated by memory load but did not differ between tasks.

In line with the studies reported above that used simple sounds, an *n*-back working memory task with environmental sounds presented at different virtual locations revealed segregation between spatial and non-spatial processing from about 200 ms onwards in auditory association cortex and fronto-parietal cortex (Alain et al.,

2009). In summary, these ERP studies showed an early topographical segregation during encoding and retrieval of spatial versus non-spatial auditory information in accordance with the dual-stream model.

Following up our studies on stimulus-specific gamma activity by comparing non-spatial and spatial AWM directly, we demonstrated the task-dependence of stimulus-specific activations (Kaiser et al., 2009a). The same filtered noise sounds that could differ in frequency and perceived lateralization were used in both tasks. Separate components of gamma activity (50–90 Hz) during the delay phase distinguished between both stimulus features. Different lateralization angles were represented by posterior gamma activity, and different sound frequencies, by fronto-central components. These feature-specific activations peaked at 200–300 ms before the onset of the test stimulus and showed a clear task-dependence: amplitude modulations were observed only when the represented feature was task-relevant. Task performance was correlated both with enhanced activity for the task-relevant stimulus attribute and reduced activity for the task-irrelevant feature. This study showed that representations of auditory features are reactivated depending on task demands and that performance benefits from activating task-relevant and attenuating task-irrelevant representations.

### **Summary**

The present findings are consistent with the notion of working memory as an emergent property relying on the dynamic interplay between attentional and sensory systems (Pasternak and Greenlee, 2005). EEG and MEG provide measures of neural activity with a sufficiently high temporal resolution to distinguish encoding, maintenance and retrieval in AWM. While there is some evidence for task-specific differences in ERP responses during encoding (Anurova et al., 2003; Lehnert and Zimmer, 2006), most of the studies have focused on the short-term retention of acoustic information. Stimulus maintenance is reflected by sustained ERP deflections whose topography varies with the task-relevant stimulus feature. The maintenance of non-spatial sound attributes like pitch is accompanied by a fronto-central negativity (Guimond et al., 2011). This slow wave reflects variations in memory load and is topographically distinct from more posterior activations during visual working memory (Lefebvre et al., 2013). Source analysis has demonstrated generators in auditory and frontal areas, suggesting that the short-term retention of pitch is partially accomplished by the prolonged activation or the reactivation of the brain regions underlying the perceptual processing of pitch (Grimault et al., 2014). In contrast, sound location seems to be processed by more posterior, parieto-occipito-temporal regions. The topographical differences between sound frequency versus location processing in AWM are consistent with the model of segregated auditory ventral and dorsal streams, respectively (Alain et al., 2001; Kaiser and Lutzenberger, 2003). ERP work comparing individual sound features has demonstrated differential processing of spatial versus non-spatial sound parameters starting from 100 ms after stimulus onset. These differences pertained mainly to encoding, early maintenance and retrieval but were less evident during the later part of a longer retention period (Anurova et al., 2003). Analyses of spectral signals have demonstrated sound feature-specific increases of gamma activity both during maintenance and retrieval. However, representations of task-relevant information were not sustained across the delay period but were temporally related to the onset of the test stimulus (Kaiser et al., 2009b). In contrast, coherence between sensory representation regions and prefrontal cortex showed a sustained increase across the maintenance phases of spatial and non-spatial AWM paradigms (Lutzenberger et al., 2002; Kaiser et al., 2003). In summary, both encoding and retrieval are characterized by the enhanced processing of task-relevant stimuli or

**by Kaiser et al. (2009b).**

stimulus attributes. Maintenance relies on a combination of a prolonged activation or a reactivation of sensory representations and an activation of frontal executive networks with increased coupling between both sets of regions.

While the majority of studies have focused on the maintenance aspect of working memory, research on mental operations on stored sounds is very limited. Working memory operations include the selection of one stored item amongst others, updating the focus of attention or the content of working memory with new items, rehearsal and coping with interference (Bledowski et al., 2010). Shifts of attention to auditory objects held in working memory were associated with the activation of fronto-parietal attention systems, and further temporal and parietal activations distinguished between spatial and category-related attention cues (Backer et al., 2015). Mental transformation and updating of auditory memory contents involved increased frontal and temporal theta power and enhanced fronto-temporal theta phase synchrony (Kawasaki et al., 2010, 2014).

# **References**


While we have gained substantial knowledge about EEG/MEG signals sensitive to the number of auditory items held in shortterm memory, future studies may focus on the neuronal signature coding the precision of individual items (Kumar et al., 2013; Ma et al., 2014). This requires clever experimental designs, sophisticated behavioral analyses and fine-grained analyses of EEG/MEG signals. Furthermore, analyzing connectivity measures in EEG/MEG may help to identify the mechanisms underlying dynamic interactions between the fronto-parietal "working" system that prioritizes, modifies and protects auditory items from interference and the storage system that codes each item representation by a singular activity pattern. These analyses may help to reveal further communalities and differences between visual and auditory working memory.

# **Acknowledgment**

I am grateful to Christoph Bledowski for helpful comments.

of pitch objects in acoustic short-term memory. *Psychophysiology* 48, 1500–1509. doi: 10.1111/j.1469-8986.2011.01234.x


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Kaiser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# To Switch or Not to Switch: Role of Cognitive Control in Working Memory Training in Older Adults

*Chandramallika Basak\* and Margaret A. O'Connell*

*Center for Vital Longevity, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX, USA*

It is currently not known what are the best working memory training strategies to offset the age-related declines in fluid cognitive abilities. In this randomized clinical double-blind trial, older adults were randomly assigned to one of two types of working memory training – one group was trained on a predictable memory updating task (PT) and another group was trained on a novel, unpredictable memory updating task (UT). Unpredictable memory updating, compared to predictable, requires greater demands on cognitive control (Basak and Verhaeghen, 2011a). Therefore, the current study allowed us to evaluate the role of cognitive control in working memory training. All participants were assessed on a set of near and far transfer tasks at three different testing sessions – before training, immediately after the training, and 1.5 months after completing the training. Additionally, individual learning rates for a comparison working memory task (performed by both groups) and the trained task were computed. Training on unpredictable memory updating, compared to predictable, significantly enhanced performance on a measure of episodic memory, immediately after the training. Moreover, individuals with faster learning rates showed greater gains in this episodic memory task and another new working memory task; this effect was specific to UT. We propose that the unpredictable memory updating training, compared to predictable memory updating training, may a better strategy to improve selective cognitive abilities in older adults, and future studies could further investigate the role of cognitive control in working memory training.

#### *Edited by:*

*Timothy Michael Ellmore, The City College of New York, USA*

#### *Reviewed by:*

*Tilo Strobach, Medical School Hamburg, Germany Claudia C. Von Bastian, University of Colorado Boulder, USA*

*\*Correspondence:*

*Chandramallika Basak cbasak@utdallas.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 30 October 2015 Accepted: 04 February 2016 Published: 02 March 2016*

#### *Citation:*

*Basak C and O'Connell MA (2016) To Switch or Not to Switch: Role of Cognitive Control in Working Memory Training in Older Adults. Front. Psychol. 7:230. doi: 10.3389/fpsyg.2016.00230*

Keywords: working memory training, cognitive control, healthy aging, strategies of training, individual differences

# INTRODUCTION

In order to maintain quality of life until late adulthood and decrease the health burden of a rapidly aging society, it is important that we develop an understanding of the principles of cognitive optimization, because gains in longevity have not been matched by maintenance of cognitive function into very old age. In particular, fluid cognition declines rapidly with age, particularly after 60 years, and includes abilities such as episodic memory, reasoning, and multi-tasking (Park and Bischof, 2010; Stine-Morrow and Basak, 2011). A plausible reason for impairments in these cognitive abilities with age is the disruption of the fronto-parietal brain networks that underlie working memory and cognitive control (Park and Reuter-Lorenz, 2009; Raz et al., 2010). One proposed principle of cognitive optimization is the enhancement of cognitive control in working memory, particularly in older adults (Basak and Zelinski, 2013).

Both cognitive control and working memory have been argued to be the underlying "core" components of fluid cognition (Stine-Morrow and Basak, 2011). Working memory, the ability to concurrently store and actively transform information (e.g., Mayr et al., 1996), is related to many complex cognitive skills, e.g., reasoning (Kyllonen and Christal, 1990). It also underlies many age-related deficits in fluid cognition, including episodic memory (Verhaeghen and Salthouse, 1997; Verhaeghen et al., 2005; Lewis and Zelinski, 2010). Moreover, significant and early declines of verbal episodic memory have long been considered to be the best cognitive marker of the earliest stages of Alzheimer's disease (Dubois et al., 2007). Therefore, training cognitively healthy older adults in these "core" cognitive components may not merely improve their fluid cognitive abilities, but can also potentially delay the onset of memory-related disorders, such as Alzheimer's disease.

The primary aim of the current study was to evaluate two different strategies to optimize cognition in older adults over a short period time by using theory-driven, simple cognitive training protocols. An understanding of the role of cognitive control in working memory, including the variations in the retrieval-related temporal dynamics and its adaptability with extensive practice, can inform us about the best strategies to use to enhance cognitive vitality into late adulthood. Such informed principles of cognitive optimization may potentially delay the onset of pathological memory-related disorders in the healthy aging population, and in turn, decrease the medical-care burden of a rapidly aging society.

# Predictability of Focus Switching and Aging in Working Memory

Cowan's hierarchical model of working memory (Cowan, 1988, 2001) posited a two-tier hierarchy based on the accessibility of information via a zone of immediate access, labeled the *focus of attention* (FoA), and a larger activated portion of long-term memory (LTM), where the items are stored in a readily available but not in an immediately accessible state. One of the most intriguing findings in cognitive psychology has been the limited capacity of the FoA. For tasks requiring serial attention processes, e.g., the continuous memory updating paradigms (McElree, 2001; Oberauer, 2002, 2006; Verhaeghen et al., 2004; Verhaeghen and Basak, 2005; Vaughan et al., 2008; Basak and Verhaeghen, 2011a,b), the capacity of FoA has been limited to just one item. If more than one information unit was to be processed, the other information units were temporarily stored in the outer store, while the current information in the FoA was updated. To process an item stored in the outer store, a retrieval operation was required that shifted the item from the outer store into the FoA (*focus switch*). This focus switch process increased the retrieval latency of that information (Verhaeghen and Basak, 2005). Therefore, measurement of the capacity of the FoA has typically involved the assessment of the *focus switch costs,* which is considered to be a measure of cognitive control (Garavan, 1998; Verhaeghen and Basak, 2005). Retrieval dynamics of the zone outside the FoA have been disputed between two prominent theories. One theory has proposed that these retrieval dynamics, viz., focus switch costs, are constant (McElree, 2001), whereas the other theory has argued that they increase as a function of the number of items in the outer store (Oberauer, 2002). Due to this disagreement between the two theories, we shall here refer to the zone outside the FoA as the "*outer store*" (Verhaeghen et al., 2004; Verhaeghen and Basak, 2005).

The current study was guided by a previously published hierarchical theory of working memory (Verhaeghen et al., 2004; Verhaeghen and Basak, 2005; Basak, 2006; Vaughan et al., 2008; Basak and Verhaeghen, 2011a,b; Basak and Zelinski, 2013), henceforth referred to as the Theory of Working Memory Adaptability (ToWMA; see **Figure 1**). This theory is both significant and novel in integrating three different families of results regarding the hierarchies of working memory (Cowan, 1988; McElree, 2001; Oberauer, 2002, 2006; Verhaeghen et al., 2004; Verhaeghen and Basak, 2005; Basak, 2006; Vaughan et al., 2008; Basak and Verhaeghen, 2011a,b) by accounting for the probe-cue expectancy that is missing from the previous hierarchical models. Importantly, this theory makes specific predictions regarding the retrieval-related temporal dynamics in the outer store, the change in these dynamics over time, and the best strategies to improve a variety of untrained fluid cognitive skills in both younger and older adults.

According to ToWMA, the three-tier working memory architecture constitutes of an inner *focus of attention* of one information unit, an *outer store* where information that needs active manipulation and subsequent updating is maintained, and a *passive store* where information that does not need any updating is held for subsequent retrieval (Basak and Zelinski, 2013). ToWMA also posits that the passive store is firewalled against the active zones, viz., FoA and outer store. ToWMA differs from other models regarding the functions of FoA as well as the retrieval dynamics of the outer store. According to ToWMA, focus of attention has three functions – directing attention to the relevant information, retrieving the information, and subsequently updating the information (Basak and Verhaeghen, 2011a). The predictability of probe-cue expectancy has been hypothesized to affect FoA's ability to direct attention to the relevant target. This, in turn, can affect the focus switch cost of information units in the outer store.

As mentioned before, there is an ongoing debate regarding the retrieval dynamics of information units in the outer store (McElree, 2001; Oberauer, 2002). According to Oberauer (2002), the focus switch cost increases as a function of set size. This is supported by models of serial processing, which posit that when searching for a specific item from a set of multiple items held in memory, response latency increases as a function of the number of items in the memory set. This suggests that the items in the memory set are examined individually until the target item is found. However, such increasing focus switch costs have not been replicated in other studies (McElree, 2001; Verhaeghen and Basak, 2005; Vaughan et al., 2008).

According to ToWMA, increases (or lack of it) in the focus-switch costs are hypothesized to be related to probe-cue expectancies. Unpredictable, compared to predictable, probe-cue expectancies engendered greater demands on cognitive control, indexed by the focus switch costs (Basak and Verhaeghen,

2011b). For example, in the N-back paradigms (McElree, 2001; Verhaeghen and Basak, 2005; Vaughan et al., 2008), where the probe-cue was always preceded by the same N positions, the expectancy was fixed or predictable. This predictable expectancy allowed the focus to be directed to the relevant target without much overhead cost of search or interference from other competing cues. This resulted in a constant focus switch cost from *N* = 2 to 5 (Verhaeghen and Basak, 2005; see top right of **Figure 1**). This pattern of constant focus switch cost for items in the outer store was unchanged even when the task difficulty was increased (Vaughan et al., 2008). Yet, in the unpredictable N-back task, where the position of the probe-cue within an N was random, the focus switch cost increased with N for N *>* 1 (Oberauer, 2002; Basak and Verhaeghen, 2011b; see bottom right of **Figure 1**). Such increase in latencies was considered to be an evidence of either a search process or a result of increased interference between competing active cues; alternative explanations, such as, lag of the last switch, were ruled out (Basak and Verhaeghen, 2011a). Also, the focus switch costs (*N* = 2 vs. 1) were of greater magnitude for the unpredictable, compared to the predictable, paradigms (Garavan, 1998; Verhaeghen and Basak, 2005; Basak and Verhaeghen, 2011a,b).

These results indicated that the unpredictable probe-cue expectancies engender greater cognitive control than the predictable probe-cue expectancies. Since age-related deficits are marked in cognitive control, ToWMA proposed that training in unpredictable versions of memory updating paradigms would engender greater transfer to tasks of fluid cognition in older adults, particularly those subserved by cognitive control (Basak and Zelinski, 2013).

# Working Memory Training and Aging

Results are mixed regarding transfer of N-back training to fluid abilities, where the training was typically adaptive (Jaeggi et al., 2008, 2010; Redick et al., 2013). Also, a lack of transfer in older adults, compared to younger adults, from working memory training has been attributed to age-related differences in patterns of brain activation for the trained and transfer tasks. In an fMRI study, younger adults showed overlap in the left striatum (thought to serve as a gate-keeping function for working memory) between the trained memory updating task and the N-back transfer task. In contrast, older adults showed no such overlap (Dahlin et al., 2008). In keeping with this argument, we have found that individual differences in striatal volume in younger (Erickson et al., 2010) and prefrontal cortex in older (Basak et al., 2011) are predictive of complex skill acquisition.

On the other hand a recent meta-analyses (Karbach and Verhaeghen, 2014), which evaluated the effects of task switch training versus working memory training on both younger and older adults, found that both types of training, particularly working memory, engendered transfer in both age groups. However, this meta-analysis did not evaluate the role of cognitive control in working memory training. It is plausible that any cognitive improvements to untrained tasks caused by working memory training resulted from training the ability to sustain and effectively control attention to the relevant information. Importantly, prior studies have only focused on immediate performance gains, compared to baseline performance, of the trained individuals; therefore, leaving the long-term benefits of working memory training in older adults unknown.

Although studies focusing on cognitive training in older individuals have failed to provide evidence for broad transfer to untrained cognitive skills (for a review, see Stine-Morrow and Basak, 2011), broader transfer has been observed in younger adults where cognitive control was trained through shifting task priorities in complex video games (Gopher et al., 1989; Kramer et al., 1995, 1999b; Boot et al., 2010; Lee et al., 2012a,b; Prakash et al., 2012; Voss et al., 2012). This approach, called variable priority training, compared to fixed priority training, relies more on the attentional control networks in the brain (Voss et al., 2012). Variable-, compared to fixed-, priority training in dualtask improved performance in a near transfer task (another dual-task) and two far transfer tasks (a running memory task requiring memory updating and a scheduling task requiring cognitive control) in both younger and older adults (Kramer et al., 1995). Variable-priority training also reduced age-related differences in the trained task and yielded greater long-term benefits after 6–8 weeks of completion of training (Kramer et al., 1999b). Moreover, fMRI studies in younger adults have shown differential increases in functional connectivity in the attentional control networks (e.g., fronto-parietal, fronto-executive) favoring variable training on both the trained task and a near transfer dual-task, suggesting that the variable-priority training taught generalizable cognitive control skills (Voss et al., 2012).

Unlike the fixed priority training, variable priority training is typically individualized adaptive as well as engages greater cognitive control by unpredictably shifting task priorities. Therefore, it is not possible to determine which of the two – greater cognitive control or individualized adaptive nature of the training – is the mechanism of transfer for variable-priority training. Moreover, new evidence suggests that working memory training may be more beneficial than dual-task training in inducing far transfer in older adults (Karbach and Verhaeghen, 2014). To date, studies conducted on variable priority training in older adults have used dual-task as the training paradigm, not working memory (Kramer et al., 1995; Erickson et al., 2007). Moreover, dual-task training in older adults has typically shown limited transfer, usually to another dual-task situation (Kramer et al., 1995; Bherer et al., 2005; Strobach et al., 2015).

On the other hand, working memory updating, compared to dual-tasking, has been argued to be more predictive of fluid intelligence (Friedman et al., 2006). Therefore, unpredictable memory updating training may have the potential to engender broader transfer to fluid cognitive skills than dual-task training. But we are not aware of any research that has explored the potential of unpredictable memory updating training – an approach where items to be updated are randomly retrieved. Existing research utilizing working memory updating tasks to train cognition have consistently employed predictable probe-cue paradigms, such as the N-back task. They have yielded mixed results regarding transfer of training (Jaeggi et al., 2008, 2010; Redick et al., 2013).

The current study was aimed to fulfill the afore-mentioned gap in the literature by explicitly assessing the prediction that unpredictable probe-cue memory updating training would be maximally effective for older adults in engendering broader transfer to fluid cognition. We manipulated probecue predictability – unpredictable vs. predictable – across the two types of training. Both trainings were not individualized adaptive. That is, all participants had to undergo all set-sizes in each training session, irrespective of their level of performance. The main research goal was to investigate the predictions of ToWMA, by comparing two different strategies of training working memory – one engendering greater demands on cognitive control than the other. According to ToWMA, the degree of probe-cue predictability affects the demands on cognitive control, such that, the unpredictable training (UT) paradigm required more cognitive control than the predictable training (PT) paradigm. Therefore, if cognitive control is the underlying mechanism of transfer in working memory training, then immediate post-testing gains (i.e., just after completion of the training) and, to a lesser extent, delayed post-testing gains (i.e., 1.5 months after completion of the training) were expected more for the UT in the tasks of fluid cognition.

# MATERIALS AND METHODS

# Participants

Forty-six older adults, between 60 and 86 years old, were recruited for this randomized clinical double-blind experiment. Twenty-nine participants were female. All adults were right handed. Inclusion criteria included a minimum of a high school education, normal corrected vision of 20/30, and normal or prehypertension range of blood pressure (*<*140/90 mm HG) with or without medication. Exclusion criteria included color blindness, low familiarity of computer use, Mini Mental Status Examination (MMSE-2) *<* 25, and prior involvement in any type of cognitive training studies. Recruitment was conducted through flyers posted around the University of Texas at Dallas campus and surrounding businesses, and through advertisements posted on community newspapers. Participants were compensated at \$10/h for their time and effort and were provided a bonus (Wave 1: \$50; Wave 2 that included delayed post-testing: \$100) for completing the multi-session experiment.

### Power Analysis

A total sample size of 46 (combining across the two training groups) provides us with more than 90% power to detect a moderate effect size (f) of 0.25 at the 0.05 alpha level for the interaction term in a 2(Training\_type) × 2(Session) ANOVA (http://www.psycho.uni-dusseldorf.de/abteilungen/gpower3).

Therefore, data collection was stopped when 46 participants were recruited for the study.

**Figure 2** shows the flow of participants and timeline of the study. This study was conducted in two waves, Wave 1 and Wave 2. Participants in Wave 2 were recruited over a longer period that included delayed post-testing. Forty-three participants were used in the analyses (*M*age = 68.81, *SD*age = 5.18, *M*education = 15.05, *SD*education = 2.54), because three participants withdrew from the experiment due to personal reasons (e.g., health).

# Apparatus

All computer-based cognitive tests were programmed in E-prime (Psychology Software Tools, Pittsburgh, PA, USA). The computer-based cognitive tests were collected on networked PC computers with 22" Dell P2213 monitors, set to a 60 Hz 1920 × 1080 resolution.

# Procedure

Participants were randomly assigned to one of the two types of training – PT or UT. Participants in the two training groups did not differ in age, education, gender, or mental status (see **Table 1**). The paradigms used for the two types of training were exactly the same with the exception of the temporal dynamics of the probe sequence, which were predictable for PT and unpredictable for UT. The differences in the temporal dynamics of probe retrieval between the two types of training engendered different levels of cognitive control – high control in UT and low control in PT. Participants were trained on the respective paradigms over five 1 h training days spanning across 2 weeks, such that 1 week had two sessions and the other week had three sessions. Prior to training, all participants completed a 2 h battery of transfer tasks that assessed their *baseline* pre-training performance. They underwent assessments of these transfer tasks again after the completion of their 2-week long training (*immediate posttesting*), allowing us to assess differential immediate post-test training benefits between the two groups. In addition to these two assessment sessions, in Wave 2, participants were asked to return back for a final assessment of the transfer tasks after a 1.5 month retention period (*delayed post-testing*). This allowed us to assess differential long-term benefits of training between the two groups. A small attrition rate of 6.52% was observed in this longitudinal 2-month study. No participant was provided any additional training during the retention period in Wave 2. Both groups of participants were told that they were participating in a training study. Therefore, there was no difference in motivation provided to the participants.

# Training Tasks: The N-Match Paradigm

The N-Match paradigm was adapted from our previously designed modified N-back (Verhaeghen et al., 2004; Verhaeghen and Basak, 2005) or random N-back (Basak and Verhaeghen, 2011a) paradigms. In these prior studies, the N differently colored digits were presented in N virtual columns, allowing for both color and location to act as retrieval cues. Multiple cues (location and color) in our modified N-back task facilitated retrieval latency and performance accuracy, when compared to a typical N-back task, where the items were presented in the same color and location (Vaughan et al., 2008). But the location cues require saccadic eye movements that can increase as a function of memory set-size, i.e., N. Therefore, in the current paradigm, all digits appeared in the same location at the center of the screen, with only color as a retrieval cue. That is, the current digit had to be compared with the digit presented immediately before in the same color. This allowed us to compare response times (RTs) of the predictable and the unpredictable versions of the N-Match task without saccadic latencies confounding the RTs, which in turn could exaggerate any differences between the two types of trained tasks. The N in the N-Match task represented the number of different colored information units that a participant had to simultaneously maintain and update during a trial run.

Before each trial run of 40 trials, distinct encoding digits were shown sequentially in N different colors. N varied from 1 to 4, with the probe-color for *N* = 1 as yellow, *N* = 2 as yellow and pink, *N* = 3 as yellow, pink, and red, and *N* = 4 as yellow, pink, red, and green. After the N encoding digits were presented, probe digits were presented on the screen one at a time in one of the N colors. Participants had to compare the identity of the current digit with the digit shown *immediately* before in the same color. If the current digit matched the previously presented same-colored digit, the participant had to press the 'z' key with their left forefinger. If the two digits did not match, the 'm' key needed to be pressed with their right forefinger, and the previous digit in this color needed to be updated with the new one for subsequent comparison. Half of the trials required such updating. The task was self-paced, with a blank screen of 300 ms appearing before the onset of the next digit. This blank mask caused jittering, allowing the participants to distinctly perceive two subsequent identical digits of the same color. A digit stayed on the screen until a key press response was made (see **Figure 3A** for an illustration of a trial run). In line with previous research, the first N encoding-only trials were discarded and only the probe-recognition trials were retained for the analyses (Verhaeghen et al., 2004; Verhaeghen and Basak, 2005; Basak and Verhaeghen, 2011a). The digits were shown in size 18, Courier New font against a black background. Computers were placed approximately 80 cm from the participants.

For *N* = 1, the digits were presented in just one color. Therefore, the participants had to compare the current digit with the previously shown digit, making this equivalent to a typical 1-back task.

#### Unpredictable N-Match Paradigm

For *N >* 1, more than one colored digit was presented, necessitating a switch from one color cue to another. Multiple colored units also varied the temporal dynamics of probe presentation as a function of the type of training. The probe (i.e., the color) sequence in the unpredictable N-Match task was random. For example, for *N* = 3, after the initial encoding stimuli in three colors, viz., red (R), yellow (Y), and pink (P), were shown, a probe-color sequence could be RYYYPPYRRR (see **Figure 3A** for an example of an UT trial). Half of the trials in a trial run were switch trials, where the probe-color for the current digit was different from that of the previously presented digit in the sequence. As mentioned before, half of the trials in a trial run were also update trials.

#### TABLE 1 | Demographic information including mean (SD) of age, education and mental status.


*Mean differences for UT and PT groups were tested using independent samples t-test for continuous variables and chi-square tests for categorical variables. None of the tests were significant at p < 0.05.*

digits appeared in the same color for two consecutive trials. The first three trials were "encoding only" trials where no response was required, because there were no prior trials to compare with. The subsequent trials required responses. "Blank" represents a black screen presented for 300 ms to allow for distinction between two consecutively presented same-colored digits of equal identities.

#### Predictable N-Match Paradigm

The main difference between the unpredictable and predictable versions of the task was that in the latter the probe sequences were predictable. Again, half of the trials in a trial run were switch trials. This was achieved in each trial run by presenting each probe-color twice in a sequence before switching to another probe-color. For example, for *N* = 3, a probe-color sequence could be RRYYPPRRYY (see **Figure 3B** for an example of a PT trial). Such a sequence allowed for alternating switch and nonswitch trials, unlike the predictable N-back paradigms used in prior research, which have always necessitated a switch from one color to another, and, therefore, had no non-switch trials (Verhaeghen et al., 2004; Verhaeghen and Basak, 2005). In sum, the only difference between the two types of training, UT vs. PT, was the probe-cue sequencing. The probability of switch, the proportion of trials requiring updating, the setsize (N), the type of retrieval-cue (color), number of blocks, and the amount of training (5 h) was equivalent for the two groups.

Irrespective of the type of training, each 1 h training day was divided into five blocks. In each block, the set-size (i.e., N) varied from small-to-large-to-small (i.e., 1-Match, 2-Match, 3-Match, 4-Match, 4-Match, 3-Match, 2-Match, 1-Match). This yielded 80 trails per N. In each training day, the first four blocks were on the training task, but the fifth block was on the predictable N-Match task (i.e., the *comparison task*). The motivation behind this block was to assess any difference in learning across the 5 days between the two groups in the lower cognitive control version of the task. This block of the *comparison task* was analyzed separately from the other four training task blocks.

# Transfer Tasks

Transfer tasks were selected using a construct approach; see **Table 2** for details about the tasks, constructs and forms used. The paper pencil tasks had two parallel forms to allow for multiple assessments.

#### Near Transfer Tasks

#### *Backward span*

Both forward and backward digit span tasks were taken from the Working Memory Index (WMI) in the Wechsler Adult Intelligence Scale (WAIS; Wechsler, 1939). Digits were presented verbally in an incremental set-size. Participants had to repeat the digits in the same order (forward) or the reverse order (backward) of the presentation. Backward span, in addition to encoding, storage and retrieval involved in forward span, requires coordination of information units. Therefore, it is considered to be a measure of working memory.


TABLE 2 | Transfer tasks completed by both training groups at baseline, immediately post training, and delayed post testing.

*A and B refer to parallel forms of the task. \*Computerized Tasks. CC* = *Cognitive Control.*

#### *Task switching*

The task switching paradigm utilized in this study is similar to that used in cognitive training (e.g., Kramer et al., 1999a; Basak et al., 2008), where the background color of the stimuli determined the task at hand. If the background was blue, participants indicated whether the digit presented was higher ('z' key) or lower ('/' key) than the digit 5. If the background was pink, participants indicated whether the digit was odd ('z' key) or even ('/' key). The digit 5 was never used and participants were required to use two hands to respond. The stimuli were presented in the center of the screen for 1500 ms. Participants completed two single task blocks, one for each task. This was followed by two multi-tasking blocks – one where the two tasks were interleaved predictably (e.g., Blue Blue Pink Pink Blue Blue*...*) and another where the tasks were interleaved randomly. The primary measure was the switch cost, i.e., the residuals obtained from regressing the average RT of non-switch trials from the average RT of switch trials. The *DualSwitchCost* was obtained from data of all trials, and the *UnpredSwitchCost* was calculated from data of just the unpredictable trials. These two switch costs were considered to be measures of cognitive control.

#### Far Transfer Tasks

#### *Raven's Advanced Progressive Matrices*

In Raven's Advanced Progressive Matrices (RAPM), participants were instructed to find the missing abstract pattern from a 3 × 3 matrix of complex visual designs. The missing pattern was one of eight possible choices that the participant was presented with. The full version of 36 items was divided into two sub-sets of 18 questions of the same difficulty level. Version A included the even questions from the first half and the odd questions from second half. Version B included the opposite combination of task questions. Participants were given 30 min to complete as many of the 18 abstract puzzles as possible. RAPM is an abstract reasoning task (Raven, 1942).

#### *Story Recall*

A short story from the MMSE-2: Expanded Version (with two parallel versions) was read out to the participants, who were prompted to remember as many details from the story. The number of correctly recalled details (with a maximum possible score of 25) was used as a measure of episodic memory (Folstein et al., 1975; Folstein et al., 2010).

Additional tasks, viz., Forward Span, Digit Symbol Substitution Test (DSST; from the MMSE-2: Expanded Version) and single RTs (from the single task blocks of the task switching paradigm), were assessed to establish whether training-related changes were due to improvements in processing speed or short-term memory capacity.

# Analysis Techniques to Assess Learning-related Changes in the Temporal Dynamics in the Trained Tasks

Individual learning rates for items within the FoA (i.e., 1- Match trials) and for items outside of the FoA (i.e., 2-, 3-, and 4-Match trials) were calculated by fitting power functions (*<sup>Y</sup>* <sup>=</sup> aXb) to the RTs across the five training days. Parameter b represented the rate of learning. Individual learning rates were assessed for the first four blocks of each training day to evaluate the differences in learning both inside and outside the FoA for the two types of training. This resulted in four learning rates, viz., inside the FoA for *N* = 1 trials for PT (*inFoA PT lng*), inside the FoA for *N* = 1 trials for UT (*inFoA UT lng*), outside the FoA for PT (*outFoA PT lng*), and outside the FoA for UT (*outFoA UT lng*). Moreover, data from the fifth block of each training day yielded learning rates for all participants on the comparison task, i.e., the predictable N-Match task.

# RESULTS

Outlier corrections on a participant-by-participant basis for each condition were conducted by deleting trials with RT either below or above 3 *SD*; trials with RTs *<* 200 ms were also removed. Average RT for each individual, for each condition, was derived from accurate trials only. The alpha level for statistical significance was 0.05; *p*-values were Greenhouse-Geisser corrected for sphericity.

# Temporal Dynamics of Predictable vs. Unpredictable N-Match Task at Baseline

The temporal dynamics of the unpredictable and predictable tasks were investigated using Day 1's RT performance in the four different Ns. First, a separate univariate repeated measures ANOVA, with set-size (*N* = 1, 2, 3, and 4) as a factor, was

conducted for each training group. The main effect of N for the predictable group was significant, *F*(1.20,23.95) = 5.48, *p* = 0.02, *MSE* = 135840.88, η<sup>2</sup> <sup>p</sup> = 0.22. *Post hoc* multiple comparisons, using the repeated contrasts, indicated that RTs of *N* = 2 were significantly slower than RTs of *N* = 1 by approximately 250 ms, *<sup>F</sup>*(1,20) <sup>=</sup> 8.82, *<sup>p</sup> <sup>&</sup>lt;* 0.01, *MSE* <sup>=</sup> 159084.84, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.31. In the unpredictable task, the main effect of N was also significant, *F*(1.52,31.87) = 5.00, *p* = 0.02, *MSE* = 55489.48, η2 <sup>p</sup> = 0.19. RTs of *N* = 2 were significantly slower than RTs of *N* = 1 by approximately 200 ms, *F*(1,21) = 10.50, *<sup>p</sup>* <sup>=</sup> 0.004, *MSE* <sup>=</sup> 77863.52, <sup>η</sup><sup>2</sup> <sup>p</sup> <sup>=</sup> 0.33 (see **Figure 4A**: Day 1). Therefore, in both versions of the task, we found evidence of focus switch costs, indicated by a significant increase in RTs from *N* = 1 to 2. They are comparable to the focus switch costs from past studies on older adults in both the N-back and the random N-back tasks (240–500 ms; Verhaeghen and Basak, 2005; Vaughan et al., 2008; Basak and Verhaeghen, 2011a).

To test whether focus switching affected the temporal dynamics of items in the outer store, two separate 3 (*N* = 2, 3, and 4) × 2 (Switch\_type: switch vs. non-switch) ANOVAs, one for UT and another for PT, were conducted on the Day 1 RTs. For the predictable task, the main effect of N was not significant, *F*(1.42,28.34) = 1.26, *p* = 0.28, *MSE* = 30260.24, η2 <sup>p</sup> = 0.06, suggesting that the focus switch cost remained unchanged outside the FoA. But the main effect of Switch\_type was significant, *F*(1,20) = 10.26, *p* = 0.004, *MSE* = 198324.4, η2 <sup>p</sup> = 0.34, suggesting that the switch RTs were slower than the nons-witch RTs. Importantly, the N × Switch\_type interaction was not significant, *F*(1.30,25.93) = 2.17, *p* = 0.15, *MSE* <sup>=</sup> 10040.62, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.10, suggesting no difference in RT slopes between the switch and non-switch trials. For the unpredictable task, both main effects of N, *F*(1.21,25.30) = 4.61, *<sup>p</sup>* <sup>=</sup> 0.04, *MSE* <sup>=</sup> 38685.07, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.18, and Switch\_type, *<sup>F</sup>*(1,21) <sup>=</sup> 10.52, *<sup>p</sup>* <sup>=</sup> 0.004, *MSE* <sup>=</sup> 123867.72, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.33, were significant. Switch trials were slower than non-switch trials. *Post hoc* comparisons using repeated contrasts indicated that the RTs for *N* = 2 and *N* = 3 were the same (*F <* 1), but were significantly faster for *N* = 4 compared to *N* = 3, *F*(1,21) = 8.99, *p <* 0.01, *MSE* = 9334.36, η<sup>2</sup> <sup>p</sup> = 0.30; the latter unexpected result could be due to speed-accuracy tradeoff evidenced by near-chance performance at *N* = 4 (60% accuracy). Importantly, the interaction was not significant, *<sup>F</sup>*(1.34,28.16) <sup>=</sup> 1.10, *<sup>p</sup>* <sup>=</sup> 0.32, *MSE* <sup>=</sup> 15778.77, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.05, indicating that the RT slopes of the switch and non-switch trials were the same.

# Learning-Related Changes in the Temporal Dynamics in the Trained Tasks

Differential learning-related changes in the temporal dynamics (RTs) of the probe retrieval across the 5 days were investigated using a 2 (Training\_type) × 5 (Day) × 4 (N) ANOVA, where Day and N are within-subjects factors and Training\_type was a between-subjects factor. The main effect of Day was significant, *F*(2.53,101.04) = 5.81, *p* = 0.002, *MSE* = 272139.62, η2 <sup>p</sup> = 0.13, suggesting that significant learning on the trained task happened over 5 days. *Post hoc* comparisons using repeated contrasts revealed significant differences between Days 2 and 3, *F*(1,40) = 4.50, *p* = 0.04, *MSE* = 101953.72, η<sup>2</sup> <sup>p</sup> = 0.10, and a marginally significant difference between Days 3 and 4, *F*(1,40) = 3.59, *p* = 0.07, *MSE* = 15668.44, η<sup>2</sup> <sup>p</sup> = 0.08, suggesting rapid learning after Day 2. The main effect of N was significant, *<sup>F</sup>*(1.24,49.73) <sup>=</sup> 15.28, *<sup>p</sup> <sup>&</sup>lt;* 0.001, *MSE* <sup>=</sup> 370949.00, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.28. *Post hoc* comparisons using repeated contrasts indicated a significant focus switch cost from *N* = 1 to 2 regardless of the type or day of training, *F*(1,40) = 22.93, *p <* 0.001, *MSE* = 87390.72, η2 <sup>p</sup> = 0.36. The main effect of Training\_type and its interactions with other variables were non-significant (*p*'s *>* 0.34). This suggests that the learning rates for the two versions of the N-Match task were equivalent over 5 days of training. This was corroborated by non-significant differences between the learning rates of the two types of training for both inside the FoA [−0.14 for PT vs. −0.18 for UT, *t*(41) = 0.85, *p* = 0.22] and outside the FoA [−0.12 for PT vs. −0.11 for UT, *t*(41) = 0.47, *p* = 0.65] (Supplementary Figure S1)1 .

To test whether the switch and non-switch RTs changed differentially with extensive practice, a 2 (Training\_type) × 5 (Day) × 3 (N = 2, 3, and 4) × 2 (Switch\_type) ANOVA was conducted (see **Figure 4B**). The main effect of Day, *F*(2.58,103.35) = 3.90, *p* = 0.02, *MSE* = 661181.2, η<sup>2</sup> <sup>p</sup> = 0.09, main effect of Switch\_type, *F*(1,40) = 22.19, *p <* 0.001, *MSE* = 400251, η2 <sup>p</sup> = 0.36, Day × Switch\_type interaction, *F*(2.04,81.67) = 3.96, *<sup>p</sup>* <sup>=</sup> 0.02, *MSE* <sup>=</sup> 41204.88, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.09, and N × Switch\_type interaction, *F*(1.35,54.00) = 5.57, *p* = 0.01, *MSE* = 33036.89, η2 <sup>p</sup> = 0.12, were found to be significant. All other main effects and interactions were not significant. These results suggest that although the focus switch cost was greater for larger set-sizes, extensive practice brought greater improvements to the switch latencies than the non-switch latencies.

To assess whether the two training groups differed in the comparison (predictable) task, we conducted a 2 (Training\_type) × 5 (Day) × 4 (*N* = 1, 2, 3, and 4) ANOVA. No main effect of Training\_Type or its interactions with the other variables were significant, suggesting that UT was not worse than PT in the predictable N-Match task (**Figure 4C**).

To evaluate whether the FoA expanded with 5 h of practice, defined by negligible difference in switch and non-switch RTs at *N* ≥ 2, we conducted separate 3(*N* = 2, 3, and 4) × 2(Switch\_type) ANOVAs for predictable and unpredictable RTs from the final day of the training (Day 5). The results were similar to those from Day 1. For both PT and UT groups, the main effect of N and the N x Switch\_type interaction were not significant. But the main effect of Switch\_type was significant; PT: *<sup>F</sup>*(1,19) <sup>=</sup> 6.97, *<sup>p</sup>* <sup>=</sup> 0.02, *MSE* <sup>=</sup> 73033.68, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.27; UT: *<sup>F</sup>*(1,21) <sup>=</sup> 16.18, *<sup>p</sup>* <sup>=</sup> 0.001, *MSE* <sup>=</sup> 50003.70, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.44. That is, even after 5 h of practice, the focus switch cost was significant in both versions of the task, suggesting that FoA still held only one information unit. This was corroborated by a significant difference in RTs between *N* = 1 and 2 at Day 5; PT: *t*(19) = −2.86, *p* = 0.01; UT: *t*(21) = −3.14, *p <* 0.01.

<sup>1</sup>Also, the two learning rates in the comparison task, viz., *inFoA Comp lng* and *outFoA Comp lng*, did not differ between the two types of training.

# Transfer of Training

Although learning-related changes in the two types of training did not vary, it is plausible that the high cognitive control training (UT) may engender greater transfer to unrelated untrained constructs, than low cognitive control training (PT). Our effect sizes, similar to previous training studies, were expected range from medium (η<sup>2</sup> p: 0.06 to 0.14) to large (η<sup>2</sup> <sup>p</sup> *>* 0.14; Cohen, 1988; Tabachnick and Fidell, 2007). Means and standard deviations of the measures of the transfer tasks for PT and UT are provided in Supplementary Table S1. No significant difference was observed between UT and PT in any of the transfer measures at baseline (see Supplementary Materials).

In line with previous intervention studies, separate analyses of covariance (ANCOVAs) were conducted on each transfer task measure to evaluate differences between the two training groups (PT vs. UT) at both immediate post testing as well as delayed post testing (e.g., Boot et al., 2010). In repeated measures analyses, any significant Training\_type × Session interaction may be due to regression toward the mean, and not due to the differential training effects. Therefore, to ensure that any post-testing performance gains in the UT, compared to the PT, was not due to regression toward the mean, we used ANCOVA that accounted for individual differences at baseline. In addition, inFoA lng was used as a covariate, because it accounted for individual differences in changes in learning speed in the easiest condition (*N* = 1). This condition was common across the two training groups. Results are depicted in Supplementary Table S2. A medium effect, albeit of marginal significance, in favor of the UT group immediately after the training was found for two far transfer tasks: story recall, *<sup>F</sup>*(1,39) <sup>=</sup> 3.09, *<sup>p</sup>* <sup>=</sup> 0.09, *MSE* <sup>=</sup> 11.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.07, and RAPM, *F*(1,39) = 2.73, *p* = 0.10, *MSE* = 4.27, η<sup>2</sup> <sup>p</sup> = 0.07 (**Figure 5**).

Another analysis technique to assess transfer effects is to pool scores from different testing sessions and then conduct a rank-ordered Blom transformation (Blom, 1958). Rank-order transformations, e.g., Tukey and Blom transformations, have previously been used to correct for within-variable errors (Knoke, 1991). Blom transformations, however, produce a better fit to the normal distribution than Tukey transformations (Bonate,

2000). Pooling of baseline and immediate post-test dependent measures and pooling of the dependent measures from all three testing sessions were conducted separately to take into account the varied number of participants in the immediate post-testing and the delayed post-testing sessions. This resulted in a slight difference in the ranking within the dependent variables. Blom transformed data allowed us to conduct repeated measures ANOVAs for each transfer task measure on normally distributed data. Separate analyses were conducted to compare immediate post vs. baseline and delayed post vs. baseline, with inFoA lng as a covariate. Thus, multiple 2 (Session) × 2 (Training\_type) repeated measures ANOVAs were conducted on these data to assess the interaction effects. These analytic methods followed those conducted in previous cognitive training studies involving older adults, both for short-term targeted interventions lasting for 10 h (e.g., ACTIVE Study; Ball et al., 2002) or long-term nontargeted interventions spanning over months that were directed to change the lifestyle of older adults (Chan et al., 2014).

The results of the main effects and interactions are provided in **Table 3**. A medium effect size was observed for the Session × Training\_type interaction at immediate post-test session favoring UT for the story recall task, *F*(1,40) = 4.23, *p <* 0.05, *MSE* = 0.39, η<sup>2</sup> <sup>p</sup> = 0.10, and for the RAPM, *<sup>F</sup>*(1,40) <sup>=</sup> 2.38, *<sup>p</sup>* <sup>=</sup> 0.13, *MSE* <sup>=</sup> 0.23, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.06. No other tasks showed medium effects for the interaction term, either at immediate or delayed post-test sessions2 .

# Individual Differences in Gain Scores, Type of Training, and Learning Rates

In this secondary set of analyses, gain scores for each individual in all transfer tasks were calculated by subtracting their baseline Blom transformed score from their immediate post-test Blom transformed score. Positive scores indicated a training-related gain in the performance of the transfer task, whereas negative scores indicated a training-related loss. A larger number of UT participants had positive gain scores, compared to PT participants, in the two tasks that had yielded medium effectsizes of transfer in the prior analyses, viz., Story Recall (**Figure 6C**) and RAPM (**Figure 6B**). In contrast, we found no observable differences in gain scores between UT and PT in backward span, a near-transfer task of working memory (**Figure 6C**). The differences in the number of participants who exhibited a positive gain score were greatest for the story recall.

It is plausible that the individuals who showed positive gains also started with higher cognitive abilities or had a greater general learning ability. On the other hand, if UT is a better strategy to induce transfer then the correlation between gain scores and learning rate may be significantly higher for UT than PT. Individual gain scores for the transfer tasks, baseline performances of the transfer tasks, previously described four learning rates, and the two learning rates for the comparison task (*inFoA Comp lng*, *outFoA Comp lng*) were subjected to non-parametric bivariate correlation analyses, represented by a heat map (**Figure 7**), where the strength of these relationships vary by the warmth of the color. Darker reds indicate a stronger positive correlation, whereas darker blues indicate a stronger negative correlation. All measures of RT were reverse coded (since slower RTs represent poorer performance). Therefore, in **Figure 7**, larger individual scores on each variable represented better performance.

In the UT group, significant positive correlations were observed between learning rates for the outer store and some of the measures of the transfer tasks, particularly, gain in story recall (*r* = 0.56, *p* = 0.01), baseline unpredictable switch cost (*r* = 0.52, *p* = 0.02), and baseline dual switch cost (*r* = 0.55, *p* = 0.01). Additionally, a marginally significant correlation was observed between learning rates for the outer store and gains in backward span (*r* = 0.39, *p* = 0.07). No significant positive relationships were observed for individuals in the PT group, even at *p* = 0.01. Therefore, individuals with greater changes in the temporal dynamics of the unpredictable task showed larger gains in story recall and, to some extent, backward span.

Using Fisher *r*-to-*z* transformation, we assessed whether the correlation coefficients for UT were higher than PT. For baseline scores in task-switching, the correlation coefficients were significantly higher for UT than PT: dual switch cost (*z* = 1.61, *p* = 0.05), unpredictable switch cost (*z* = 1.88, *p* = 0.03). For gain scores in the transfer tasks, the correlation coefficient was significantly higher for UT than PT in the backward span (*z* = 2.36, *p <* 0.01), and was marginally higher for UT than PT in the story recall (*z* = 1.28, *p* = 0.10).

When the two groups were combined in the comparison task, it provided us with more power to assess positive correlations between general learning rate in the N-Match task and the other cognitive tasks. Significant positive correlations were observed between learning rates for the outer store in the comparison task and performance gains in selected transfer tasks (story recall, *r* = 0.37, *p* = 0.01; backward span, *r* = 0.32, *p* = 0.04), although these results could be driven by the UT group.

# DISCUSSION

We found selective transfer effects to the tasks of fluid cognition in the older adults who were trained on our novel memory updating approach, i.e., unpredictable probe-cue expectancies, compared to those older adults, who were trained on the standard predictable paradigm. Our conclusions on group differences on the transfer effects were based on the effect sizes, because our research questions were formulated in the estimation terms (Cumming, 2014). This is in line with recent research that has brought up issues of replication in psychology (Open Science Collaboration, 2015). Immediate post-testing transfer effects of medium effect size favoring the UT were found in *story recall*, a verbal episodic memory task, evidenced from two different types of planned analyses that have been typically used in prior studies of cognitive training, viz., repeated measures ANOVA on the Blom transformed data and analyses of covariance (ANCOVA), where baseline scores are accounted for. We also observed medium effect size of transfer effects, albeit not significant at *p* = 0.05, immediately after training favoring the UT training in *RAPM* in the ANCOVA analyses. Therefore, we consider these results to be weaker in comparison to those from the story recall. These differential selective transfer effects could not be merely explained by speeded learning of the trained task by the UT group, compared to the PT group. Because the two training tasks were similar in all aspects (e.g., switch probability, updating probability, set-size, stimulus-response relationship, identity judgment), but one (probe-cue expectancy), we expected to find smaller effect sizes of transfer than typical working memory training studies. Yet, for story recall, a medium effect size of immediate transfer was observed in both the ANCOVA and repeated measures analyses, favoring the UT group.

<sup>2</sup>A JZS Bayes factor Repeated Measure ANOVA (Rouder et al., 2012; Love et al., 2015; Morey and Rouder, 2015) was implemented to support these findings. For story recall, the interaction model between Session (baseline vs. immediate post) and Training\_type was preferred to the main effects models by a Bayes factor of 1.452.


TABLE 3 | Repeated measures ANOVA calculated using Blom transformed measures at baseline vs. immediate-post training session, and baseline vs. delayed-post sessions.

*Session refers to Testing\_session.*

These results are in accordance with the Theory of Working Memory Adaptability (ToWMA) that proposes that cognitive control is the mechanism of transfer in working memory training. ToWMA predicts that the focus switch costs encountered when shifting between actively manipulated items are greater for sequences where probe-cue expectancies are more unpredictable. That is, unpredictable probe-cue sequences are hypothesized to require greater cognitive control than predictable probecue sequences, especially in older adults who have a marked deficit in cognitive control. If enhanced cognitive control in working memory is the mechanism of transfer in older adults, then unpredictable, when compared to predictable, training may show greater benefit to tasks of fluid cognition that are argued to be subserved by cognitive control. This hypothesis was supported by our ANCOVA/ANOVA analyses on the differential improvements in the transfer tasks, where UT had a more selective effect of transfer, which was mediumsized, to story recall than PT. Additionally, individual gain plots in **Figure 6** showed that more individuals in the novel, UT group had positive immediate performance gains in story recall (and RAPM) than the standard, PT group. These results indicate that just 5 h of training on unpredictable memory updating probe-cue expectancies may engender greater transfer to selective untrained skills than PT. Since the differential effect sizes of transfer favoring the novel training were at most medium, more studies are needed to further test the role of cognitive control in working memory training. They could explore the effects of training dose (e.g., 10 h instead of 5 h), consider multiple tasks of episodic memory, and/or compare our novel training paradigm to other types of working memory training approaches, where improved cognitive control could not be argued as the underlying mechanism of transfer.

In addition to conducting group-difference analyses to assess performance improvements at post training compared to the baseline by using ANOVAs/ANCOVAs, we also investigated the relationships between an individual's ability to learn the trained tasks and gains in untrained cognition. It is plausible

that individuals with higher cognitive abilities or learning ability may show the most cognitive gains irrespective to the type of working memory training. On the other hand, if UT is a better strategy to enhance cognition, then the correlation between the gain scores and the learning rates may be limited to the individuals who received the UT. Since accuracy for all individuals for the items inside the focus of attention (i.e., *N* = 1) was near perfect from Day 1, we were interested in the individual differences in the learning rates for items outside the focus of attention, where focus switching was necessitated between the actively held information units. In the novel UT group, greater efficiency in the temporal dynamics for manipulating and updating items in the unpredictable task was found to be significantly associated with larger gains in the story recall, a verbal episodic memory task, and marginally significantly associated with larger gains in the backward span, a near transfer working memory task. To counteract the issue of small sample size for these correlation analyses, Fisher's *z*-tests were conducted to compare the correlation coefficients between the two training groups. The correlation coefficient in UT, compared to the PT, was significantly higher for backward span and marginally higher for story recall. Given that none of the measures of learning rates were significantly different between UT and PT (see Supplementary Figure S1A), the results from the *z*-tests supported ToWMA's prediction that UT may be a better strategy to improve fluid cognition (e.g., backward span, story recall).

Also, according to ToWMA, unpredictable probe-cue expectancy training engenders greater demands on cognitive control than PT. In line with this theory, we found that the participants who exhibited more efficient cognitive control before the training, evidenced from the unpredictable switch costs and dual switch costs of task switching paradigm, learned the UT task faster. No such relationships were observed in the PT group. Furthermore, correlation coefficients between

these baseline measures and learning rates of the trained tasks were significantly higher for UT than PT. Though these were exploratory analyses and the lack of correlation in PT group can be attributed to small sample size, we argue that such analyses should be incorporated in future studies. They would provide a better understanding of who would benefit most from a specific training strategy, and allow us to devise the best strategies to improve cognition in older adults. So far, our exploratory analyses suggest that older adults with more efficient cognitive control may show larger gains in selective tasks of fluid cognition with unpredictable memory updating training.

Although training working memory has gained popularity in the last decade, the mechanism of transfer that would allow us to determine the best strategies to improve cognition has not yet been systematically studied. In training studies, be it cognitive or fitness training, participants are typically recruited and assigned to either the training or a control group. Unlike fitness training studies where the controls are trained on a different type of fitness (e.g., aerobic compared to anaerobic; Erickson et al., 2012), cognitive training studies are fraught with issues regarding the use of an appropriate control group. The type of control group in any clinical trial allows us to make inferences about the power of the training-related benefits (Boot et al., 2011). If the control group consists of a no-contact control, then we cannot determine the mechanism of transfer. It is possible that mere participation in a training program induces cognitive benefits and, therefore, comparing working memory training with a passive control group does not provide us with knowledge on whether it is the training of working memory *per se* that is improving cognition. Yet, many recent studies have continued to use no-contact controls to compare against the working memory training group (Zinke et al., 2012, 2014; Heinzel et al., 2014; Stepankova et al., 2014). This methodological issue is not evident in other types of cognitive

training, e.g., video game training (for a review, see Boot et al., 2011). In contrasts, if a study uses a placebo group as a control, e.g., social engagement or questionnaires (Borella et al., 2010, 2014; Carretti et al., 2013), it is plausible that cognitive benefits between the training group and the control group are associated with the participant's belief that the experimental treatment should have an effect. Moreover, often in active control studies, experimental group is trained on an individualized adaptive paradigm, whereas the control group is trained on a non-adaptive paradigm (e.g., variable priority training vs. fixedpriority training). Choosing an appropriate control group will, therefore, have implications on the perceived benefits of the training. Although the results of transfer in younger adults have been mixed, plausibly driven by the type of control groups, working memory training compared to active controls do result in improvements on tasks of *near transfer* (see Melby-Lervåg and Hulme, 2013, for a meta-analytic review of study-specific effect sizes for different measures of transfer tasks). Importantly, a recent meta-analysis on age-related differences and cognitive training found that older adults trained in either working memory or task switching paradigms, both "core" abilities, benefitted in both near and far transfer tasks when compared to active controls (Karbach and Verhaeghen, 2014).

The results from the current study further our understanding of cognitive optimization in older adults by comparing two different types of working memory training – one novel approach that requires greater cognitive control (unpredictable probe-cue sequence) and another similar to previous training paradigms (e.g., N-back) that requires predictable focus switching. It is important to note that, in the current study, the two training paradigms were not individualized adaptive and were trained on the same updating task, varying in all but one dimension, viz. probe cue predictability. Therefore, any differences in our outcome variables favoring the novel training approach cannot be merely attributed to motivation or perceived benefits of training.

This study was not aimed to resolve the ongoing debate about whether, or not, working memory training in younger adults improves intelligence. We did not have an additional control group that would allow for assessment of the overall benefits of working memory training over a different type of training (e.g., Redick et al., 2013). Moreover, we consider our results indicating differential benefits to RAPM, a measure of nonverbal intelligence, to be weak at best. In the future, studies could compare multiple training groups that would allow for such assessments in addition to further investigation on the role of cognitive control in working memory training. These studies should also include larger number of participants and multiple measures of psychological constructs to allow for an investigation of individual differences in learning and transfer.

## REFERENCES

Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., et al. (2002). Effects of cognitive training interventions with older adults - a randomized controlled trial.

This study is important for the field of cognitive training because it furthers our understanding of the principles of cognitive optimization that is theory-driven, expounds the role of cognitive control in working memory, and helps us develop better cognitive training strategies in older adults. Moreover, the N-Match task was easier-to-learn and its training benefits were observed after a short-training period (5 h over 2 weeks), unlike other successful cognitive training studies used in older adults where either the training task was complex (e.g., video games; Basak et al., 2008), or intensive hours of training over a long period was required (e.g., 14 weeks at 15 h/week; Park et al., 2014). Although we failed to observe any long-term benefits of UT, compared to predictable, future studies could investigate how the length of training (5 vs. 10 h), amount of feedback, and frequency of training, influence long-term benefits of UT. Because improvements in episodic memory can delay the onset of Alzheimer's disease, and our UT approach benefitted a task of episodic memory, future studies could use this novel training approach on at-risk individuals, such as, patients with Mild Cognitive Impairments or older adults with lower education, to investigate whether varying temporal dynamics of memory probe-cues during cognitive training can delay the onset of Alzheimer's disease.

# AUTHOR CONTRIBUTIONS

CB designed the research and programmed the experiments; CB and MO performed the research, analyzed the data and wrote the paper.

# FUNDING

This research was supported in part by Faculty Research Initiative grants from University of Texas at Dallas to CB.

# ACKNOWLEDGMENT

We are grateful to Juan Mijares, Maria I. Cunha, and Evan Smith for their assistance in data collection and scoring.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.00230

*J. Am. Med. Assoc.* 288, 2271–2281. doi: 10.1001/jama.288. 18.2271

Basak, C. (2006). Capacity limits of the focus of attention and dynamics of the focus switch cost in the working memory. [Dissertation Abstract]. *Diss. Abstr. Int. Sec. B Sci. Eng.* 66, 5717.


of training on hemispheric asymmetry. *Neurobiol. Aging* 28, 272–283. doi: 10.1016/j.neurobiolaging.2005.12.012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Basak and O'Connell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Building metamemorial knowledge over time: insights from eye tracking about the bases of feelingof-knowing and confidence judgments

#### *Elizabeth F. Chua1,2\* and Lisa A. Solinger1*

*<sup>1</sup> Department of Psychology, Brooklyn College of the City University of New York, Brooklyn, NY, USA, <sup>2</sup> The Graduate Center of the City University of New York, New York, NY, USA*

#### *Edited by:*

*John Magnotti, Baylor College of Medicine, USA*

#### *Reviewed by:*

*Rosanna Kathleen Olsen, Rotman Research Institute at Baycrest, Canada Logan Thomas Trujillo, Texas State University, USA*

#### *\*Correspondence:*

*Elizabeth F. Chua, Department of Psychology, Brooklyn College of the City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, USA echua@brooklyn.cuny.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 15 April 2015 Accepted: 29 July 2015 Published: 18 August 2015*

#### *Citation:*

*Chua EF and Solinger LA (2015) Building metamemorial knowledge over time: insights from eye tracking about the bases of feelingof-knowing and confidence judgments. Front. Psychol. 6:1206. doi: 10.3389/fpsyg.2015.01206* Metamemory processes depend on different factors across the learning and memory time-scale. In the laboratory, subjects are often asked to make prospective feeling-ofknowing (FOK) judgments about target retrievability, or are asked to make retrospective confidence judgments (RCJs) about the retrieved target. We examined distinct and shared contributors to metamemory judgments, and how they were built over time. Eye movements were monitored during a face-scene associative memory task. At test, participants viewed a studied scene, then rated their FOK that they would remember the associated face. This was followed by a forced choice recognition test and RCJs. FOK judgments were less accurate than RCJ judgments, showing that the addition of mnemonic experience can increase metacognitive accuracy over time. However, there was also evidence that the given FOK rating influenced RCJs. Turning to eye movements, initial analyses showed that higher cue fluency was related to both higher FOKs and higher RCJs. However, further analyses revealed that the effects of the scene cue on RCJs were mediated by FOKs. Turning to the target, increased viewing time and faster viewing of the correct associate related to higher FOKs, consistent with the idea that target accessibility is a basis of FOKs. In contrast, the amount of viewing directed to the chosen face, regardless of whether it was correct, predicted higher RCJs, suggesting that choice experience is a significant contributor RCJs. We also examined covariates of the change in RCJ rating from the FOK rating, and showed that increased and faster viewing of the chosen face predicted raising one's confidence above one's FOK. Taken together these results suggest that metamemory judgments should not be thought of only as distinct subjective experiences, but complex processes that interact and evolve as new psychological bases for subjective experience become available.

Keywords: metamemory, feeling-of-knowing, confidence, recognition, eye tracking, memory, metacognition

# Introduction

In our day-to-day functioning, we rely both on our memory, and our knowledge of our memory, referred to as our metamemory (Nelson and Narens, 1990). For example, we may encounter someone on the street, and even though we cannot immediately recall his or her name, we may feel that we know it. This feeling is the result of monitoring our memory. Because we have this feeling-of-knowing (FOK), we may continue to rack our brains for their name, showing that the knowledge of our memory influences our behavior. Once we generate a name, we then monitor and decide if we are confident enough to use that name, again showing that our knowledge of our memory influences our behavior. Memory monitoring involves evaluating the current state or ongoing progress of any aspect of memory (Nelson and Narens, 1990). In the laboratory, memory monitoring is typically measured by having individuals make subjective judgments about their memory, and can be assessed during any stage of learning and remembering. Prospective FOK judgments require participants to predict their future ability to correctly recognize some previously learned information (e.g., Shimamura and Squire, 1986; Maril et al., 2003; Perrotin et al., 2006). In contrast, *retrospective confidence judgments* (RCJs), elicited after participants have indicated their memory response, require participants to rate the likelihood that they correctly remembered the target information (e.g., Stretch and Wixted, 1998; Brewer et al., 2002; Ranganath et al., 2004; Chua et al., 2006). The typical FOK procedure uses a recall-judgment-recognition paradigm (Hart, 1965). Participants are first presented with a cue and asked to recall the corresponding target from memory. If they are unable to do so, they are asked to make a FOK judgment, followed by a recognition test. During RCJ tasks, participants are given a memory test (either recall or recognition), and subsequently asked to rate their confidence that their response is accurate.

Research has resulted in a general consensus that, when monitoring memory, individuals use an inferential process to evaluate whether a particular response will be, or has been, remembered based on the inputs that are readily available (e.g., Schwartz et al., 1997; Koriat, 2008a). However, the particular inputs that are utilized differ depending on the time at which memory is assessed (Schwartz et al., 1997). Prospective FOK judgments are thought to be based on familiarity of the cue (Metcalfe et al., 1993), accessibility of information about the target (Koriat, 1993, 1994), or a combination of the two (Koriat and Levy-Sadot, 2001). In contrast, it is widely accepted that RCJs are based on the target retrieval experience–that is, the on-line experience of remembering some previously studied item (Nelson and Narens, 1990; Chua et al., 2012; Koriat, 2012). In this study, we measured FOKs and RCJs during an associative recognition memory task, while participants had their eye movements tracked, in order to examine the sources of information that form the basis of metamemory judgments prior to, and following, recognition memory. Our goal was to examine the extent to which particular factors influence memory monitoring judgments across time.

The idea that memory monitoring is a dynamic process is central to the theoretical framework of metamemory introduced by Nelson and Narens (1990). According to the model metacognition contains two interrelated levels, an object level and a meta-level, which correspond to memory and metamemory, respectively. Information from the object level is available to the meta-level via monitoring processes, and the meta-level can modify the object level via control processes. The idea is that the meta-level contains a dynamic and imperfect model of the current state of the object level and, in the case of metamemory, monitoring is updated on-line while learning and memory processes occur. Our primary goal was to better understand this dynamic aspect of memory monitoring, and to that end, the model makes several predictions. First, we can expect that FOKs and RCJs depend on different inputs because they are made at different times over the course of retrieval. Indeed, metamemory judgments made at different points in time are often weakly correlated, implying that, at least to some extent, different information is being used as the basis of these judgments (Leonesio and Nelson, 1990; Schwartz, 1994). Secondly, although assessing memory at different times may depend on different sources of information, there may also be factors that influence the construction of the meta-model in similar ways across learning and remembering. That is, there may be sources of information that similarly influence both FOKs and RCJs. Finally, the model stipulates that monitoring informs the meta-level, which can subsequently control the object level, resulting in changes that are dynamically mapped onto the meta-level via subsequent monitoring. In other words, monitoring that takes place prior to retrieval might indirectly influence monitoring that takes place after retrieval. Therefore we can expect that FOKs may influence RCJs.

Our primary goal and novel contribution was to examine the dynamic nature of memory monitoring using eye movement indices of memory to indirectly assess the influence of available mnemonic information on FOKs and RCJs, as well as the influence of FOKs on RCJs. Although the theoretical model describes memory monitoring as a dynamic process, few studies have investigated how metamemory judgments change over time, or how multiple factors contribute to metamemory judgments. Some studies have examined how prospective judgments of learning (JOLs), another type of metamemory judgment that has participants predict their future memory performance, influenced each other in a multiple trial learning paradigm, and showed that JOLs influence each other from trial to trial (Tauber and Rhodes, 2012; Serra and Ariel, 2014), but they did not examine different types of metamemory judgments. Other studies have shown that FOKs and RCJs correlate, but did so in the service of using RCJs to index recollection or mnemonic strength to better understand FOKs (Hertzog et al., 2010, 2014), and not to understand the updating of metamemory judgments over time. We used eye movements to indirectly measure relevant factors that are known to contribute to metamemory judgments because using an indirect measure allowed us to examine several relevant factors within the same study, and on a continuous scale. This is in contrast to the majority of work investigating the basis of metamemory judgments by experimentally manipulating a single factor, and showing that the manipulation led to increases or decreases in metamemory accuracy (e.g., Begg et al., 1989; Koriat and Ma'ayan, 2005). In this study, we did not rely on a manipulation of a single factor, but instead used eye movement indices of memory to provide a sensitive measure of multiple relevant sources of information over time (Chua et al., 2012).

The first relevant factor we investigated was the cue used to elicit a target memory. Much research has focused on the influence of *cue familiarity* or *cue fluency* on FOKs (Schwartz and Metcalfe, 1992; Metcalfe et al., 1993). Increased cue familiarity has been consistently shown to lead to higher FOK ratings, even when there is no corresponding increase in memory accuracy (Reder, 1987, 1988; Schwartz and Metcalfe, 1992). This effect appears to be specific to cue familiarity because increases in target familiarity had no influence on FOKs, but did improve test performance (Schwartz and Metcalfe, 1992). Although the effects of cue fluency on FOKs have been well studied (Reder and Ritter, 1992; Schwartz and Metcalfe, 1992; Metcalfe et al., 1993; Paynter et al., 2009), less research has been devoted to considering the role of cue fluency in RCJs (but see, Diller et al., 2001). In one study, however, Chua et al. (2012) showed that increased cue familiarity led to increased RCJs. This shared basis begs the question of whether cue familiarity is an independent influence on RCJs, or whether it is mediated by FOKs. In other words, does cue familiarity influence the meta-model at the time FOK is assessed in much the same way as it does at the time RCJ is assessed, or does cue familiarity influence FOK, which then exerts a bias over the RCJ?

To measure cue familiarity, we capitalized on the ability of eye movements to indirectly measure mnemonic processing (for review, see Hannula et al., 2010). We used the *item reprocessing effect* to examine cue fluency (Althoff and Cohen, 1999; Ryan et al., 2000). Previous studies have shown that as items are processed more fluently, typically because of repeated exposure, participants make longer fixations directed to fewer regions of the picture compared to initial processing (Althoff and Cohen, 1999; Ryan et al., 2000). We have previously used eye fixations to demonstrate that cue fluency influences RCJs (Chua et al., 2012), and in this study will use it to examine both prospective FOKs and RCJs.

The next relevant factor we investigated as a basis of metamemory judgments was *partial access* to the target. FOK judgments are thought to be driven by the accessibility of information about the target (Koriat, 1993). In other words, when participants fail to recall some target information, their FOK judgments are based on the amount and fluency of partial information that is accessed while searching for the target. Indeed, partial access to various aspects of the to-beremembered stimuli, including the emotional content (Schacter and Worling, 1985; Thomas et al., 2011), and the number of letters in a string (Koriat, 1993), led to higher FOK judgments. Further, the latency before recalling partial information has been shown to correlate with FOKs such that shorter RTs were associated with higher FOKs suggesting that the ease with which information is accessed also influences the FOK (Koriat, 1993).

To measure accessibility we relied on another eye movement based memory effect, *the rapid onset of viewing effect*. After a cue is presented, followed by a forced choice recognition task, the eyes are automatically drawn to the associated target, and this can provide an index of associative memory even independent of an explicit response (Hannula et al., 2007; Richmond and Nelson, 2009). This viewing effect has also been shown to emerge rapidly and is thought to be obligatory (Hannula et al., 2007; Ryan et al., 2007). Thus we can use rapid onset of viewing to the target as an index of memory, even if the choice differs from the target. Here we use speed of viewing directed to the target to test whether FOKs and RCJs are related to target accessibility.

The last relevant factor that we investigated as a basis of metamemory judgments is *the target recognition experience*, which has been shown to influence RCJs (Nelson and Narens, 1990; Kelley and Lindsay, 1993; Koriat and Goldsmith, 1996). Multiple components comprise the target recognition experience. For example, Kelley and Lindsay (1993) showed that confidence ratings on a general knowledge test were higher when the chosen answer had been pre-exposed, suggesting that response fluency influenced RCJs, regardless of whether it was correct or incorrect. Similarly, Chandler (1994) examined whether the amount of information accessed at retrieval influenced confidence. After studying a series of images, participants viewed a second set of images, which were similar to a subset of the studied images. During a subsequent recognition test, RCJs were higher for items from that subset even though memory performance was impaired. Finally, Busey et al. (2000) demonstrated that manipulating luminance also affects confidence, such that brighter faces (as compared to dim) lead to higher overall confidence. Taken together, these studies support the idea that the target recognition experience is often used as a basis of RCJs. Target information that is retrieved more quickly, more easily, or with more vividness is associated with higher RCJs than target information that is more effortful to retrieve (Kelley and Lindsay, 1993; Chandler, 1994; Busey et al., 2000).

The target recognition experience can also be studied using eye movements in forced choice paradigms (Hannula and Ranganath, 2009; Chua et al., 2012). Previous research has shown a *disproportionate viewing effect* in that participants look longer at the studied stimulus compared to the non-studied distractors (Hannula et al., 2007; Ryan et al., 2007). Proportion of viewing among the different choices can be examined in relation to the correct face (which is chosen for hits and not misses), and also to the chosen face (which is correct for hits and not misses). Disproportionate viewing of the correctly chosen target is a memory effect that occurs above and beyond viewing related to choice (Hannula et al., 2007), and thus can be used to index retrieval. Additionally, increased viewing of the chosen stimulus has been related to choice certainty (Chua et al., 2012). Thus, we use disproportionate viewing to examine how the recognition experience relates to metamemory judgments.

In our study, to examine the basis of memory monitoring at differing points across the learning and memory time scale, subjects completed a face-scene associative memory task while having their eye movements monitored (Hannula et al., 2007; Hannula and Ranganath, 2009; Chua et al., 2012). After studying face-scene pairs, participants were presented with a scene cue, and then rated their FOK that they would remember the associated face. This was followed by a forced choice recognition test for the face that had been studied with the scene and retrospective confidence ratings. Using this paradigm we tested whether eye movement measures of cue fluency, accessibility, and the recognition experience contributed to FOKs and RCJs. We further tested whether relevant factors, namely cue fluency, had a direct influence on RCJs or were mediated by FOKs.

# Materials and Methods

#### Participants

Seventy English speaking Brooklyn College students participated in this research in exchange for course credit (1credit/h) or for pay (\$10/h). All participants reported normal or corrected to normal vision. Only data from 66 participants (47 females/19 males; mean age = 21, range 18–34 years) were analyzed; data from other participants were excluded because eye position could not be reliably calibrated, or time constraints and technical difficulties did not allow for sufficient data to be collected. Each participant provided written informed consent in a manner approved by the Human Research Protection Program at Brooklyn College.

#### Behavioral Paradigm

PsychoPy software was used to present stimuli and record responses (version 1.81; http://www*.*psychopy*.*org; Peirce, 2007). Participants viewed stimuli on a secondary 22- monitor with an integrated eye tracking camera unit controlled by a Windows PC.

Stimuli consisted of 180 full-color face images (90 females/90 males) selected from a previously normed faces database (Althoff and Cohen, 1999) and 180 full-color scenes from Brand <sup>X</sup>© photography. Each face was sized to 256 <sup>×</sup> 256 pixels and placed upon a 270 × 270 pixels uniform black background; scenes were sized to 800 × 600 pixels. The size of the face displayed on the monitor was 9.7 cm × 9.7 cm, and the size of the scenes on the display was 29.6 cm × 22.2 cm. Nine additional face images and three additional scene images were used for instruction and practice of the behavioral paradigm.

Data were obtained in five blocks, each of which comprised of an encoding phase and a self-paced three alternative forced choice recognition (3AFC) test phase that also included assessments of FOK and confidence (**Figure 1**). Of the 66 participants included in the analyses, 39 completed all five blocks while 19 completed four blocks due to time constraints. Each study phase consisted of encoding 36 face-scene pairs (18 female/18 male). The face-scene combinations were chosen randomly and randomly assigned to a specific block. The presentation order of the blocks, as well as the trials within each block, was independently randomized. Each study trial began by presenting a scene on the screen for 3000 ms, followed by a gaze-contingent fixation cross for 500 ms, and then a face appeared that was superimposed on the scene for 4000 ms. To ensure that participants attended to each face-scene pair, participants were instructed to rate how well they thought the face "fit" the scene, using the 11 keys across the top of the keyboard (from the symbol ∼ to the number 0) to indicate responses from 0 to 100 in intervals of 10. Participants were also instructed to try to remember each face-scene pair for a subsequent memory test. Each study block was immediately followed by the corresponding test block, which consisted of 12 trials. Each test trial began by presenting a previously studied scene on the screen for 3000 msec (*scene cue*). After seeing the scene, participants made a FOK judgment on an 11-point percentage scale ranging from 0 to 100 in intervals of 10. Participants were instructed that a rating of 0 meant they were "absolutely certain" that they would not recognize the correct face, a rating of 100 meant they were "absolutely certain" that they would recognize the correct face, and ratings from 10 to 90 indicated intermediate levels of certainty. Participants were encouraged to use the entire scale. This was followed by a gaze-contingent fixation for 500 ms, to ensure that participants began the 3AFC recognition test with eye position in the same place and equidistant from each of the three alternatives. Once a fixation was measured, three faces were superimposed on the scene, and subjects were asked to indicate, via button press, which face had been paired with the scene during the study phase. One of the faces was correct and had been previously paired with the scene, whereas the other two had been previously paired with other scenes. Thus each face was familiar to the participants, and the recognition task could not be solved on familiarity alone. The correct face appeared in the left, right, and bottom position an equal number of times across each block. Participants had a maximum on 10000 ms to indicate their recognition response. After 10000 ms or the button press indicating their choice, the trial advanced and participants made a RCJ, once again, using an 11-point percentage scale ranging from 0 to 100 in intervals of 10, to indicate how certain they were that they chose the correct face. Note that each studied face was only viewed once during the test block, resulting in one– third of the studied face-scene pairs being tested. The face-scene pairs were counterbalanced across participants such that each face-scene pair was tested equally often. Although this paradigm deviated from the typical recall-judgment-recognition paradigm used to test FOKs as it did not test for recall of the specific face paired with the scene, because faces are hard to verbally label, we chose to use it because it has been well characterized in terms of eye movement based memory effects related to confidence and accuracy (Hannula et al., 2007; Hannula and Ranganath, 2009; Richmond and Nelson, 2009; Chua et al., 2012).

#### Eye Tracking Acquisition and Analysis

Eye position was measured using an SMI iView RED eye tracker (SensoMotoric Instruments, Teltow, Germany) controlled by iView X version 2.5 software (SensoMotoric Instruments) that recorded binocularly at a rate of 60 Hz. Prior to the presentation of the study items, participants' eye position was calibrated using five-point calibration plus validation. If the error in the estimated position was greater than 0.75◦ of visual angle, the experiment was stopped and calibration restarted.

We previously used the number of fixations during the scene cue at test as an indirect measure of cue fluency, with fewer fixations indexing greater fluency (Chua et al., 2012). Unlike Chua et al. (2012), in the present study we recorded eye movements during the study phase, enabling us to calculate the reprocessing effect. Therefore, we also conducted analyses using a measure of cue fluency based on the reprocessing effect (Althoff and Cohen, 1999; Ryan et al., 2000), and indexed cue fluency based on the difference in the number of fixations during the scene cue at study and the scene cue at test. This measure may better capture the change in fluency for a particular scene from study to test, with a bigger drop in fixations from study to test indicating a bigger change in fluency. Furthermore, it helps account for variation in the number of fixations due to stimulus differences. We analyzed data using both the total

number of fixations and the difference in fixations from study to test. Because the analyses showed similar results, we only report results using the *difference in fixations from study to test* as our measure of cue fluency. Fixations were calculated offline using SMI's BeGaze 3.5 Software (Teltow, Germany).

For the 3AFC recognition test, we took an area of interest approach (AOI), and examined viewing behavior among four different AOIS: the three faces and the scene. For each face, we characterized the AOI as the *Correct Face, Incorrect Face 1,* and *Incorrect Face 2*. We also characterized an additional face AOI as the *Chosen Face*, which was the face indicated via button press by the participant during the recognition task. For hits the chosen face was the correct face, and for misses the chosen face was the incorrectly selected face (i.e., if the correct answer was in position '1' and the participant indicated '2,' the chosen face would be '2'). Our primary face AOIs of interest were the Correct Face and the Chosen Face, and viewing directed toward these faces were analyzed separately. Analyses of viewing directed to the correct face provides information about trace access regardless of behavioral response (Hannula et al., 2007; Richmond and Nelson, 2009), whereas viewing directed to the chosen face can reveal differences in viewing behavior related to the recognition decision, and also reveal memory effects above and beyond

choice (Hannula et al., 2007; Hannula and Ranganath, 2009). For our purposes, the proportion of viewing directed at the chosen face can reveal information about the recognition decision. For example, we would expect that participants would spend more time viewing faces selected with high confidence, whereas for lower confidence responses we would expect participants to direct viewing more evenly among the choices, resulting in a lower proportion of viewing to the chosen face. We calculated the *proportion of viewing* directed to each AOI based on fixation durations as an overall metric of viewing behavior.

Previous research using a similar face-scene associative memory paradigm has indicated that rapid eye movements to the target face are obligatory and provide evidence of the previously encoded association (Hannula et al., 2007). Therefore, in addition to proportion of viewing directed at each AOI, we also examined how rapidly participants fixated within the AOI and analyzed whether the *onset of the first fixation* in the AOI differed by FOK, RCJ, or recognition accuracy.

#### Data Analysis

Two-tailed paired *t*-tests were used to examine task performance in terms of recognition and metamemory performance. Although participants were instructed to make metamemory judgments on a scale ranging from 0 to 100, the actual scale was an 11-point percentage scale ranging from 0 to 100 in intervals of 10, and thus metamemory judgments were analyzed on the 11-point scale.

We analyzed the relative accuracy of FOKs and RCJs in two ways: using the Goodman–Kruskal gamma correlation and using *d*a, a measure derived from signal-detection theory. We calculated the Goodman–Kruskal gamma coefficient because, historically, it has been the most commonly used correlation to measure metamnemonic accuracy (Nelson, 1984; Gonzalez and Nelson, 1996). Gamma values range from −1 to 1, with 1 indicating that metamemory judgments perfectly predict accuracy, 0 indicating no correlation, and −1 indicating that metamemory judgments negatively predict accuracy. However, gamma has been criticized and may be less optimal than other measures because: (1) it treats metamnemonic judgments as an ordinal measure, thus making it insensitive to differences in the magnitude of the judgments (Masson and Rotello, 2009), (2) it has very low levels of stability across split halves and alternative forms of tests (Nelson, 1988; Thompson and Mason, 1996), and (3) it does not allow for interval-level interpretations of the data, which is necessary to accurately assess between group manipulations or interactions (Benjamin and Diaz, 2008). Recent research has suggested the use of *d*a, a measure derived from the signal-detection framework may be superior to gamma (Benjamin and Diaz, 2008; Toth et al., 2011). Therefore, we also computed *d*a, a measure based in signal detection theory. To compute *d*a, we used the procedure described by Benjamin and Diaz (2008) and used the formula *<sup>d</sup>*<sup>a</sup> <sup>=</sup> <sup>√</sup>2*y*0/(1 <sup>+</sup> *<sup>m</sup>*2) where *y*<sup>0</sup> and *m*<sup>2</sup> represent the y intercept and slope, respectively, of a normal deviate isosensitivity function. Some researchers use a similar SDT derived statistic, *d*- , on metamemory judgments (sometimes referred to as Type 2 decisions, resulting in Type 2 *d*- ) as a measure of metamnemonic accuracy (Fleming and Lau, 2014). Both *d* and *d*<sup>a</sup> can be conceptualized as distance based measures, and thus range from +∞ to −∞, with zero representing chance performance. Unlike *d*a, *d*- assumes common variance of the underlying distributions—an assumption that is often found to be incorrect (Swets, 1986). Still, *d* is commonly used because it can be calculated when the rating scale has as few as two discrete choices. At least three discrete choices must be used for calculating *d*a.

#### Relating Eye Movement Data and Metamemory

As a first step, we did two sets of multi-level models, examining whether different eye movement measures covaried with (1) FOKs and (2) RCJs. As a subsequent analysis step, we examined whether specific covariates were still significant predictors of one metamemory judgment, when controlling for the other metamemory judgment (i.e., RCJs were included as a covariate in the FOK models and FOKs were included as a covariate in the RCJ models).

We used multi-level modeling in SPSS 22.0 to model both trial level and subject level variability in FOKs and RCJs. Trials in which subjects failed to provide a button response were excluded. Subjects and stimuli were treated as random effects with a varying intercept (Judd et al., 2012). All other effects were fixed effects. Recognition Accuracy was entered as a factor in the model (hits = 1 and misses = 0). Continuous variables (e.g., eye movement measures, metamemory ratings) were mean centered at the subject level, and entered as covariates. Models were estimated using Maximum Likelihood Estimation. Models were compared using likelihood ratio tests. Significant two-way interactions were followed up using simple slope tests (Aiken et al., 1991; Dawson, 2013). The simple slopes were evaluated for hits and misses (values 0 and 1, respectively) and 1 SD above or below the mean of the independent variable.

In one case, we used mediation analysis on our multilevel data, taking an approach that combines the dependent variable and mediator into a stacked variable and then uses that in the multilevel model (Bauer et al., 2006). We used the mixed procedure in SPSS, to obtain values for the indirect effect of the mediator on the dependent variable. To assess the mediation, a Monte Carlo resampling method was used with 20000 simulations to obtain 95% Confidence Intervals for the indirect effects using an R web utility (Preacher and Selig, 2010).

# Results

#### Task Performance

#### Memory Performance

Participants (*N* = 66) performed well on the recognition task, choosing the correct face 71% ± 0.02% of the time (Mean ± SEM).

#### Metamemory Performance

First we computed the mean FOK and RCJ ratings for both hits and misses. These data show that both FOKs and RCJs were meaningfully related to memory in that participants gave higher ratings (scale 0–10) for hits than misses, for both FOKs [FOK for hits: 6.77 ± 0.18, FOK for misses: 5.51 ± 0.20, *t*(65) = 10.70, *p <* 0.00001, 95% CI of the difference for hits and misses (1.02,1.49)] and RCJs [RCJ for hits: 7.60 ± 0.16, RCJ for misses: 5.05 ± 0.19, *t*(65) = 15.21, *p <* 0.00001, 95% CI of the difference for hits and misses (2.21,2.88)]. However, consistent with the idea that metamnemonic judgments are made online and depend on different inputs at different times over the course of retrieval, compared to FOKs, RCJs were higher for hits [*t*(65) = 9.67, *p <* 0.00001, 95% CI of the difference between FOKs and RCJ for hits (0.66,1.0)] and lower for misses [*t*(65) = 3.38, *p <* 0.001, 95% CI of the difference between FOKs and RCJ for misses (0.19,0.73)]. Furthermore, the difference between the mean metamnemonic judgments for hits and misses was greater for RCJs than for FOKs [mean difference for RCJs: 2.59 ± 0.17, mean difference for FOKs: 1.29 ± 0.12, *t*(65) = 9.67, *p <* 0.00001, 95% CI of the difference between FOKs and RCJs (1.04,1.58)], showing that RCJs made after retrieval better reflected the true difference between hits and misses than did FOKs elicited prior to retrieval.

The mean ratings provide compelling evidence that RCJs better reflected true memory than did FOKs, however, that analysis considers groups of items and does not capture the accuracy of metamnemonic judgments at the item-by-item level. Traditionally, the gamma correlation has been used to measure relative metamnemonic accuracy, that is, the extent to which individuals' metamnemonic judgments reflect their own memory performance for one item relative to another. FOKs and RCJs were reasonably accurate, as shown by gammas (FOK: 0.43 ± 0.03; RCJ: 0.61 ± 0.03), and RCJs were more accurate than FOKs [*t*(65) = 7.05, *p <* 0.00001, 95% CI of the difference between FOKs and RCJs (0.13,0.23)]. Similar to gamma, FOKs and RCJs were reasonably accurate, as shown by *d*<sup>a</sup> (FOK: 0.62 ± 0.057; RCJ: 1.12 ± 0.072)1 , and RCJs were more accurate than FOKs, as measured by *d*<sup>a</sup> [*t*(64) = 9.14, *p <* 0.00001, 95% CI of the difference between FOKs and RCJs (0.40,0.62)].

#### Metamemory Judgments Over Time

One way to examine metamemory across the learning and memory time scale is to examine the difference in FOKs and RCJs. As one would expect, metamemory judgments change with additional mnemonic experience (i.e., retrieval attempts, weighing of alternatives, and making a recognition decision). Examination of trial-by-trial changes in ratings (FOK–RCJ) showed that the recognition experience led to a change in ratings [*t*(65) = −9.69, *p <* 0.00001, 95% CI of FOK–RCJ (−1.55,−1.02)]; for hits, the level of RCJs were 0.83 ± 0.09 higher than FOKs, whereas for misses, RCJs were 0.46 ± 0.14 lower than FOKs.

The change in ratings makes it clear that our subjective evaluation of our memory changes with more mnemonic experience, but it is also possible that prior metamemory judgments could influence subsequent metamemory judgments. To test this, we used multilevel modeling to examine FOK rating as a predictor of RCJs, and compared the addition of this covariate to a model without it. In the first model, we entered recognition accuracy and reaction time, and their interaction, as covariates. As expected, recognition accuracy and speed of retrieval were significant predictors of RCJs (**Table 1**). To test whether FOK ratings also influenced RCJs, we added FOK rating to the model. FOK rating was also a significant predictor of RCJs (**Table 1**). Furthermore, a likelihood ratio test showed that the model including FOK ratings fit better than the one without FOK ratings [χ2(1) = 777, *p <* 0.00001]. Thus,

<sup>1</sup>Data from 1 subject was not included in the analysis of *d*<sup>a</sup> because their data did not allow for the calculation of *d*a.



*Accuracy, Recognition accuracy; RT, Reaction time;* −*2LL,* −*2 Restricted log likelihood.*

∗*p < 0.05,* ∗∗∗*p < 0.001.*

individuals' prospective metamemory judgments, combined with their retrieval experience, predict RCJs.

#### Eye Movements

#### Fixations during the Scene Cue and Cue Fluency

Based on prior research suggesting that cue fluency contributes to FOKs (Schwartz and Metcalfe, 1992; Metcalfe et al., 1993; Koriat and Levy-Sadot, 2001) and RCJs (Koriat et al., 2008; Chua et al., 2012), we capitalized on the *reprocessing effect* (Althoff and Cohen, 1999; Ryan et al., 2000) and used the difference in the number of fixations to the scene cue at study and test to examine cue fluency. We used multi-level modeling to examine whether cue fluency covaried with (1) FOKs and (2) RCJs (**Table 2**). The model also included recognition accuracy and the accuracy × cue fluency interaction. Cue fluency was related to higher FOKs (*p <* 0.001) and RCJs (*p <* 0.001), and there was no significant interaction with accuracy (FOKs: *p >* 0.9; RCJs: *p >* 0.2). For FOKs, cue fluency was still a significant predictor (*p <* 0.001), even when controlling for RCJs, suggesting that cue fluency makes a direct contribution to FOKs. However, for RCJs, cue fluency was no longer a significant predictor (*p >* 0.2) when controlling for FOKs, suggesting that the influence of cue fluency on RCJs may be indirect and occur via prospective metamemory judgments (**Table 2**).

To test the idea that the effects of cue fluency on RCJs are mediated by FOKs, we ran a mediation analysis. When FOK was included as a mediator, the effect of cue fluency on RCJs became non-significant [*b* = 0.013 ± 0.015, t(2987) = 0.90, *<sup>p</sup> <sup>&</sup>gt;* 0.35; **Figure 2**]. The indirect effect was significant, with FOKs mediating the effect of cue fluency on RCJs (ab: 0.054 ± 0. 0086, 95%CI of ab 0.038, 0.072, Z = 6.34, *p <* 0.0001).

#### Viewing Directed to the Correct Face and Target Accessibility

Target accessibility has been thought to subserve both FOKs (Koriat, 1993, 1994, 1995) and RCJs (Koriat, 2008b, 2012), and we examined this using two eye movement measures: overall proportion of viewing time directed to the correct face and the onset of the first fixation to the correct face. We used multi-level modeling to examine whether each variable covaried with (1) FOKs and (2) RCJs (**Table 3**). Each model also included recognition accuracy and the accuracy × eye movement interaction.

We first examined the relationship between proportion of viewing directed to the correct face and FOKs (**Table 3**; **Figure 3**). Consistent with the hypothesis that FOKs are related to target accessibility regardless of whether subsequent recognition is accurate or not, a higher proportion of viewing directed to the correct face was associated with higher FOKs for correct and incorrect recognition (*p <* 0.001), and there was no significant interaction with accuracy (*<sup>p</sup> <sup>&</sup>gt;* 0.25; **Figure 3A**). To determine whether this effect remained when controlling for RCJs, we ran a subsequent model including RCJs as a covariate; there was only a trend for higher proportion of viewing directed to the correct face associating with higher FOKs (*p <* 0.08).

We also examined the relationship between onset of the first fixation to the correct face and FOKs (**Table 3**), reasoning that


TABLE 2 | Multi-level modeling of metamemory judgments by cue fluency (measured as the number of fixations to the scene cue during study minus the number of fixations to the scene cue at test) and recognition accuracy.

*Accuracy, Recognition accuracy;* −*2LL,* −*2 Restricted log likelihood.* ∗∗∗*p < 0.001.*

the speed of memory-based attentional capture indexed target accessibility (Hannula et al., 2007; Richmond and Nelson, 2009). Like the proportion of viewing to the correct face, analyses of onset of the first fixation to the correct face were consistent with the hypothesis that FOKs are related to target accessibility. Faster onsets of the first fixation to the correct face were associated with higher FOKs for correct and incorrect recognition (*p <* 0.05), and there was no significant interaction with accuracy (*p >* 0.07). As shown by subsequent models that included RCJs as a covariate, faster onsets of the first fixation to the correct face was still associated with higher FOKs when controlling for RCJs (*<sup>p</sup> <sup>&</sup>lt;* 0.05; **Table 3**), and there was no interaction with accuracy (*p >* 0.1).

We next examined the relationship between proportion of viewing directed to the correct face and RCJs (**Table 3**; **Figure 3B**). Demonstrating that RCJs for correct and incorrect recognition have different relationships to target accessibility (**Figure 3B**), there was a significant accuracy <sup>×</sup> proportion of

TABLE 3 | Multi-level modeling of metamemory judgments by viewing directed to the correct face and recognition accuracy.


*Accuracy, Recognition accuracy;* −*2LL,* −*2 Restricted log likelihood; PropVT, proportion of viewing; First fix ons, onset of the first fixation.* ∗*p < 0.05,* ∗∗∗*p < 0.001.*

viewing directed to the correct face interaction (*p <* 0.001) such that increased viewing led to higher RCJs for correct recognition [*B* = 2.70; *t*(3451) = 4.87, *p <* 0.001], but not incorrect recognition [*B* = 0.370; *t*(3451) = 0.435, *p >* 0.65]. As shown by subsequent models that included FOKs as a covariate, this effect remained when controlling for FOKs (*<sup>p</sup> <sup>&</sup>lt;* 0.001; **Table 3**). Also consistent with the idea that RCJs are not based on target accessibility, the onset of the first fixation to the correct face was not a significant predictor of RCJs (*p >* 0.2), nor was its interaction with accuracy (*<sup>p</sup> <sup>&</sup>gt;* 0.3; **Table 3**).

#### Viewing Directed to the Chosen Face

To determine the influence of the ease of the recognition decision on FOKs and RCJs, we examined the proportion of viewing directed to the chosen face and the onset of the first fixation to the chosen face (**Table 4**). The chosen face was the face that the subject indicated via button press to be the face that was originally paired with the scene. It is worth noting that for correct recognition, the chosen face, and the correct face are the same, whereas for incorrect responses, the chosen face was an incorrect face. Thus the values for the proportion of viewing and onset of the first fixation directed to the correct and chosen face are the same for correct recognition, but not incorrect recognition. Therefore, the added value of examining how these viewing measures directed to the chosen face relate to metamemory judgments is for incorrect recognition.

For FOKs (**Table 4**; **Figure 3C**), there was a significant viewing directed to the chosen face × accuracy interaction (*p <* 0.001) such that a higher proportion viewing of the chosen face was associated with higher FOKs for correct recognition [*B* = 1.78; *t*(3451) = 3.34, *p <* 0.001], but not incorrect recognition [*<sup>B</sup>* <sup>=</sup> 0.021; *<sup>t</sup>*(3451) <sup>=</sup> 0.026, *<sup>p</sup> <sup>&</sup>gt;* 0.95; **Figure 3C**]. This interaction remained significant when controlling for RCJs (*p <* 0.005). This suggests that recognition choice does not significantly relate to FOKs overall, and is consistent with the idea that FOKs are related to target accessibility (see section Viewing Directed to the Correct Face and Target Accessibility) rather than accessibility of any choice.

To further test whether FOKs were related to accessibility of any choice, we examined the relationship between onset of the first fixation to the chosen face and FOKs. Faster onsets of the first fixation to the chosen face were associated with higher FOKs (*<sup>p</sup> <sup>&</sup>lt;* 0.05; **Table 3**), and did not interact with accuracy, and this remained significant when controlling for RCJs (*p <* 0.05). Unlike our previous analyses of the proportion of viewing directed to the chosen face, the finding that there is faster viewing of the chosen face, regardless of accuracy, is more consistent with partial access theories of FOK.

We also examined viewing directed to the chosen face for RCJs (**Table 4**; **Figure 3D**). The proportion of viewing directed to the chosen face was associated with higher RCJs overall (*p <* 0.001; **Figure 3D**), and did not significantly interact with accuracy (*p >* 0.06). When controlling for FOKs, the main effect, with increased viewing directed to the chosen face predicting higher RCJs for correct and incorrect recognition, remained significant (*p <* 0.001). Although overall proportion of viewing to the chosen


TABLE 4 | Multi-level modeling of metamemory judgments by viewing directed to the chosen face and recognition accuracy.

*Accuracy, Recognition accuracy;* −*2LL,* −*2 Restricted log likelihood; PropVT Chosen, proportion of viewing to the chosen face; First fix ons Chosen, onset of the first fixation to the chosen face.*

∗*p < 0.05,* ∗∗*p < 0.01,* ∗∗∗*p < 0.001.*

face was a significant predictor of RCJs, the onset of the first fixation to the chosen face was not (*p >* 0.2). Nevertheless, the findings that increased viewing of the chosen face is associated with higher RCJs is consistent with the idea that recognition confidence is based, at least in part, on the ease of decisionmaking and choice behavior.

#### Changes in Metamemory Judgments

Given that FOKs influence RCJs, one question that arises is what leads to changing one's rating of certainty after making that first metamemory judgment. That is, if you've given an FOK of 7, what would happen during recognition that would lead to lowering one's confidence to a 5, maintaining one's confidence at a 7, or raising it to a 10. To examine this we constructed a model with the change in metamemory rating (FOK–RCJ) as the dependent variable, and included different factors related to target accessibility and the recognition decision as predictors (**Table 5**). In one model, we focused on viewing directed to the correct face to examine target accessibility, and entered the proportion of viewing directed to the correct face, its interaction with accuracy, the onset of the first fixation to the correct face, its interaction with accuracy, the initial FOK rating, and recognition accuracy in the model. In addition to accuracy (*p <* 0.001) and FOKs (*p <* 0.001) being significant predictors of the change in metamemory rating, increased viewing of the correct face interacted with accuracy such that for hits it led to raising one's RCJ higher than the FOK [*B* = −2.32, *t*(3448) = −4.34, *p <* 0.001] and for misses it had no effect [*B* = 0.28, *t*(3448) = 0.33, *p >* 0.7]. There was also an interaction of first fixation onset to the correct face and accuracy (*p <* 0.05), but the simple slopes were not significant. Thus, more accurate confidence judgments (i.e., those where higher confidence ratings are given to hits) are based on target accessibility, whereas less accurate confidence ratings (i.e., those where higher confidence ratings are given to misses) are not.

We also examined the possibility that information related to the recognition decision was driving the change in confidence for both hits and misses, and therefore, we focused on viewing directed to the chosen face. We constructed a second model with the change in metamemory rating (FOK–RCJ) as the dependent variable, and entered the proportion of viewing directed to the chosen face, its interaction with accuracy, the onset of the first fixation to the chosen face, its interaction with accuracy, the initial FOK rating, and recognition accuracy in the model. In addition to accuracy (*p <* 0.001) and FOKs (*p <* 0.001) being significant predictors, increased (*p <* 0.001) and faster (*p <* 0.001) looking at the chosen face predicted raising ones confidence above one's FOK, with no interaction with accuracy (**Table 5**). The consistent association of viewing directed to the chosen face with raising one's confidence above the initial FOK is consistent with the idea that, in addition to target accessibility, the recognition decision-making experience is driving confidence and, unlike target accessibility, does so regardless of accuracy.

### Discussion

To examine how memory monitoring changes over time, and the basis for those changes, we used eye movement indices of memory to examine how cue fluency, target accessibility, and choice behavior influence (1) FOKs, and (2) RCJs. We showed that early metamemory judgments, namely FOKs, are based on cue fluency and accessibility. Later metamemory judgments, namely RCJs, are based on the decision-making experience, and the earlier metamemory judgment.


#### TABLE 5 | Multi-level modeling of change in metamemory judgments by viewing directed to correct/chosen face and recognition accuracy.

*Accuracy, Recognition accuracy; FOK, Feeling-of-knowing; RCJ, Retrospective confidence judgment;* −*2LL,* −*2 Restricted log likelihood; PropVT correct, proportion of viewing to the correct face; First fix ons correct, onset of the first fixation to the correct face; PropVT chosen, proportion of viewing to the chosen face; First fix ons Chosen, onset of the first fixation to the chosen face.*

∗*p < 0.05,* ∗∗∗*p < 0.001.*

#### Feeling-of-Knowing Judgments: Cue Fluency and Target Access

After viewing a cue, but before the 3AFC recognition test, participants were asked to indicate their certainty about their future recognition performance by indicating their FOK. Consistent with prior research showing that cue-related processing influences FOKs (Schwartz and Metcalfe, 1992; Metcalfe et al., 1993), we showed that a greater change in fixations to the scene cue from study to test, which indexes more fluent processing, was related to higher FOKs.

A more controversial basis of FOKs relates to accessibility. Although early models of FOKs proposed that they were based on direct access to the target (Metcalfe, 2000), there is evidence against such an account of FOKs (Koriat, 2000; Koriat and Levy-Sadot, 2000). Here, we used eye movements directed to the correct face as an indirect measure of direct access to the target. Rapid viewing of the correct face is thought to be an obligatory effect of memory on eye movements (Hannula et al., 2007; Ryan et al., 2007), and higher FOKs were associated with faster fixations to the correct face, for both correct and incorrect responses. Although our measure of target access is indirect, this is consistent with the idea that for higher FOKs subjects had access to the target, and this led to faster fixations to the target. Thus, it appears that direct access can serve as a basis of FOKs.

However, we also showed that faster fixations to the incorrectly chosen face for misses predicted higher FOKs, which is difficult for direct access theories to explain. This kind of illusory FOK, could be based on partial access to the target, but could also be based on erroneous, yet accessible information. For example, after viewing a scene cue, a participant might recall that the target has red hair and, as a result, give a high FOK, and look more quickly to a target with red hair, even if the correct face has brown hair. In such a case, the FOK appears to be based on the amount or fluency of accessible information, which is different than direct access to the target (Koriat, 1995). Thus it appears that to make FOKs, individuals may monitor accessibility without respect to accuracy (Koriat, 1995, 2000; Koriat and Levy-Sadot, 2001). However, not all accessible features will increase FOKs; the information must be relevant and meaningful (Thomas et al., 2012). For example, if a person recognizes a scene, they may recall they imagined the individual acting in the scene, "I remember imagining the face picking the corn in the field," but an individual may not use this information as the basis of their FOK if they recognize that remembering the action does not indicate whether they will remember the face. Overall, it appears that FOKs are based on accessibility of information, but this access does not necessarily have to be accurate.

One consideration for our findings is that typical FOK paradigms use a Recall-Judgment-Recognition format (Hart, 1965), and FOKs are made about unrecallable items only (Nelson and Narens, 1990). In our paradigm, we did not explicitly test for recall, and participants made a FOK judgment after every trial. Because participants were presented with a scene cue, and likely attempted to recall the associated face, we may have some trials in which participants recalled the face associated with the scene. This could explain why we get faster fixations to the target face. Although this poses some limitation for comparisons with other studies, it is unlikely to change our interpretation of the basis for FOKs. First, our analyses were based on continuous variables ranging from 0 to 10, and tested for linear effects, so it is unlikely that recalled items, which presumably would be given the highest FOK rating, would alter our results. Second, because our findings are consistent for both correct and incorrect recognition, it seems unlikely that removing successfully recalled items would change the results.

#### Retrospective Judgments: FOK and Choice Experience

Whereas most studies examine FOKs and RCJs in isolation, we examined metamemory judgments across time. Our results showed that FOKs predict RCJs, which suggests that a participant's expectations about his or her performance influenced his or her confidence. At face value, FOKs appear to measure expected performance. Empirically, they have also been related to beliefs about what should be known (i.e., expectations) in the general knowledge domain (Costermans et al., 1992; Marquié and Huet, 2000). As one might expect given that FOKs predict confidence, general beliefs about ability have been shown to predict confidence in both general knowledge and eyewitness memory paradigms (Perfect, 2004). The effects of expectations on confidence have also been tested more directly. For example, using an episodic memory cueing paradigm, participants were told that a stimulus was either "Likely Old" or "Likely New," and when this "Likely Old" cue was invalid, participants had decreased confidence in correct rejections (i.e., saying the item was new) compared to when the cue was "Likely New" (Jaeger et al., 2012). Thus, our findings that FOKs influence confidence are consistent with prior research showing that, in some cases, expectations can influence confidence.

The fact that expectations influence confidence may have relevance for findings showing that cue-related processing influences confidence (e.g., Koriat et al., 2008; Chua et al., 2012) because it suggests that the effect of cues on confidence may be indirect. Indeed, in our analyses, we initially showed that cue fluency predicted RCJs. However, subsequent mediation analyses revealed this was an indirect effect, with FOKs mediating the effect of cue fluency on confidence. This mediating effect is unlikely to rely on explicit FOK judgments; in a previous study, using a similar paradigm, but without FOK judgments, we showed that fixations to the scene cue predicted confidence (Chua et al., 2012). Similarly, general knowledge tasks, which show domain specific increases in confidence (Marquié and Huet, 2000), suggest that the effects of domain familiarity may be indirect in that participants give higher confidence ratings because they expect to know more in that domain.

The idea that expectations, or prospective metamemory judgments, can influence retrospective memory judgments may relate broadly to findings that metamemory judgments are inferential in nature (Schwartz et al., 1997; Koriat et al., 2008; Koriat and Ackerman, 2010). An FOK judgment may be another source of information that people use when making inferential RCJs. Work examining memory for deceptive and non-deceptive items, in which the deceptive information is based on gist or partial access has consistently shown that gist or partial access leads to higher retrospective confidence and decreased metacognitive accuracy (Brewer and Sampaio, 2006, 2012). What drives this effect is the inference that gist or partial access relates to accuracy. Because FOKs may also be based on partial access (Koriat, 1994), it is reasonable to think that participants may make the same inference that FOKs relate to accuracy.

Clearly, expectations are not the only basis for retrospective confidence; the recognition experience is a major contributor to confidence (Koriat et al., 2008; Koriat and Ackerman, 2010; Chua et al., 2012). This makes intuitive sense, and we also show that participants raise or lower their confidence after their recognition experience, and that RCJs are more accurate than FOKs (Watier and Collin, 2011). Furthermore, when we control for FOKs, the only covariates that significantly predict RCJs, or changing one's metamemory rating, relate to the recognition experience. The variables that predicted confidence, and did not interact with accuracy, all related to choice behavior, namely the proportion of viewing time and the onset of the first fixation directed to the chosen face. Viewing directed to the correct face did not vary by confidence for incorrect recognition responses. Thus it appears that confidence comes from decision-making behavior rather than direct access to the memory trace alone. This is consistent with experience-based models of RCJs (Koriat et al., 2008) in that the online feedback about the recognition experience, as indexed by eye movements, predicted confidence, and changes in metamemory ratings. One recent experience-based model, the self-consistency model of confidence, proposes that confidence in forced choice tasks comes from the evidence for the choice based on the amount of conflict experienced when making a decision (Koriat, 2012), and our findings that confidence for correct and incorrect recognition tracks viewing of the chosen face is consistent with this.

Additional merits to the *self-consistency model of confidence* is that it may also explain the correlation between FOKs and RCJs. The model posits that confidence tracks reliability (Koriat, 2012). Reliability would come from consistent mnemonic information retrieved during the scene cue period, which would form the basis of the FOK, and the recognition task, which forms the basis of RCJs. To call on our earlier example, if the scene cue elicited a memory of a person with red hair, and there was a red haired individual on the three-face display, one might have higher confidence because there was consistency across time. If there were no individual with red hair in the threeface display, the participant might then drastically lower his or her confidence. Thus, an alternative explanation for why FOKs predict confidence is that confirmatory evidence over multiple time points leads to higher confidence (Koriat, 2012).

It is also possible that individual differences in motivation, beliefs, and biases could drive the relationship between FOKs and RCJs. One such bias may be the confirmation bias, which has some commonalities the self-consistency model of confidence (Koriat, 2012), in which individuals selectively look for information that confirm their hypotheses (Nickerson, 1998), and which there are known to be individual differences (Rassin, 2008). Similarly, individual differences in beliefs about one's memory (Lineweaver and Hertzog, 1998; Magnussen et al., 2006) may also influence the degree to which participants are willing to update their confidence judgments based on the target retrieval experience. These beliefs may apply across the learning and memory timescale, and similarly bias individuals' FOKs and RCJs. Our study did not involve any manipulation that might affect participants' efforts to be consistent nor measure individual differences in biases, but this idea should be considered in future research.

Another reason that prospective and retrospective metamemory ratings may interact is that they share a common neural basis (Chua et al., 2009; Ryals et al., 2015). For example, recent fMRI work has shown that the act of making FOKs and RCJs shows common activation compared to other memory and non-memory tasks (Chua et al., 2009). Furthermore, experimental manipulation of brain activity by theta burst TMS over the frontopolar cortex selectively improved both prospective (in this case JOLs) and retrospective metamemory judgments (Ryals et al., 2015), indicating a shared neural basis for metamemory judgments that occur at different times across the learning and memory timescale.

It is worth mentioning that, in this study, viewing directed to the chosen face was indicative of confidence for both correct and incorrect recognition, unlike in our previous study (Chua et al., 2012), in which it tracked confidence for correct responses only. Differences in the paradigm and analyses may explain the discrepancy. In this study, we used a larger response scale and analyzed confidence as a continuous variable, whereas in the previous study we examined confidence as high, medium, and low. Additionally, in our previous study examined differences for a set period of time, whereas in this study, we examined viewing before the recognition response, which may have increased our sensitivity to detect effects.

# Conclusion

Memory monitoring is an ongoing process that that involves a dynamic model that changes across time. Memory monitoring

#### References


assessed prior to recognition is based on cue fluency, and target accessibility leads participants to have expectations about their future performance. The experience during the recognition task, in particular the experience related to choice behavior, gives rise to subjective feelings of confidence in one's answer. However, the target recognition experience only accounts for some of the variance, and one's metamemory judgment prior to recognition also influences their metamemory judgment following recognition. These results indicate that metamemory judgments should not be thought of as distinct subjective experiences in time but as an evolving awareness that incorporates the past metamnemonic judgments with new information into a dynamic model of memory.

# Acknowledgments

The authors thank Sergio Zenisek for help with data collection. This research was supported by National Institute on Aging of the National Institutes of Health under award number SC2AG046910. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Chua and Solinger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The Effects of Changing Attention and Context in an Awake Offline Processing Period on Visual Long-Term Memory

Timothy M. Ellmore1, 2 \*, Anna Feng<sup>1</sup> , Kenneth Ng<sup>1</sup> , Luthfunnahar Dewan<sup>1</sup> and James C. Root <sup>3</sup>

*<sup>1</sup> Department of Psychology, The City College of New York, City University of New York, New York, NY, USA, <sup>2</sup> Program in Behavioral and Cognitive Neuroscience, The Graduate Center, City University of New York, New York, NY, USA, <sup>3</sup> Department of Psychiatry and Behavioral Sciences, Memorial Sloan Kettering Cancer Center, New York, NY, USA*

#### Edited by:

*Antonino Vallesi, University of Padua, Italy*

#### Reviewed by:

*Elisa Di Rosa, University of Padua, Italy Francesco Amico, Newcastle Hospital, Ireland*

> \*Correspondence: *Timothy M. Ellmore*

*tellmore@ccny.cuny.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *26 June 2015* Accepted: *25 November 2015* Published: *07 January 2016*

#### Citation:

*Ellmore TM, Feng A, Ng K, Dewan L and Root JC (2016) The Effects of Changing Attention and Context in an Awake Offline Processing Period on Visual Long-Term Memory. Front. Psychol. 6:1902. doi: 10.3389/fpsyg.2015.01902*

There is accumulating evidence that sleep as well as awake offline processing is important for the transformation of new experiences into long-term memory (LTM). Yet much remains to be understood about how various cognitive factors influence the efficiency of awake offline processing. In the present study we investigated how changes in attention and context in the immediate period after exposure to new visual information influences LTM consolidation. After presentation of multiple naturalistic scenes within a working memory paradigm, recognition was assessed 30 min and 24 h later in three groups of subjects. One group of subjects engaged in a focused attention task [the Revised Attentional Network Task (R-ANT)] in the 30 min after exposure to the scenes. Another group of subjects remained in the testing room during the 30 min after scene exposure and engaged in no goal- or task-directed activities. A third group of subjects left the testing room and returned 30 min later. A signal detection analysis revealed no significant differences among the three groups in hits, false alarms, or sensitivity on the 30-min recognition task. At the 24-h recognition test, the group that performed the R-ANT made significantly fewer hits compared to the group that left the testing room and did not perform the attention ask. The group that performed the R-ANT and the group that remained in the testing room during the 30-min post-exposure interval made significantly fewer false alarms on the 24-h recognition test compared to the group that left the testing room. The group that stayed in the testing room and engaged in no goal- or task-directed activities exhibited significantly higher sensitivity (*d* ′ ) compared to the group that left the testing room and the group that performed the R-ANT task. Staying in the same context after exposure to new information and resting quietly with minimal engagement of attention results in the best ability to distinguish old from novel visual stimuli after 24 h. These findings suggest that changes in attentional demands and context during an immediate post-exposure offline processing interval modulate visual memory consolidation in a subtle but significant manner.

Keywords: offline processing, attention, context, consolidation, intermediate memory, working memory, recognition memory

# INTRODUCTION

The human capacity to store complex visual information in long-term memory (LTM) is vast and well documented. In an early notable set of experiments (Standing et al., 1970), subjects recognized 90% of pictures from a set of over 2500 stimuli with only 1 s of display at initial encoding after 30 min and 24 h. While the capacity of visual information in LTM is high, much less is understood about the temporal dynamics and cognitive factors influencing the transformation of visual information from short-term memory (STM) to LTM. The type of processing after learning certainly is one important modulator; it is known that sleep and napping after learning improve the consolidation (i.e., permanent storage) of information in LTM (Stickgold, 2013), and that humans exhibit deficits in the ability to form new memories without regular periods of sleep (Yoo et al., 2007).

Aside from sleep or a nap, simple rest or quiet wakefulness without being engaged in any goal- or task-related processing facilitates memory strengthening (Ellenbogen et al., 2007) as it allows for "offline processing" including, most importantly for declarative memory, the replay of hippocampal-cortical traces and strengthening of horizontal cortico-cortical connections (O'Neill et al., 2010). Through this process, one idea is that memories stabilize or become resistant to interference during rest or quiet wakefulness and then consolidate during sleep (Walker, 2005). Another idea is that part of the process of consolidation takes place during wakefulness and periods of relative inactivity after the acquisition of new information (Buzsaki, 1989; Frankland and Bontempi, 2005).

In more typical learning situations outside a controlled testing environment, periods of wakefulness following the acquisition of new information are not accompanied by inactivity, but rather by a plethora of different cognitive processes and other experiences. How other active cognitive processes and behaviors influence the offline persistence of memory traces is a question beginning to be addressed by cognitive neuroscientists (Peigneux et al., 2006; Tambini et al., 2010; Staresina et al., 2013), but much still remains to be learned about how different types of active wakefulness following new learning affects the stabilization and consolidation of memories. This is an important area for future research as understanding the variables influencing how memories are made permanent in everyday scenarios has important implications for developing more effective education and training strategies, not to mention informing the experimental design of future basic memory investigations.

In the present study, we tested the hypothesis that changing attention and spatial context during an awake offline processing period immediately following exposure to new visual stimuli would affect the consolidation of stimuli into LTM. We predicted that engaging attention on an unrelated task and switching context would impair scene recognition by reducing the opportunity and efficiency of offline processing during wakeful activity.

# MATERIALS AND METHODS

# Subjects

Data were obtained from a total of 41 subjects (mean 19.75 years old, STD 2.39, 16 males, 4 left-handed). Each subject provided informed consent and completed the present study, which was approved by the Institutional Review Board of the City College of New York Human Research Protection Program. Subjects were recruited through the Sona Systems scheduling system of the Psychology Department. Each student received course extra credit for a total of 2 h of participation over the course of 2 days.

# Design and Apparatus

A mixed within- and between-groups design was used. Subjects completed cognitive tasks inside a sound-attenuated booth (IAC Acoustics) to minimize auditory and visual distractions. Both the memory and attention tasks were programmed in Superlab 5 (Cedrus Corporation). Before performance of the actual tasks, subjects were given short (less than 5 min) demonstration tasks, with different stimuli for the scene tasks, to ensure that they understood the task instructions. The stimuli in the memory tasks were naturalistic scenes in 24-bit color sampled from the SUN database (Xiao et al., 2010). Each scene was displayed on a 27-inch LED monitor with a refresh rate of 60 hertz and a screen resolution of 1920-by-1080. Participants sat 83.5 cm from the monitor and maintained stable viewing using a combined forehead/chin rest. Each scene measured 800-by-600 pixels on the screen, and from the subject's point of view occupied a horizontal viewing angle of 17.2◦ and a vertical viewing angle of 12.7◦ .

On the first day, each participant was exposed to a set of scenes by completing a 40-trial Sternberg working memory (WM) paradigm (see **Figure 1** for example task timeline). Each trial consisted of a presentation of two to five scenes, each scene lasting 2 s (10 trials for each of the 4 loads). A 6 s blank screen delay period followed the presentation of the set of scenes. After the delay, a probe stimulus was presented for 2 s and, if the stimulus was considered to belong to the previously presented set (50% chance), the subject was required to press the right (green) button on a RB-530 response pad (Cedrus Inc.); if the probe did not belong to the previous set, the subject was required to press the left (red) button on the response pad. The scenes were presented within a modified Sternberg WM paradigm, rather than requiring subjects to passively view them. Following participation in the WM task, subjects were randomly assigned to one of the three groups. One group spent the following 30 min performing the revised version (Fan et al., 2009) of the Attentional Network Task (R-ANT) starting immediately after completion of the WM task.

In the R-ANT task, the subject is asked to determine the direction of the center arrow in a flanker array that may appear either to the left or right of fixation. The target is preceded by one of three cue conditions: no cue—no cue is presented; double cue—both cue boxes flash briefly to indicate when but not where the target will appear; valid cue—the left or right cue box flashes when and where the target will appear; invalid cue—the opposite cue box from where the target will appear flashes. Two conditions of flanker targets are included: congruent—all arrows

point in the same direction; and incongruent—the center arrow points in the opposite direction from that of the flankers. These conditions allow an attention network to be identified through component decomposition. An alerting component is calculated by subtraction of the no cue condition from the double cue condition; an orienting component is calculated by subtraction of the double cue condition from the valid cue condition; and the executive component is calculated by subtraction of the incongruent target condition from the congruent target condition. The alerting component represents the earliest attention-related process and reflects arousal. The orienting component represents attentional shifting. The executive component represents the resolution of conflicting stimulus information. Subjects in the second group (No R-ANT, leave testing room) were dismissed from the testing room and told to return in 30 min; subjects in the third group (No R-ANT, stay in testing room) remained seated quietly in the testing room and did not engage in any goaldirected or task-related activity. None of the subjects in any of the three groups was told that there would be a recognition test after 30 min.

A total of 21 subjects who completed the R-ANT task after the scene WM task included 10 males, 2 left handed, with mean age 19.29 (STD 2.37). A total of 10 subjects who did not complete the R-ANT task and instead left the testing room for 30 min after completing the scene WM task included 3 males, 1 left handed, mean age 19.8 (STD 2.04). Another 10 subjects who did not complete the R-ANT task and instead remained in the testing room for 30 min after completing the scene WM task included 3 males, 1 left handed, mean age 20.6 (STD 2.72).

Following the 30-min interval, subjects in all three groups completed a 56 trial recognition memory test in which 30 randomly selected scenes from the WM task were randomly intermixed with 26 new scene images. Subjects were required to press the right (green) button if they thought they had already seen the scene (50% chance, with old scenes sampled from the WM task presentation), and the left (red) button if they decided that they had not seen the image (50% chance, new scenes sampled from SUN database). Scenes were presented one at a time for 2 sec. After the 30-min recognition test, each subject was told to return the following day at the same time. None of the subjects were informed on the nature of the test taking place the following day.

On the second testing day, all subjects completed a new 56 trial recognition memory test in which 26 of the stimuli had already been presented in the previous WM test. Importantly, none of the old stimuli presented during the 24-h recognition test was presented during the recognition test on the first day of testing; none of the novel stimuli presented during the 24-h recognition test was presented previously. Subjects were required to press the right (green) button if they thought they had already seen a stimulus (50% chance, with old stimuli sampled from the WM task presentation), and the left (red) button if they decided that they had not seen a stimulus previously (50% chance, new stimuli sampled from SUN database).

Also, on the second day of testing subjects who completed on the previous day the R-ANT task and subjects who remained in the testing room and engaged in no task completed a 14 item questionnaire of demonstrable reliability (Ellis et al., 1981) in which they rated the quality and duration of their previous night's sleep. The sleep ratings were only obtained for 10 of the subjects who during the previous day remained in the testing room and performed the R-ANT task and the 10 subjects who during the previous day remained in the testing room and did not complete the R-ANT task.

## Analysis

Performance across the forty WM trials was computed as percent correct to ensure that subjects were viewing the scenes and making decisions with a high level of accuracy. Percent correct accuracy was also computed for the R-ANT task for subjects who completed it between WM and recognition testing on day 1. A signal detection analysis (Stanislaw and Todorov, 1999) was performed on the 30 min and 24 h recognition task data. A hit was counted when a previously presented stimulus was signaled by the subject pressing a button indicating, correctly, that the stimulus had been previously seen (an old stimulus correctly classified as old). A false alarm was counted when a (new) stimulus not previously presented was indicated by the subject pressing a button indicating, incorrectly, that the image had been previously presented (a new stimulus incorrectly classified as an old stimulus). Total hits and false alarms were expressed as proportions in each subject and used to compute a measure of sensitivity as the difference in standardized normal deviates of hits minus false alarms: d ′ = Z(hit rate) — Z(false alarm rate). The proportion of hits, false alarms, and sensitivity measures were analyzed using a mixed-design ANOVA in SPSS (v.21) with a within-subjects factor of time (time 0: working memory, time 1: recognition after 30 min, time 2: recognition after 24 h) and a between-subject factor of group (R-ANT task, leave testing room, stay in testing room). Paired comparisons of the proportion of hits, false alarms, and d ′ for the 30 min and 24 h recognition tasks were compared using independent sample t-tests also in SPSS among subjects who performed the R-ANT task, the subjects who left the testing room for the 30 min between the WM and recognition test administered on the first day of testing, and the group that remained in the testing room for the 30 min between the WM and the recognition task administered on the first day of testing. Post hoc statistical power expressed, as 0 to 100%, for paired comparisons was computed according to (Rosner, 2011),

$$Power = \Phi \left\{-Z\_{1-\alpha/2} + \frac{|\mu\_2 - \mu\_1|}{\sqrt{(\sigma\_1^2/n\_1)} + (\sigma\_2^2/n\_2)} \right\}$$

where n is the sample size for a given group, µ is the group mean, σ is the variance of the mean, α is the probability of type I error (0.05), z is the critical Z value for α, and 8 is the function converting a critical Z value to power.

# RESULTS

# Mixed-Model ANOVA

For the analysis of hits, Mauchly's test indicated the assumption of sphericity had not been violated [X<sup>2</sup> (2) = 1.24, p = 0.54], therefore the degrees of freedom were not corrected for the main effect of time [F(2,76) = 117.85, p < 0.0001, η 2 <sup>p</sup> = 0.76], the effect of group [F(2,38) = 0.37, p = 0.69, η 2 <sup>p</sup> = 0.02], or the interaction between time and group [F(4,76) = 1.78, p = 0.14, η 2 <sup>p</sup> = 0.09].

For the analysis of false alarms, Mauchly's test indicated no violation of the sphericity assumption [X<sup>2</sup> (2) = 0.98, p = 0.71], so the degrees of freedom were not corrected for the main effect of time [F(2,76) = 20.04, p < 0.0001, η 2 <sup>p</sup> = 0.34], the effect of group [F(2,38) = 1.81, p = 0.18, η 2 <sup>p</sup> = 0.09], or the interaction between time and group [F(4,76) = 1.82, p = 0.13, η 2 <sup>p</sup> = 0.09].

For the analysis of d ′ (sensitivity), Mauchly's test indicated no violation of sphericity [X<sup>2</sup> (2) = 0.95, p = 0.42], so the degrees of freedom were not corrected for the main effect of time [F(2,76) = 179.80, p < 0.0001, η 2 <sup>p</sup> = 0.83], the effect of group [F(2,38) = 0.66, p = 0.52, η 2 <sup>p</sup> = 0.03], or the interaction between time and group [F(4,76) = 0.93, p = 0.45, η 2 <sup>p</sup> = 0.05].

# Exposure to Scenes

Subjects in all three groups performed with similar accuracy on the WM task. Subjects assigned to leave the testing room (different context, no attention task) achieved 94.00% accuracy (STD 3.76) on the WM task. Subjects assigned to remain in the testing room (same context, no attention task) performed at 93.75% (STD 4.89) on the WM task. Subjects assigned to remain in the testing room and perform the R-ANT task during the retention interval (same context, attention task) performed at 93.93% (STD 4.23) on the WM task. The subjects who stayed in the testing room and performed the R-ANT task completed the R-ANT with an accuracy of 96.92% (STD 2.74).

# Recognition After 30 min

There were no significant differences among the three groups of subjects in either the proportion of hits (**Figure 2A**) or sensitivity (d-prime, **Figure 2C**) during the 30-min recognition test. Subjects who stayed in the testing room and did not engage in any task during the 30 min interval made fewer false alarm responses compared to subjects who stayed in the testing room and completed the R-ANT task (**Figure 2B**) but this tendency did not reach the a priori threshold for statistical significance [0.08 versus 0.13, t(29) = 1.32, p = 0.10, STD error of difference = 0.04, power = 39.5%].

# Recognition After 24 h

Subjects who left the testing room during the 30-min interval after exposure to the scenes on the first day obtained significantly more hits (**Figure 3A**) during the recognition test 24 h later compared to subjects who stayed in the testing room and performed the task [0.63 versus 0.54, t(29) = 1.91, p = 0.03, STD error of difference = 0.05, power = 54.5%].

Subjects who left the testing room during the 30-min interval after exposure to the scenes on the first day also made significantly more false alarms (**Figure 3B**) during the recognition test 24 h later compared to both the subjects who stayed in the testing room and performed the R-ANT task [0.20 versus 0.13, t(29) = 1.67, p = 0.05, STD error of difference = 0.04, power = 40.3%] and compared to the subjects who remained in the testing room and did not perform the task [0.20 versus 0.10, t(18) = 2.24, p = 0.02, STD error of difference = 0.04, power = 61.3%].

Subjects who remained in the testing room and did not perform the R-ANT task during the 30-min interval after exposure to the scenes on the first day exhibited the highest sensitivity (d′ ) during the recognition test 24 h later (**Figure 3C**) compared to subjects who left the testing room [1.73 versus 1.28, t(18) = 2.52, p = 0.03, STD error of difference = 0.18, power = 71.1%] and also compared to subjects who remained in the testing room and performed the R-ANT task [1.73 versus 1.39, t(29) = 1.91, p = 0.03, STD error of difference = 0.18, power = 53.3%].

### Sleep Duration and Quality

Sleep duration and quality for the night between day 1 scene exposure and day 2 recognition testing indicated no differences between subjects who remained in the testing room and performed the R-ANT testing and subjects who remained in the testing room and did not complete the R-ANT task [self-reported sleep quality from 1 "very light" to 8 "very deep": mean 5.30 ± 1.49 (STD) versus 5.62 ± 1.77, independent sample t-test p = 0.68; self-reported number of times awoke during the night 0 "not at all" to 7 "more than 6 times": 0.60 ± 0.84 versus 0.50 ± 0.76, p = 0.80; number self-reported hours of sleep: 6.77 ± 1.60 versus 6.16 ± 1.75, p = 0.44].

FIGURE 2 | Ability to Discriminate Old from New Scenes at the 30 min Recognition Test is Not Affected by Changing Attention or Context during Awake Offline Processing. Average hits (± SEM) do not differ among groups (A). There is a trend (#*p* < 0.1) for subjects who remained in the testing room to make fewer false alarms compared to subjects who performed the R-ANT (B). There is no difference in sensitivity (C) among the three groups.

FIGURE 3 | Ability to Discriminate Old from New Scenes at the 24 h Recognition Test is Better for Subjects Who Remained in the Same Context and Rested Quietly. Average hits (± SEM) are significantly lower for the group that performed the R-ANT task (A). Subjects who left the testing room made significantly (\**p* < 0.05) more false alarms (B) compared to subjects who performed the R-ANT task and subjects who remained in the testing room and rested quietly. Subjects who stayed in the testing room and rested quietly showed significantly higher sensitivity compared to the subjects who performed the attention task and those who changed spatial context by leaving the testing room during the 30-min post-exposure period (C).

# DISCUSSION

Entering a sleep state makes it more likely that information encountered during the immediately preceding awake period will be remembered (Walker, 2005; Stickgold, 2013). The beneficial effect of sleep for memory consolidation is hypothesized to depend in part on offline reprocessing of the information to create a stronger memory trace. Offline processing does not just occur during sleep; it is also thought to occur during awake periods, like quiet rest (Tambini et al., 2010; Staresina et al., 2013; Vilberg and Davachi, 2013; Schlichting and Preston, 2014). In the present study we tested the hypothesis that changing cognitive demands in the awake period immediately after the exposure to information would affect LTM for that information. We studied three groups of subjects all of whom were exposed to the same set of naturalistic color scenes. In the 30 min immediately following the exposure, one group remained in the testing room and engaged in an attention demanding task; another group was dismissed from the testing room thereby experiencing a switch in the context for offline processing and told to return 30 min later with no constraints on the activities in which they spent the 30 min, as long as it was outside the testing room. A third group remained in the testing room and waited quietly, engaging in no particular task- or goal-directed activity.

All subjects in each group completed a recognition task after the post-exposure interval but no group showed any significant difference in hits, false alarms, or sensitivity to distinguish old from novel scenes during the 30-min recognition test. All participants came back 24 h later for another recognition test involving unique subsets of new scenes and old scenes encountered during the previous exposure on the first day. There were significant differences among the groups in the 24-h recognition performance. The group of participants who performed the attention task during the 30-min interval on the first day showed the poorest hit rate at the 24-h recognition test. The group that left the testing room during the 30-min interval on the previous day showed the highest false alarm rate, significantly higher than the group that performed the attention task and the group that remained in the testing room. The group that remained in the testing room and waited quietly and engaging in no particular task- or goal-directed activity demonstrated the highest sensitivity for distinguishing old from new scenes after 24 h. These results indicate that varying attentional engagement and the context in which awake offline processing occurs in the immediate period following exposure to novel visual stimuli affects LTM for those stimuli.

Engaging attention in an unrelated task immediately after exposure to new stimuli could reduce the efficiency of offline processing in a period of time that is critical to start the neurobiological process of memory trace replay thought necessary to consolidate information (Rosenzweig et al., 1993; McGaugh, 2000; Frankland and Bontempi, 2005; O'Neill et al., 2010). Switching the context from where the initial exposure to new information occurred could increase the likelihood of encountering similar visual stimuli, which could increase interference (Keppel and Underwood, 1962) thereby also decreasing the efficiency of offline processing. While it is plausible that the groups of subjects who underwent the attention focus and context switches experienced reduced efficiency of offline processing during the immediate post-exposure period, there are several caveats and limitations to this study that must be highlighted.

First, we exposed our subjects to the set of scenes by presenting them within a modified Sternberg WM paradigm (Sternberg, 1969). This task required subjects to keep these scenes in STM for a brief period and then make a decision about whether a probe matched one of the previously presented scenes. We hypothesized that this task would engage subjects more so than merely requiring them to passively view the scenes, but we did not have a passive viewing group in the present study test this assumption. While subjects in all of the groups tested performed the WM task above 90% accuracy, it is not possible to say definitively that subjects actually learned or encoded all of these scenes equally well. However, we obtained some evidence for learning because subjects identified previously presented scenes from the set well above chance on average (hit rate for all groups > 70%), and there were no differences across groups for the hit rates in the first recognition test administered after the 30 min interval.

Does offline processing in the awake state immediately after exposure to visual stimuli result in equally good LTM if a subject's attention is engaged in an unrelated task? We addressed this question by having one group of subjects perform the R-ANT task and compared their performance to a group of subjects who remained in the testing room and engaged in no task- or goaloriented activity that demanded attention. In the R-ANT task, the subject is asked to determine the direction of the center arrow in a flanker array that may appear either to the left or right of fixation. On some trials the subject is given valid or invalid cues but these cues only last 100 ms before a 400 ms gap separating the presentation of the set of arrows. This rapid presentation requires subjects to pay attention, and completion of the dozens of trials over a half hour can be quite taxing especially if performance on the task is high, which it was on average (> 90%) in the subjects we tested. The task consists of boxes, a fixation cross and sets of arrows, which are stimuli that are not complex and in no way resemble the color scenes which we used in the WM and recognition memory tasks. Therefore interference from the R-ANT stimuli should be low. Subjects who completed the R-ANT task and took the recognition memory task at 30 min performed equally as well as subjects who remained in the testing room and engaged in no attention tasks or goal-directed activity. When subjects who performed the R-ANT completed the 24 h recognition test, they exhibited reduced sensitivity (d′ ) for discriminating old from new scenes. This finding suggests that engaging attention in the intermediate period after exposure to new stimuli subtly but significantly impacts LTM at 24 h but not 30 min. To rule out differences in sleep between the two groups as a potential confound which could explain a difference in performance at 24 h but not 30 min, we administered a sleep questionnaire (Ellis et al., 1981) to subjects when they returned for the 24 h recognition test. We found no significant differences between those who performed the R-ANT task and those who remained in the testing room and waited quietly. We interpret these results as indicating that there is a slight advantage for visual LTM in allowing subjects to rest quietly in the awake state rather than occupy their time with a demanding albeit unrelated cognitive task.

We also conducted a comparison in a group of subjects who were allowed the opportunity to experience offline processing in an uncontrolled condition in which they were not allowed to remain in the testing room after exposure to the scenes. This group of participants experienced the 30 min between exposure to the scenes and the first recognition task outside the testing room in a different context, engaging in whatever activity they so desired. This condition is more ecologically valid, as offline processing after exposure to new stimuli often takes place in the real world in new contexts during highly variable cognitive conditions. Surprisingly even after experiencing such an uncontrolled condition, the different context group achieved similar hit rates, false alarm rates, and sensitivity at the 30 min recognition test. They did, however, show some significant differences at the 24 h recognition test, including higher false alarms compared to the group that performed the RANT task and compared to the group that remained in the testing room and did no task. They also showed reduced sensitivity (d′ ) compared to the group that remained in the testing room and did no task. The higher false alarm rate could be attributed to the opportunity for these subjects to encounter a highly variable set of other visual stimuli during the intervening 30 min, which could have interfered with the scene stimuli presented during the task. A major limitation of this study is that for this group we did not control precisely what these subjects saw or which other contexts they experienced outside the testing room. There is an opportunity for future studies to systematically vary the both exposure to similar stimuli (thereby increasing interference) and spatial context in a more controlled fashion by allowing subjects to experience offline processing in virtual worlds through the use of immersive virtual reality technology. Similar manipulations have proved extremely successfully in extending a basic understanding of how place, context, and temporal order are represented behaviorally and at the neural level (Burgess et al., 2001, 2002).

Finally, these are data from small samples of unequal groups of young adults. Our paired comparisons do not survive strict Bonferroni multiple comparisons corrections and post hoc power computations indicate variable power ranging from 40 to 70%. Therefore further study in a larger sample with additional controls will need to be conducted to understand why exactly focusing attention and changing context in the immediate post-learning period impacts LTM consolidation for visual information in a subtle but statistically significant manner.

Some neurobiological evidence implicates the hippocampus as orchestrating a process of memory trace replay that may, for some memories, strengthen horizontal connections among cortical areas (Buzsaki, 1989; Eichenbaum et al., 1994; Dudai et al., 2015). As a result of this process, the trace may be stabilized and consolidated into a more durable state (Frankland and Bontempi, 2005). Opportunities for reconsolidation or forgetting (Hardt et al., 2013) exist if the stimuli are re-experienced in the same or different context. The optimal time duration for offline processing is not known but may last from minutes to a day or more, with one or more periods of sleep thought to facilitate consolidation (Drosopoulos et al., 2005; Yoo et al., 2007). Our experimental manipulation of offline processing involved a relatively brief 30-min period after exposure to the novel visual information. Although we saw some significant effects among our groups at the 24-h recognition test, these effects were small and likely due to a combination of the small period of time in which we manipulated cognitive demands during the offline processing. Increasing the sample size, adding more tightly controlled cognitive constraints including a sleep or nap group, understanding more thoroughly the thought processes accompanying quiet wakefulness (Hurlburt et al., 2015) and combining the behavioral manipulations with electrophysiological monitoring like EEG (Huber et al., 2004) or neuroimaging (Tambini et al., 2010; Spadone et al., 2015) will help better understand the contribution of awake offline processing to memory consolidation.

# REFERENCES


# CONCLUSION

In conclusion, the present findings suggest that changes in the focusing of attention and context during offline processing in the minutes after exposure to novel visual stimuli modulate LTM consolidation in a subtle but significant way. During the offline processing period, remaining in the same context and resting quietly with minimal attention demands results in the best sensitivity for distinguishing old from novel visual stimuli after 24 h.

# ACKNOWLEDGMENTS

We thank Sara Murphy and Daniel Schor for help with task and stimulus creation and Cindy Lin for help with data analysis. This work was supported by PSC-CUNY award TRADA-44- 206. Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number SC2GM109346. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer Elisa Di Rosa and handling Editor Antonino Vallesi declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Ellmore, Feng, Ng, Dewan and Root. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Levels of attention and task difficulty in the modulation of interval duration mismatch negativity**

*Alana M. Campbell <sup>1</sup> and Deana B. Davalos <sup>2</sup> \**

*<sup>1</sup> Department of Psychiatry and the UNC Carolina Institute for Developmental Disabilities, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, <sup>2</sup> Department of Psychology, Colorado State University, Fort Collins, CO, USA*

Time perception has been described as a fundamental skill needed to engage in a number of higher level cognitive processes essential to successfully navigate everyday life (e.g., planning, sequencing, etc.) Temporal processing is often thought of as a basic neural process that impacts a variety of other cognitive processes. Others, however, have argued that timing in the brain can be affected by a number of variables such as attention and motivation. In an effort to better understand timing in the brain at a basic level with minimal attentional demands, researchers have often employed use of the mismatch negativity (MMN). MMN, specifically duration MMN (dMMN) and interval MMN (iMMN) have been popular methods for studying temporal processing in populations for which attention or motivation may be an issue (e.g., clinical populations, early developmental studies). There are, however, select studies which suggest that attention may in fact modify both temporal processing in general and the MMN event-related potential. It is unclear the degree to which attention affects MMN or whether the effects differ depending on the complexity or difficulty of the MMN paradigm. The iMMN indexes temporal processing and is elicited by introducing a deviant interval duration amid a series of standards. A greater degree of difference in the deviant from the standard elicits a heightened iMMN. Unlike past studies, in which attention was intentionally directed toward a closed-captioned move, the current study had participants partake in tasks involving varying degrees of attention (passive, low, and high) with varying degrees of deviants (small, medium, and large) to better understand the role of attention on the iMMN and to assess whether level of attention paired with changes in task difficulty differentially influence the iMMN electrophysiological responses. Data from 19 subjects were recorded in an iMMN paradigm. The amplitude of the iMMN waveform showed an increase with attention, particularly for intervals that were the most distinct from a standard interval (*p <* 0.02). Results suggest that the role of attention on the iMMN is complex. Both the degree of attention paid as well as the level of difficulty of the MMN task likely influence the neuronal response within a timing network. These results suggest that electrophysiological perception of time is modified by attention and that the design of the iMMN study is critical to minimize the possible confounding effects of attention. In addition, the implications of these results for future studies assessing interval duration-based MMN in clinical populations is also addressed.

#### *Edited by:*

*John Magnotti, Baylor College of Medicine, USA*

*Reviewed by: Natasha Matthews, University of Queensland, Australia Antonia Thelen, Vanderbilt University, USA*

> *\*Correspondence: Deana B. Davalos davalos@colostate.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 01 May 2015 Accepted: 07 October 2015 Published: 27 October 2015*

#### *Citation:*

*Campbell AM and Davalos DB (2015) Levels of attention and task difficulty in the modulation of interval duration mismatch negativity. Front. Psychol. 6:1619. doi: 10.3389/fpsyg.2015.01619*

**Keywords: temporal processing, time perception, dMMN, mismatch negativity, temporal perception, iMMN**

# **INTRODUCTION**

We constantly rely on time: from information processing to executing action plans. But, is there a difference between the ability to perceive time and to use it? Clock models of time suggest that there are basic biological components of time. When one perceives interval durations, the beginning and end of the interval are marked neurologically and these marks are cognitively compared or computed to be used in perception, information processing, or higher order cognitive processes (Gibbon et al., 1997; Matell and Meck, 2000). Two recent questions in this model pertain to the role of attention to and to the duration of the interval to be timed. Much of the research in the past has focused on the importance of distinct neural substrates based on interval duration. Specifically, these substrates reflect at least two distinct processes, dependent upon the length of the duration. Timing related to sub-second intervals is referred to as time perception and conceived of as a more basic and automatic process which can be studied on both a behavioral and a physiological level (Ivry and Spencer, 2004). The sub-second processes have been argued to be more automatic and potentially more motor in nature (Lewis and Miall, 2003). In contrast, supra-second interval timing is thought to require more cognitive engagement and attention, in addition to recruiting different brain circuitry than sub-second intervals (Lewis and Miall, 2003; Wiener et al., 2010a). However, recent work has suggested that sub-second interval timing can be modulated by and interact with cognition.

Recently, Coull and Nobre (2008) have suggested that observed differences between sub- and supra-second intervals may not be completely explained by differences in the duration of the interval itself, but rather how that interval information is to be used. Explicit timing generally refers to tasks during which participants attend to the *duration* of a stimulus, specifically. Implicit timing, on the other hand, generally requires subjects to engage in tasks in which timing is a key component, but not the primary focus (Wiener et al., 2010b). Implicit timing is crucial to develop predictive patterns. Coull and Nobre describe two types of implicit timing; exogenous versus endogenous implicit timing that differ only in the awareness of, or cue toward the predictive pattern. Exogenous timing occurs passively, without a cue toward or awareness of temporal patterns in a task, while exogenous timing cues attention to temporal features within a task.

One way to increase awareness of temporal information is to allocate attention to that information. Attention has been a key cognitive mechanism of interest in terms of differentiating among the various measures of time. Research to date suggests that attention plays a large role in overall perception of time. Studies suggest that attending to the duration of a stimulus rather than another feature leads to greater accuracy in estimating the stimulus duration (Corbetta et al., 1993; Coull et al., 2004). Specifically, Coull et al. (2004) varied the degree to which attention was paid to the duration of a stimulus rather than its hue, they found that increased attention to the time increased accuracy in a behavioral response and was associated with greater brain activation in several regions of a corticostriatal network, including the pre-supplementary motor area, right frontal operculum and right dorsolateral prefrontal cortex. The

current study investigates the role of attention via exogenous implicit timing tasks on electrophysiological indices of temporal perception. Specifically, the role of attention (via task goals) was assessed electrophysiologically to determine whether time processing as part of a task goal affected early passive preattentive components or only those that occur later and have been associated with attentional processes in past studies. Event related potentials (ERPs) have been utilized in the past to highlight differences in how the brain processes temporal information. Of the previous electrophysiological work on time processing, the Contingent Negative Variation (CNV) and P3 components have been used to assess the neural response to temporal information when one is actively attending to and involved in a timing task (Macar et al., 1999; Macar and Vidal, 2003; Pfeuty et al., 2005; Gibbons and Stahl, 2008). Unlike traditional behavioral measures of time processing, select ERPs can also measure the brain response to stimuli during tasks that vary in attentional demands. In particular, the MMN is used in paradigms requiring attention, but has also been observed in the absence of attention, as in sleeping infants (Martynova et al., 2003) and comatose patients (Tzovara et al., 2015). Thus, to investigate the roles of attention on time processing we can employ the mismatch negativity (MMN) event-related potential to provide information. The MMN is a component elicited in response to a deviant stimulus embedded in series of standard stimuli. The MMN is a difference wave computed by subtracting the average waveform in response to a standard stimulus from the averaged waveform in response to a deviant stimulus (Näätänen et al., 2007). The component is thought to reflect sensory echoic memory and is believed to be involved in determining whether changes in stimuli in the environment are different enough to warrant guiding attention to the stimuli (Näätänen et al., 2007). In clinical studies, MMN has been shown to have ecological validity in terms of predicting performance on select measures of memory and as a predictor of multiple measures of functional status (e.g., social, psychological and occupational; Kiang et al., 2007; Atkinson et al., 2012; Light and Braff, 2005).

In the case of studying time-based information, MMN is elicited by either a deviant inter-stimulus interval duration or a deviant stimulus duration. For testing the role of attention in time processing, MMN is ideal as it can be used to measure neural responses to changes in temporal information without attention to the task (Davalos et al., 2003). The MMN also allows one to assess variations in brain response based on the magnitude of changes in temporal information, with intervals that are more distinct from the standard eliciting greater neural responses (Kisley et al., 2004). Therefore we can use the MMN to compare the neural responses to time in an endogenous implicit task to an exogenous one. In this case, an unattended or passive MMN would reflect activity related to an exogenous implicit task—that is, there is no task goal specifically related to time. In contrast, an akin endogenous condition would require attention to the timed intervals and an awareness and expectation of those intervals in the task. The goal of the current study was to assess the interaction between degrees of attention as it varied based on task demand (from exogenous to endogenous) and duration of timed interval to test the cognitive temporal interaction in time processing.

There are previous studies suggesting that attention can modulate accuracy and brain activity to temporal estimation (Coull et al., 2004). Additionally, in select studies, attention has been shown to modulate the MMN for auditory stimuli (Sussman and Winkler, 2001; Grimm et al., 2004; Grimm and Schröger, 2005; Muller-Gass et al., 2006). These studies report enhanced MMN to attended deviants for both frequency and duration auditory deviants (in which the stimulus was presented for a longer or shorter period of time). While these results suggest that attention can modulate the MMN for tones of varying durations or of complex temporal psychical information, this modulation could be due to both auditory as well as temporal features. To isolate temporal features, an alternate technique could be to use the same physical tone to denote the beginning and end of an interval to be timed and modify the duration between tones. This technique would allow manipulation of timing information and could be used to assess the interaction of timing with attention. The current study sought to test if attention can also modulate the MMN elicited in response to purely sub-second temporal information. If so, this would (a) provide evidence that sub-second intervals do interact with cognitive processes and (b) provide support to the theory that timed information is processed separately from, but in a manner similar to, sensory information. Further, by varying the magnitude of the temporal deviants we can test the influence of attention on electrophysiological indices of temporal processing. It has been documented that the MMN is larger for deviants that are more distinct from the standard (Kisley et al., 2004; Näätänen et al., 2007), yet this interaction has not been tested for temporal processing. We predicted that endogenous attention tasks would elicit an interval MMN (iMMN) with greater amplitude than a passive, exogenous one. We tested increased degrees of attention demand over three increased deviant interval durations. We predicted that, consistent with prior research, deviant intervals that are more distinct from the standard would elicit a greater amplitude across ERP components. In addition, the greater degree of attention necessary for the endogenous tasks, would elicit greater amplitudes across ERP components. To test the effect of attention, we also investigated the negative component (N2) and the P3 in accordance with earlier work. Research suggests the N2-P3 complex, resulting from the deviant stimuli, vary with attention (Sussman and Winkler, 2001), particularly the P3 which has been well studied as an index of cognitive control and attention and varies with the rarity, probability and level of attention (Polich, 2007). While the second hypothesis is more clearly supported in the literature for the N2-P3 complex, the degree of effect of increased attention on the MMN amplitude is not as well supported. For that reason, it is hypothesized that the effects of an endogenous task versus an exogenous task on the MMN component will be less pronounced than the N2 and P3 components.

# **MATERIALS AND METHODS**

## **Participants**

Participants were 19 undergraduate students who were recruited from the University's psychology research pool and volunteered in exchange for receiving partial course credit for their participation in the study. Participants were screened for abnormal hearing, ever having a traumatic brain injury with loss of consciousness, current or past history of a neurological condition or psychiatric condition. All exclusion information was obtained using a demographic questionnaire. This study was carried out in accordance with the requirements of 45 CFR 46 and the Institutional Review Board at Colorado State University. All participants signed informed consent and completed the study; however, four were excluded from the passive condition due to artifacts. Full analysis was completed on the remaining 15 participants. The mean age was 19.59 years, standard deviation was 1.65, nine female.

# **Electrophysiological Recordings**

Electrophysiological recordings were acquired using a SynAmps2 system and Scan 4.1 software (Compumedics Neuroscan, Charlotte, NC, USA). Ag/AgCl electrodes hand placed at scalp locations Fz, Cz, Pz, and referenced through the left mastoid (off-line the average of both the right and left mastoids were used as the reference). Forehead served as ground. Measurements were taken in according to the 10–20 international placement guide (Jasper, 1958). The recordings were sampled at a rate of 1000 Hz with a 0.1 Hz high-pass and 200 Hz low-pass recording filter. Ocular movements were determined through superior and lateral eye electrodes. Impedances were below 10 kΩ.

# **Procedure**

Participants completed three interval MMN tasks at three levels of attention (passive, during which attention was diverted away from the MMN task in addition to low attentional load and high attentional load) while EEG data were recorded. Participants were seated comfortably with speakers placed 85 cm binaurally. In all tasks, a 50 ms 1000 Hz pure tone with a 5 ms rise and fall played at 75 dB HL marked the separation of the intervals, with a 400 ms standard inter-stimulus interval (as in our previous work, e.g., Davalos et al., 2003). As noted in our previous studies, interval durations are used to minimize the effects of non-temporal information on judgments as previous research has suggested that time based judgements can be affected by non-temporal information (e.g., sounds or words; Poynter and Homa, 1983). Deviant intervals were 310, 355 or 370 ms in duration, selected based on previous research suggesting that interval duration differences between approximately 10 and 20% are challenging, yet appropriate for assessing variability in performance in healthy controls (Davalos et al., 2003, 2005). Deviant interval durations were presented at an occurrence rate of 6.67% and were presented in counterbalanced blocks (only one deviant type per block). Forty-five deviant intervals of each type were presented amidst 630 standard intervals per block. In the passive condition, participants were told to watch a silent, closed-captioned video and ignore the tones (Davalos et al., 2003). Deviant interval durations were blocked such that only one deviant interval type occurred per block. In the low attention load condition, the recording block was separated into five blocks and participants had to report via a yes or no keypress response on a keyboard the existence

of deviant intervals within the block that was presented, thus requiring them to pay attention to the intervals between the tones. The high attention condition mimicked the low, except it required participants to keep track of and report via keypress the number of the deviant intervals in each block, requiring a greater degree of engagement in the task (Schwartze et al., 2011). While we cannot rule out that working memory may have also been employed during the high attention task, we selected the task as counting auditory stimuli has often been used to assess both sustained attention and selective attention in past studies (Dinkelbach et al., 2015). For the attention conditions, participants were presented with blocks in which no deviant interval occurred. In these blocks, 45 standards were selected to create the ERP in response to the standard interval. Both the order of the levels within the tasks as well as the task order were counterbalanced. Task blocks were counterbalanced using a pseudo Latin square design.

# **EEG Data Analysis**

Recordings were epoched from *−*100 to 500 ms post stimulus onset. Trials exceeding *±*100 *µ*Vs and trials containing blinks were excluded from further analysis. A minimum of 60% of the trials remained after artifact rejection. The remaining epochs were baseline corrected, averaged and filtered between 0.1 and 30 Hz with a 0 phase shift filter and 24 dB/octave rolloff, for both standard and deviant interval trials. For each participant the average waveform in response to the standard (all for the passive condition and exemplars from each block for the low and high attention conditions) and deviant intervals were calculated. The N2 was defined as the negative-most peak occurring between 120 and 250 ms post stimulus onset at electrode Fz. The P3b peak was defined as the greatest positive peak within 250 and 500 ms post-stimulus onset at electrode Pz. The MMN reflected the greatest negative peak occurring between 120 and 300 ms post stimulus observed in electrode Fz in the difference waveform obtained by subtracting the standard from deviant interval waveforms. The MMN component is most prominent in Fz. The grand-average low and high attended waveforms compared to passive are presented in **Figure 1**. The mean individual amplitudes are reported in **Table 1**.

# **Statistical Analyses**

A 3(Attention: passive, low, high) *×* 3(deviance difficulty: 310, 355, 370 ms) repeated measures analysis of variance (ANOVA) assessed the interaction and main effects of attention and deviance duration. In the case of violated sphericity, the Huynh-Feldt correction was used.

#### Campbell and Davalos Attention and Interval mismatch negativity

#### **TABLE 1 | Amplitude of the MMN, N2, and P3 peaks.**


*The amplitudes of the MMN for passive, low, and high attention from the MMN difference wave at electrode Fz. The N2 and P3 peaks from the deviant interval duration waveforms across conditions.*

# **RESULTS**

# **EEG: Attention and Interval Duration Influence on the MMN**

A repeated measures ANOVA revealed a main effect of attention [*F*(2,22) = 4.03, *p* = 0.05, *η<sup>p</sup>* <sup>2</sup> = 0.27]. Corrected follow-up comparisons showed that both high and low levels of attention elicited larger MMNs than passive (*t* = 3.84, *p* = 0.003 and *t* = 2.24, *p* = 0.046). However, the high and low attention conditions did not differ from each other (*t* = 0.58, *p* = 0.57). There was also a main effect of interval duration [*F*(2,22) = 9.56, *p* = 0.001, *η<sup>p</sup>* <sup>2</sup> = 0.47]. The deviant 310 ms interval elicited a larger MMN than the 355 ms (*t* = 2.82, *p* = 0.02) and the deviant 370 ms interval (*t* = 3.71, *p* = 0.003). Importantly, there was an interaction of attention with interval duration [*F*(4,44) = 3.22, *p* = 0.02, *ηp* <sup>2</sup> = 0.23]. The deviant 310 ms intervals showed larger responses than 355 ms (*t* = 3.33, *p* = 0.007) or 370 ms (*t* = 3.48, *p* = 0.005) for low attention. The deviant 310 ms interval also induced larger responses than 355 ms (*t* = 2.21, *p* = 0.05) or 370 ms (*t* = 3.70, *p* = 0.004) deviant intervals in the high attention condition. Within the passive attention level, no differences emerged between interval deviant durations.

# **EEG: Attention and Interval Duration Influence on the N2P3**

Repeated measures ANOVAs were conducted on the peaks extracted from the waves in response to the deviant intervals for the N2 and P3. For the N2, there was a main effect for interval duration [*F*(2,22) = 4.08, *p* = 0.03, *η<sup>p</sup>* <sup>2</sup> = 0.27] with the 310 ms deviant duration eliciting a wave with a greater amplitude than the 370 ms deviant (*t* = 2.33, *p* = 0.04). There was also an attention by interval duration interaction for the N2 [*F*(4,44) = 3.37, *p* = 0.02, *ηp* <sup>2</sup> = 0.23]. Follow-up comparisons revealed the N2 to have the greatest amplitude in response to the 310 ms deviant, particularly in the passive condition compared to the low attention (*t* = 2.43, *p* = 0.03) and to a lesser degree the high attention condition (*t* = 2.08, *p* = 0.06). The P3 analysis revealed a marginal effect of attention [*F*(2,22) = 3.24, *p* = 0.059, *η<sup>p</sup>* <sup>2</sup> = 0.23]. No other main effects or interactions were found to be significant for the P3.

# **Behavioral Results**

We recorded behavioral data as a manipulation of attention level, but behavioral performance was not the primary focus of the study. Nevertheless, a relationship was observed between level of difficulty of the task and accuracy in detecting deviants. In the low attention condition participants correctly identified 49.12% of the duration deviant intervals. In the high attention condition, on only 19.37% of the trials were participants able to report a count of the number of deviants detected. The chi-square tests comparing behavioral performance across conditions was 14.05 (*p* = 0.0001).

# **DISCUSSION**

The roles of task difficulty and attention have long been examined in the temporal processing literature. In the current study we report an interaction between level of attention and task difficulty. While previous studies have focused on the role of sub-second temporal perception versus supra-second temporal estimation as a means of better understanding overall time processing, recent studies have suggested that there may be a different factor that warrants consideration. Specifically, Coull and Nobre discuss the importance of knowing how that interval information is to be used (Coull and Nobre, 2008). And while the change in the MMN has been noted to auditory stimuli of varying durations (see Näätänen et al., 2007 or Sussman, 2007 for reviews) the interaction between level of attention and the iMMN has not been examined.

In the current study, we sought to examine the role of task and attention in implicit timing by varying the degree of attention to temporal duration information that differed in level of difficulty. The main effect of interval duration reported in this study is consistent with previous research in which the MMN elicited responses are dependent on the likelihood and degree of deviance of the stimulus to the standard (Kisley et al., 2004). Specifically, deviant stimuli that occur more infrequently and that are more distinct from the standard stimuli elicit the greatest responses. In addition, an interaction was detected, whereby the neurophysiological responses to the timed intervals were amplified by cognitive preparedness and directed attention to detect changes in interval durations in the attention conditions. The pattern of increased amplitude to a greater degree of deviance when paired with greater attention to the stimuli is similar to that of MMN responses to deviants in sensory modalities such as audition (Sussman et al., 2002). The current results provide further support for the idea that temporal processing is akin to basic sensory processing. Furthermore, the heightened electrophysiological responses of the MMN in the attended conditions suggest that attention or cognitive control can facilitate detection and processing of deviant temporal information.

The observed results suggest that endogenous types of timing tasks may receive a boost in neuronal response due to the task goals as the attended condition had increased responses particularly for the most deviant intervals. Based on previous findings in which MMN was elicited passively to temporal deviants, it is arguable that the increased response in the attended condition is most likely an effect of the task goals above and beyond the responses in the passive condition. Thus the observed difference between the attended and passive MMN responses suggests a difference between endogenous and exogenous implicit timing tasks. Moreover, the endogenous, attended condition allows for greater influence on and modulation of timed intervals, as evidenced by the interaction, than the exogenous, passive one. That is, neuronal responses to deviants that were less distinct than the standard were more readily detected in the endogenous tasks.

The current findings suggest that attention may modulate the MMN amplitude in terms or responses to temporal information, specifically when the task requires a greater degree of attentional resources than are required for a passive condition or a condition for which the overall goal is less demanding (low attention condition). It is hard to disentangle what role attention plays in the increased MMN amplitude, but results suggest that increased attention utilized as part of what might be considered an endogenous timing paradigm affected what has generally been viewed as an early brain response that can be elicited passively to changes in temporal information (Sussman et al., 2014). These findings are interesting in that the results suggest that when one is involved in a goal-directed paradigm or engaging in endogenous time processing, neuronal responses may be better prepared to track temporal and deviant information. It may be, as Sussman et al. (2014) describe, that the neural representations of stimuli in memory that are used in the MMN response are altered by task goals rather than how well one listens to the stimuli. But the current findings, along with additional studies of attention and/or task goals suggest that MMN may be affected by context to a greater degree than once thought. Specifically, while many studies of MMN in the past have supported what Cheour (2007) describe as the "automatic, bottom-up" nature of MMN, which orients attention toward stimuli and relies of the passive creation of the echoic trace and expectation, the current study provides evidence that MMN can be affected by "top down" processes (Cheour, 2007; Boutros et al., 2014).

The implications regarding the changes across components suggest that the neurophysiology of timing may be more malleable than once thought. Specifically, rather than interventions aimed at adapting to poor timing at the behavioral level, the current results suggest that improving timing skills should be addressed both at the behavioral level and at the neurophysiological level. It may be that individuals are simply more accurate when timing is taught or processed in the context of a goal. It may also be that the goal piece is secondary to the influence of the attentional load. What the results appear to support is that in addition to prior findings suggesting improvement in behavioral temporal accuracy via increased attention, that the neural underpinnings of time processing are also strengthened by attention.

## **REFERENCES**

Atkinson, R. J., Michie, P. T., and Schall, U. (2012). Duration mismatch negativity and P3a in first-episode psychosis and individuals at ultra-high

Interestingly, the N2 and P3 results highlight differences in sensory and attentional aspects of the task paradigms, with the frontal N2 responding to stimulus driven novelty effects (Folstein and Van Petten, 2008) and the P3 responding to attentional and higher order demands (Polich, 2007). The MMN amplitude appeared to exhibit responses to both stimulus and attentional features. This supports the view that the MMN indexes both the establishment of a pattern and a violation, cued from new stimulus features, of that pattern. The MMN has been conceptualized as a component marking a shift in attention arising from novel or different sensory information (Näätänen et al., 2007). The current study reports the interaction at this sensory-attentional intersection. Further, the N2 exhibited a sensory influence and the P3 showed a marginal modification with attention. Thus, deficits observed in the MMN response may be able to be disentangled from stimulus features or attentional demands by tracking N2 and P3 in conjunction with the MMN. For example, it is possible that the development of the prediction model could falter in some clinical disorders whereas attentional influences may differentially influence the MMN in others.

The idea that different patterns of temporal performance or temporal dysfunction may affect different populations based on neural underpinnings is not a new idea. Wiener et al. (2010b) followed up on the work of Coull and Nobre by assessing brain function associated with implicit versus explicit temporal processing. Their research suggests that there are likely shared neural substrates associated with both types of temporal processing, but more important to the current study, there are also different patterns of brain activation elicited based on task features. While the current study only begins to inform us about the electrophysiological correlates of exogenous and endogenous temporal tasks, the findings a least suggest that greater investigation in to this topic is warranted. Additionally, one limitation of the current study is the exclusion of a greater range of duration deviants. While it was clear participants were engaged in the tasks, their behavioral performance suggested that the more difficult tasks may have been too difficult to achieve high rates of accuracy. Future studies including a wider range of difficulty in the behavioral tasks could distinguish levels of temporal information that is challenging both at a behavioral level and neuronal level. In addition, future work should manipulate task goals within populations who struggle with temporal information to assess if the endogenous/exogenous nature of the task may alleviate temporal processing problems.

# **ACKNOWLEDGMENTS**

The authors would like to thank our undergraduate research team at Colorado State University and acknowledge support from the Colorado State University Libraries Open Access Research and Scholarship Fund.

risk of psychosis. *Biol. Psychiatry* 71, 98–104. doi: 10.1016/j.biopsych.2011. 08.023

Boutros, N. N., Mucci, A., Vignapiano, A., and Galderisi, S. (2014). Electrophysiological aberrations associated with negative symptoms in schizophrenia. *Curr. Top. Behav. Neurosci.* 21, 129–156. doi: 10.1007/7854\_ 2014\_303


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Campbell and Davalos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Rhythm and Attention: Does the Beat Position of a Visual or Auditory Regular Pulse Modulate T2 Detection in the Attentional Blink?

Christina Bermeitinger <sup>1</sup> \* and Christian Frings <sup>2</sup>

<sup>1</sup> Experimental Psychology, Institute of Psychology, University of Hildesheim, Hildesheim, Germany, <sup>2</sup> Experimental Psychology, Department of Psychology, University of Trier, Trier, Germany

The attentional blink (AB) is one impressive demonstration of limited attentional capacities in time: a second target (T2) is often missed when it should be detected within 200–600 ms after a first target. According to the dynamic attending theory, attention cycles oscillatory. Regular rhythms (i.e., pulses) should evoke expectations regarding the point of the next occurrence of a tone/element in the rhythm. At this point, more attentional resources should be provided. Thus, if rhythmic information can be used to optimize attentional release, we assume a modulation of the AB when an additional rhythm is given. We tested this idea in two experiments with a visual (Experiment 1) or an auditory (Experiment 2) rhythm. We found large AB effects. However, the rhythm did not modulate the AB. If the rhythm had an influence at all, then Experiment 2 showed that an auditory rhythm (or stimulus) falling on T2 might generally boost visual processing, irrespective of attentional resources as indexed by the AB paradigm. Our experiments suggest that oscillatory cycling attention does not affect temporal selection as tapped in the AB paradigm.

Keywords: attentional blink, temporal attention, rhythm, pulse, alerting signals, audition and vision, multisensory processing

# INTRODUCTION

A fundamental function of attention is the ability to select information in space or time given the limited capacities of the cognitive system. One impressive demonstration of limited attentional capacities in time is the attentional blink (AB) (for reviews see e.g., Shapiro et al., 1997; Dux and Marois, 2009; Martens and Wyble, 2010): within a rapid stream of irrelevant stimuli, a second relevant stimulus (target 2, T2) is often missed when it should be detected within 200–600 ms after a first relevant stimulus (target 1, T1; see below for more details). To optimize attentional precision, it would be beneficial if resources are provided at the right time. To determine when attentional resources should be provided, it might be helpful to use additional information such as previous knowledge, cues, primes, or context stimuli. In general, it is assumed that the incoming stream of events is partitioned by help of anticipated as well as actually presented stimuli (e.g., Klauer and Dittrich, 2010) to optimize the distribution of processing and response resources.

According to Barnes, Jones and colleagues (e.g., Large and Jones, 1999; Barnes and Jones, 2000; Jones et al., 2002), attention cycles oscillatory (see also e.g., Klimesch, 2012) when a rhythm is given. When the cognitive system is adapted to a given (auditory) regular rhythm (i.e., a pulse), the largest

#### Edited by:

Timothy Michael Ellmore, The City College of New York, USA

#### Reviewed by:

Michiel M. Spapé, Helsinki Institute for Information Technology HIIT, Finland Wolfgang Klimesch, University of Salzburg, Austria Luca Ronconi, University of Padova, Italy

#### \*Correspondence:

Christina Bermeitinger bermeitinger@uni-hildesheim.de

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 28 July 2015 Accepted: 13 November 2015 Published: 01 December 2015

#### Citation:

Bermeitinger C and Frings C (2015) Rhythm and Attention: Does the Beat Position of a Visual or Auditory Regular Pulse Modulate T2 Detection in the Attentional Blink? Front. Psychol. 6:1847. doi: 10.3389/fpsyg.2015.01847 "attentional energy" is provided at that point in time at which the next rhythmic stimulus (i.e., a beat given by a tone) is expected. In other words, the rhythm is used to optimize the release of attentional resources. The experiments of Barnes and Jones (2000) investigated time perception. In several experiments, the authors presented isochronous auditory rhythms/pulses. The stimulus onset asynchrony (SOA) between two tones of the rhythm was always 600 ms. At the end of the rhythm, a standard interval was presented which was varied between 524 and 676 ms. Thereafter a comparison interval was presented, which was equally often shorter, equal to, or longer than the standard interval. Participants had to decide whether the comparison interval was shorter, equal to, or longer than the standard interval of the rhythm. Accuracy in categorizing the comparison interval was greatest for the expected SOA, that is, if the standard interval was exactly the same as the intervals in the preceding rhythm (600 ms). Accuracy was worst for very long or very short (i.e., very unexpected) standard intervals. The authors interpreted their results in the context of their dynamic attending theory.

Evidence for their theory of attentional deployment in time comes from neurophysiological studies in monkeys and humans as well as from behavioral studies (for reviews see e.g., Schroeder and Lakatos, 2009; Calderone et al., 2014). For example, in macaque monkeys, it was shown first, that neural oscillations modulate responses to stimuli and second, respond to external rhythmic stimuli. In the last case, intrinsic rhythms are entrained and shifted by extrinsic rhythms, resulting in optimization of neural responses when task-relevant events are expected (e.g., Lakatos et al., 2008). Predictable rhythmic beats are more easily perceived and faster detected than unpredictable (non-rhythmic) stimuli (e.g., Rohenkohl et al., 2012). Further, selective attention seems also closely related to entrainment to rhythms (Calderone et al., 2014). There are some recent studies showing that rhythm can drive the temporal allocation of attention and that orienting of attention is not modality dependent but even cross-modal (for uni-modal evidence see, for example, Doherty et al., 2005; Sanabria et al., 2011).

First, Bolger et al. (2013) used simple auditory and visual detection and discrimination tasks. They introduced a rhythm sequence (either with simple isochronous meter or with complex musical stimuli) prior to the occurrence of the stimuli which had to be detected. Reaction times depended on the metrical positions at which the stimuli were presented. The authors interpreted their results as evidence that metrical entrainment can enhance stimulus processing. Second, Miller et al. (2013) also found cross-modal influences of an auditory rhythm on the temporal attentional allocation to visual stimuli. These authors used regular or irregular tone sequences either synchronous or asynchronous to visual targets. Results showed faster saccadic detection responses (Experiments 1, 2) and improved accuracy in a discrimination task (Experiment 3) to visual targets coinciding with a tone of a regular rhythm compared to asynchronous (i.e., tone preceded or followed the visual target) as well as irregular rhythms.

Previous studies in which the influence of rhythms on attention and perception was investigated, focused on simple reaction time tasks (and sometimes accuracy tasks) to target stimuli. That is, one central aspect of attention—its limited capacity which is thought to be changing over time depending on stimuli which had to be processed—is not sufficiently touched by previous research on entrainment and/or rhythmic influences on attention. As already mentioned above, the AB paradigm is a suitable tool to investigate limitations of attention. In the visual domain, the AB reflects a robust deficit to correctly detect a second target (T2) appearing approximately 200–600 ms after a correctly identified first target (T1; e.g., Raymond et al., 1992). As a paradigm, the AB is most often studied by use of rapid serial visual presentation (RSVP) of shortly presented (distracting) stimuli, most often (strings of) letters, and varying the lag or SOA between the first and the second target. Typically, the first target has to be identified and the second target has to be detected or both targets have to be identified. Several theories might explain the AB (for an integration see e.g., Hommel et al., 2006). Whereas, early theories suggested a perceptual locus of the phenomenon (Raymond et al., 1992), later theories explained the AB at later, postperceptual stages of processing (e.g., Vogel et al., 1998; Jolicoeur and Dell'Acqua, 2000). The core element of most theories on the AB is based on capacity limitations of short-term memory or working memory. It is supposed that there are problems transferring and consolidating new information into working memory as long as preceding information is not processed to a certain level and that these processes related to the working memory draw on attentional resources (Hommel et al., 2006). Most likely, several mechanisms work together to result in an AB (Chun and Potter, 2001).

There are two studies in which entrainment and the AB were related. First, it was found that alpha entrainment (without an additional external rhythm except the RSVP rhythm) is larger for trials in which T2 cannot be reported than for trials in which T2 can be reported (Zauner et al., 2012). The authors argue that for stimuli presented with a frequency of about 10 Hz (i.e., approximately like the alpha frequency) those processes that underlie the generation of the P1 of the visual event related potential in the EEG (and that are related to alpha) interfere with those processes that enable the encoding of stimuli, specifically of T2. Second, there is recent work by Ronconi et al. (2015) who studied the influence of an acoustic or visual rhythmic stream before the RSVP stream, but with the same frequency. The authors presented entraining stimuli before the RSVP stream either with a regular rhythm, that is with the same frequency as the RSVP stimuli, or with an irregular rhythm, that is with variable interstimulus intervals between the entraining stimuli. There results showed reduced AB effects with a regular (compared to an irregular) rhythm.

However, until now, there is now study in which the dependence of the AB effect on an additional rhythm like that used by Barnes and Jones (2000; see above) is studied. If other information, especially rhythmic information, can be used to optimize attentional release, we assume there should be a modulation of the AB when an additional rhythm is given. Specifically, we assume that the AB could be diminished by introducing a rhythm which peaks at the point in time when T2 is presented. In this case, the rhythm should evoke expectations regarding the point of T2 and more attentional resources should be provided at this point. All current theories concerning AB lead to the prediction that a peak of additional attentional resources corresponding to the onset of T2 should diminish the AB. That is, when a tone is expected at position T2, this should lead to a simultaneous release of attentional resources which in turn would lead to a diminished AB effect—given that a rhythm is able to release additional attentional resources. The general aim of the present experiments is to examine whether the assumed cyclical oscillating nature of attention in the presence of a rhythm can be manipulated to release attentional resources at peak times in the RSVP cycle, as would be shown by a reduction of the AB effect (for the general idea and procedure see also **Figure 1**).

In Experiment 1, we tested this prediction by using a visual rhythm before and during the RSVP stream. In Experiment 2, we used an auditory rhythm (Please note, in contrast to Ronconi et al., 2015, we did not investigate the question whether a regular

FIGURE 1 | General idea and procedure (with an auditory rhythm, i.e., Experiment 2). Please note that the time information is given with rounded values. With a refresh rate of 75 Hz, the exact timing is 26.66…, 106.66…, 133.33…506.66…, 533.33…(A,B) show the auditory rhythm presentation (the white squares indicate that there is no acoustic event at this time), the visual presentation (especially the RSVP with letters) with T2 at lag 3, as well as the attending/attentional rhythm. According to Barnes and Jones (2000), "an expected point in time corresponds to the peak of the attentional pulse carried by the oscillator" (p. 262). It is assumed that the oscillator adapts to stimulus time structure. (A) The auditory critical cue stimulus appears together with T2 which should result in a reduction of the AB (i.e., better T2 detection rates at lag 3; cf. C). (B) The auditory critical cue stimulus appears one position before T2. (C) Shown is our hypothesis for the AB effect depending on the rhythm and critical cue stimulus (which is either at T2 or at another position). The picture shows a reduction of the AB for T2s appearing together with the auditory critical cue stimulus (A better than B). We did not explicitly predict a general modulation of T2 detection by a rhythm—there might by a general enhancement or reduction of T2 detection also at lags 1 and 5. The main prediction, however, refers to the AB effect.

or irregular rhythm—induced by entraining stimuli before the RSVP stream—enhances T2 performance. In their experiments and due to their research question, attention to each stimulus should be enhanced with a regular rhythm. In contrast, we tested the specific effect of rhythms falling at T2 vs. rhythms falling at stimuli surrounding T2).

# EXPERIMENT 1 (VISUAL RHYTHM)

In Experiment 1, we used a visual rhythm (red symbols or letters) which was presented before the RSVP stream and continued during the RSVP stream. The last rhythm stimulus (=critical cue stimulus) appeared either one position before T2, at T2, one position after T2, or two positions after T2. Participants had no task regarding the rhythm. Their task was to indicate T1 identity and detect T2.

# Methods

#### Participants

The sample consisted of 29 students from Saarland University. Participants had normal or corrected-to-normal vision. They were paid for their participation or participated in exchange for course credit and gave informed written consent before participation. We excluded two participants as they made overall more than 60% errors in either the T1 or the T2 task. Of the remaining participants, 8 were male and 19 were female. Their median age was 24 years, ranging from 20 to 33 years.

The experiment was run in conformity with the ethical standards of our field and the AB task was approved by the ethical committee of the University of Hildesheim.

#### Design

Essentially, we used a 4 (position of the critical cue stimulus: one position before T2, at T2, one position after T2, two positions after T2) × 3 (lag: 1, 3, 5) design. Note that the factor Lag also determined the position of T1 (10, 8, 6) in the RSVP stream. Additionally, it was varied whether a T2 probe was presented or not. All factors were varied within participants. In the tradition of AB experiments, we used correct T2 probe detections when a T2 probe was presented (after correct T1 responses) as the dependent variable.

## Material

#### **Attentional blink task**

The stimuli in the RSVP stream consisted of the letters of the alphabet except the letters I, O, and Q. Each letter except the X could appear at each position of the stream. X served exclusively as T2 probe letter and was presented in half of the trials. Stimuli were written in Courier New font (pt. 18, bold). Most of the letters were presented in black. T1 was presented in white. In Experiment 1, some letters were presented in red according to the rhythm. All letters were presented at the center of a gray background.

#### **Critical cue stimulus and rhythm stimuli**

The critical cue stimulus was embedded in a visually presented rhythm. For the visual rhythm, the critical cue stimulus, and the other rhythm stimuli were realized by colored letters or symbols. Before the RSVP stream began, participants saw rarely used symbols (e.g., U, Ø) written in red and with the same font and size as the letters in the RSVP stream. The symbols appeared in the same manner as the AB stimuli (i.e., at the center of the gray background; written in Courier New font, pt. 18, bold) and one after the other, to realize the rhythm. Overall, we used 14 different symbols. Seven randomly chosen symbols were presented in each trial. With the beginning of the RSVP stream, the rhythm continued by coloring the respective letters of the RSVP stream in red (or in white in the cases in which the rhythm coincides with T1). When the rhythm appeared together with T2, T2 was colored in red.

#### Procedure

Participants were individually tested in sound-attenuated chambers. The experiment was run using E-Prime software (version 1.3) with standard PCs connected to 17′′ CRT monitors with a refresh rate of 75 Hz and standard QWERTZ-keyboards. Stimulus presentation was synchronized with the vertical retrace signal of the monitor. Viewing distance was about 60 cm. Instructions were given on the CRT screen. Participants had two tasks which were to be answered after each RSVP stream. First, participants answered the question (T1 identification): Which one was the white letter? They used the standard keyboard and entered the corresponding key. Second, participants answered the question (T2 detection): Was there an X after the white letter? Participants pressed the M-key (marked with JA = yes) or the C-key (marked with NEIN = no).

The sequence of each trial was as follows (see **Figure 1** for an auditory variant): Participants started each trial self-paced by pressing the space-key. Then, a fixation stimulus (+) appeared at the center of the screen for 506.66. . .ms. Next, the first rhythm stimulus appeared for 26.66. . .ms. With a SOA of 533.33. . .ms, the next rhythm stimulus appeared. In each trial, seven rhythm stimuli were presented before the RSVP stream. After the seventh rhythm stimulus, there was an interval of 106.66. . . (critical cue stimulus one position after T2), 240 (critical cue stimulus at T2), 373.33. . . (critical cue stimulus one position before T2), or 506.66. . . (critical cue stimulus two positions before = after T2) ms. Then, the first letter of the RSVP stream appeared for 26.6. . .ms, followed by a blank screen for 106.66. . .ms. Thereafter, the next letter appeared (letter-to-letter SOA = 133.33. . .ms). Each RSVP stream contained 15 letters. The rhythm was continued during the RSVP stream with an SOA of 533.33. . .ms between two successive rhythm stimuli until the critical cue stimulus. The rhythm stimulus appeared simultaneously with a letter of the RSVP stream. There were three letters between two succeeding rhythm stimuli. T2 was always presented at position 11 of the RSVP stream. T1 was presented at position 10 (lag 1), 8 (lag 3), or 6 (lag 5) of the stream. There were 9, 7, or 5 distractor letters before T1, respectively, and 4 distractor letters after T2.

Each participant worked through five experimental blocks with 48 trials each. There was a short pause after each block. Before the first experimental block, there was a practice phase with 14 trials. Each experimental block consisted of 16 trials in which T2 was at lag 1 (i.e., directly after T1), 16 trials in which T2 was at lag 3, and 16 trials in which T2 was at lag 5. At position T2, half of the trials contained an X and the other half of the trials did not contain an X. Additionally, the critical cue stimulus appeared equally often one position before T2, at T2, one position after T2, and two positions after T2 in each lag (1, 3, 5) × T2 probe present (yes/no) condition. Within each block, conditions were presented in random order. Participants' task was to indicate first, which letter the white letter was and second, whether there was an X after the white letter or not.

## Results

Mean error rates were 11.9% (SD = 10.1) in the T1 task and 25.3% (SD = 12.2) in the T2 task. We first excluded trials with incorrect T1 responses. For the remaining trials (for each lag × position of critical cue stimulus there were between M = 8.3 and M = 9.3 observations after removal of trials with inaccurate T1 responses), we calculated mean correct T2 probe detections (in percent) when a T2 probe was presented. These mean correct T2 probe detections were subjected to a 4 (position of the critical cue stimulus) × 3 (lag) repeated measures ANOVA. The main effect of lag was significant, F(2, 52) = 9.60, MSE = 1073.05, p < 0.001, η 2 <sup>p</sup> = 0.27. This main effect reflected the AB: Repeated contrasts showed that there was a significant difference in correct T2 detections between lag 5 and lag 3, F(1, 26) = 14.07, p = 0.001, but no difference between lag 3 and lag 1, F(1,26) < 1, p > 0.44.

Neither the main effect of "position of critical cue stimulus," F < 1, p > 0.65, nor the interaction effect, F < 1, p > 0.94, were significant. That is, there was no evidence for an influence of the rhythm on the general T2 detection rate or the AB. As shown in **Figure 2**, there was no better (but also no worse) T2 detection performance if the critical cue stimulus appeared simultaneously with T2.

As we only used trials with correct T1 performance for further analysis, of course, T1 performance in these trials was the same across conditions.

# Discussion

Using rhythmically and repeatedly presented colored visual stimuli before and during an RSVP—in which a critical stimulus could appear either at the position of T2, one position before T2, one position after T2 or two positions after T2—we found a significant visual AB with better detection rates at lag 5 than lag 3 (or lag1). However, there were no significant influences of the rhythm, neither in general nor in interaction with the AB. That is, the visual rhythm did not induce specific expectations or act as a general alerting signal. However, our results also show that the position of the critical cue stimulus does not hamper T2 detection, as there were no differences between the different positions of the critical cue stimulus.

# EXPERIMENT 2 (AUDITORY RHYTHM)

In Experiment 2, we attempted to make the rhythm more salient/relevant and to approximate the rhythm to that of the experiments by Barnes and Jones (2000). Therefore, we used an auditory instead of a visual rhythm and added a task regarding the rhythm, to ensure that participants could not fully ignore the rhythm.

# Methods

#### Participants

The sample consisted of 43 students (9 male) from Saarland University with a median age of 22 years (ranging from 19 to 28). Participants had normal or corrected-to-normal vision. They were paid for their participation or participated in exchange for course credit and gave informed written consent before participation.

The experiment was run in conformity with the ethical standards of our field and the AB task was approved by the ethical committee of the University of Hildesheim.

#### Design, Material, and Procedure

The experiment was equal to Experiment 1 with the following exceptions. First, the rhythm was now realized auditorily with 1000 Hz tones presented via headphones for 27 ms each. The fixation cross remained on the screen until the onset of the first letter of the RSVP stream. Second, at the end of each trial and after the T1 and T2 response, participants indicated whether the rhythm was regular or not (note that each rhythm was actually regular); again, the answer was given by the M- or C-key. For this task, participants worked through a second practice phase directly after the first practice phase (with T1/T2 task) in which they practiced all three tasks (T1/T2/rhythm task). Third, each participant worked through five experimental blocks with only 24 trials each. Each block consisted of 8 trials in which T2 was at lag 1 (i.e., directly after T1), 8 trials in which T2 was at lag 3, and 8 trials in which T2 was at lag 5.

# Results

Mean error rates were 22.3% (SD = 11.4) in the T1 task and 30.0% (SD = 11.0) in the T2 task. Again, we first excluded trials with incorrect T1 responses. For the remaining trials (for each lag × position of critical cue stimulus there were between M = 3.7 and M = 4.0 observations after removal of trials with inaccurate T1 responses), we calculated mean correct T2 probe detections (in percent) when a T2 probe was presented. These mean correct T2 probe detections were subjected to a 4 (position of the critical cue stimulus) × 3 (lag) repeated measures ANOVA. If necessary, the Greenhouse-Geisser correction was applied, and corrected values are reported. The main effect of lag was significant, F(1.48, 61.94) = 5.27, MSE = 2805.62, p = 0.01, η 2 <sup>p</sup> = 0.11. This main effect reflected the AB: Repeated contrasts showed that there was a significant difference in correct T2 detections between lag 5 and lag 3, F(1, 42) = 22.08, p < 0.001, but no significant difference between lag 3 and lag 1, F(1, 42) = 1.88, p = 0.18.

The main effect of "position of critical cue stimulus" was not significant, F(3, 126) = 1.64, p = 0.18. However, the planned contrast showed that T2 detection was marginally better if the critical cue stimulus appeared at T2 position compared to the other positions, F(1, 42) = 3.77, p = 0.059. This revealed a tendency for enhanced attention when the critical cue stimulus appeared at T2. The interaction effect was not significant, F < 1, p > 0.54. That is, the rhythm had—if at all—a general effect on T2 detection, but was not able to modulate the AB. **Figure 3** clearly shows that, especially at lag 3, there was no difference between the positions at which the critical cue stimulus appeared.

## Discussion

By use of an auditory rhythm before the critical auditory stimulus (again either coinciding with T2, or preceding or following T2), we again found a significant AB. Although the main effect of "position of critical cue stimulus" was again not significant, planned contrast revealed slight evidence for enhanced attention when the critical cue stimulus appeared at the point in time at which T2 was presented (compared to the other possible positions of the critical cue stimulus). Most interesting seems to be that there was no difference between the positions at which the critical cue stimulus appeared at lag 3 (which is the position with the largest AB). That is, the AB was again not modulated by the rhythm; if at all, the rhythm and the critical

cue stimulus improved T2 detection irrespective of lag. This result might be interpreted as visual boosting due to an auditory stimulus. For example, better detection rates of visual stimuli were found with simultaneous presentation of an irrelevant auditory accessory stimulus (Frassinetti et al., 2002). Chen and Yeh (2009) could reduce or even reverse repetition blindness in a visual RSVP stream by presenting an auditory stimulus together with the stimuli of interest. We hasten to add that we created a cross-modal situation by using a visual AB task and an auditory rhythm. Perhaps, this might be a crucial difference to the experiments by Barnes and Jones (2000). However, when comparing the results of Experiment 1 (only visual) and 2 (visual and auditory), there were no large differences (see also below).

## GENERAL DISCUSSION

We analyzed the possible influence of oscillatory cycling attention on the AB. In particular, following Jones and colleagues (e.g., Barnes and Jones, 2000), we presented visual and auditory rhythms in a typical AB task. If attention adapts to the presented rhythm, the AB should depend on whether T2 is presented at a point in time when the attentional resources are at a maximum (due to the rhythm). However, although we found clear and large AB effects, we found not even the slightest hint of modulation of the AB effect by rhythm. If the rhythm had an influence at all, then Experiment 2 showed that an auditory rhythm (or stimulus) might generally boost visual processing at this particular point in time—irrespective of attentional resources as indexed by the AB paradigm.

Thus, the idea of oscillatory cycling attention as a model for the allocation of attentional resources in temporal selection (like in the AB task) does not hold. Participants obviously did not "use" (which is not necessarily meant in the controlled and/or conscious sense) the rhythm as a cue for increasing the allocation of attention although our rhythms were always perfectly reliable. In addition, note that we used two different variations of presenting the rhythm (visual and auditory) and also followed the procedures used by Jones and colleagues. This is important, because one may argue that it matters whether the rhythm is presented in the same modality as the to be attended stimuli (see Arend et al., 2006, who also concluded that the same AB attenuation effects resulted when additional stimuli were presented in the same or in another modality than the AB stimuli) or whether the modality in which the rhythm is presented "fits" to rhythm-processing in general (Welch et al., 1986)—of course it still might be the case that a particular combination of the modality in which the rhythm is presented and the modalities of rhythm and the RSVP stimuli might be a precondition for an effect of oscillatory cycling attention on the AB (e.g., maybe only rapid serial auditory streams are affected by auditory rhythms?). In addition, we must admit two possible caveats. First, we did not check in the same experiment whether our rhythm actually manipulated attention, but just failed to manipulate the AB (in other words, some kind of manipulation check concerning the effect of the rhythm would have been desirable). Second, the experiments conducted by Barnes, Jones, and colleagues (e.g., Large and Jones, 1999; Barnes and Jones, 2000; Jones et al., 2002) focused mainly on time perception or pitch judgments, which surely taps different attentional resources as compared to the AB. Thus, our data do not speak against these previous findings but only suggest that the model of oscillatory cycling attention is not easily applied to other tasks like the AB. It is clear that more research in different paradigms is needed to analyze whether the oscillatory cycling attention model could be applied to other domains than time perception and pitch judgments.

The fact that we observed—if any—generally slightly better T2 detection when an auditory stimulus coincided with the visual T2 fits to previous observations in RSVP streams which found visual boosting due to auditory stimuli (Frassinetti et al., 2002; Olivers and Van der Burg, 2008; Chen and Yeh, 2009). In particular, Olivers and Van der Burg (2008) found better T2 detection when an irrelevant bleep was presented together with T2 but not when it was presented directly before T2. This pattern suggests that the visual boosting is not due to alerting (because one might expect to find better detection performance if the auditory signal is presented shortly before T2) but due to multisensory enhancement. In fact, Busse et al. (2005) investigated whether neurophysiological signals to an irrelevant auditory stimulus were altered by a simultaneously presented, spatially (mis-)aligned visual stimulus. They found the strongest neurophysiological response to the irrelevant tone when the simultaneously presented visual stimulus was attended—suggesting some kind of multisensory enhancement of visual processing due to auditory stimulation (see Vroomen and de Gelder, 2000 for a discussion when auditory signals enhance or decrease visual processing).

There are a few papers in which a general enhancing effect of music/rhythm was found on T2 detection rates. Olivers and Nieuwenhuis (2005) found better T2 detection rates when participants listened simultaneously to a continuous rhythmic tune compared to the standard condition without music. The beat was not synchronized to the presentation of the stimuli in the RSVP stream. Better T2 detection rates were also found when participants should think about their holidays or their shopping plans for a dinner with friends simultanesously to the AB task. Also task irrelevant visual motion and flicker attenuates the AB (Arend et al., 2006). The authors suggested that a more diffuse attentional state causes better T2 detection rates, either via arousal or via positive affective state (see also Olivers and Nieuwenhuis, 2006). Ronconi et al. (2015) also found reduced AB effects when an auditory (but not when a visual) rhythm preceded the RSVP stream in the same frequency as the RSVP items. In general, however, there are also single reports, that the effect of music could not be replicated (Spalek and Di Lollo, unpublished data, as cited by Colzato et al., 2014). Further, differences between studies on entrainment and the AB (Zauner et al., 2012; Ronconi et al., 2015) used rhythms touching alpha. This also might explain differences in results. In this context, it also might be that the items of the RSVP stream themself induce a rhythm, too, which could generally enhance performance (in our experiments and all experiments using fixed time intervals between items in an RSVP stream).

We ran a control experiment of Experiment 2 in which we removed the rhythm and presented only the critical cue stimulus. The experiment was a replication of Experiment 2 except that we did not present any rhythm but only single tones as critical cue stimuli. (Please find the detailed description of the control experiment in the Appendix in Supplementary Material.) The critical cue stimuli were tones between 750 and 1250 Hz and participants had to compare (same/different decision) the tone pitch of the critical cue stimulus with a 1000 Hz standard tone presented at the beginning of each trial. With 24 student participants, we again found a significant main effect of lag, i.e., an AB effect, F(2, 46) = 9.30, MSE = 1050.33, p < 0.001, η 2 <sup>p</sup> = 0.29. The main effect of "position of critical cue stimulus" as well as the interaction of both factors were not significant (ps > 0.40; for the results see also **Figure 4**). In addition, comparing the control experiment and Experiment 2, we did not find statistical evidence for a general enhancement or impairment by the rhythm (i.e., there was no main effect of experiment/rhythm, F < 1, p = 0.85), and the interaction of experiment/rhythm and lag also missed the criterion for being significant, F(2, 128) = 1.99, p = 0.14 (all other effects including the factor experiment/rhythm were also not significant, ps > 0.70). Thus, we did not find evidence for a general enhancement/influence of the rhythm used in Experiment 2 and a control condition in which no rhythm was used (Of course, the lack of significance does not prove the H0). We interpret this as evidence that the results in our rhythm experiment(s) are not due a specific entrainment by the rhythm. As long as one does not argue that the presence of a critical cue stimulus effect and the rhythm modulation do interact in a disordinal way, the critical cue stimulus only adds a main effect and as a result the net effect of (any) critical cue stimulus effect and the rhythm modulation would still be usable for testing whether rhythms modulate the AB.

Why did we find no attenuation of the AB as it was found by Olivers and Nieuwenhuis (2005; 2006; but see Spalek and Di Lollo, unpublished data, as cited by Colzato et al., 2014) or Arend et al. (2006) when introducing a second task or enriching

identification and when the T2 probe was present) in the control experiment with an auditory critical cue stimulus without a preceding rhythm, depending on lag and position of the critical cue stimulus. Bars indicate the standard error of the mean.

the material by further stimuli? One possible way (besides that some of the effects could not be replicated by Spalek and Di Lollo) to explain the difference between our experiments and that of Olivers and Nieuwenhuis or Arend et al. is that our additional task was not affectively positive (like shopping plans or music) and not as demanding like a flicker task. As a result, attentional resources were not allocated to the rhythm and thus the AB was not attenuated. In contrast to most of the other experiments, in which influences of rhythms/entrainment on perception and attention were found, we used an accuracy measure instead of reaction time measures. This difference might lead to differences in results. However, as Barnes and Jones (2000) also used accuracy measures, we should have found modulations of the AB effect.

Taken together, our experiments suggest that oscillatory cycling attention induced by the rhythms used does not affect temporal selection as tapped in the AB paradigm. Our results might also be interpreted as evidence that the tasks and materials used require different attentional networks with different oscillator frequencies (e.g., Fan et al., 2007; Posner,

## REFERENCES


2012). Future research could test whether regular and various kinds of irregular rhythms differ in their influence on the AB effect and whether longer/stronger entrainment phases lead to modulations of the AB—also in cases in which no beat is presented at T2 positions.

## ACKNOWLEDGMENTS

We thank Marleen Stelter, Freddy Neumann, Nadine Johr, Charlotte Schwedes, Michaela Wanke, Pinar Yapkac, Markus Streb, Jana Linstedt, Miriam Storz, and Yvonne Hoffmeister for their help in data collection. Many thanks go to Ryan Hackländer who improved the readability of this article. We thank Kathrin Lange for her helpful comments on this work.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01847


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bermeitinger and Frings. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Action dynamics in multitasking: the impact of additional task factors on the execution of the prioritized motor movement

#### Stefan Scherbaum\*, Caroline Gottschalk, Maja Dshemuchadse and Rico Fischer

Department of Psychology, Technische Universität Dresden, Dresden, Germany

#### Edited by:

Timothy Michael Ellmore, The City College of New York, USA

#### Reviewed by:

Miriam Gade, Catholic University of Eichstatt-Ingolstadt, Germany Nick Duran, Arizona State University, USA

#### \*Correspondence:

Stefan Scherbaum, Department of Psychology, Technische Universität Dresden, Zellescher Weg 17, 01062 Dresden, Germany stefan.scherbaum@tu-dresden.de

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 30 April 2015 Accepted: 22 June 2015 Published: 06 July 2015

#### Citation:

Scherbaum S, Gottschalk C, Dshemuchadse M and Fischer R (2015) Action dynamics in multitasking: the impact of additional task factors on the execution of the prioritized motor movement. Front. Psychol. 6:934. doi: 10.3389/fpsyg.2015.00934 In multitasking, the execution of a prioritized task is in danger of crosstalk by the secondary task. Task shielding allows minimizing this crosstalk. However, the locus and temporal dynamics of crosstalk effects and further sources of influence on the execution of the prioritized task are to-date only vaguely understood. Here we combined a dual-task paradigm with an action dynamics approach and studied how and according to which temporal characteristics crosstalk, previously experienced interference and previously executed responses influenced participants' mouse movements in the prioritized task's execution. Investigating continuous mouse movements of the prioritized task, our results indicate a continuous crosstalk from secondary task processing until the endpoint of the movement was reached, although the secondary task could only be executed after finishing execution of the prioritized task. The motor movement in the prioritized task was further modulated by previously experienced interference between the prioritized and the secondary task. Furthermore, response biases from previous responses of the prioritized and the secondary task in movements indicate different sources of such biases. The bias by previous responses to the prioritized task follows a sustained temporal pattern typical for a contextual reactivation, while the bias by previous responses to the secondary task follows a decaying temporal pattern indicating residual activation of previously activated spatial codes.

Keywords: action dynamics, mouse movements, crosstalk, dual task, task shielding, cognitive control, conflict adaptation

# Introduction

Multitasking seems to be ubiquitous in today's world. The execution of multiple tasks at the same time, however, runs the risk of the prioritized task's performance being affected by the additional tasks. For example, even highly practiced and prioritized driving performance might suffer from additional task execution (e.g., Levy et al., 2006; Strayer and Drews, 2007).

In the present study it was thus asked to which extent continuous motor movements of a prioritized task are affected by determinants of a multitasking context (e.g., programming of a subsequently executed motor task). To study how a prioritized task (e.g., Task 1: a number magnitude judgment) is influenced by simultaneous processing of additional task components (e.g., Task 2: a tone frequency judgment) most experiments use dual-task paradigms in which (1) the stimulus (S2) of Task 2 is presented in various intervals (stimulus onset asynchronies, SOA) after the stimulus (S1) of Task 1 (Pashler, 1994) and (2) both tasks share dimensional overlap (Navon and Miller, 2002). In these settings, many studies reported that programming of the response (R1) in Task 1 is affected by simultaneous programming of the response (R2) in Task 2, reflected in so-called crosstalk effects. For example, Hommel (1998a) demonstrated crosstalk effects on RT in Task 1 (RT1) when the response codes of Task 1 and 2 overlapped, i.e., responding to colors in Task 1 with a left or right key-press and to letters in Task 2 by saying "left" and "right." In this case, RT1 decreased when Task 2 indicated the same response category, while it increased when Task 2 indicated a different response category.

Although the amount of between-task interference, reflected in crosstalk effects, has often been taken to indicate the effectiveness of cognitive control in shielding the prioritized task processing i.e., small crosstalk effects reflecting strong task shielding (Logan and Gordon, 2001; Fischer and Hommel, 2012; Plessow et al., 2012; Fischer et al., 2014), the locus and temporal dynamics of crosstalk effects are to date only vaguely understood. In addition, while most studies investigated how response programming in one task affects response programming in another, only few studies targeted execution-related interference between tasks. For example, Bratzke et al. (2009) used continuous motor movements in Task 1 and found propagation effects of movement distance in Task 1 on choice RT in Task 2, indicating Task 1 motor execution-related interference in Task 2. In a similar setup, Ulrich et al. (2006) found that prolonging response-execution in Task 1 also increased choice RT on Task 2. While these studies investigated effects of continuous motor movements on additional task processing, we pursued the opposite approach by focusing on the quality of the prioritized continuous motor execution and crosstalk effects due to additional task processing on these movements. More specifically, we applied a crosstalk approach to test in which time windows and by which factors the continuous motor execution of a prioritized Task 1 is affected. For this we designed a dynamic dual-task paradigm, in which participants had to move a computer mouse to respond in both tasks. Importantly, a continuous analysis of mouse movements allows first, to track accuracy/quality of the movement parameter and second to determine the temporal characteristics of the influencing factors (Spivey et al., 2005; Freeman et al., 2008; Scherbaum et al., 2010).

We hypothesized at least three important factors to determine motor execution in Task 1. First, we predict that simultaneous programming of an additional motor response affects the execution of the prioritized motor movement: interference and hence deviations of the prioritized movement could be expected if the additional motor response points into a different direction as the prioritized movement. Participants responded to the magnitude of a presented number (S1) by moving the mouse to a pre-defined target region (see **Figure 1**). To ensure R2 programming while executing Task 1, S2 (high vs. low tone) was presented shortly with different SOAs following S1. Both tasks were performed sequentially using the same response device. R1 was given by moving the mouse toward target regions at the upper left and the upper right of the screen, while R2 was given by subsequently moving the mouse toward target regions at the lower left and the lower right of the screen. The brief presentation of S2 required encoding and possibly programming of R2. Yet, the execution of R2 was not to start until execution of R1 was completed. If R2 programming starts prior to completion of R1 execution, programming R2 that entails opposite directional movement parameters (e.g., a spatial code for the target region on the right side of the screen) should critically affect the quality of the continuous motor execution of R1 (e.g., movement in direction of the left target region). Importantly, and in contrast to previous crosstalk studies employing discrete button presses, the continuous performance measure allows determining whether and to which time point R2 programming affects the movement

of R1. This is not trivial, as findings from single tasks indicate time-sensitive profiles of interference. Whereas, in the Simon task interference effects decrease over time due to either decay or suppression of conflicting information (Hommel, 1994; Stürmer et al., 2002; Band et al., 2003; Scherbaum et al., 2010), in the Stroop task and the Flanker task, the opposite temporal pattern was observed (Pratte et al., 2010; Ulrich et al., 2015). In addition, R1 movement execution might reach a ballistic dynamic in the direction of the aimed target region at which it might become immune to influences of additional Task 2 processing.

Second, in a recent study, we found that previously executed responses influenced motor movement parameter in the current trial (Scherbaum et al., 2010). At the start of a trial, participants showed a movement bias in the direction of the previously executed response. In the present dual-task situation, two responses from the previous trial could influence Task 1 execution: the previous response to Task 1 and the previous response to Task 2, respectively. Because the movement pattern of R2 (downward movement) differs substantially to the movement pattern of R1 (upward movement), a bias on the current R1 by pervious R2 could be driven by residual activation of the spatial code (left vs. right target region) of the previous R2. This residual activation should drop off quickly after response execution and hence, one could expect this influence to decay quickly.

Given the similarity of previous and current R1 (both upward movements) we assume that a bias may consist of the reactivation of the entire motor program including the previously targeted spatial code. This reactivation of the previously applied motor program should be reflected in a more sustained influence on the current response to Task 1. Both influences and the described temporal patterns can be detected by the analysis of mouse movements.

As a third influential parameter on current prioritized movement execution we hypothesize that the extent of betweentask interference in the previous trial (i.e., the level of crosstalk in trialN-1) will determine shielding of the currently executed motor response from crosstalk. This assumption is derived from the influential conflict monitoring theory (Botvinick et al., 2001) which proposes that an experienced response conflict triggers a recruitment of cognitive control to optimize subsequent performance. As a consequence, interference effects are usually reduced when following a conflict (Gratton et al., 1992; Stürmer et al., 2002; Egner and Hirsch, 2005; Ullsperger et al., 2005). In a dual-task context, crosstalk interference from Task 2 onto Task 1 processing can be interpreted as conflict that in turn shows sequential dependencies (e.g., Fischer et al., 2014). Hence, it is conceivable that the present form of crosstalk in a continuous motor execution task leads to similar sequential modulations of interference. Demonstrating trial-totrial modulations of dual-task specific crosstalk effects extends the idea of conflict adaptation to yet another form of conflict. This is not trivial, as conflicts in single tasks usually contain taskirrelevant features/stimuli that can be suppressed by mechanisms of selective attention. Dual-task processing, however, differs considerably as all features/stimuli are task-relevant and a simple selective attention mechanism might not be adaptive (see also Fischer et al., 2014).

Furthermore, by applying continuous motor movements to study these modulations in dual-task situations, our study extends previous studies investigating temporal dynamics of congruence sequence effects (e.g., Notebaert et al., 2006; Mayr and Awh, 2009; Egner et al., 2010)—however, the study of continuous movements enables a within-trial approach yielding precise temporal patterns by which previous interference affects current Task 1 execution.

# Methods

#### Participants

Twenty students (17 female, mean age = 23.52 years, SD = 4.41) of the Technische Universität Dresden took part in the experiment<sup>1</sup> . All participants had normal or corrected to normal vision. They received class credit or 5 e payment.

#### Ethics Statement

The study was approved by the institutional review board of the Technische Universität Dresden and conducted in accordance to ethical standards of the 1964 Declaration of Helsinki and of the German Psychological Society. All participants were informed about the purpose and the procedure of the study and gave written informed consent prior to the experiment. All data were analyzed anonymously.

#### Apparatus and Stimuli

Target stimuli in Task 1 (numbers 1–4 and 6–9) were presented in white on a black background in the center of on a 17 inch screen running at a resolution of 1280×1024 pixels (75 Hz refresh frequency). S1 had a width of 6.44◦ . Response boxes (11.55◦ in width) in Task 1 were presented at the top left and top right of the screen. S2 were sine tones (low: 440 Hz, high: 880 Hz, sampled at 44,100 Hz), presented for 200 ms binaurally via headphones. Response boxes (11.55◦ in width) in Task 2 were presented at the bottom left and bottom right of the screen.

For presentation, we used Psychophysics Toolbox 3 (Brainard, 1997; Pelli, 1997), Matlab 2006b (the Mathworks Inc.), and Windows XP. Tones were presented via the Portaudio driver on high precision ASIO enabled soundcards. Responses were carried out by moving a computer mouse (Logitech Wheel Mouse USB), sampled with a frequency of 92 Hz.

#### Procedure

After onscreen instructions and demonstration by the experimenter, participants practiced 20 trials, followed by the main experiment. The experiment consisted of four blocks and 1028 trials overall (see Design).

Each trial consisted of three stages (see **Figure 1**). In the first stage, participants had to click at a red box (11.55◦ in width) at the bottom of the screen within a deadline of 1500 ms. This served to produce a comparable starting area for each trial. After clicking within this box, the second stage started and two response boxes at the right and left upper corner of the screen were presented.

<sup>1</sup>Descriptive data of one student participant was lost after data collection—hence, gender and age describe data of 19 participants. The experimental data are reported completely for all 20 participants.

Participants were required to start the mouse movement upwards within a deadline of 1500 ms. We chose this procedure forcing participants to be already moving when entering the decision process to assure that they did not decided first and then only executed the final movement. Hence, only after moving at least 4 pixels in each of two consecutive time steps, the third stage started containing the actual tasks one and two. The target stimulus of Task 1—the number—was presented. The stimulus of Task 2 the tone—was presented with a stimulus onset asynchrony (SOA) of 0, 250, or 500 ms relative to S1.

To minimize "noise" in the data (due to incompatible stimulus-space representations (e.g., Dehaene et al., 1993), we used a constant spatial-compatible mapping between stimuli and responses for all participants. More precisely, for the first task, participants were instructed to respond to the number by moving the cursor into the upper left response box for digits smaller than five and to the upper right response box for digits larger than five. After giving this response to Task 1, participants executed their response to Task 2 by moving the cursor into the bottom left response box for a low tone and to the bottom right response box for the high tone. The number and the tone either indicated the same side of response (congruent condition) or opposite sides (incongruent condition).

The trial ended after moving the cursor into the respective response boxes or within a response deadline of 1500 ms in each task (see **Figure 1**). If participants missed deadlines of one of the three stages, the next trial started with the presentation of the red start box. Response times (RT) were measured as the time to reach the respective response box, reflecting the interval between the onset of the target stimulus (number in the first task, tone in the second task) and reaching the response box (top ones for the first task, bottom ones for the second task) with the mouse cursor.

#### Design

Across trials, we varied the following independent variables: for the current trial, number<sup>N</sup> (1,2,3,4,6,7,8, and 9) and tone<sup>N</sup>

(low/high), and for the previous trial, numberN-<sup>1</sup> and toneN-1. The sequence of trials was balanced within each block by pseudo randomization resulting in a balanced trial<sup>N</sup> (16) × trial <sup>N</sup>-1(16) = 256 trials transition matrix (+1 trial to conclude the sequence of balanced transitions) for each of four blocks of trials, resulting in 1028 trials overall. On this balanced sequence of trials, the SOA between number and tone was distributed balanced across congruency<sup>N</sup> and congruencyN-<sup>1</sup> by pseudo randomization. Overall, this leads to a 2 (congruencyN) × 2 (congruencyN-1) × 3 (SOA) design with 85–86 trials per condition.

#### Data Preprocessing

We excluded trials missing deadlines or containing erroneous responses and the trial following erroneous responses in one or both of the two tasks (15.47%). For all analyses of discrete measures, Greenhouse-Geisser adjustments were applied when appropriate.

For the analysis of mouse movements, we aligned all movements for a common starting position and normalized each movement to 100 equal time slices<sup>2</sup> (Spivey et al., 2005; Scherbaum et al., 2010). To quantify the deviation of mouse movements from the shortest path to the target box, we subtracted X-coordinates of each movement from an ideal Xcoordinate line.

## Results

## Response Times in Task 1 (RT1)<sup>3</sup>

We first analyzed the impact of between-task crosstalk in the current trial (congruencyN) and the between-task crosstalk of

<sup>2</sup>At 92 Hz sampling frequency, 100 samples correspond to an RT of 1087 ms. This means that all trials below this RT, including trials of average RT (M = 682 ms, SE = 19 ms, 62 samples at 92 Hz), are stretched to 100 samples. Only trials longer than 1087 ms (5.7% of all trials) are compressed.

<sup>3</sup>For RT2, please see the Supplementary Material and the Supplementary Figure 1.

Frontiers in Psychology | www.frontiersin.org July 2015 | Volume 6 | Article 934 |

the previous trial (congruencyN-1) on RT1 (see **Figure 2A**). A repeated measures analysis of variance (ANOVA) on RT1 with the factors congruencyN, congruencyN-1, and SOA revealed significant crosstalk from Task 2 to Task 1 as expressed in the main effect of congruency<sup>N</sup> on RT1, F(1, 19) = 11.61, p < 0.01, η 2 <sup>p</sup> = 0.38. RT1 were shorter when the present trial was congruent (709 ms) than incongruent (724 ms). Post-conflict trials were slightly faster (713 ms) compared to trials following congruent trials (720 ms) as indicated by the significant factor congruencyN-1, F(1, 19) = 6.32, p < 0.05, η 2 <sup>p</sup> = 0.25. A main effect of SOA indicated larger RT1 (692, 710, 747 ms) with increasing SOA, F(2, 38) = 33.52, p < 0.001, η 2 <sup>p</sup> = 0.64.

Furthermore, there was a significant two-way interaction congruencyNx congruencyN-1, F(1, 19) = 77.53, p < 0.001, η 2 <sup>p</sup> = 0.8, reflecting a conflict adaptation effect that was also present for all individual SOA levels (individual ANOVAs for each SOA, all ps < 0.01). Yet, the significant three-way interaction between congruencyN× congruencyN-1× SOA on RT1 shows that the expression of conflict adaptation varied across SOA levels, F(2, 38) = 11.49, p < 0.001, η 2 <sup>p</sup> = 0.38 (see **Figure 2A**). No other interactions reached statistical significance (all p > 0.3).

To establish a comparison for the continuous regression analysis as performed on mouse movements (see next section), we studied the full pattern of hypothesized effects with respect to SOA by performing regression analysis on RT1. We performed regression separately for each SOA with regressors for all four hypothesized, namely congruencyN, conflict adaptation (congruency<sup>N</sup> × congruencyN-1), first response in previous trial (current R1 as repetition or switch of previous R1), and second response in previous trial (current R1 as repetition or switch of previous R2). All regressors were normalized to a range of [0, 1]. To exclude multicolinearity, we checked variance inflation factors to stay below 1. We tested the resulting 12 beta-weights (3 SOA × 4 regressors) for statistical significant influence by t-tests against zero.

Results (see **Figure 2B**) show significant beta-weights across all SOA for congruencyN(all β < −9, all t < −2.10, all p < 0.05), conflict adaptation (all β < −23, all t < −3.154, all p < 0.05), and first response in previous trial (all β < −28, all t < −4.760, p < 0.05), but not for second response in previous trial (all p > 0.12). Hence, congruency within a trial, conflict adaptation, and a repetition of the previous response in Task 1 significantly influenced the response in the current trial across all SOA.

#### Mouse Movements in Task 1

In the next step we analyzed mouse movements to investigate the temporal patterns of the different influences in dependence of the SOA. To this end, we performed time continuous multiple regression (Notebaert and Verguts, 2007; Scherbaum et al., 2010; but see Mirman et al., 2008 for a multilevel approach) on the deflection of mouse movements on the (horizontal) X-axis (see Supplement Figure 2): For each trial, we calculated deflection as the difference of the real movement and a straight line from the start-point to the end-point of the real movement (a hypothetical direct movement). Compared to movements on the X-axis, this measure removes random variance resulting from different start- and end-points of movements and instead focusses on the deviation of the movement away from an ideal movement due to influences during movement execution. Compared to movement-angles (Scherbaum et al., 2010), deviation is more robust to noise, as it integrates influences across time, though at the cost of temporal resolution.

For regression on this continuous measure of deflection, we used the same regressors as for RT1, namely congruencyN, conflict adaptation (congruency<sup>N</sup> × congruencyN-1), first response in previous trial (current R1 as repetition or switch of previous R1), and second response in previous trial (current R1 as repetition or switch of previous R2). For each time slice, we calculated a multiple regression analysis (100 time slices 100 multiple regressions analyses) with the four defined regressors, yielding four time-varying beta weights (4 weights across 100 time slices) for each participant. For each of these four beta-weights, we computed grand averages representing the time-varying strength of influence curve for each predictor. To detect significant temporal segments of influence, we calculated t-tests against zero for each time step of these beta-weights (Scherbaum et al., 2010; Dshemuchadse et al., 2012), compensating for multiple comparisons of temporally dependent data by only accepting segments of more than 10 consecutive significant t-tests (see Appendix for a Monte Carlo analysis on this issue, based on Dale et al., 2007).

As can be seen in **Figure 3** and **Table 1**, congruencyNshowed a significant influence across all SOAs and the temporal onset of significant influence by congruency<sup>N</sup> strictly followed the SOA. Furthermore, conflict adaptation was only present for the first two SOAs, with a slight time-lag to the onset of congruencyN. A long lasting influence of firstresponse in previous trial was present across all SOA, while second response in previous trial influenced mouse movements, in contrast to RT1, but only at the start of the trial, decaying quickly in the course of the trial<sup>4</sup> .

These results confirm our expectation that information from the previous tasks and from the current Task 2 influenced the execution of Task 1. Furthermore, crosstalk from Task 2 on Task 1 was not limited to a specific stage of Task 1 execution, but was present for all SOA and followed the timing of the arrival of information from Task 2 as reflected in the SOA.

# Discussion

In the present study we investigated to which extent continuous motor movements of a prioritized task (Task 1) are affected by determinants of a multitasking context, namely crosstalk from a secondary task (Task 2), previously executed responses of Task 1 and Task 2, and the extent of previously experienced interference between Task 1 and Task 2 (crosstalk in trial N-1). We found evidence for an influence of all four factors, following specific temporal patterns.

First, the results from RT and mouse movements indicate that Task 1 execution is influenced by crosstalk through the information necessary to program the response of Task 2 (Hommel, 1998a; Logan and Schulkind, 2000; Fischer et al.,

<sup>4</sup>These analyses did not change qualitatively when removing all trials with RT lower than 500 ms.

TABLE 1 | Significant temporal segments (normalized time) from continuous regression analysis and respective mean RT1.

(in ms) is plotted in separate panels. The SOAs' time segment relative to


2007). While the analysis of RT1 indicated crosstalk to weaken with increasing SOA, the analysis of mouse movements revealed that crosstalk strictly followed the timing of the onset of the information for Task 2 as determined by the SOA. For the shortest SOA, the influence of crosstalk started the earliest and accumulated most in the deflection of mouse movements. For the longest SOA, the influence of crosstalk was limited to the final part of the movement and could accumulate only shortly. Thus, while varying in degree, crosstalk was not limited to specific critical time-windows that might be related to certain processing stages, i.e., response selection and/or movement execution of Task 1. The finding of crosstalk on Task 1 movement execution is

beta-weights indicate a support of movement into the correct direction (smaller deflection to the incorrect target box). Shaded areas indicate standard errors.

not trivial. First of all, although the movement of the mouse itself started the trial (with S1 presentation) so that R1 programming was forced to occur online during movement, it could have been conceivable that the movement becomes a ballistic process at some point which renders it insensitive to influences of additional stimulus encoding and classification. In contrast, however, we could show that throughout the entire movement period crosstalk from additional Task 2 processing affected the movement quality in the prioritized task—even for the longest SOA, we found crosstalk, as indicated by regression analysis of RT1 and mouse movements. Furthermore, even though the execution of both tasks was temporally segregated (due to using the same response device for Task 1 and Task 2) crosstalk did occur whenever S2 was presented. This shows that the brief presentation of S2 resulted in an immediate stimulus feature encoding and response selection process that interfered with motor execution in Task 1. Put differently, despite the mouseparadigm-inherent sequential motor execution, crosstalk from Task 2 onto Task 1 could not be prevented. Since R1 and R2 movements were executed in different vertical directions (upwards for R1, downwards for R2), one could assume that the found crosstalk does not stem directly from the programming of movements, but from the spatial (horizontal) codes for the target areas (left/right), that overlapped between Task 1 and Task 2.

Second, the analysis of RT1 and mouse movements yielded different results about the influence on R1 by previously executed responses in Task 1 and Task 2. The analysis of RT1 only revealed an influence of the previous response in Task 1, while mouse movements revealed an influence of the previous response in both Task 1 and Task 2. This indicates that RT as a discrete measure was not as sensitive as mouse movements and missed the smaller effects of the previous R2. Mouse movements further revealed distinct temporal patterns for both influences. That is, the previous response to Task 1 led to a strong and sustained influence which can be interpreted as a retrieval of the previously activated response by the context of Task 1 (cf. Hommel, 1998b; Hommel et al., 2002). Since this effect was present across the whole trial, it was also reflected in RT1. The previous response to Task 2 led to a weaker and quickly decaying influence. This could be interpreted as a passively decaying residual activation of the response executed directly before the current R1. Notably, the differences in the exact movements of R1 (upwards) and R2 (downwards) suggests that this residual activation stems from spatial codes used for programming R2, but not from the completely programmed movement of the previous R2 itself. Since the effect decayed quickly, it was not reflected in RT1, indicating the advantage of analyzing the continuous data. The finding of these two effects from previously executed responses also sheds light on a similar effect found in single task situations (Scherbaum et al., 2010, 2013): here, the strength of the found response repetition bias might result from an inseparable mixture of the retrieval and residual activation of the same response in the previous trial.

Third, congruence relations in the previous trial affected crosstalk in the current trial. More specifically, previous conflict reduced the current impact of Task 2 processing on Task 1. Therefore, experiencing crosstalk resulted in increased levels of prioritized task shielding to protect Task 1 processing from Task 2 interference that could be compared to conflict adaptation in single task situations (Botvinick et al., 2001).

Our findings support the view of a continuous process of scheduling and capacity sharing (Tombu and Jolicœur, 2003). The crosstalk between Task 1 and 2 and the influence of previous interference indicate flexible task shielding of Task 1 from Task 2 following a pattern similar to conflict adaptation in single task situation (Botvinick et al., 2001). The effects of conflict adaptation indicate that task shielding can be parameterized by previous experience of crosstalk interference (Fischer et al., 2014; compare e.g., Logan and Gordon, 2001). However, adjustments to task shielding showed a time-lag leading to conflict adaptation in mouse movements being only present for the first two SOAs. For the longest SOA, the temporal pattern shows an onset of conflict adaptation that fails to reach significance before the end of R1.

The continuous nature of our task might have supported the flexible time sharing compared to the usual key-press-based setups. We forced participants to start the movement of R1 before S1 was presented and this could have forced participants to choose a continuous processing mode that might not be chosen in a key-press based paradigm. While this procedure was necessary to ensure that response selection is reflected in the movement of R1, one could also argue that most actions in the real world demand the continuous adaptation of response movements to occurring stimuli and hence, our results imply a higher ecological validity compared to the strongly constrained key-press setups found in other studies.

Notably, our study is not the first one to apply continuous movements to respond to Task 1. However, previous studies used the continuous movements in Task 1 mainly to influence responding in Task 2 (e.g., Ulrich et al., 2006; Bratzke et al., 2008, 2009). An important variable in these studies was movement distance in Task 1, leading to higher movement times of R1. Especially for long movement distances, these studies found increasing Task 1 RT in dependence of the SOA, comparable to the results of the current study (although focusing on the consequences on Task 2; e.g., Bratzke et al., 2009).

A negative side-effect of our focus on Task 1 and the chosen response setup of our paradigm is that responses in Task 2 are hard to interpret. In the case of incongruent Task 1 and Task 2 responses, participants had a longer way to reach the opposite response box in Task 2 and hence longer RT per se. However, what could be taken from the pattern of RT in Task 1 and Task 2 is that our manipulation of SOA was effective. If higher SOA had shown smaller slopes in RT of Task 2 (see Supplement), it would have been possible that Task 2 information was too late to influence Task 1 (compare e.g., Ulrich et al., 2006).

Concluding, the dynamic investigation of Task 1 execution in a dual-task setting yielded three findings: First, the crosstalk from Task 2 interfered with Task 1 execution. Although this influence was clearly dependent on the temporal proximity between S1 and S2 presentation, the impact of the influence onto Task 1 processing was continuous, i.e., irrespective of any critical windows of influence. This indicates a continuous process of task execution that does not end in a ballistic automatic movement, but is prone to interference until reaching its final destination; second, the modulation of crosstalk by previous interference indicates a flexible adaptation of task-shielding; and third, the execution of Task 1 was also influenced by previously executed responses of both, Task 1 and Task 2 however, these influences showed different temporal patterns indicating a sustained reactivation of the previous response of Task 1 and a decaying residual activation that is most likely related to the spatial codes of the previous execution of Task 2.

# Author Contributions

This research was supported by the German Research Foundation (DFG grant SCH1827/11 to SS and DFG grant SFB 940/1 Project A3 to RF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.

# Funding

We acknowledge support by the German Research Foundation and the Open Access Publication Funds of the TU Dresden.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.00934

# References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Scherbaum, Gottschalk, Dshemuchadse and Fischer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Action dynamics reveal two types of cognitive flexibility in a homonym relatedness judgment task

Maja Dshemuchadse\*, Tobias Grage and Stefan Scherbaum

Department of Psychology, Technische Universität Dresden, Dresden, Germany

Cognitive flexibility is a central component of executive functions that allow us to behave meaningful in an ever changing environment. Here, we support a distinction between two different types of cognitive flexibility, shifting flexibility and spreading flexibility, based on independent underlying mechanisms commonly subsumed under the ability to shift cognitive sets. We use a homonym relatedness judgment task and combine it with mouse tracking to show that these two types of cognitive flexibility follow independent temporal patterns in their influence on participants' mouse movements during relatedness judgments. Our results are in concordance with the predictions of a neural field based framework that assumes the independence of the two types of flexibility. We propose that future studies about cognitive flexibility in the area of executive functions should take independent types into account, especially when studying moderators of cognitive flexibility.

Keywords: cognitive flexibility, priming, homonym, dynamics, continuous time, mouse movements, executive functions, relatedness judgment

# Introduction

As humans, we are able to adapt our goals and our behavior to a constantly changing environment. The flexibility to shift cognitive sets is a central part of executive functions (e.g., Miyake et al., 2000; Diamond, 2013). It requires either identifying alternatives related to the original set or overcoming perseveration on previous sets by shifting to a new alternative set. In the research on executive functions, cognitive flexibility is usually studied in two ways. The first approach focusses on task-switching (e.g., Monsell, 2003), set-shifting (e.g., Dreisbach and Goschke, 2004) or related paradigms that reveal switch costs (higher response times) in switch trials—where one has to switch from one task to another one—compared to repeat trials—where the same task is performed again. Several influences on this type of cognitive flexibility have been identified. For instance, it has been found that positive mood increases cognitive flexibility and thereby reduces switch costs (Dreisbach and Goschke, 2004). Interestingly, this influence of positive mood on cognitive flexibility builds the link to the second approach that focusses on the breadth of attention, as studied, e.g., with the flanker task (Eriksen and Eriksen, 1974) or with semantic tasks as the remote associates task (Mednick et al., 1964). The flanker task—in which one has to respond to a central target while ignoring flanking stimuli—shows the congruency effect, so that incongruent trials—in which center and flanking stimuli point to different responses—are slower than congruent trials—in which center and flanking stimuli point to the same response. This congruency effect increases with decreasing distance between center and flanking stimuli (Eriksen and Eriksen, 1974). Interestingly, positive mood also increases the congruency effect similar to effects of positive mood on the

#### Edited by:

Timothy Michael Ellmore, The City College of New York, USA

#### Reviewed by:

Thomas Kleinsorge, Leibniz Research Centre for Working Environment and Human Factors, Germany Peter Dixon, University of Alberta, Canada Natalie A. Kacinik, City University of New York, USA

#### \*Correspondence:

Maja Dshemuchadse, Department of Psychology, Technische Universität Dresden, Zellescher Weg 17, 01062 Dresden, Germany maja.dshemuchadse@tu-dresden.de

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 30 April 2015 Accepted: 04 August 2015 Published: 28 August 2015

#### Citation:

Dshemuchadse M, Grage T and Scherbaum S (2015) Action dynamics reveal two types of cognitive flexibility in a homonym relatedness judgment task. Front. Psychol. 6:1244. doi: 10.3389/fpsyg.2015.01244 detection of remote associates in the remote associates task (Rowe et al., 2007). These findings are interpreted as an increase in the breadth of attention and hence increased cognitive flexibility. The link between the two approaches—shifting and attentional breadth—as indicated by the effects of positive mood is also innate in other conceptualizations of executive functions, i.e., the shielding-shifting dilemma that assumes the shifting of goals and the shielding from distraction as opposing constraints of the cognitive system that span the dimension of cognitive flexibility (Goschke and Bolte, 2014).

Here, we argue, that despite the common influence of positive mood and the apparent similarities of these conceptualizations, the two approaches tackle different types of cognitive flexibility based on independent underlying mechanisms: The ability to identify related alternatives or to completely shift to new alternatives comprise two distinct and independent aspects of cognitive flexibility, that become evident when referring to neural mechanisms. We will call the first type shifting flexibility—the ability to overcome perseveration by switching from an active neural pattern representing a certain cognitive set to a new one; and we will call the second type spreading flexibility the ability to identify related alternative cognitive sets based on the spread of activation to nearby neural patterns. A classic example for these two abilities stems from the field of problem solving: Duncker's candle task (Duncker, 1945). Participants are given a box of tacks, matches, and a candle. Then they are instructed to attach the candle to a wall of corkboard and to light it (compare Isen et al., 1987). The "correct" solution is to use the tacks' box as platform for the candle and to attach this platform to the wall using the tacks. However, a typical phenomenon is functional fixedness: participants are not able to leave the functional cognitive set associated with the single items: they only explore different functions in the vicinity of the original functions (for example, using the melted wax of the candle as glue). Within this candle task, spreading flexibility describes the exploration of all ideas related to the original function of the items, while shifting flexibility would mean to shift to a completely new function—leaving functional fixedness.

Decomposing these two types of cognitive flexibility and their underlying cognitive processes experimentally, however, poses a methodological challenge: Typical approaches either analyze inter-individual variance in different tasks (e.g., Miyake et al., 2000) or analyze patterns of lesions or neural activation between different brain regions (e.g., Koechlin et al., 2003; Stuss and Alexander, 2007). Both approaches require a large number of participants and inter-individual variability to differentiate components, and both approaches are correlational in using given inter-personal variation. For example, in their seminal study on components of executive functions, Miyake and colleagues measured 137 participants—all undergraduate students—on 14 tasks, identifying 3 specialized and 1 general components of executive functions via factorial analysis (Miyake et al., 2000). However, in later studies, measuring participants between 20 and 81 years, an additional fourth specialized component was identified (Fisk and Sharp, 2004). While this does not question the original three components, it illustrates that variance between participants is necessary in this approach to identify specific components.

To go beyond such correlative analyses, experimental approaches manipulate task-requirements of different components and measure the resulting differences in response times and error rates (e.g., Braver, 2012). We will embrace and add to this experimental approach by adopting a dynamic continuous perspective (Spivey and Dale, 2004, 2006; Spencer et al., 2009). We manipulate the requirement for the two types of flexibility, shifting and spreading, within one task and compare their effects continuously in time by using mouse tracking (Spivey et al., 2005; Freeman et al., 2008; Scherbaum et al., 2010; Dshemuchadse et al., 2012) which will enable us to distinguish both types by their temporal profiles (Scherbaum et al., 2010). To manipulate and measure the two types within the same task, we will use a relatedness judgment task (Zwaan and Yaxley, 2003). In this task, participants have to detect whether two words are semantically related or not. Hence, participants have to search within their semantic network to detect associations between words (Faust and Lavidor, 2003). For this search process, one can manipulate the level of required spreading flexibility necessary to solve the task by varying the strength of relatedness between words. To also manipulate and assess shifting flexibility, we extended this task by using homonyms (Dshemuchadse, 2009)—words with two or more different meanings, i.e., bank as a place to store and retrieve money or as the embankment of a river. Shifting flexibility is presumed to allow individuals to switch from one activated meaning of a homonym to another (Simpson and Kang, 1994; Gorfein et al., 2000) when required by the task (see **Figure 2** for a sketch of the resulting task).

To illustrate our reasoning about the processes involved, **Figure 1** sketches a simplified neural framework that builds on previous competitive activation models of ambiguity resolution (e.g., Plaut and Booth, 2000; Twilley and Dixon, 2000; Rodd et al., 2004). The framework assumes word meanings to be represented by patterns of activation in a dynamic neural field (Amari, 1977; Erlhagen and Schöner, 2002; Erlhagen and Bicho, 2006). The neural field represents the meaning of words distributed (Rodd et al., 2004) along a topographic semantic space (Plaut and Booth, 2000) with related word meanings positioned near to each other.

For homonyms, the two meanings are presented in separate subspaces of the field, assuming that a homonym's meanings are distinct. For example, the homonym band would have two separate peaks representing the meaning of a small orchestra and the meaning of a tie. Activation within the field is not limited to single points, but spreads from one unit to neighboring units—representations of concepts are, hence, diffuse and always co-activate related concepts that lie nearby in semantic space (compare Quillian, 1967). The more related two concepts are, the nearer they are to each other in the field and the stronger they co-activate each other. For example, the words band and music are nearer to each other and, hence, spread more activation to each other when activated, than the words band and chord. The priming of one of the two meanings of the homonym results in stronger initial activation of the primed semantic subspace of the whole homonym meaning space. For example, when the homonym band is shown and before, the context of music

was primed, the peak representing the small orchestra would be supported by a decaying peak representing this priming.

Within this simplified neural framework with spreading activation within the field and the decaying effect of priming, we can derive two predictions for how fast a related word (the associate in the following) could be identified by the model. First, the nearer the associate is located to the homonym's core meaning, the better its identification is supported by the spreading activation from the homonym. For example, the relation between band and music is identified more quickly than the relation between band and chord. This difference represents the spreading flexibility: broader spread of activation should support the identification of words less related to the core concept activated by the homonym.

Second, an associate from within the primed subspace can be identified more easily than an associate from within the unprimed subspace, although the influence of priming is expected to decay over time. For example, the word music would be identified more rapidly than the word ribbon. This difference represents the shifting flexibility: The more stable the neural representation of the primed homonym meaning is, the more difficult it is to shift to a completely different meaning (this closely matches the implementation of shielding in common neural network models, e.g., O'Reilly, 2006; Herd et al., 2014).

To tackle these two types of cognitive flexibility empirically, we created a task that asked participants to identify associates that have first either been primed or unprimed, and second are either strongly or weakly related to the homonym. From the theory as sketched in the neural framework, we can derive the following empirical hypotheses (for a formal implementation and simulation of the framework in a dynamic neural field model and the resulting predictions, see Supplementary Material Data Sheet 1). First, we expect, that there will be independent main effects in response times and error rates of priming and association indicating the independence of the two types of cognitive flexibility. Second, we expect that the two types will influence participants' mouse movements at different points in time: the priming should be influential at an earlier time than the strength of association and both should not interact while active in parallel. This difference in timing and the temporal independence would provide evidence for the independence of the underlying processes. Third, on an inter-individual level, we expect no correlation across participants between the two types. Notably, due to the deliberately chosen sample size, this can only be seen as weak and additional, but not crucial evidence.

# Method

### Participants

Twenty students (10 female, mean age = 24.7 years) of the Technische Universität Dresden took part in the experiment. All participants had normal or corrected to normal vision. They received class credit or 5 e payment. The study was performed in accordance with the guidelines of the Declaration of Helsinki and of the German Psychological Society. An ethical approval was not required since the study did not involve any risk or

discomfort for the participants. All participants were informed about the purpose and the procedure of the study and gave written informed consent prior to the experiment. All data were analyzed anonymously.

#### Materials, Apparatus, and Stimuli

The presented material consisted of 252 related/associated and 252 unrelated word-pairs (for some of these pairs existed several versions due to different conditions, see below).

The trials of interest were so called homonym trials (84 in number) that consisted of a German homonym (target word) paired with a related word (the associate), which existed in four versions derived from two factors: the associate was either strongly associated with the respective homonym meaning or it was weakly associated; and the associate was either related to one meaning of the homonym or the other meaning. Each of the homonym trials was preceded by a prime trial (84 in number) consisting of two related words which existed in two versions: One of the prime trial versions was priming one meaning of the homonym; the other prime trial version was priming the other meaning of the homonym (see **Figure 2B**).

To avoid participants learning and adapting to this primehomonym structure in the sequence of trials, we added filler trials using further material: associated filler trials (84 in number), with two associated words; homonym catch trials with a homonym and an unrelated word (84 in number); filler catch trials with two unrelated words (168 in number). The order of the different trial types was random with the exception that prime trials and homonym trials always followed each other. The experimental material (prime and homonym trials) was evaluated in earlier studies (compare Dshemuchadse, 2009) and can be found in the Supplementary Material Table 1.

Additionally, we had 10 independent raters evaluate all of the material (including the filler and catch trials) for relatedness. All words presented together as target and associate in the different conditions were rated for relatedness on a scale from 1 ("no relatedness") to 9 ("very strong relatedness"). The order of presentation was randomized across raters. **Table 1** shows the TABLE 1 | Relatedness Ratings (1 = unrelated, 9 = strongly related) for the experimental homonym trials and the other four types of trials (priming, homonym catch, filler, and filler catch) for both sets of homonym meanings (A and B).


For homonym trials, the average difference of strong associates indicates balancing of the two homonym meanings across the two sets of meaning A/B; the average absolute difference indicates the degree of general balance between the two homonym meanings.

average results across word pairs. The data indicate the intended strong relatedness for priming trials, strong homonym trials and filler trials (6.19–6.92), medium relatedness for weak homonym trials (4.43–4.77) and almost no relatedness for filler catch and homonym catch trials (1.04–1.17). Furthermore, we calculated for each pair of strong associates (trial set A and B) the difference between their homonym relatedness to estimate the degree to which the homonym meanings were balanced. For the average difference, the low absolute value of 0.15 indicates well balanced homonym meanings across the two homonym trial sets A and B. For the average of the absolute difference, the relatively low value of 1.16 indicates a sufficient general balance of the homonym meanings (for the complete specific word pair ratings, see the Supplementary Material Table 1).

Stimuli (target word and associate) were presented in white on a black background on a 17 inch screen running at a resolution of 1280×1024 pixels (75 Hz refresh frequency). Words were printed in Arial (font size: 48 pt) in the horizontal center of the screen. The target word was present 100 px above the vertical center, the associate 35 px below the vertical center (this upwards bias was chosen to suit the upwards direction of mouse movements). Response boxes (200 px in width) were presented at the top left and top right of the screen. For presentation, we used Psychophysics Toolbox 3 (Brainard, 1997; Pelli, 1997), Matlab 2006b (the Mathworks Inc.), and Windows XP. Responses were carried out by moving a computer mouse (Logitech Wheel Mouse USB), sampled with a frequency of 92 Hz.

#### Procedure

Participants' task on each trial was to judge whether the two words presented were related or not. After onscreen instructions and demonstration by the experimenter, participants practiced 20 trials, followed by the main experiment (material of these practice trials did not appear again in the experiment).

Each trial consisted of three stages (see **Figure 2A**), following an established mouse task procedure (Scherbaum et al., 2010; Dshemuchadse et al., 2012). In the first stage, participants had to click into a red box (140 px in width) at the bottom of the screen within a deadline of 1.5 s. This served to produce a comparable starting area for each trial. After clicking within this box, the second stage started and two response boxes at the right and left upper corner of the screen were presented. Participants were required to start the mouse movement upwards within a deadline of 1 s. We chose this procedure forcing participants to be already moving when entering the decision process to assure that they had not already decided prior to simply executing the final movement. Only after participants moved the cursor at least 4 pixels in each of two consecutive time steps the third stage started with the appearance of the target word (e.g., the homonym) and the associate (hence, the time for stages 1 and 2 could be conceptualized as the inter-trial-interval). Participants were instructed to move the cursor into the upper left response box to indicate that both words were related and into the upper right box to indicate that both words were not related (directions were balanced across participants).

The trial ended after moving the cursor into one of the response boxes within a deadline of 2.5 s (see **Figure 2**). If participants missed the deadline of one of the three stages, the next trial started with the presentation of the red start box. Response times (RT) were measured as the duration of the third stage, reflecting the interval between the onset of the target stimulus and reaching the response box with the mouse cursor.

#### Design

The experiment consisted of 5 types of trials (prime, homonym, filler, catch homonym, and catch filler—see Materials, Apparatus, and Stimuli). The main experimental manipulation concerned the consecutive prime and homonym trials. Prime trials primed one of the two meanings of the homonym in the following homonym trials. Homonym trials consisted of the homonym and an associate that was either strongly associated or weakly associated with either one or the other of the homonym's meanings. Across two blocks of trials, participants experienced each homonym twice, with each meaning primed once (order of primed meanings was balanced across participants). Priming and strength of association were manipulated orthogonally (randomized), leading to a 2 (primed/unprimed) × 2 (strong/weak) design with 42 trials per condition. As a control factor, we included repetition, the first and second presentation of the homonym in the analysis.

#### Data Preprocessing

For analysis of RT and mouse movements, we excluded erroneous trials (compare **Figure 7**) and trials aborted because of participants clicking too late into the start box (0.14%) or starting their movement too late (0.01%). On average, the inter-trialinterval (stages 1 and 2 of the mouse procedure) lasted 1068 ms (SE = 41 ms).

We aligned all movements for a common starting position (within the range of the start box) and normalized each movement to 100 equal time slices (Spivey et al., 2005; Scherbaum et al., 2010). For a detailed analysis of the time course of influence on mouse movements, we used the mouse movement trajectory angle on the XY plane (compare Scherbaum et al., 2010) as dependent variable. We calculated the angle relative to the y-axis for each difference vector between two time steps<sup>1</sup> . Following this, we prepared the temporal analysis by introducing temporal correlation between the single data points. To this end, we filtered the data with an 8-point Gaussian smoothing window across time steps (the 8-point window was chosen to equal the correction criterion for multiple testing as explained in the Results section).

# Results

#### Discrete Results for Homonym Trials

Our main experimental interest was in the dynamics of the two influences within homonym trials. However, as a first check for successful manipulation and for independence of the two factors priming and association, we performed a repeated measures analysis of variance (ANOVA) on RT and on error rates with the factors priming (primed/unprimed), association (strong/weak), repetition (first experience/second experience). For RT (see **Figure 3**, left), this revealed significant main effects for priming, F(1, 19) = 129.93, p < 0.001, η 2 <sup>p</sup> = 0.87, association, F(1, 19) = 84.28, p < 0.001, η 2 <sup>p</sup> = 0.82, but no effect forrepetition, F(1, 19) = 0.02, p = 0.9. There was no significant interaction (all p > 0.1). For error rates (see **Figure 3**, right), the analysis yielded comparable results (priming and association: p < 0.001, all other p > 0.09). Hence, priming and association influenced processing independently of each other. Furthermore, the insignificant control factor indicates that the second experience of the word material did not influence participants' processing substantially.

Since, the error rates were at the 50% chance level in the condition unprimed-weak, we aimed to ensure that participants performed the task correctly and did not simply respond randomly. Therefore, we performed a signal detection analysis

<sup>1</sup>Mouse movement angle has two advantages over the raw trajectory data as a measure of movement tendency. First, it better reflects the instantaneous tendency of the mouse movement since it is a differential measure compared to the cumulative measure represented by raw trajectory data. Second, it integrates the movement on the XY plane into a single measure.

(Green and Swets, 1966) with these most difficult homonym trials and the homonym catch trials: We coded relatedness as signal and participants' choices as decision. If participants decided randomly in difficult trials, we would expect them to show sensitivity near to zero. However, the analysis revealed a mean sensitivity of 1.41 (SE = 0.11, t = 12.95, p < 0.001). We interpret this as evidence that participants performed the task correctly, even when they erred often in the most difficult condition.

With this basic pattern as expected, corroborating the independence of the factors priming and association would usually mean looking at inter-individual variability and to calculate correlations for these factors across participants. Although we measured only 20 participants, we performed this analysis for RT and error rates (see **Figure 4**). Results for RT, r = −0.08, p = 0.75, and error rates, r = 0.04, p = 0.88, indicated no correlation between the two effects across participants. As noted in the introduction, this correlational approach usually relies on larger number of participants. The analysis of within trial dynamics as indicated by mouse movements circumvents these disadvantages, as described in the following sections.

#### Continuous Results for Homonym Trials

As a first validating analysis, we analyzed mouse movements for the same effects as found in RT and errors (see **Figure 5**). To this end, we calculated the degree of curvature calculated as the area under the curve between a direct straight line movement and the real curved movement in a trial. An ANOVA on the degree of curvature for the factors priming (primed/unprimed), association (strong/weak), repetition (first experience/second experience) yielded significant main effects for priming, F(1, 19) = 100.96, p < 0.001, η 2 <sup>p</sup> = 0.84, association, F(1, 19) = 60.01, p < 0.001, η 2 <sup>p</sup> = 0.76, but no effect for repetition, F(1, 19) = 0.27, p = 0.61. There was no significant interaction (all p > 0.1).

We then performed the main analysis for mouse movements in homonym trials by time continuous multiple regression (Notebaert and Verguts, 2007; Scherbaum et al., 2010) on mouse movement angles (see **Figure 6**, left and middle panel; additionally, see the Figure in the **Supplementary Material Image 1** for the other types of trials) with three regressors: association (strong/weak), priming (primed/unprimed) and the interaction association × priming. The first two regressors were normalized to a range of [−1, 1]. To exclude multicolinearity as a source of artifacts, we checked variance inflation factors to stay below 1.1, indicating the necessary low level of multicolinearity.

We calculated 100 multiple regression analyses (100 time slices → 100 multiple regressions) yielding three time-dependent beta weights (three weights across 100 time slices) for each participant. For each of these three beta-weights, we computed grand averages representing the time-varying strength of influence for each predictor (see **Figure 6**, right). To analyze the properties of these three beta-weights, we checked for relevant temporal segments of influence by calculating t-tests against zero for each time step of these beta-weights (Scherbaum et al., 2010; Dshemuchadse et al., 2012). To compensate for multiple comparisons of temporally dependent data, we followed previous studies (Scherbaum et al., 2010; Dshemuchadse et al., 2012) and chose as a criterion of reliability a minimum of eight consecutive significant t-tests (see Dale et al., 2007 for Monte Carlo analyses on this issue).

The results (see **Figure 6**, right, and **Table 2**) indicate that association and priming followed different time courses. The influence of priming started earlier than association [M(RT) = 552 ms vs. M(RT) = 716 ms], as we expected. The interaction between both factors did not show any significant temporal segments of influence.

#### Results across All Types of Trials

Our main focus was on the experimental investigation of the dynamics within the homonym trials. To check for the validity of the overall experiment, we also analyzed RT and error rates of the different types of trials (see **Figure 7**).

We performed an ANOVA on RT for the factor trial type (priming, homonym, filler, homonym catch, filler catch), revealing significant differences F(1.62, 30.75)<sup>2</sup> = 14.83, p <

<sup>2</sup>Greenhouse-Geisser corrected.

FIGURE 4 | Scatter plots of the effect of association (strong-weak) and the effect of priming (primed-unprimed) for RT (left) and error rates (right) of each participant.

0.001, η 2 <sup>p</sup> = 0.44. Post-hoc t-tests revealed this difference to be located between priming, homonym, and filler trials on the one side and homonym catch and filler catch trials on the other side (all p < 0.01, uncorrected). Hence, this effect indicates that finding an association was easier than rejecting any association. Concerning error rates, an ANOVA also revealed significant differences for the factor trial type F(1.42, 26.94)<sup>2</sup> = 33.57, p < 0.001, η 2 <sup>p</sup> = 0.64. Post-hoc t-tests revealed priming (M = 16.64%, SE = 2.03%) and filler (M = 18.51%, SE = 2.28%) trials to be similar, as well as homonym catch (M = 8.72%, SE = 2%) and filler catch (M = 7.72%, SE = 1.68%) trials. Homonym (M = 30.03%, SE = 2.05%) trials were significantly different to all other trials (all p < 0.001). Homonym catch and filler catch were different to all non-catch trials (all p < 0.01). Again, finding an association was different to rejecting any association—the lower error rate for catch trials indicates, however, a speed-accuracy trade-off: instead of risking missing an association, participants seemed to double check before responding that no association was present.

# Discussion

The aim of our study was to decompose cognitive flexibility as a component of executive functions into two distinguishable types, namely shifting flexibility—the readiness to switch between cognitive sets—and spreading flexibility—the ability to identify related cognitive sets. To overcome the limitations of interindividual correlational approaches, we chose an experimental

TABLE 2 | Timing of the influence of priming and association on mouse movement angles.


approach using a homonym relatedness judgment task combined with mouse movements. The former served to manipulate the two types within the same task; the latter allowed us to dissociate the two types by their temporal variance within trials.

Our results indicate that the manipulation of the two types leads to independent effects. This independence is reflected on the one hand in distinct temporal patterns of influence within trials as measured with mouse movements, and on the other hand in independent correlational patterns of RT and error rates across participants. These results match the predictions theoretically based on a neural framework assuming continuous representations of word meanings in a neural field (for a formal implementation and simulation of the predictions, see Supplementary Material Data Sheet 1).

Our use of filler trials, additional to the central homonym trials, indicated several additional findings that validate the findings presented here. First, homonym trials showed a higher error rate than all other trials, indicating that working against the priming of the wrong context and identifying weakly related associates were a difficult task for our participants compared to the standard association trials. Second, in catch trials, when no relatedness between the two words was present, participants were slower but showed less errors. This finding supports the assumption that participants aimed at identifying associations (instead of the opposite strategy of excluding associations) and only responded with judgments of no relatedness after checking twice. This also indicates, that despite the high error rates, participants did not simply guess in the most difficult trials (unprimed—weakly related associates), but still performed the task they should have performed so that mouse movements in correct trials still contain the trace of the processes of interest.

It has to be noted that our participants experienced all homonyms twice to increase the absolute number of trials. In light of previous findings, i.e., of primacy effects in ambiguity resolution (Gorfein et al., 2000), it was important that our analyses indicated no substantial difference between the first and the second experience.

Could our results also be explained by priming effects as they would also be present in a simple word/ non-word recognition task? While this would question our interpretation of the priming effect as an indicator of shifting-flexibility, results from pretests (Dshemuchadse, 2009) contradict such an interpretation. When we varied the temporal presentation order of the associate and the homonym, priming effects were much larger when the homonym was presented first (as in the study reported here). Hence, processing the homonym built up an expectation that participants had to overcome to identify the unprimed associated and this exactly matches our definition of shifting flexibility.

Beside the central findings of our study, three further theoretical and methodological implications for psychological research can be discussed.

First, our results and the underlying theoretical framework question the reduction of cognitive flexibility to one single construct, namely shifting flexibility. Representing cognitive flexibility with only one parameter, i.e., the neural gain parameter in both task switching (O'Reilly, 2006; Herd et al., 2014) and ambiguity resolution (Plaut and Booth, 2000) confounds the two subtypes: In neural field models, manipulating the gain parameter leads to changes in both, stability and breadth (specificity) of activation and hence to a dependence of shifting and spreading flexibility as defined here. In the light of our results, this dependence can be questioned. In our framework, we assumed shifting flexibility to be related to the stability of neural activation patterns as reflected in the strength of neural self-excitation. In contrast, spreading flexibility was related to the breadth of the spread of neural activation. Concerning

the conceptualization of control dilemmas, more complex elaborations (Goschke, 2012) indeed distinguish between a shielding-shifting and a selection-monitoring dilemma, pointing to different underlying parameters. Our results support this distinction and show that although the common effects of positive mood might indicate only one dilemma (Goschke and Bolte, 2014), the distinction should be made. With our paradigm, we show that it is possible to meet both demands: to shift cognitive sets and to spread across cognitive sets. Furthermore, we suggest that the selection-monitoring dilemma can be extended from the breadth of attention to breadth of activation in the semantic space (compare Rowe et al., 2007). Notably, the complementary nature of the dilemmas implies that any benefits of a certain configuration—for both, shifting and spreading flexibility—also come with costs: In accordance with this, we assume for shifting flexibility that a smaller effect of priming implies not only easier switching but also less benefit when staying; concerning spreading flexibility, we expect that a broader spreading could lead to difficulties when focusing on one concept or when distinguishing concepts is necessary.

Second, the way we implemented shifting and spreading flexibility suggests that the difference between these two types in our experiment cannot be reduced to other distinctions of cognitive flexibility or cognitive control. Concerning the latter, a distinction between proactive and reactive control had been propose previously (Brown et al., 2007; Braver, 2012). However, whether subjects had to switch to the unprimed meaning of the homonym (shifting flexibility) or whether they had to search for the weak association (spreading flexibility) was only evident when the word-pairs appeared on the screen. Hence, they were unforeseeable and in both cases triggered externally or under reactive control. In terms of cognitive control processes the proposed difference between shifting and spreading flexibility could be mapped to a distinction between a switching component and a searching component. This latter distinction is related to two types of cognitive flexibility as proposed by Eslinger and Grattan, namely reactive flexibility and spontaneous flexibility (Eslinger and Grattan, 1993). However, these types are assumed to be more general than the types of flexibility, we propose here. Especially, while reactive flexibility refers to the instructed or demanded shifting of cognitive sets as we specify for the shifting flexibility, spontaneous flexibility refers to the free search for knowledge bypassing automatic responses in order to attend to more divergent ideas, thus going beyond our definition of spreading flexibility. This difference could be due to the different methods: while Eslinger and Grattan's types of flexibility are based on the study of brain lesions, we used an experimental approach that was based on neural parameters. It is an open question if a combination of the different types of cognitive flexibility might lead to a finer differentiation matching both definitions.

Third, we provided a methodological alternative to the correlational approach common in research on executive functions by combining within-task manipulations with mouse movement analyses. Instead of correlations across participants, the analysis of variance and temporal dynamics in trials allowed us to examine the distinction of shifting and spreading flexibility with a small sample of participants.

Overall, we argue that distinguishing two types of cognitive flexibility, namely shifting and spreading flexibility, as components of executive functions could reveal new insight into the process underlying flexible, goal oriented behavior. Furthermore, we present a continuous relatedness judgment task facilitating further research in the intersecting field of executive functions and semantic processing.

# Acknowledgments

This research was partly supported by the German Research Council (DFG grant SCH1827/11 to SS and DFG grant SFB 940/1 2014). We acknowledge support by the German Research Foundation and the Open Access Publication Funds of the TU Dresden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01244

Image 1 | Mouse movement angle for the interaction of the two experimental conditions association and priming in the homonym trials and the two additional conditions filler trial (not associated) and prime trial (associated). Shaded areas indicate standard-errors.

# References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Dshemuchadse, Grage and Scherbaum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Change blindness in pigeons (*Columba livia*): the effects of change salience and timing

#### *Walter T. Herbranson\**

*Department of Psychology, Whitman College, Walla Walla, WA, USA*

Change blindness is a well-established phenomenon in humans, in which plainly visible changes in the environment go unnoticed. Recently a parallel change blindness phenomenon has been demonstrated in pigeons. The reported experiment follows up on this finding by investigating whether change salience affects change blindness in pigeons the same way it affects change blindness in humans. Birds viewed alternating displays of randomly generated lines back-projected onto three response keys, with one or more line features on a single key differing between consecutive displays. Change salience was manipulated by varying the number of line features that changed on the critical response key. Results indicated that change blindness is reduced if a change is made more salient, and this matches previous human results. Furthermore, accuracy patterns indicate that pigeons' effective search area expanded over the course of a trial to encompass a larger portion of the stimulus environment. Thus, the data indicate two important aspects of temporal cognition. First, the timing of a change has a profound influence on whether or not that change will be perceived. Second, pigeons appear to engage in a serial search for changes, in which additional time is required to search additional locations.

#### *Edited by:*

*John Magnotti, Baylor College of Medicine, USA*

#### *Reviewed by:*

*Jeffrey Katz, Auburn University, USA Daniel Ian Brooks, Tufts University, USA*

#### *\*Correspondence:*

*Walter T. Herbranson, Department of Psychology, Whitman College, 345 Boyer Avenue, Walla Walla, WA 99362, USA herbrawt@whitman.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 01 April 2015 Accepted: 20 July 2015 Published: 03 August 2015*

#### *Citation:*

*Herbranson WT (2015) Change blindness in pigeons (Columba livia): the effects of change salience and timing. Front. Psychol. 6:1109. doi: 10.3389/fpsyg.2015.01109* Keywords: change detection, change blindness, attention, pigeon, timing

# Introduction

One fundamental consequence of the temporal aspects of cognition is the notion of change. Change differs from the related concepts of difference and motion in part because of the central role played by time (see Rensink, 2002). The world may become an importantly different place as time passes, and the ability to detect such changes across time must be fundamental to survival: a changed environment may require different sorts of responses. On a behavioral level, it is well established that the appearance of a discriminative stimulus elicits different behavior from an animal welltrained in the relevant contingencies. The concept is also true on a basic physiological level. Phasic receptors are one concrete indicator of the fundamental importance of change detection. Many sensory receptors respond not to the presence or absence of a stimulus (as tonic receptors do), but to any change in the stimulus environment (Knibestol and Valbo, 1970). By responding specifically to changes, phasic receptors create a neural signal that enables individuals to notice and attend to novel aspects of the world as they appear. Thus, change detection is a fundamental process of temporal cognition that seems to be built into the nervous system at its most basic level. Yet paradoxically, changes (even important ones) are not always detected.

One somewhat surprising illustration is the phenomenon of change blindness. Change blindness occurs when a clearly visible change to a stimulus display goes unnoticed. One particularly striking example of change blindness has been provided by Simons and Levin (1998). An experimenter stopped unsuspecting individuals on a college campus to ask for directions. During the ensuing conversation, two confederates carrying a door walked between the two conversants, and during the brief visual interruption the experimenter was replaced by a different person. About half of their participants did not notice the change in their conversation partner. Surprisingly, change blindness can occur even when an individual is looking directly at the location of the change, and when a participant is expecting and actively searching for changes (see Simons and Ambinder, 2005).

Change blindness, of course, can occur under a variety of circumstances and with a diverse range of stimuli. A convenient way of studying change blindness in the laboratory is the "flicker task," developed by Rensink et al. (1997). They presented participants with two continuously alternating images, with consecutive presentations separated by a brief, blank interstimulus interval (ISI). The images were identical with the exception of a single feature, and participants were instructed to search for the difference as they alternated. Participants had difficulty finding even large changes, and normally required many repetitions before eventual successful identification. In contrast, when the same images were presented without the ISI, the change was immediately apparent. Thus, the timing of a change has a powerful influence over whether or not it will be detected. The difference in change detection between trials with and without an ISI provides a convenient and specific operational definition of change blindness, and underscores the importance of timing in change detection.

One of the appealing features of the flicker task is that it can be implemented in a laboratory setting, and Herbranson et al. (2014) developed a variation of the task to investigate a possible change blindness effect in pigeons. They presented pigeons with stimulus displays consisting of randomly generated lines across three response keys. As in other versions of the flicker task, alternating displays were identical except for one feature (a single line that was present in one display but absent in the other), and pecks to the location of the change were reinforced at the end of a trial. Pigeons displayed the expected change blindness effect, in that accuracy was better on trials with no ISI than on trials with an ISI between consecutive displays. Their results also showed some other complex patterns reflecting the importance of time. In particular, the duration of the ISI had a powerful influence over the magnitude of the change blindness effect. As the ISI was shortened, accuracy on ISI trials rose toward the higher accuracy of no-ISI trials. In addition, pigeons showed evidence of using a serial search strategy over time. As the number of repetitions of the change increased, accuracy also increased, as did the effective search area. With few repetitions, pigeons produced overall low accuracy, and could reliably detect changes appearing on only two of the three response keys. With more repetitions, accuracy was higher overall, and better than chance on each of the three response keys.

Pigeons from Herbranson et al. (2014) showed accuracy that was above chance, but not always particularly high (especially on trials that featured an ISI). Nevertheless, some aspects of the procedure increased overall accuracy by systematically increasing accuracy on the more difficult ISI trials: number of repetitions, and ISI duration. It is likely that there are numerous other factors that would similarly influence change detection accuracy. Another plausible way to improve performance is to manipulate the salience of the displayed change. Smilek et al. (2000) used a flicker paradigm with alternating displays consisting of arrays of block characters. As is usually the case, one character differed between displays, and the change was characterized as either large or small, depending on the number of line features that differed. For example, a change of a character from F to L (three features) was considered a large change, whereas a change from F to E (one feature) was a small change. Their human participants were faster to detect changes involving more features than they were to detect changes involving fewer features.

In the modified pigeon version of the flicker task developed by Herbranson et al. (2014), the possible change locations are limited and fixed, corresponding to the three keys in an operant chamber. A change in any spatial location (i.e., on any particular key) is therefore likely to be roughly as salient as any other: they are the same size, brightness, color, and pecking on each has been reinforced with approximately equal frequency. However, the discrete stimulus features (lines) do permit one to make a change more prominent using the same logic as Smilek et al. (2000): by increasing the number of line features that constitute a change. Whereas Herbranson et al. (2014) presented two successive displays that differed by a single line feature on one key, the procedure is not limited to changes involving a single feature; up to eight changes (all of the possible line features) can be made to change on a single key. A difference of a single feature on a key would presumably be a smaller or more subtle change than a difference involving multiple features. As the number of changes increases, one would expect change detection to become proportionally easier, producing better accuracy, and requiring fewer repetitions.

# Materials and Methods

# Animals

Four White Carneaux Pigeons (*Columba livia*) were purchased from Double-T Farm (Glenwood, IA, USA). Each bird was fed mixed grain and maintained at 80–85% of free-feeding weight to approximate the condition of healthy wild birds (Poling et al., 1990). Birds were housed in individual cages in a colony room with a 14:10-h light: dark cycle and had free access to water and grit. All four had previous experiences with a serial response time task (Herbranson and Stanton, 2011) and a change detection task (Herbranson et al., 2014). Animal care and all procedures described below were approved by Whitman College's Institutional Animal Care and Use Committee.

#### Apparatus

Four identical BRS/LVE operant chambers were used. Each had three circular response keys (2.5 cm in diameter) located in a horizontal row on the center of the front wall and a food hopper located directly below the middle key. A houselight located on the front wall, directly above the middle key, was illuminated for the duration of each experimental session.

#### Stimuli

Stimuli consisted of straight white lines back-projected onto each response key using stimulus projectors (Industrial Electronic Engineers, Van Nuys, CA, USA) that had been retrofitted with LED light sources (Martek Industries, Cherry Hill, NJ, USA). The LED light modifications were necessary because their onset and offset times (∼30 μs) are much faster than incandescent bulbs, allowing for precise control of even very fast stimulus presentations and ISIs. The three keys each could display up to eight radial lines, with each line spanning the full diameter of the key. The lines appeared at evenly spaced orientations corresponding to 0.0, 22.5, 45.0, 67.5, 90.0, 112.5, 135.0, and 157.5◦ from vertical. On each trial, a base stimulus was generated according to the following parameters: each of the eight lines on each of the three keys independently had a 0.5 chance of being present and a 0.5 chance of being absent. Consequently, each stimulus could consist of anywhere from 0 to 24 lines across the keys (0–8 per key). A modification of that base stimulus was then generated by reversing the display status of 1, 2, 4, or, 8 of the lines on a single key, depending on the experimental condition (see below). If the line to be reversed was present in the base display, then it was not present in the modified display. Conversely, if it was not present in the base display, then it was present in the modified display. The number of changes in the stimulus presentation was generated randomly, and each change was equally likely to occur in any of the eight orientations on the key.

Each trial consisted of alternating 250-ms presentations of the original and modified displays. The alternating displays were presented 1, 2, 4, 8, or 16 times (randomly determined on each trials with *p* = 0.2 for each). Each presentation of the original display was followed by the modified display and each presentation of the modified display was followed by either by a repetition of the original display or a trial-terminating display consisting of three white key lights (if and only if it was the final repetition of the trial).

Half of the trials presented the two alternating displays with no time delay in between. The modified display was presented immediately after the base display, so that there was no time when one of the two displays was not present on the response keys until the end of the trial. The other half of the trials contained a 30 ms ISI between the two displays, during which the keys were completely dark, and no lines were visible. The ISI was then followed immediately by the modified display. Thus, on trials with an ISI, the same number of repetitions took longer because each 250-ms stimulus presentation was followed by an ISI delay. The 30 ms ISI duration was selected based on the results from Herbranson et al. (2014) in order to produce an intermediate level of accuracy on ISI trials so that changes in accuracy could not be masked by floor or ceiling effects. **Figure 1** depicts two sample trials (both featuring two changes on the critical key), one with an ISI and one without.

#### Procedure

The experiment was conducted daily over four blocks of 10 days each (40 days total). Each session consisted of 120 trials, each separated by a 5-s intertrial interval (ITI). During this ITI, the computer program generated original and modified displays, as well as determined the number of repetitions and whether to include an ISI. Pecks during stimulus presentation were not recorded and had no programmed consequences. Following completion of the entire stimulus display, all three keys were uniformly illuminated with white light, and the first peck on any key was automatically recorded. If the peck corresponded to the location of the stimulus change, then the bird was presented with approximately 3-s access to mixed grain (this varied between birds in order to maintain individual running weights). If the peck corresponded to either of the other two, unchanging locations, a 10-s error signal was presented, during which the houselight flashed on an off every 0.5-s. After either the reinforcement or the error signal, the session continued with a normal ITI, followed by the next trial.

#### Conditions

The procedure during each block was identical with the exception of the number of changes displayed on the critical key during each trial. During the first block, a single line feature was reversed on each trial, regardless of any other stimulus characteristics (ISI, number of repetitions, etc.) This was the baseline condition, and paralleled the procedure from Herbranson et al. (2014), with the exception of the ISI duration. The subsequent three blocks displayed changes consisting of 2, 4, and finally 8 reversed line features on the critical key on each trial.

Because all four birds had previous experience on a slightly different version of the flicker task, no pretraining was necessary and data collection could begin immediately.

# Results

A 4 (changes: 1, 2, 4, 8) × 5 (repetitions: 1, 2, 4, 8, 16) × 2 (ISI: present, absent) × 10 (session: 1–10) repeated measures ANOVA was run on average change detection accuracy. The main effect of session was not significant, nor were any of the interactions involving session, *F <* 1.526, *p >* 0.061. These results indicate that performance was relatively stable across the 10 days that constituted each condition. The remaining analyses therefore consider the other factors collapsed across days.

All three experimental factors (changes, repetitions, and ISI) yielded significant main effects, and the influence of each variable can be seen in **Figure 2**. The main effect of changes indicates that accuracy was better when there were more features that changed on a trial, *<sup>F</sup>*(3,15) <sup>=</sup> 41.953, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.894. Mean accuracy increased as number of changes increased from 1 (*M* = 51.80, SE = 3.45) to 2 (*M* = 58.19, SE = 5.33) to 4 (*M* = 64.08, SE = 5.53) to 8 (*M* = 70.54, SE = 4.89). The main effect of repetition indicated that accuracy was better when more repetitions of the stimulus displays were presented, *<sup>F</sup>*(4,20) <sup>=</sup> 51.329, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.911. Mean accuracy increased as number of repetitions increased from 1 (*M* = 42.68, SE = 4.45)

to 2 (*M* = 53.87, SE = 5.83) to 4 (*M* = 63.90, SE = 5.40) to 8 (*M* = 70.44, SE = 4.78) to 16 (*M* = 74.87, SE = 4.48). Finally, the main effect of ISI indicated that accuracy was better when an ISI was absent (*M* = 65.23, SE = 4.27) than when it was present (*M* = 57.07, SE = 5.31), *F*(1,5) = 25.852, *p* = 0.004, η<sup>2</sup> <sup>p</sup> = 0.838. This final main effect constitutes a basic replication of the change blindness effect seen in previous experiments using the flicker task.

In addition to the main effects reported above, all three 2-way interactions were significant: changes × repetition, *<sup>F</sup>*(12,0) <sup>=</sup> 2.287, *<sup>p</sup>* <sup>=</sup> 0.018, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.314; changes × ISI, *<sup>F</sup>*(3,15) <sup>=</sup> 5.677, *<sup>p</sup>* <sup>=</sup> 0.008, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.532; and repetition × ISI, *<sup>F</sup>*(4,20) <sup>=</sup> 7.806, *<sup>p</sup>* <sup>=</sup> 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.610. Note that the second of these 2-way interactions was particularly important for our purposes, as it indicates that the additional changes increased accuracy on ISI trials more than they did on no-ISI trials, thus decreasing the magnitude of the change blindness effect. Finally, the 3-way interaction was not significant, *F*(12,60) = 1.498, *<sup>p</sup>* <sup>=</sup> 0.150, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.230.

In order to assess the incremental accuracy associated with presentation of additional changing line features, the increase in accuracy was computed by subtracting accuracy on the baseline (one-feature change) condition from each of the subsequent conditions (2-, 4-, and 8-feature changes). Then a 3 (additional changes: 1, 3, 7) × 5 (repetitions: 1, 2, 4, 8, 16) × 2 (ISI: present, absent) repeated measures ANOVA was run on the calculated increases. All three main effects were significant, and the influence of each variable can be seen in **Figure 3**. The main effect of additional changes indicated that additional changes on each trial produced progressively greater increases in accuracy, *<sup>F</sup>*(2,10) <sup>=</sup> 49.568, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.908. The main effect of repetition indicated that there was greater improvement on longer trials, *<sup>F</sup>*(4,20) <sup>=</sup> 3.506, *<sup>p</sup>* <sup>=</sup> 0.025, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.412. Finally, the main effect of ISI indicated that trials with an ISI showed more improvement than trials with no ISI, *F*(1,5) = 10.957, *p* = 0.021, η2 <sup>p</sup> = 0.687.

The 2-way interaction between repetitions and ISI was significant, indicating that additional repetitions benefitted trials with an ISI more than trials without one, *F*(4,20) = 2.936, *<sup>p</sup>* <sup>=</sup> 0.046, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.370. The other 2-way interactions and the 3-way interaction were not significant: additional changes × repetition, *F*(8,40) = 1.601, *p* = 0.155, η<sup>2</sup> <sup>p</sup> = 0.243; additional changes × ISI, *F*(2,10) = 1.748, *p* = 0.223, η<sup>2</sup> <sup>p</sup> = 0.259; additional changes × repetition × ISI, *F*(8,40) = 0.904, *p* = 0.522, η2 <sup>p</sup> = 0.153.

A secondary pattern not reflected in the above analyses is that each of the birds developed a position bias, distributing their responses unevenly across the three response keys. Vertical bars in **Figure 4** show this position bias as key preferences on trials of different lengths (numbers of repetitions) and during each of the four conditions. Key preferences were determined separately for each bird based on the overall proportions of responses

during the entire experiment. As can be seen in the figure, the position bias was quite strong on shorter trials, and gradually weakened as trials became longer. This is confirmed by a 3 (key: first, second, and third preferred) × 5 (repetitions: 1, 2, 4, 8, 16) × 4 (changes: 1, 2, 4, 8) × 10 (session: 1–10) ANOVA. The main effect of key preference was significant, *F*(2,10) = 14.299,

*<sup>p</sup>* <sup>=</sup> 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.741. The interaction between key preference and repetition was also significant, *F*(8,40) = 13.345, *p <* 0.001, η2 <sup>p</sup> = 0.727. The main effect of changes, and all interactions involving changes were not significant, *F <* 1.073, *p >* 0.411. The main effect of session, and all interactions involving session were not significant, *F <* 1.102, *p >* 0.304. Thus, it appears that pigeons adjusted their distributions of responses as time elapsed during a trial, but those key preferences were not influenced by number of changes, and the pattern of key preferences did not change significantly across days within a condition.

In order to assess the influence of this dissipating position bias on accuracy, accuracy was computed separately for each response key based on the same individually identified key preferences (shown as lines in **Figure 4**). Note that while chance accuracy is 33% overall, the bars depicted in the figure make for a more sophisticated indicator of chance that takes into account the uneven distribution of responses across keys. A bird responding randomly within the constraints of its position bias should produce correct responses on a key (points on lines) that approximate its overall allocation of responses to that key (bars). For example, in the 1-change condition (top left panel) birds were correct on approximately 60% of single-repetition trials that presented changes on their preferred key, a figure that is reliably better than chance performance of 33% (see the depicted 95% confidence interval relative to the 33% reference line). However, they were able to do so only because they allocated a similarly high percentage of responses on that preferred key. This is in contrast to trials with more repetitions, where birds maintained comparable levels of accuracy on the preferred key, but did so while allocating fewer pecks to that key. Note that because the 95% confidence intervals (error bars) for two or more repetitions do not overlap with the bars depicting response bias, accuracy level is unlikely to be solely based on that response bias.

The pattern of data depicted in **Figure 4** shows how key preference, change salience, and repetition all contribute to change detection. Pertaining to key preference, note that accuracy to detect changes on the first- and second-preferred keys are relatively close throughout. Change detection accuracy on the least-preferred key lags behind the first two, but catches up by the sixteenth repetition. Simultaneously, change salience has a powerful effect: as the number of changing features (change salience) increases, the ceiling for each key (accuracy achieved at the maximum number of repetitions) rises. Finally, accuracy generally rises along with the number of repetitions, but the increase is modulated by the other factors. When four or more changes are presented, accuracy on the first-and second-preferred keys is better than chance by the second repetition, whereas the least-preferred key requires at least eight repetitions. When two or fewer changes are presented, additional repetitions (four or more) are required for the first- and second-preferred keys to exceed chance. Thus, it appears that change detection is a process that occurs over time. As time elapses over the presentation of additional repetitions, pigeons are able to more reliably identify changes (accuracy increases), and can do so across a wider range of locations (three keys rather than two).

# Discussion

This experiment investigated the importance of change salience and timing on pigeons' ability to detect changes. Results indicated that larger changes were easier to detect, and this basic pattern is the same one that has been previously demonstrated in human participants: if the change in a flicker task involves a greater number of features, it is more likely to be noticed by human participants (Smilek et al., 2000). Our results lead us to the same conclusion: pigeons were better able to identify changes involving additional features. Furthermore, change detection became more effective over time. As additional repetitions were presented, overall accuracy increased, and pigeons' ability to detect changes expanded to encompass more locations.

The experiment reported here is an extension of the first demonstration of change blindness in pigeons (Herbranson

et al., 2014). Though these are the same birds as in that initial research, several important variables were manipulated without changing the primary finding (better on accuracy without an ISI), supporting the idea that the phenomenon of change blindness is robust, and not dependent on a precise set of conditions. Furthermore, it refines those methods, and provides a factor (change salience) than can be used to increase low levels of accuracy on what is a fairly complex task.

Change salience in this experiment was manipulated as the number of changing lines on a particular trial. Note, however, that the number of changing lines might covary with other stimulus characteristics. Consider that on some trials, birds might have been able to detect change based on overall differences in brightness between the original and modified displays if they consisted of different numbers of lines (See bottom panel of **Figure 1**). While this is indeed a possibility, it is tenuous as an overall explanation of pigeons' performance for several reasons. First, the lines were quite thin, and contributed little

to the overall brightness of the operant chamber, especially in the context of a comparably bright houselight. Second, such a strategy would be completely useless on a large percentage of trials – specifically those on which the original and modified displays consisted of the same numbers of lines at different orientations (i.e., when the generation of the alternate display involved adding some line features from the original display while subtracting others; see top panel of **Figure 1**). Third, performance on ISI trials was greater than chance, even though the two different stimulus displays were separated by a blank ISI. The ISI is critical since its brightness would by definition differ from each display by more than the displays could possibly differ from each other. Thus, large changes in brightness would be present and detectable on literally every key during every ISI trial. Finally and most importantly, even if pigeons used a strategy that was based either partly or entirely on stimulus brightness, the major conclusions pertaining to change detection still hold. That is, this remains a *change detection* task, whether birds are detecting changes in the presence of line features or changes in brightness. Additional changing line features makes the difference between the two displays more salient, whether that salience is due to additional visible features or due to a greater difference in illumination (or some other factor).

The previous point underscores some aspects of this experiment that remain uncertain. First, our results do not reveal exactly what aspect of the stimulus pigeons were using to identify changes. Each line feature was different from others in both orientation (the angle of the line) and location (the space occupied by the line), and birds could have used either (or both) to identify changes. It is also possible that some stimuli featuring multiple changes could have produced apparent motion (consider for example, a modified stimulus created by adding one line feature to the original, and deleting an adjacent one). Future research using different kinds of stimuli may help to disentangle these various possibilities. Second, it is not certain what cognitive processes were utilized by pigeons. Pigeons could, for example be using visual short-term memory (or visual working memory) to compare stimuli across the ISI. Similar kinds of change detection tasks have been used quite effectively to study visual short term memory in pigeons and monkeys (Cook et al., 2003; Elmore et al., 2012; Leising et al., 2013), though with different stimuli and time intervals. However, given the short ISI duration in the present experiment it is possible that pigeons instead used sensory memory to compare successive images. The difference might have some important implications, pertaining to the role of attention in change detection, and the nature of representation of objects and scenes (see Rensink, 2002).

Despite those uncertainties, these data might provide some initial insight into the cognitive processes that are at work as pigeons detect changes to their visual environment. Tovey and Herdman (2014) proposed a 3-stage model for human change detection (with pre-processing, feature-extraction, and identification stages operating in sequence). They concluded that large changes could be identified at the second, featureextraction stage, whereas small changes were identified later, in the identification stage. Indeed, our data are also consistent with such a model. Small changes (those consisting of one feature) produced a gradual and consistent increment in accuracy with repetition, extending to even the longest (16 repetition) trials (see **Figure 2**). This is what one would expect from the identification stage, which requires focused attention, and would presumably operate until either a specific changing feature is identified, or a trial ends. Repetitions of larger changes, on the other hand (four or eight features) produce little improvement in accuracy beyond four repetitions (see **Figures 2** and **3**). Note that a value of four repetitions coincides with the minimum number tested here that would allow for even brief individual consideration of each of the three display keys. The likelihood that the feature extraction stage is sufficient to identify the correct key would be naturally dependent on repetitions and the number of changes (supported by the data), but should not be assisted by additional repetitions once each location has been processed. Thus, it would seem possible that a similar stagebased model might apply to pigeon change detection. Further research on the interactions between easily manipulable factors that presumably operate at early stages (such as stimulus quality or discriminability) and at later stages (such as familiarity or learned associations) would be an ideal test of the model's applicability to pigeons.

Since change by definition occurs over time, this change detection task tells us some important things about temporal cognition in pigeons. First, the timing of a change has a powerful influence over detection. Instantaneous changes are easier to detect than those obscured by a temporally coordinated ISI. Furthermore, pigeons (like humans) appear to engage in a serial search process, allowing them to detect changes in a larger region of space as time passes. Note however, that additional time does not constitute an absolute advantage. ISI trials are longer in duration than no-ISI trials involving the same number of repetitions, yet they produce lower levels of change detection accuracy. Thus, there appear to be two aspects of time that contribute to successful change detection. Additional repetitions (occurring over time) enhance change detection, whereas interruption of continuity by an ISI (also occurring over time) inhibits it. It is not yet entirely clear how these two factors interact. Herbranson et al. (2014) showed that shorter ISI durations had a progressively weaker effect on accuracy. If pigeons do engage in a serial search process, then increasing the rate of presentation might weaken the repetition effect by limiting the number of locations that can be considered per repetition. Hagmann and Cook (2013) studied a related temporal aspect of change detection, using a more precise manipulation of change. Unlike the discrete changes in the present experiment, they presented pigeons with stimuli that changed continuously in brightness, at various rates. Birds were able to discriminate changing stimuli from constant ones, but accuracy was significantly influenced by rate of change. Future research should delve further into the temporal aspects of change detection to create a more complete understanding of the cognitive process at work.

Given the results indicating that effective search area expanded with additional repetitions, this procedure might also be an effective one for investigating sequential search strategies. While there was no indication that pigeons' strategies changed across days, individual birds did have different stable key preferences, indicating that they went about the search process in different ways (i.e., beginning on different keys and then progressing to others in different orders). Presumably those individual key preferences arose during pretraining and persisted through the conditions and sessions reported here. Note that because changes were equally likely to appear on any of the three keys, such variations in search strategies should have no effect on accuracy over the long run. However, if birds are indeed performing a systematic search, then probabilistically cueing upcoming change locations (either through base-rate manipulations or trial-by-trial priming) might bias birds toward a specific strategy that would increase accuracy by allowing them to begin their search at the most likely change location. This could be another possible means of increasing accuracy on the task, and perhaps expanding it to study other cognitive processes.

# Conclusion

Change detection is a fundamental aspect of temporal cognition that can and has been investigated in both humans and pigeons. So far, the factors that influence change detection and failures of

# References


change detection (i.e., change blindness) appear to be similar in both species. These factors include the timing and salience of a change. Furthermore, the influence of these factors change over the course of a trial, indicating that change detection may provide some important insight into the temporal aspects of cognition.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Herbranson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Temporal Dynamics of the Integration of Intention and Outcome in Harmful and Helpful Moral Judgment

*Tian Gan1,2, Xiaping Lu1,3, Wanqing Li1,3, Danyang Gui1,3, Honghong Tang1,3, Xiaoqin Mai4, Chao Liu1,3\* and Yue-Jia Luo5,6\**

*<sup>1</sup> State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China, <sup>2</sup> Department of Psychology, Zhejiang Sci-Tech University, Hangzhou, China, <sup>3</sup> Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing, China, <sup>4</sup> Department of Psychology, Renmin University of China, Beijing, China, <sup>5</sup> Institute of Affective and Social Neuroscience, Shenzhen University, Shenzhen, China, <sup>6</sup> Collaborative Innovation Center of Sichuan for Elder Care and Health, Chengdu Medical College, Chengdu, China*

#### *Edited by:*

*Timothy Michael Ellmore, The City College of New York, USA*

#### *Reviewed by:*

*Gethin Hughes, University of Essex, UK Jonathan Scott Phillips, Harvard University, USA Rita Anne McNamara, University of British Columbia, Canada*

*\*Correspondence:*

*Chao Liu liuchao@bnu.edu.cn; Yue-Jia Luo luoyj@szu.edu.cn*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 06 August 2015 Accepted: 17 December 2015 Published: 11 January 2016*

#### *Citation:*

*Gan T, Lu X, Li W, Gui D, Tang H, Mai X, Liu C and Luo Y-J (2016) Temporal Dynamics of the Integration of Intention and Outcome in Harmful and Helpful Moral Judgment. Front. Psychol. 6:2022. doi: 10.3389/fpsyg.2015.02022*

The ability to integrate the moral intention information with the outcome of an action plays a crucial role in mature moral judgment. Functional magnetic resonance imaging (fMRI) studies implicated that both prefrontal and temporo-parietal cortices are involved in moral intention and outcome processing. Here, we used the event-related potentials (ERPs) technique to investigate the temporal dynamics of the processing of the integration between intention and outcome information in harmful and helpful moral judgment. In two experiments, participants were asked to make moral judgments for agents who produced either negative/neutral outcomes with harmful/neutral intentions (harmful judgment) or positive/neutral outcomes with helpful/neutral intentions (helpful judgment). Significant ERP differences between attempted and successful actions over prefrontal and bilateral temporo-parietal regions were found in both harmful and helpful moral judgment, which suggest a possible time course of the integration processing in the brain, starting from the right temporo-parietal area (N180) to the left temporo-parietal area (N250), then the prefrontal area (FSW) and the right temporo-parietal area (TP450 and TPSW) again. These results highlighted the fast moral intuition reaction and the late integration processing over the right temporo-parietal area.

Keywords: morality, event-related potential (ERP), integration of intention and outcome, harmful moral judgment, helpful moral judgment, temporo-parietal junction

# INTRODUCTION

The ability to integrate mental states such as intentions with outcome information plays an important role in moral judgment. People judge attempted harms (e.g., a person who intends to kill someone but failed) as less permissible and more morally blameworthy than accidental harms (e.g., a person accidentally kills someone) (Young et al., 2007; Young and Saxe, 2008). Recently, the integration of intention and outcome in moral judgment has been systematically investigated (Young et al., 2007, 2011; Young and Saxe, 2008, 2009). In these experiments, participants read scenarios in a 2 × 2 design: agents produced either a negative or neutral outcome while intending either the negative outcome ("negative" intention) or the neutral outcome ("neutral" intention). Thus, the combination of intention and outcome yielded four conditions: successful harm, attempted harm, accidental harm and no harm. Behavioral results of healthy adults demonstrated a significant interaction between intention and outcome (Young et al., 2007, 2011, 2012). These findings reveal that mature moral judgments depend crucially on the integration processing between an agent's intention and actual results of action.

Functional magnetic resonance imaging (fMRI) studies investigating the processing of moral intention and outcome have revealed a network of brain regions, including the medial prefrontal cortex, precuneus and temporo-parietal junction (TPJ) (Young et al., 2010; Young and Dungan, 2012). However, the temporal dynamics of this network is still unknown. Particularly, some studies found right TPJ activation during theory of mind (TOM), mentalizing and integration processing (Saxe and Kanwisher, 2003; Young et al., 2007, 2010), others reported right TPJ activation associated with moral intuition (Harenski et al., 2010). These fMRI findings implied that right TPJ activation may be involved in both early moral intuition and late moral reasoning processing during the integration of intention and outcome. However, the time course of this integration processing at right TPJ are poorly understood.

Against this background, the first aim of the present study was to investigate the electrophysiological mechanisms of the integration between moral intention and outcome using eventrelated potentials (ERPs). The measurement of ERPs provides an excellent method to investigate the temporal features of information processing in moral cognition due to the high temporal resolution available from the ERP signal. Previous ERP studies of moral judgment focused on the interaction between cognition and emotion in moral processing (Chen et al., 2009; Sarlo et al., 2012, 2014; Wang et al., 2014; Gui et al., 2015). For instance, in our previous study, we found the larger N1 for moral pictures than non-moral pictures, which may reflect the early moral intuition processing without the emotional impact. For mental states attribution, Proverbio and Riva, 2009; Proverbio et al., 2010) reported an enhanced posterior negative component that peaked around 250 ms (N250), which reflected the early recognition of comprehensible behaviors and the early processing of action's purpose. Another study further suggested that the N250 may reflect the early cognitive processing to understand private intention (Wang et al., 2012). During the late time windows (from 300 to 800 ms), slow wave effects over frontal and parietal areas were reported to reflect specific cognitive process of monitoring others' beliefs (Liu et al., 2009; McCleery et al., 2011; Geangu et al., 2013). Specially, McCleery et al. (2011) found a positive component over bilateral temporo-parietal areas peaking between 400 and 500 ms (TP450) and suggested that this component could reflect the calculating and representing processes in the visual perspective taking. However, most of these studies focused only on the mental states reasoning during the classical TOM tasks, few have investigated the intention information processing in moral judgment. So far, only three ERP studies investigated the neural correlates of moral intention and valence processing. The results reported significant ERP effects indicating early automatic and late controlled processes (Van Berkum et al., 2009; Decety and Cacioppo, 2012; Yoder and Decety, 2014). Decety and Cacioppo (2012) found that people could distinguish between intentional and accidental harm in as fast as 62 ms post-stimulus. Van Berkum et al. (2009) reported a rapid ERP response to the first word that indicated a clash with the reader's value system within 200 ms. And significant differences on N1 amplitude between morally good and bad actions were reported by Yoder and Decety (2014). However, tasks used in these studies were only related to intention decoding or moral valence, and the processing of integrating intention with outcome information has not been considered yet. Exploring the time course of integration between intention and outcome may reveal different integration processing stages in moral judgment.

The second aim of the present study was to explore the cognitive and neural mechanisms of helpful moral judgment. Helpful behaviors are critically important for human social development (Graham, 2014; Hofmann et al., 2014). However, most moral neuroscience studies have concentrated on immoral and negative behaviors such as killing, murder and harm (Greene and Haidt, 2002; Young et al., 2007; Greene, 2009; Graham, 2014). Only a few studies have explored the neural processing during moral judgment of helpful behaviors (Loke et al., 2011; Young et al., 2011; Paulus et al., 2012; Yoder and Decety, 2014). The neural correlates of helpful intention processing, especially the integration of helpful intention and outcome in moral judgment is thus left to be explored. Previous studies have found that humans respond differently to negative and positive information (Baumeister et al., 2001). For instance, when people make decisions, they typically exhibit greater sensitivity to losses than to equivalent gains, which is called loss aversion (Tom et al., 2007). Is there a similar asymmetry between the moral judgment of good and bad actions? Will the judgments of helpful behaviors be different from the judgment of harmful behaviors? In order to explore these questions, in addition to harmful moral processing, positive intention and outcome processing in helpful moral judgment were also considered in the present study.

Based on previous work, we used ERPs to investigate the temporal dynamics of the integration processing of intention and outcome information of harmful and helpful moral judgment in two experiments. We predicted that: (a) In accord with previous findings (Van Berkum et al., 2009; Decety and Cacioppo, 2012; Yoder and Decety, 2014; Gui et al., 2015), the fast moral intuition might be revealed by the early ERP effects which are sensitive to the successful harm and help. Besides, late ERP effects in accidental and attempted conditions might reflect the integration in late processing stage; (b) According to the fMRI studies which reported the significant activation of right TPJ, left TPJ and mPFC during the integration of intention and outcome (Koenigs et al., 2007; Young et al., 2007; Funk and Gazzaniga, 2009; Young and Dungan, 2012; FeldmanHall et al., 2014), we predicted that ERP effects would be elicited over frontal and temporo-parietal electrodes, especially the right TPJ area; (c) The previous fMRI studies have found similar brain activation in judging harmful vs. helpful actions (Young et al., 2011). Based on these findings, we predicted that the temporal processing stages of the integration of intention and outcome in harmful and helpful moral judgment would be similar, which would be shown by similar ERP patterns in the current two experiments. However, the interaction effects between intention and outcome of ERPs would be different in harmful and helpful moral judgment because of the possible asymmetry between positive and negative processing (Baumeister et al., 2001).

# MATERIALS AND METHODS

#### Participants

Fifty-five undergraduate students from Beijing Normal University participated in the study. Twenty-seven students (12 males, *M*age = 22.8 years, *SD* = 2.1) participated in Experiment 1 and a different group of 28 students (13 males, *M*age = 22.4 years, *SD* = 1.6) participated in Experiment 2. All participants were right-handed, with no history of neurological/psychiatric illness. All the experimental procedures used in Experiments 1 and 2 were approved by the institutional review board (IRB) of Beijing Normal University (School of Brain and Cognitive Sciences) and informed written consent was obtained from each participant in accordance with the Declaration of Helsinki.

## Stimuli and Experimental Procedures

Stimuli consisted of four variations of 40 harm scenarios selected from Young's studies (Young et al., 2010) for experiment 1 and 40 help scenarios selected from a pilot study for Experiment 2 (**Figure 1A**), for a total of 160 stories for each experiment. In the pilot study, we wrote 51 help scenarios and asked 25 students (13 males, *M*age = 23, *SD* = 2.95) to evaluate how much moral praise the agent deserves for his or her action from 1 (*None at all*) to 7 (*Very much*). 40 scenarios with significant rating differences (no help, 2.61 ± 0.83; accidental help, 3.77 ± 1.16; attempted help, 5.23 ± 0.74; successful help, 6.35 ± 0.47) and high internal consistency (the Chronbach's alpha: no help: 0.905; accidental help: 0.960; attempted help: 0.921; successful help: 0.891) were chosen as the materials used in Experiment 2. For harm scenarios, agents produced either a negative outcome or a neutral outcome with the intention that they were causing a negative outcome ("negative" intent) or a neutral outcome ("neutral" intent). For help scenarios, agents produced either a positive outcome or a neutral outcome with the intention that they were causing a positive outcome ("positive" intent) or a neutral outcome ("neutral" intent). Harmful outcomes referred to injury to others and helpful outcomes referred to saving others' lives. Consistent with fMRI studies (Young et al., 2007), each story consisted of background, foreshadow, intention, action and outcome segments:


The experiments were conducted in a dimly lit, sound-proof room. Participants were seated on a comfortable chair with their eyes approximately 90 cm away from a 17-in computer screen. The timeline of a trial was adapted from fMRI studies (Young et al., 2007, 2011). Intention, action and outcome segments were separated for the analysis of ERPs time-locked to the key information (**Figure 1B**). In each trial, background and foreshadow information were presented cumulatively until participants pressed the space key or for a maximum of 10 s. Then, the intention segment was shown without the keywords indicating the valence of the agent's intention. After participants pressed the space key, the keywords of intention segment (e.g., "safe") was presented for 2 s after a 300 ms delay. The action and outcome segments were then presented cumulatively, and the keywords of the action and outcome segments were presented in a manner similar to the intention segment keywords. At the end of the trial, participants were required to judge the moral permissibility of the agent's action from 1 (*Forbidden*) to 7 (*Permissible*) in Experiment 1, and to judge how much moral praise the agent deserves for the action from 1 (*None at all*) to 7 (*Very much*) in Experiment 2. In both experiments, following eight practice trials, participants completed 160 test trials and could take a short break after every 20 trials. The sequence of stories was pseudo-randomly ordered and no scenario was repeated in five consecutive stories.

# ERP Recordings

The electroencephalogram (EEG) was recorded from 64 scalp sites using tin electrodes mounted in an elastic cap (NeuroScan Inc.). Reference was placed at vertex by default. Horizontal electrooculogram (EOG) was recorded from electrodes placed at the outer canthi of both eyes. Vertical EOG was recorded from electrodes placed above and below the left eye. All interelectrode impedance was maintained under 5K-. EEG and EOG signals were amplified with a 0.05–100-Hz bandpass filter and continuously sampled at 500 Hz/channel.

During off-line analysis, EEG was re-referenced to the average reference. Ocular artifacts were removed from the EEG signal using a regression procedure implemented in the Neuroscan software (Semlitsch et al., 1986). The EEG was averaged in 1400-ms epochs (200-ms baseline) time-locked to the presentation of the keywords in the intention segment. These averages were digitally filtered with a 30-Hz low-pass filter and were baseline corrected by subtracting from each sample the average activity of that channel during the baseline period. Any trials in which EEG voltages exceeded a threshold of ±80 μV during the recording epoch were excluded from the analysis.

# RESULTS

# Behavioral Results

Permissibility and praise judgments of harm and help scenarios, respectively, as well as reaction times were analyzed using separate 2 (intention: valence, neutral) × 2 (outcome: valence, neutral) repeated measures ANOVA:

#### Moral Judgments of Harm Scenarios

For the values of permissibility, the main effects of intention and outcome were both significant. Agents with negative intentions were judged more non-permissible than those with neutral intentions [negative, 1.81 ± 0.09; neutral, 4.67 ± 0.17; *<sup>F</sup>*(1,26) <sup>=</sup> 216.12, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.89]. Agents producing negative outcomes were judged more non-permissible than those causing neutral outcomes [negative, 2.38 ± 0.11; neutral, 4.09 <sup>±</sup> 0.12; *<sup>F</sup>*(1,26) <sup>=</sup> 78.54, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.89]. The interaction between intention and outcome was significant [*F*(1,26) <sup>=</sup> 129.56, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.83]. *Post hoc* tests showed that the difference between no harm and accidental harm was larger than that between attempted harm and successful harm (no harm, 5.86 ± 0.16; accidental, 3.48 ± 0.20; attempted, 2.32 <sup>±</sup> 0.15; successful, 1.29 <sup>±</sup> 0.06) (**Figure 2**).

#### Moral Judgments of Help Scenarios

Predicted main effects of intention and outcome were observed. Agents with positive intentions were judged more praiseworthy than agents with neutral intentions [positive, 5.27 ± 0.18;

neutral, 3.31 ± 0.16, *F*(1,27) = 149.15, *p* < 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.85]. Agents producing positive outcomes were judged more praiseworthy than those causing neutral outcomes [positive, 5.01 ± 0.17; neutral, 3.57 ± 0.15, *F*(1,27) = 163.37, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.86]. However, there was no significant interaction between intention and outcome [*F*(1,27) = 0.44, *<sup>p</sup>* <sup>=</sup> 0.514, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02] (**Figure 2**).

#### Reaction Time of Harm Scenarios

The main effect of intention was significant [*F*(1,26) = 23.83, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.48]. Judgments of negative intentions were faster than of neutral intentions (negative, 1628 ± 117 ms; neutral, 1347 ± 89 ms). The interaction between intention and outcome was also significant [*F*(1,26) = 41.58, *p* < 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.62]. *Post hoc* tests showed that judgments of neutral outcomes were longer than that of negative outcomes when the intention of the agent was negative (negative, 1516 ± 110 ms; neutral, 1177 ± 74 ms). However, when the agent had neutral intention, the reaction times of neutral outcomes were significantly faster than that of negative outcomes (neutral, 1431 ± 104 ms; negative, 1824 ± 136 ms).

#### Reaction Time of Help Scenarios

There was a main effect of outcome [*F*(1,27) = 29.49, *p* < 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.52] and a significant interaction between intention and outcome [*F*(1,27) <sup>=</sup> 25.98, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.49]. *Post hoc* tests showed that only when the agents had positive intention, the judgments were faster for positive outcomes (1281 ± 100 ms) than neutral outcomes (1799 ± 135 ms). When the agents had neutral intention, no difference was found in reaction times between positive outcomes and neutral outcomes.

# ERP Results

#### ERP Components and Analysis

The grand averaged ERP patterns of harmful and helpful experiments were similar (see Supplementary Materials). Following the methodology of previous studies and after examination of the grand average ERPs, the ERP components at three areas were selected: frontal area (average amplitudes of electrodes FPZ, FP1, FP2, AF3, and AF4), left temporo-parietal area (average amplitudes of electrodes CP5, P3, P5, P7, and TP7) and right temporo-parietal area (average amplitudes of electrodes CP6, P4, P6, P8, and TP8). Based on the hypothesis and visual inspection, two early components and three late waves were analyzed: two early negative components peaking from 160 to 210 ms (N180), from 230 to 270 ms (N250) over bilateral temporo-parital areas; a positive component peaking between 400 and 500 ms (TP450) and a late slow-wave from 580 to 780 ms (TPSW) over bilateral temporo-parital areas; and a frontal slow-wave from 380 to 780 ms (FSW) recorded over frontal area. **Figure 3** presents the mean ERPs over frontal and bilateral temporal-parietal areas during four experimental conditions.

For the statistical analysis, we analyzed the differences in ERP waveforms recorded during the four experimental conditions in both experiments. At left and right temporo-parietal areas, the peak amplitudes of N180 and N250 and the mean amplitudes of TP450 and TPSW were computed. At the frontal area, the mean amplitude of FSW was computed. For the temporoparietal ERP components, amplitudes of each component were measured by a three-way repeated-measures analysis of variance (ANOVA) of 2 (intention: valenced, neutral) × 2 (outcome: valenced, neutral) × 2 (hemisphere: left, right). For the FSW, a two-way repeated-measures ANOVA of 2 (intention: valenced, neutral) × 2 (outcome: valenced, neutral) was conducted. The Greenhouse–Geisser correction was applied to adjust the degrees of freedom of the *F* ratios. The statistical results and effects are listed in **Table 1** and **Figure 4**.

#### ERP Component Effects

# *Early ERP effects over TPJ area for harm scenarios*

For the N180, the repeated-measures ANOVA only revealed a three-way interaction [*F*(1,26) = 4.42, *p* = 0.045, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.15]. *Post hoc* tests showed that only in the right hemisphere, the N180 was more negative for successful harm than attempted harm.

For the N250, the main effect of hemisphere was significant [*F*(1,26) <sup>=</sup> 15.25, *<sup>p</sup>* <sup>=</sup> 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.37], with that the N250 amplitude was more negative in the left hemisphere than the right hemisphere. The three-way interaction was also significant [*F*(1,26) = 9.65, *p* = 0.005, partial η<sup>2</sup> = 0.27]. *Post-hoc* tests showed that only in the left hemisphere, the N250 was more negative for attempted harm than successful harm.

#### *Early ERP effects over TPJ area for help scenarios*

For the N180, the interaction between outcome and hemisphere was significant [*F*(1,27) <sup>=</sup> 4.67, *<sup>p</sup>* <sup>=</sup> 0.04, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.15]. Most importantly, there was a significant three-way interaction [*F*(1,27) <sup>=</sup> 7.95, *<sup>p</sup>* <sup>=</sup> 0.009, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.23]. *Post hoc* tests showed that only in the right hemisphere, the N180 was more negative for successful help than attempted help.

For the N250, the main effects of outcome and hemisphere were both significant. The N250 was more negative for neutral outcome than positive outcome [*F*(1,27) = 5.27, *p* = 0.03, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.16] and was more negative in the left hemisphere than the right hemisphere [*F*(1,27) <sup>=</sup> 9.075, *<sup>p</sup>* <sup>=</sup> 0.006, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.25]. The interaction between outcome and hemisphere was significant [*F*(1,27) <sup>=</sup> 5.142, *<sup>p</sup>* <sup>=</sup> 0.032, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.16], such that in the left hemisphere, the N250 was more negative for neutral outcome than positive outcome. The three-way interaction was also significant, *F*(1, 27) = 5.83, *p* = 0.023, partial η<sup>2</sup> = 0.18. Again, only in the left hemisphere, the N250 was more negative for attempted help than successful help.

## *Late ERP effects over TPJ area for harm scenarios*

For the TP450, there was a significant interaction between intention and hemisphere [*F*(1,26) = 5.04, *p* = 0.033, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.16], whereby amplitudes were significantly larger for neutral compared with negative trials only in the left hemisphere. Most importantly, TP450 amplitudes exhibited a three-way interaction [*F*(1,26) <sup>=</sup> 5.47, *<sup>p</sup>* <sup>=</sup> 0.027, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.17]. *Post hoc* tests showed that only in the right hemisphere, the TP450 was larger for attempted harm than successful harm.

For the TPSW, the main effect of intention was significant [*F*(1,26) <sup>=</sup> 5.34, *<sup>p</sup>* <sup>=</sup> 0.029, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.17], indicating that the TPSW amplitude was more positive for the neutral



*L, left; R, right; M, mean amplitudes;* ∗*p* < *0.01;* ∗∗*p* < *0.05;* ∗∗∗*p* < *0.001; CI, confidence interval.*

intention than the negative intention. The interaction between intention and outcome was significant [*F*(1,26) = 8.79, *p* = 0.006, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.25], indicating that the TPSW amplitude was more positive for attempted harm than successful harm. Most importantly, the three-way interaction was significant [*F*(1,26) <sup>=</sup> 15.97, *<sup>p</sup>* <sup>&</sup>lt; 0.001, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.38]. *Post hoc* tests showed that only in the right hemisphere, the TPSW amplitude was more positive for attempted harm than successful harm and was more positive for accidental harm than no harm (accidental, 1.12 ± 0.29 μV; no harm, 0.39 ± 0.38 μV).

# *Late ERP effects over TPJ area for help scenarios*

For the TP450, there was a significant main effect of hemisphere, with the TP450 more positive in the right hemisphere than the left hemisphere [*F*(1,27) <sup>=</sup> 5.88, *<sup>p</sup>* <sup>=</sup> 0.022, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.18]. The interactions of outcome × hemisphere [*F*(1,27) = 4.43, *<sup>p</sup>* <sup>=</sup> 0.045, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.14] and intention <sup>×</sup> outcome [*F*(1,27) <sup>=</sup> 5.75, *<sup>p</sup>* <sup>=</sup> 0.024, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.18] were both significant. Most importantly, the TP450 amplitudes exhibited a three-way interaction, *F*(1, 27) = 5.81 *p* = 0.023, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.18. *Post hoc*tests showed that only in the right hemisphere, the TP450 was larger for attempted help than successful harm.

For the TPSW, the interaction between outcome and hemisphere was significant, *F*(1,27) = 6.47, *p* = 0.017, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19. Most importantly, the three-way interaction was also significant, *F*(1,27) = 6.98, *p* = 0.014, partial η<sup>2</sup> = 0.21. *Post hoc* tests showed that in the left hemisphere, the TPSW was more positive for successful help than attempted help. In contrast, in the right hemisphere, the TPSW was more positive for attempted help than successful help.

#### *Late ERP effects over frontal area for harm scenarios*

For the FSW, there was a significant interaction of intention × outcome [*F*(1,26) = 4.98, *p* = 0.035, partial

FIGURE 4 | Mean amplitudes of the N180, N250, TP450 and TPSW recorded from selected left and right TPJ electrodes, and mean amplitudes of the FSW recorded from selected prefrontal electrodes for the four experimental conditions in harmful (left) and helpful (right) moral judgments. Error bars represent standard error. ∗*p* < 0.05; ∗∗*p* < 0.01; ∗∗∗*p* < 0.001.

η<sup>2</sup> = 0.16]. *Post hoc* tests showed that the FSW amplitude was more negative for attempted harm than successful harm.

#### *Late ERP effects over frontal area for help scenarios*

Similar to the harmful experiment, there was a significant interaction of intention × outcome of the FSW, *F*(1,27) = 6.86, *<sup>p</sup>* <sup>=</sup> 0.014, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.20. *Post hoc* tests showed that the FSW was more negative for attempted help than successful help.

# DISCUSSION

One aim of the present study was to examine ERP responses to keywords integrating the information of harmful/helpful intention and outcome in moral judgment. The ERPs were timelocked to the keywords of intention. In each trial, the foreshadow information actually have implied the possible outcome of the agent's action. So, when the keyword of intention was presented, subjects not just decoded but also integrated the intention with outcome information. This integration processing has lead to the significant interaction effect between intention and outcome in previous studies (Young et al., 2007, 2011; Young and Saxe, 2008) and in the present study. Consistent with our hypothesis, the ERPs showed differences between conditions in early and late time windows over prefrontal and temporo-parietal electrodes. We presume that these frontal and bilateral temporo-parietal effects reflect, at least in part, the participants' online processing of the integration of intention and outcome information to make moral judgments.

During early time windows, two negative ERP components, N180 and N250, over bilateral temporo-parietal areas were found. These components differentiated between successful and attempted conditions in both experiments. Because the keywords of intention in these two conditions were the same, differences in ERP activities cannot be attributed to physical differences between stimuli but rather to the mental processes. During the time window from 160 to 210 ms, the successful harm/help conditions induced more negative N180 than attempted conditions. This early effect was consistent with the fast automatic reaction to moral valence information found in previous studies (Van Berkum et al., 2009; Decety and Cacioppo, 2012; Yoder and Decety, 2014; Gui et al., 2015). Yoder and Decety (2014) reported the amplitude differences on N1 component between morally good and bad actions. In our previous study, we found that the N1 amplitudes elicited by moral pictures were significantly more negative than those elicited by non-moral pictures and suggested that this early ERP effect reflected the moral intuition processing without the emotional impact (Gui et al., 2015). By using high-density ERP techniques, Decety and Cacioppo (2012) found that people could distinguish between intentional and accidental harm in as fast as 62 ms post-stimulus. Van Berkum et al. (2009) reported a rapid ERP response to the first word that indicates a clash with the reader's value system within 200 ms. According to the social intuitionist model (Haidt, 2001), when dealing with the information about morals, people would automatically give harsher evaluations to moral transgressors and more favorable evaluations to those who uphold moral standards. In the conditions of successful harm and help, the agent kills or saves a human life intentionally and successfully. Subjects may have a quick, automatic evaluations to give harsher punishment to violators and bigger reward to people who behave morally. This kind of rapid, automatic reaction may be reflected by the more negative N180 of successful conditions in both experiments. In other words, the condition of successful harm was more morally "bad" than the condition of attempted but failed harm, and the actions with higher moral valence induced more negative N180 amplitude. In the same way, the successful help condition induced larger N180 because this condition was more morally "good" than the attempted help condition. These effects only reached significance over right temporo-parietal areas, which were consistent with many previous findings: ERP studies have found significant activity in right TPJ during spontaneous trait inference and spontaneous goal inference processing (Van Duynslaeger et al., 2007; Van der Cruyssen et al., 2009). FMRI studies also reported that right TPJ contributes more to moral intuition than moral deliberation (Young and Saxe, 2009; Harenski et al., 2010). In a word, the effect of the N180 demonstrated a rapid response over right temporo-parietal electrodes to morally extreme information.

Over the left hemisphere, there was an early negative component peaking at 250 ms, which was named as N250. Different from the N180 effects, the N250 of attempted conditions were more negative than that of successful conditions. This effect was consistent with previous ERP findings: One ERP study showed that decoding mental states from pictures of eyes induced a negative component which started 270 ms post-stimulus (Sabbagh et al., 2004). Another study reported a larger amplitude of N250 over parietal electrodes for private intention compared to communicative and physical intentions and suggested that N250 was related to the early cognitive processing of intention information (Wang et al., 2012). Proverbio and Riva (2009), Proverbio et al. (2010) found that pictures of comprehensible behaviors induced larger posterior N250 compared to incomprehensible behaviors and suggested that this component reflected the recognition and comprehension processing of action intention. Besides, the N250 effect only reached significance over left temporo-parietal electrodes. Previous studies showed that left TPJ may be indexing differences in generic perspective processing during mentalizing (Perner et al., 2006; Young et al., 2011). Based on these results, we suggest the N250 effect might reflect the early representation and integration processing of moral intention and outcome information.

During late time windows, two late waves, TP450 and TPSW, over bilateral temporo-parietal electrodes and one slow wave, FSW, over frontal electrodes differentiated between conditions, especially between attempted and successful harm/help conditions. Over right temporo-parietal area, the positive component during the 400–500 ms time window (TP450) appeared and was exceptionally large for attempted harm/help conditions. This component has been demonstrated as an index of complex mental state processing in a previous TOM study and the activity of right TPJ could account for most of its variance (McCleery et al., 2011). In the present study, the significant interaction of intention × outcome on TP450 over right hemisphere might reflect the complex integration processing of intention and outcome information.

The TPSW component, in the later time window from 580 to 780 ms, not only distinguished between the attempted and successful harm/help conditions over right hemisphere, but also between no harm and accidental harm in Experiment 1 and between attempted help and successful help over left hemisphere in Experiment 2. Similar late positive components had been reported in many previous ERP studies, which were specifically associated with belief processing (Liu et al., 2009; Geangu et al., 2013). Over prefrontal area, consistent with previous studies, a slow wave from 380 to 780 ms (FSW) was found, which was larger in the attempted harm/help conditions than successful harm/help conditions. Frontal ERP effects have been found and were related to the mental states processing and inhibitory control in many previous studies (Chen et al., 2009; Liu et al., 2009; McCleery et al., 2011; Geangu et al., 2013). Here, our results suggested that the prefrontal activity started from 380 ms after stimulus presentation and the prefrontal effects were paralleled by similar findings at the temporo-parietal area, which have been specifically associated with differentiating and reasoning between conditions.

Another aim of this study was to explore the cognitive mechanism and neural temporal dynamics of integration of helpful intention and outcome. For the behavioral results, unlike in the harm context, we did not find the significant interaction between intention and outcome but only the main effects of these two factors. For the ERP results, in the help experiment, significant outcome × hemisphere interactive effects were found for N180, N250, TP450 and TPSW components, while these did not reach significance in the harm experiment. We suggest two possible reasons to explain these differences between helpful and harmful judgment. One reason is that in help scenarios, the importance of the outcome factor was greater, which neutralized the interactive effect between intention and outcome. Error management theory suggest that when people make a judgment (for instance, judge whether sticks are snakes or not), they can make either a false-positive error (inferring that it is a snake when it is actually not) or a false-negative error (inferring that it is not a snake when it actually is). People more often make the false-positive error because the cost of this kind of error is less over evolutionary time (Haselton and Buss, 2000; Haselton and Nettle, 2006). In the attempted and accidental help conditions of the present study, agents in the described stories either have positive intents to help others, or their actions have lead to positive consequences. According to this theory, although agents in these two conditions have made false-positive errors, subjects still gave them more favorable evaluations, which may neurtralize the interaction of intention and outcome in helpful condition. Another reason is the differences between the tasks. In the help contexts, we asked participants to judge how much moral praise the agent deserves, but in the harm condition, the task was to judge the moral permissibility of the agent's action. Different question types could induce different psychological processing for moral judgments (Christensen and Gomila, 2012). When participants were asked for the permissibility of a moral transgression, they appeared to base primarily on intention information. However, for the assignment of praise and blame, both mental states and the causal link between the agent's actions and the harmful consequences is important (Cushman, 2008). A similar asymmetry between moral judgment of good and bad has been reported in a previous study: for the negative impulsive actions elicited a discounting of moral blame, but the positive impulsive actions did not elicit a discounting of moral praise (Pizarro et al., 2003).

To the best of our knowledge, this is the first study using ERPs to assess the temporal dynamics of integration processing of intention and outcome in harmful and helpful moral judgment contexts. The findings suggest that the neural mechanism of the integration processing of moral intention and outcome may involve three stages. The first stage involves a fast response over right temporo-parietal area, reflected in our finding that successful harm/help conditions induced more negative N180 than attempted harm/help conditions. The second stage involves representation of intention information and early integration processing over left temporo-parietal area, indicated by the larger N250 for attempted harm/help conditions. The third stage involves late integration and reasoning processing over prefrontal and temporo-parietal areas, reflected by the TP450, TPSW and FSW effects.

There are some limitations of the present study. First, when the intention keyword was presented, before completing the integration process, participants might have encoded the intention information first. Future studies should manipulate the sequence of intention and outcome presentation to clearly separate these two different processing phases. Second, as the stories in harmful moral scenarios were adopted from previous fMRI studies (Young et al., 2007, 2010, 2011), the background information in some stories might give readers a clue about the tendency of the agent's intention before the intention keyword was presented to them (e.g., Because the white powder is labeled "sugar," Grace believes that it is "safe"). However, we have found similar patterns of ERP components in judgments about helpful moral scenarios when we keep the information constant across different conditions of the same scenarios, so the significant differences in early N180 component between conditions with the same moral keywords might not be attributed to those intention clue differences in background information. Anyway, further studies are still required to obtain a purer intention component when investigating related questions. Finally, the present study demonstrated that prefrontal and temporo-parietal activities could reflect the processing of the integration of intention and outcome. However, the spatial resolution of ERP technique is not high enough. Future studies should use high-density electrical techniques combined with fMRI to identify the source of these ERP components more accurately, which would be a meaningful contribution to the understanding of the neural mechanisms of moral judgment.

# CONCLUSION

Our findings suggested a possible time course of neural activation during integration of moral intention and outcome, starting from the right temporo-parietal area, more negative N180 of successful harm/help conditions reflected a rapid intuitionist response in the earliest time window. Then the larger N250 for attempted harm/help over left temporo-parietal area implied the representation and early integration processing. The late ERP effects (FSW, TP450 and TPSW) implied the integration and reasoning processing over frontal and bilateral temporo-parietal areas in late time window. These results highlighted the critical role of neural activation over right temporo-parietal areas in both early automatic responses to moral actions and late moral integration processing.

# FUNDING

This work was supported by 973 Program (2014CB744600, 2013CB837300), the National Natural Science Foundation of China (NSFC) (31400878, 31530031, 81471376, 31170971, 61210010, 31400888), the Major Project of National Social

# REFERENCES


Science Foundation (12&ZD228), MOE (Ministry of Education in China) Project of Humanities and Social Sciences (14YJC190005), Beijing Municipal Science & Technology Commission No. Z151100003915122, the Project of Zhejiang Federation of Humanities and Social Sciences Circles (2014N145), the Project Grants 521 Talents Cultivation of Zhejiang Sci-Tech University (521 talent project of ZSTU) and Science Foundation of Zhejiang Sci-Tech University (ZSTU) under Grant No.13062172-Y.

# ACKNOWLEDGMENTS

We thank Liane Young, Ruolei Gu, Tingting Wu for their constructive and helpful comments. We also thank Boqi Du and Haiyan Wu for their assistance with EEG recording and analysis.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.02022


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Gan, Lu, Li, Gui, Tang, Mai, Liu and Luo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Development of grouped icEEG for the study of cognitive processing

Cihan M. Kadipasaoglu<sup>1</sup> , Kiefer Forseth<sup>1</sup> , Meagan Whaley 1, 2, Christopher R. Conner <sup>1</sup> , Matthew J. Rollo<sup>1</sup> , Vatche G. Baboyan<sup>1</sup> and Nitin Tandon1, 3 \*

*<sup>1</sup> Vivian Smith Department of Neurosurgery, University of Texas Health Science Center at Houston, Houston, TX, USA, <sup>2</sup> Department of Computational and Applied Mathematics, Rice University, Houston, TX, USA, <sup>3</sup> Texas Medical Center, Mischer Neuroscience Institute, Memorial Hermann Hospital, Houston, TX, USA*

Invasive intracranial EEG (icEEG) offers a unique opportunity to study human cognitive networks at an unmatched spatiotemporal resolution. To date, the contributions of icEEG have been limited to the individual-level analyses or cohorts whose data are not integrated in any way. Here we discuss how grouped approaches to icEEG overcome challenges related to sparse-sampling, correct for individual variations in response and provide statistically valid models of brain activity in a population. By the generation of whole-brain activity maps, grouped icEEG enables the study of intra and interregional dynamics between distributed cortical substrates exhibiting task-dependent activity. In this fashion, grouped icEEG analyses can provide significant advances in understanding the mechanisms by which cortical networks give rise to cognitive functions.

Keywords: icEEG, ECoG, cortical network dynamics, distributed cortical networks, ventral temporal cortex, face perception, fusiform face area (FFA), parahippocampal place area (PPA)

# Introduction

The exponential growth in whole-brain neuroimaging studies has produced an overwhelming amount of data, and the conceptual frameworks for the neurobiology of human cognition have undergone tremendous change. These data have produced a consensus that complex cognitive functions—such as language—cannot be understood through the isolated study of specialized, cortical regions (Hagoort, 2014). Currently, a major focus of cognitive neuroscience is to understand how cognition emerges from transient, coordinated neural interactions in distributed large-scale cortical networks (Felleman and Van Essen, 1991; Bressler, 1995; Sporns et al., 2005; Martin, 2007; Patterson et al., 2007; Poeppel, 2014). Driven largely by fMRI, PET, and lesion-based analyses, significant advances have been made in identifying anatomical substrates that form the neural architecture of these distributed networks (Damasio et al., 2004; Dronkers and Ogar, 2004; Binder et al., 2009; Price, 2010; Friederici, 2011; Kanwisher, 2011). However, the limited temporal resolution of these neuroimaging modalities has hindered our understanding of how intra- and interregional cortical interactions give rise to cognition (Lachaux et al., 2003a; Jerbi et al., 2009; Friederici and Singer, 2015).

# Introducing icEEG

A unique opportunity to study cognitive function is presented in patients undergoing intracranial EEG (icEEG) recordings as part of their pre-surgical evaluations for medically refractive focal epilepsy (Mukamel and Fried, 2012). In order to delineate their

#### Edited by:

*Timothy Michael Ellmore, The City College of New York, USA*

#### Reviewed by:

*Jean-Philippe Lachaux, Centre de Recherche en Neurosciences de Lyon, France John M. Zempel, Washington University, USA Xiao Liu, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA*

#### \*Correspondence:

*Nitin Tandon, University of Texas Health Science Center at Houston, 6431 Fannin Street Suite G.550D, Houston, TX 77030, USA nitin.tandon@uth.tmc.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

> Received: *03 May 2015* Accepted: *06 July 2015* Published: *21 July 2015*

#### Citation:

*Kadipasaoglu CM, Forseth K, Whaley M, Conner CR, Rollo MJ, Baboyan VG and Tandon N (2015) Development of grouped icEEG for the study of cognitive processing. Front. Psychol. 6:1008. doi: 10.3389/fpsyg.2015.01008* epileptogenic networks, these patients are implanted with either subdural electrodes that record from the cortical surface or penetrating depth electrodes that record from below the cortical surface, and in some case both (McGonigal et al., 2007; Van Gompel et al., 2008; Tandon et al., 2009). As such, icEEG recordings yield multi-lobar, high spatio-temporal resolution sampling of disseminated brain regions, providing optimal coverage and signal fidelity in comparison to the poor temporal resolution of fMRI/PET and poor spatial resolution of surface EEG/MEG (Jerbi et al., 2009; Lachaux et al., 2012). Importantly, high-frequency broadband gamma activity (BGA, 40–200 Hz) captured by icEEG yields precise estimates of task-related cortical activity, thereby permitting the study of local and long-distance networks at the millisecond time-scales relevant to neural processes (Jacobs and Kahana, 2010; Lachaux et al., 2012).

# The Need for Grouped icEEG Analysis: The Sparse-sampling Problem

Despite its remarkable advantages, the widespread acceptance of icEEG by cognitive neuroscience has been hindered by difficulties in data representation and analyses at the individual and population-level (for review see Lachaux et al., 2003b; Conner et al., 2013; Kadipasaoglu et al., 2014; Chang, 2015). Relevant to the current discussion are the challenges arising from spatially variable and limited electrode coverage in patients- termed the sparse-sampling problem. This issue is unique to icEEG research, as electrode implantation, and therefore the sites where data are actually collected, are dictated solely by clinical criteria. Therefore, only a fraction of the total brain volume is sampled in any one patient (Halgren et al., 1998), precluding comprehensive investigation of cortical networks at the individual-level.

To address the sparse-sampling problem, and thereby develop icEEG for the study of large-scale, distributed networks, different methods for the grouped analysis of icEEG data have recently been proposed (Miller et al., 2009; Dykstra et al., 2012; Burke et al., 2013; Conner et al., 2013; Davidesco et al., 2013; Kadipasaoglu et al., 2014). Because of the discrete nature of recordings, icEEG activity will likely underestimate functional activity at the individual level. Therefore, a primary goal in all of these methods is to accurately combine data across large numbers of subjects to generate continuous brain activity maps, which leverage the spatiotemporal advantages of icEEG toward providing a more comprehensive view of cortical function. One such method—developed by our lab—employs topologically accurate representations of subdural electrode coverage and BGA on subject-specific cortical models. By integrating this approach with surface-based normalization to precisely align datasets across subjects (Argall et al., 2006; Saad and Reynolds, 2012), and a mixed effects multilevel analysis (MEMA) to correct for unsampled cortical regions (i.e., missing data) (Chen et al., 2011), we are able to perform statistically valid and topologically accurate grouped analyses of icEEG data. In this fashion, our surface-based MEMA (SB-MEMA) can generate continuous brain-activity maps to fully leverage the unique spatio-temporal properties of icEEG in the study of network function (Kadipasaoglu et al., 2014).

To illustrate how such grouped icEEG approaches can contribute to cognitive neuroscience, we discuss SB-MEMA in the context of cortical networks relating to visual object recognition and reading. We note here that the following analyses are intended only as illustrative examples. Therefore, we have not provided detailed experimental methods or statistical interpretations of our results. Furthermore, all results presented in this manuscript are intended solely to highlight the potential application of such grouped icEEG approaches. Importantly, these results will be elaborated in subsequent publications, and this manuscript is not the definitive representation of those analyses.

### Visual Object Recognition

Visual object recognition is believed to be mediated by neural substrates in the ventral temporal cortex (VTC) capable of categorizing visual inputs within a few 100 ms (Thorpe et al., 1996; Grill-Spector and Kanwisher, 2005). Yet the role of the VTC in accomplishing these complex functional computations remains a mystery. Non-invasive neuroimaging studies have demonstrated a consistent relationship between cortical topology (and white matter connectivity) and functional representations in the VTC (Saygin et al., 2012; Pyles et al., 2013; Grill-Spector and Weiner, 2014; Gomez et al., 2015). Specifically, the midfusiform sulcus predicts transitions in the location of cytoarchitectonic regions, receptor architectonics, and large-scale functional maps in the VTC (e.g., eccentricity bias/domainspecificity/animacy/real-world object size) (Grill-Spector and Weiner, 2014; Weiner et al., 2014; Gomez et al., 2015). This has led to a hypothesis that the VTC's anatomical organization is spatially optimized for the computational processes of the distinct functional networks subserving object recognition (Grill-Spector and Weiner, 2014). Grouped icEEG studies are uniquely suited to investigate such hypotheses, which require the differentiation of functional networks at millimeter resolution and millisecond time-scales.

To demonstrate this, we use SB-MEMA to investigate category-specific differences in the fusiform gyrus. We applied SB-MEMA to icEEG data collected in a large cohort (n = 27, left hemisphere only) as they performed visual confrontation naming of famous faces and places. Importantly, we were able to achieve comprehensive fusiform coverage using the precise intersubject co-registration afforded by SB-MEMA (**Figure 1**, top). To focus on early perceptual processes, the analysis was constrained to window from 50 to 500 ms after stimulus presentation. Consistent with previously reported domain-specificity maps, significant BGA for faces and places was localized in a lateralto-medial fashion, respectively, along the mid-fusiform sulcus (**Figure 1**, middle; Kanwisher et al., 1997; Epstein and Kanwisher, 1998; Nasr et al., 2011; Grill-Spector and Weiner, 2014).

Given that SB-MEMA computes grouped effects estimates by summing BGA over time, a temporal smoothing of the data is still present. This precludes the evaluation of certain cortical response properties (e.g., onset latency/response duration), which may otherwise provide valuable insight into a given region's functional role (Lachaux et al., 2012). To evaluate the temporal profile of BGA, time-series representations of averaged BGA can be

FIGURE 1 | Top: icEEG data were collected in 27 patients, implanted with subdural electrodes (SDEs) in the left hemisphere, as they performed a visual confrontation naming of famous faces, places, and scrambled control images. Surface-based representations of SDE coverage and high-frequency broadband gamma activity (BGA; 60–120 Hz) were generated for each subject. We utilize cortical surface models that have been reconstructed from each subject's pre-implantation high-resolution anatomical MRI scans (Phillips Medical; T1-weighted, 1 mm isotropic resolution; using FreeSurfer software), and subsequently imported to the SUMA module of AFNI. Surface-based datasets of SDE coverage and BGA are generated with respect to each subject's cortical model using geodesic metrics to correct for local gyral and sulcal folding patterns. By spatially transforming data to the cortical surface, we integrate SUMA's surface-based normalization strategy to convert individual datasets to a standardized cortical surface (N27). To achieve this, SUMA resamples individual cortical models (and therefore their associated datasets) to a standardized mesh and enables a one-to-one correspondence between anatomical locations across subjects. Group maps for electrode (left) and surface-based coverage (right) are shown for the ventral temporal cortex. SDEs are modeled as spheres, with red spheres indicated SDEs that were excluded due to 60 Hz line noise or epileptiform activity. By grouping data in this fashion, comprehensive cortical coverage is obtained, and cognitive function can be critically evaluated at spatio-temporal scales relevant to neural processes. Middle: SB-MEMA derived significant grouped effects estimates by comparing composite BGA percent change (50–500 ms post-stim; with respect to pre-stimulus baseline of −700 to –200 ms) for each stimulus category against its scrambled control. Notably, BGA to faces was localized lateral to the mid-fusiform sulcus, while peak BGA to places was localized *(Continued)*

#### FIGURE 1 | Continued

medially. Anterior to the mid-fusiform sulcus BGA for both conditions converged in magnitude and spatial extent. Bottom: Subject electrodes localized over the three regions in the fusiform (Fusi.) gyrus with significant activity to faces, places, or both stimuli as revealed by SB-MEMA (see B). SDEs are color-coded by region and displayed on a common brain surface (N27). Notably, SDEs are spatially arranged with respect to the mid-fusiform sulcus: laterally (purple), medially (blue), or anteriorly (red). Below, group time-series of percent change in BGA for face (orange) and place (cyan) stimuli can be seen. Of note, traces colored green indicate a region of activity overlap. Percent change is relative to a pre-stimulus baseline (−700 to −200 ms). Stimulus onset at 0 ms. Shading denotes 1 SEM. All figures display the ventro-medial aspect of the left hemisphere (N27 cortical surface model).

generated from all electrodes contributing to significant loci seen in SB-MEMA (**Figure 1**, bottom; Conner et al., 2013; Kadipasaoglu et al., 2014). In contrast to summing BGA over a time window, time-series representations instead compute percent change in BGA at each data point (which can be on the order of ms, depending on sampling rate) and plot these changes over time (Yoshor et al., 2007; Kadipasaoglu et al., 2014). Alternatively, data from time-series representations can be spatially transformed back onto the cortical surface to generate 4-dimensional, whole brain representations of cortical activity (**Movie 1**). Such visualization of time-varying BGA (shown as cortical surface heat maps) relative to cortical anatomy facilitates insights into dynamic network behavior that may not be readily appreciable in static images, and is complementary to SB-MEMA.

Once cortical regions of interest have been identified, more sophisticated measures for assessing functional connectivity and information flow can then be applied to understand how these regions interact during cognitive operations (Bruns et al., 2000; Canolty et al., 2006; Nir et al., 2008; Korzeniewska et al., 2011; Vidal et al., 2012; Watrous et al., 2013; Flinker et al., 2015). To illustrate this, we discuss one such connectivity measure the short time direct directed transfer function (SdDTF) (Korzeniewska et al., 2008, 2011)—in the context of cortical reading networks. Of note, the electrodes for this example were identified using SB-MEMA for the evaluation of a wordcompletion task (not shown).

#### Network Dynamics of Reading

The neural substrates that comprise the reading network include cortical areas traditionally associated with language production (e.g., Broca's area), as well a ventrally positioned region in the fusiform gyrus, which demonstrates preferential responses to visually presented words and pseudowords (w-FG) (McCandliss et al., 2003). Cognitive approaches are divided on connectivity patterns during word reading that facilitate the visual processing of orthographic stimuli (Carreiras et al., 2014). While it is agreed upon that w-FG is crucial to word reading, some models predict strictly feed-forward connectivity patterns accompany word reading while other models stress the presence of bi-directional interactions between ventral visual and higher-level frontal cortex (Price and Devlin, 2011; Carreiras et al., 2014). Given that the anatomical sources and temporal evolution of top-down

control are not well-established, a data-driven connectivity measure, such as SdDTF, is necessary to investigate the timing and directionality of information transmission during word reading. SdDTF quantifies connectivity across multi-dimensional networks, and can derive directed information flow between any two network nodes, while controlling for the contributions from all other sources (Korzeniewska et al., 2008, 2011). Applied to our icEEG data, patient-specific information flows were computed for subsets of task-relevant electrodes identified through SB-MEMA. It is important to note that connectivity between any two regions can only be derived in patients with electrodes recording simultaneously from both regions. In other words, connectivity measures must first be performed within subject, before individual connectivity estimates can be combined across subjects to yield a grouped connectivity estimate. In this fashion, flows derived from SdDTF were averaged over patient and region, and were able to isolate top-down information flow from Pars Triangularis to w-FG during a word completion task (**Figure 2**).

# Conclusion

The study of icEEG has been able to generate novel insights into a wide range of cognitive functions (Jacobs and Kahana, 2010; Lachaux et al., 2012). Within the past decade alone, it has significantly advanced diverse areas of neuroscience research, including cognitive control (Wessel et al., 2013), working and episodic memory (Fell et al., 2001; Axmacher et al., 2007; Watrous et al., 2013), sensorimotor integration (Brovelli et al., 2005; Hermes et al., 2011; Bouchard et al., 2013), brain-machine interfaces (Leuthardt et al., 2004, 2011; Miller et al., 2007), perception (Allison et al., 1994, 1999; McCarthy et al., 1999; Privman et al., 2007, 2011; Fisch et al., 2009; Liu et al., 2009; Engell and McCarthy, 2010, 2014; Vidal et al., 2010; Chan et al., 2011; Davidesco et al., 2014; Ghuman et al., 2014), and language processing (Crone et al., 2001; Sahin et al., 2009; Chang et al., 2011; Mesgarani and Chang, 2012; Conner et al., 2013; Flinker et al., 2015). With the development of robust techniques for grouped analysis, icEEG analyses are provided a powerful new tool to investigate the architecture and interregional dynamics of distributed cortical networks. Yet despite its significant advantages, grouped approaches to icEEG still suffer from a number of limitations. Most notably, group-size, and degree of cortical coverage limit the applicability of methods like SB-MEMA. As mentioned earlier, the discrete nature of the recordings may underrepresent functional activity. A failure to find significant effects may be due to the absence of such effects in a given region (true negative) or the lack of sufficient coverage in that region (false negative). Furthermore, as discussed in Section Network Dynamics of Reading, connectivity measures are also dependent on individual with coverage in all regions of interest. For these reasons, it is critical that population-level analyses continue to be supported by data at the individual level.

A final concern that arises with any icEEG study is whether the results found in these patient populations are applicable to the normal human brain. Such concerns are generally addressed using a variety of inclusion criteria, both for patients as well as the data analyzed (e.g., data free of electrophysiological abnormalities, or which arise from pathological cortex) (Halgren et al., 1998; Lachaux et al., 2003b; Crone et al., 2006; Jerbi et al., 2009). The development of grouped icEEG provides a new environment in which to voice these concerns, but also a new opportunity to resolve them. Work from our lab has previously compared patient fMRI and icEEG recordings against fMRI obtained in healthy volunteers, under identical task conditions (Conner et al., 2013). Critically, we identified no significant difference in activity, further validating the reliability of such icEEG recordings. Additionally, work from other groups has begun to investigate the potential of multi-modal analyses by critically investigating grouped icEEG and grouped fMRI analyses from the same patient populations (Mukamel et al., 2005; Privman et al., 2007; He et al., 2008; Conner et al., 2011; Esposito et al., 2012). In doing so, these studies have hoped to better understand the electrophysiological basis of the BOLD signal. Such multi-modal approaches also provide a method for resolving concerns arising from the lack of global coverage in grouped icEEG studies. By integrating data from grouped fMRI and icEEG analyses, it could be confirmed that all relevant components of a given cognitive network have indeed been sampled within the icEEG cohort, prior to subjecting these data to a population-level analysis.

# Author Contributions

NT designed research; CK, CC, MR, VB acquired data; CK, MW, KF, CC, VB performed research; CK, KF, MW, CC analyzed data; CK, MW, and NT wrote the paper.

# Acknowledgments

Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health (5TL1TR000369-07), the Clinical

# References


and Translational Award from the National Center for Research Resources (K12-KL2 RR0224149), the National Institute of Biomedical Imaging and Bioengineering (NIBIB; T32EB006350 to CC), the Keck Center for Interdisciplinary Bioscience Training of the Gulf Coast Consortia (Grant No. T15 LM007093), the Vivian Smith Foundation for Neurologic Research, the Keck Center of the Gulf Coast Consortium, and the Memorial Hermann Foundation. We thank Bartlett Moore IV, Suganya Karunakaran, Eleonora Bartoli, and Kamin Kim for their comments on earlier drafts of the manuscript. We are especially grateful to all the patients who participated in this study, the neurologists at the Texas Comprehensive Epilepsy Program (Jeremy Slater, Giridhar Kalamangalam, Omotola Hope and Melissa Thomas) who participated in the care of these patients, Vips Patel, and all of the nurses and technicians in the Epilepsy Monitoring Unit at Memorial Hermann Hospital who helped make this research possible.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01008

Movie 1 | Intracranial recordings provide unparalleled insights into rapidly evolving patterns of cortical activity across distributed neural substrates. Electrocorticographic movies of grouped percent change in high-frequency broadband gamma activity (BGA; 60–120 Hz; *n* = 27 subject) during a face and place-naming task are visualized on the N27 cortical surface model. Hotter colors denote an increase in mid-gamma band power, while cooler colors denote a decrease (color-scale ranges from −50 to 50% change). The movie begins 100 ms before stimulus onset and continues until 700 ms after stimulus onset, in 5 ms steps (stimulus onset at 0 ms). Notably, for face stimuli (left surface) increases in BGA are localized to the lateral aspect of the mid-fusiform sulcus. In contrast, place stimuli produce more widespread activations in the medial aspects of the fusiform gyrus.


in human visual cortex. Neuron 62, 281–290. doi: 10.1016/j.neuron.2009. 02.025


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Kadipasaoglu, Forseth, Whaley, Conner, Rollo, Baboyan and Tandon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.