# ADVANCES IN MODERN MENTAL CHRONOMETRY

EDITED BY: José M. Medina, Willy Wong, José A. Díaz and Hans Colonius PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-566-4 DOI 10.3389/978-2-88919-566-4

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **ADVANCES IN MODERN MENTAL CHRONOMETRY**

Topic Editors:

**José M. Medina,** Universidad de Granada, Spain **Willy Wong,** University of Toronto, Canada **José A. Díaz,** Universidad de Granada, Spain **Hans Colonius,** Carl von Ossietzky Universität Oldenburg, Germany

Image courtesy of Dr. José M Medina and Dr. José A Díaz, University of Granada, Spain. Copyright José M Medina and José A Díaz.

Mental chronometry encompasses all aspects of time processing in the nervous system and constitutes a standard tool in many disciplines including theoretical and experimental psychology and human neuroscience. Mental chronometry has represented a fundamental approach to elucidate the time course of many cognitive phenomena and their underlying neural circuits over more than a century. Nowadays, mental chronometry continues evolving and expanding our knowledge, and our understanding of the temporal organization of the brain in combination with different neuroscience techniques and advanced methods in mathematical analysis. In research on mental chronometry, human reaction/responses times play a central role. Together with reaction times, other topics in mental chronometry include vocal, manual and saccadic latencies, subjective time, psychological time, interval timing, time perception, internal clock, time production, time representation, time

discrimination, time illusion, temporal summation, temporal integration, temporal judgment, redundant signals effect, perceptual, decision and motor time, etc.

The aim of this research topic is to provide an overview of the state of the art in this field—its relevance, recent findings, current challenges, perspectives and future directions. Thus, as a result, a collection of 14 original research and opinion papers from different experts have been gathered together in a single volume.

We hope this research topic will provide a useful framework and an up-to-date set of papers for further discussion on mental chronometry within the human brain. We are grateful to all the referees for their valuable support, effort, and time during the creation of the research topic.

**Citation:** Medina, J. M., Wong, W., Díaz, J. A., Colonius, H., eds. (2015). Advances in Modern Mental Chronometry. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-566-4

# Table of Contents


*12 A theory of power laws in human reaction times: insights from an informationprocessing approach*

José M. Medina, José A. Díaz and Kenneth H. Norwich


# **Reaction time distributions**


# **Human vision system**

*93 Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation*

Detlef Wegener, Fingal Orlando Galashan, Maike Kathrin Aurich and Andreas Kurt Kreiter

*107 Visual evoked potentials to change in coloration of a moving bar* Carolina Murd, Kairi Kreegipuu, Nele Kuldkepp, Aire Raidvee, Maria Tamm and Jüri Allik

# **Human auditory system and time perception**


Giovanna Mioni, Simon Grondin and Franca Stablum

*150 Does time ever fly or slow down? The difficult interpretation of psychophysical data on time perception* Miguel A. García-Pérez

# Advances in modern mental chronometry

José M. Medina<sup>1</sup> \*, Willy Wong<sup>2</sup> , José A. Díaz <sup>1</sup> and Hans Colonius <sup>3</sup>

<sup>1</sup> Departamento de Óptica, Facultad de Ciencias, Universidad de Granada, Granada, Spain, <sup>2</sup> Department of Electrical and Computer Engineering, Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada, <sup>3</sup> Department für Psychologie, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany

Keywords: mental chronometry, reaction time, timing and time perception, sensory perception, cognition, human performance, stochastic processes, decision making

Mental chronometry encompasses all aspects of time processing in the nervous system and constitutes a standard tool in many disciplines including theoretical and experimental psychology and human neuroscience. Mental chronometry has represented a fundamental approach to elucidate the time course of many cognitive phenomena and their underlying neural circuits over more than a century. Nowadays, mental chronometry continues evolving and expanding our knowledge, and our understanding of the temporal organization of the brain in combination with different neuroscience techniques and advanced methods in mathematical analysis. In research on mental chronometry, human reaction/responses times (RT) play a central role. Together with RTs, other topics in mental chronometry include vocal, manual and saccadic latencies, subjective time, psychological time, interval timing, time perception, internal clock, time production, time representation, time discrimination, time illusion, temporal summation, temporal integration, temporal judgment, redundant signals effect, perceptual, decision and motor time, etc. It is worth noting that there have been well over 37,000 full-length journal papers published in the last decade on a variety of topics related to simple and choice RTs, etc. This amounts to approximately 3800 papers per year, or roughly 10 papers per day (source: PubMed, similarly Thomson Reuters Web of Science). There are comprehensive reviews that deal extensively with the history of mental chronometry, experimental methods and paradigms, stochastic models, etc. as well as its relationship to other psychological and physiological variables, neuroscience methods and clinical applications (Laming, 1968; Posner, 1978, 2005; Welford and Brebner, 1980; Townsend and Ashby, 1983; Luce, 1986; Meyer et al., 1988; Robbins and Brown, 1990; Schall, 2001; Mauk and Buonomano, 2004; Smith and Ratcliff, 2004; Jensen, 2006; Gold and Shadlen, 2007; Linden, 2007; Grondin, 2010; Merchant et al., 2013; Allman et al., 2014).

#### Edited and reviewed by:

Hauke R. Heekeren, Freie Universität Berlin, Germany

> \*Correspondence: José M. Medina, jmedinaru@cofis.es

Received: 26 February 2015 Accepted: 21 April 2015 Published: 06 May 2015

#### Citation:

Medina JM, Wong W, Díaz JA and Colonius H (2015) Advances in modern mental chronometry. Front. Hum. Neurosci. 9:256. doi: 10.3389/fnhum.2015.00256

The aim of this research topic is to provide an overview of the state of the art in this field—its relevance, recent findings, current challenges, perspectives and future directions. Thus, as a result, a collection of 14 original research and opinion papers from different experts have been gathered together in a single volume. They outline a selection of unsolved problems and topics in mental chronometry mainly within the context of the human visual system as well as the auditory system. One of the unsolved problems is the functional role of power laws in RT variability and in the study of timing. Power laws are ubiquitous in many complex systems, and their experimental validity and theoretical support represent a fundamental aspect in many disciplines, such as in biology, physics, finance, etc. In this theme issue, the papers of Ihlen (2014), Medina et al. (2014), Rigoli et al. (2014) and Shouval et al. (2014) address different aspects of power laws, namely, multifractal analysis on RT series; an information theoretic basis of RT power law scaling; Fourier-based power law correlations ("1/f noise") in a tapping task and its comparison with other physiological processes (e.g., heartbeat intervals); and a log-power law model of the firing rate of neurons in interval timing.

A second unsolved problem involves RT-based methods and research into RT distributions. RT distributions are typically positively skewed and often exhibit long right-tails in the time-domain. A long-standing issue deals with the shape of RT distributions, their intrinsic stochastic latency mechanisms and neural basis. Sequential-sampling models are a common approach widely used in human RTs and simple decision making (Smith and Ratcliff, 2004). Diederich and Oswald present a RT sequentialsampling model for multiple stimulus features based on an Ornstein–Uhlenbeck diffusion process (Diederich and Oswald, 2014). In a different type of analysis, the work of Harris et al. introduces an alternative approach to examine very long RTs in the rate-domain (i.e., 1/RT). These authors investigate the shape of choice RT distributions and sequential correlations using autoregressive techniques (Harris et al., 2014). In general, RT distributions exhibit faster RTs under summation/facilitation tasks when two or more redundant signals are available as compared with a single signal or sensory modality (e.g., binocular vs. monocular vision), usually called redundant signals effect. The work of Lentz et al. examines binaural vs. monaural hearing performance under noise masking tasks using modeling techniques based on the concept of workload capacity and different processing mechanisms (e.g., serial vs. parallel, etc.) and stopping rules (Lentz et al., 2014). Within the same redundant signals paradigm, Zehetleitner et al. study bimodal (audio-visual) facilitation effects using sequential-sampling models (Zehetleitner et al., 2015).

Regarding the human vision system, the work of Wegener et al. examines the visual attention mechanisms using colored stimuli (random dot patterns), and they have presented a novel

# References


three-step model of attention to predict the corresponding RT distributions (Wegener et al., 2014). The work of Murd et al. exemplifies the used RTs in conjunction with visual evoked potentials in the detection of visual colored stimuli (Murd et al., 2014). There are also studies focusing on the auditory system, including the work of Nakajima et al. that investigates the foundations of time perception using a time illusion based on an overestimation of a second time interval preceded by a first time interval or time-shrinking effect (Nakajima et al., 2014). Mitsudo et al. present recorded magnetoencephalogram signals in tasks that require to judge temporal gaps in tones and have discussed their implications in the organization of the auditory cortex (Mitsudo et al., 2014). Within the same time perception paradigm, Mioni et al. show a detailed review on temporal dysfunctions in traumatic brain injury patients (Mioni et al., 2014). The present theme issue also includes the work of García-Pérez who introduces a unified model to analyze different psychophysical tasks in time perception and estimation of the psychometric function (García-Pérez, 2014).

We hope this research topic will provide a useful framework and an up-to-date set of papers for further discussion on mental chronometry within the human brain.

# Acknowledgments

We acknowledge all the referees for their valuable support, effort, and time.


model of attention for object-based selection and feature-based processing modulation. Front. Hum. Neurosci. 8:414. doi: 10.3389/fnhum.2014.00414


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Medina, Wong, Díaz and Colonius. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Multifractal analyses of human response time: potential pitfalls in the interpretation of results

# *Espen A. F. Ihlen\**

*Department of Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway \*Correspondence: espen.ihlen@ntnu.no*

#### *Edited by:*

*José M. Medina, Universidad de Granada, Spain*

#### *Reviewed by:*

*Fred Hasselman, Radboud University Nijmegen, Netherlands Helmut Ahammer, Medical University of Graz, Austria*

**Keywords: response times, 1/***f* **noise, multifractal, variability, long-range dependency, fractal**

# **INTRODUCTION**

Analyses of response time series have provided insight into mental organization and cognitive processes used in a wide variety of tasks such as simple reaction time, word naming, choice decision, visual search, memory search, and lexical decision (Gilden, 2001). One of the new and frequently used sets of analyses is the numerical definition of scale invariant structure of response time series, also called 1/*f* fluctuations. Componentoriented theories suggest that this scale invariant structure originated from an idiosyncratic mechanism in the cognitive system, whereas interaction-oriented theories argue that scale invariant structure in response time series arises from selforganizing interaction between different sources and mechanisms (cf. Diniz et al., 2011). In this short commentary, new analyses of human response time called multifractal analyses will be introduced, and potential pitfalls of interpreting the results of these analyses will be discussed.

Multifractal analyses quantify the intermittent structure of response time series that are created by interactions between temporal scales of response series (Ihlen and Vereijken, 2010, 2013). Even though these analyses have been recently introduced in analysis of human behavior, their mathematical fundament of these analyses was introduced four decades ago (Yaglom, 1966; Mandelbrot, 1974). Typically, response time series with a large number of trials will contain intermittent periods with a higher number of slow response latencies than the rest of the response series (e.g., Holden et al., 2009). These intermittent periods of slow response latencies might indicate shifts in the participant attention to the stimuli source or active periods of response error corrections (Ihlen and Vereijken, 2010, 2013). In order to quantify the intermittent structure of response time series, multifractal analyses combine two fundamental classes of analyses: (1) model based analyses of the response time distribution and (2) analyses of the dependency of the time ordering of the responses. Class 1 analyses have shown that the response time distributions across cognitive tasks is unimodal, positively skewed, and with a heavy right tail containing the slow response latencies (e.g., Luce, 1986; Holden et al., 2009). Class 2 analyses have shown that the response times have long-range dependency across hundreds and even thousands of trials and, consequently, that the response time series cannot be considered to be independent random variables assumed by class 1 analyses (Gilden, 2001). The long-range dependency (i.e., monofractal structure) of the response time series are numerical, defined as a single scaling exponent by spectral analyses, autocorrelation analyses, detrended fluctuation analysis, and dispersion analysis, to mention but a few (cf. Diniz et al., 2011). However, Class 2 analyses assume that the response time is Gaussian distributed, whereas Class 1 analyses indicate that they have a non-Gaussian heavy tail toward slow response latencies. Multifractal analyses are able to parameterize the non-Gaussian heavy tails that are created using intermittent variation by assessing the complete spectrum of scaling exponents. Thus, multifractal analyses are important extensions of monofractal analyses of response time series.

All multifractal analyses are based on a decomposition of the response time series into a scale-dependent measure that identifies the periods of intermittent variation (see upper panel of **Figure 1**). The scale dependent measure is the basis for computation of the multifractal spectra along two formalisms (see arrows A and B in **Figure 1**). In the Legendre formalism, the scale-dependent measure μ*s*,*<sup>t</sup>* is used in the computation of the *q*-order moment. μ*s*,*<sup>t</sup>* is amplified by the positive *q*-orders in the periods with large variation, whereas μ*s*,*<sup>t</sup>* is amplified by the negative *q*-orders in periods with small variation. An exponent ζ*<sup>q</sup>* is then estimated from the scaling of each of the *q*-order moments before the multifractal spectra are computed from ζ*<sup>q</sup>* (see Ihlen and Vereijken, 2013 for further details). In the large deviation formalism, local exponents are computed from the scaledependent measure μ*s*,*t*, and the multifractal spectrum is estimated from the distribution of the local exponents. The increased width of multifractal spectra will reflect more distinct periods of intermittent variation in response time series (see example in Figure 2 in Ihlen and Vereijken, 2013). Additional surrogate tests also detect the periods influenced by multiplicative interactions between temporal scales (Ihlen and Vereijken, 2010). The different multifractal analyses like *structure function approach*, *entropy analyses*, *wavelet transformation modulus maxima*,

amplify the large μ*s*,*<sup>t</sup>* (i.e., *red contours*) whereas the statistical moments with negative *q*'s amplify the small μ*s*,*<sup>t</sup>* (i.e., *blue contours*). The scaling exponent ζ*<sup>q</sup>* numerically defines the power law relation of the intermittent periods with large (i.e., positive *q*'s) and small variation (i.e., negative *q*'s). The panel below the bottom arrow A illustrates a multifractal spectrum *Dh* estimated from ζ*q*. The panel below the top arrow B illustrates the direct estimation of the local singularity exponent *ht* as the local slope of log(μ*s*,*t*) vs. log(*s*) for each time instant *t*. The panel below the bottom arrow B illustrates the multifractal spectrum *Dh* estimated from the distribution of local singularity exponent *ht*. Adapted from Ihlen and Vereijken (2013).

*gradient modulus wavelet projection*, and *multifractal detrended fluctuation analysis* are defined by the particular way the scaledependent measures are computed (Ihlen, 2013a; Ihlen and Vereijken, 2013). The Legendre and large deviation formalisms contain statistical assessments of multifractality. Various geometrical assessments have been suggested in the literature that

estimates the box counting dimension of the time series (e.g., Russel et al., 1980; Chaudhuri and Sarkar, 1995). However, these methods are only numerically stable for positive *q* orders and, consequently, only estimate the left tail of the multifractal spectrum. Technical details for the computation of different multifractal analyses within the Legendre and large deviation formalisms, their parameter settings, Matlab codes, and comparison of their performance can be found elsewhere (Kantelhardt et al., 2002; Turiel et al., 2006; Kantelhardt, 2011; Ihlen, 2013a). Multifractal analyses have been applied to several cognitive tasks like simple reaction time, word naming, choice decision, and feedback manipulation (Ihlen and Vereijken, 2010; Kuznetsov and Wallot, 2011). All results from these studies indicate that response time series have multifractal properties that are not described by conventional monofractal analyses and that some of these properties might be task dependent.

### **POTENTIAL PITFALLS IN THE INTERPRETATION OF MULTIFRACTAL ANALYSES**

The interpretation of multifractal spectra of response time series has potential pitfalls. First, the multifractal spectra alone do not indicate that intermittent response time variation is generated by interaction between temporal scales. Wide multifractal spectra of response time series can reflect a power-law response time distribution and not intermittency generated by multiplicative interactions (Ihlen, 2013b). Surrogate tests have to be used to properly identify multiplicative interactions between temporal scales. In these tests, surrogate versions of the response time series are created that eliminate the interaction between temporal scales but preserve all other statistical properties. Multiplicative interaction is present when there is a significant difference between response time series and its surrogate series (e.g., Ihlen and Vereijken, 2010).

Second, response time series of 1000 trials might be too small to establish the presence of multifractality. An ideal monofractal signal will have an infinite number of scales whereas the 1000 trials of response series will only give three scales of order (i.e., 10, 100, and 1000 trials). However, in contrast to ideal monofractal signal, a multifractal signal has scale invariant properties only up to a maximum scale (Bacry et al., 2001). The *q*-order moments and scale-dependent measure converge into a single point on this maximum scale. Thus, in contrast to monofractal analyses, it is sufficient for multifractal analyses to include scales up to the maximum order. Assuming that the signal originates from a prototypical multifractal process, called a multiplicative cascade, the maximum scale could be assessed by analysis of the autocorrelation function (Bacry et al., 2001). Nevertheless, the estimation error of the multifractal spectra related to the number of trials in the response will also be dependent on the chosen *q*-range for the methods within the Legendre formalism and the unknown degree of multifractality. Large degree of multifractality will need large number of trials for a robust assessment of the tails of the multifractal spectra. Consequently, multifractal analysis is quite sensitive to differentiate between monofractal and multifractal response time series, but not between response time series with large degree of multifractality. Furthermore, multifractal analysis of moderately sized response time series will both be more susceptible to noise and non-stationarities compared to longer time series (Ihlen, 2013a). A possible solution is to compare the results of two or more multifractal analysis before interpreting the results. Large deviations in the results of two multifractal analyses indicate that response time series deviate from multifractality and that the results from these analyses must be interpreted with caution.

Third, no single multifractal analysis seems to have superior performance assessing the multifractal spectra of response time series. Previous studies statistical methods based on wavelet transformation, like wavelet transform modulus maxima, has been shown to superior to conventional methods based on the structure function (Muzy et al., 1993). Furthermore, both multifractal detrended fluctuation analysis and gradient modulus wavelet projection has shown superior performance to wavelet transform modulus maxima on moderate sized time series (Kantelhardt et al., 2002; Oswi˛ ´ ecimka et al., 2006; Turiel et al., 2006). Kelty-Stephen et al. (2013) have suggested that an entropy based analysis is the best method to assess the multifractal spectrum from response time series and that other multifractal analyses have inferior performance compared to this method using their choice of a scale-dependent measure. However, recent systematic comparison of multifractal analyses shows that all multifractal analyses have different pros and cons and that no single analyses seem to be superior to others (Ihlen, 2013a).

Fourth, the origin of multifractal and intermittent variation in response time series is still debated. Intermittent variation in response time has been suggested to be caused by changes in the participants' attention to stimuli or intermittent error corrections (Ihlen and Vereijken, 2010) and linked to cognitive phenomena like strong anticipation (Stephen and Dixon, 2011). Furthermore, multifractal spectra have been suggested to reflect to a greater extent the presence of self-organization and interaction-dominant dynamics compared to the outcomes of conventional monofractal analyses (Ihlen and Vereijken, 2010; Kelty-Stephen et al., 2013). The interaction-dominant view has been suggested to contrast explicit models of an idiosyncratic mechanism in the cognitive system specific to cognitive tasks or the dynamics of particular localized components (e.g., Van Orden et al., 2003). However, idiosyncratic mechanisms for multifractal variations have been suggested for human locomotion and cardiac function, which indicates that intermittent variations can be generated by task specific components (Ivanov et al., 1998; West and Scafetta, 2003). It is unlikely that any analysis or model will provide conclusive evidence on the generating processes of multifractal variation in response time series (Hasselman, 2013; cf. Kantz and Schreiber, 2004). The generating processes of multifractal and intermittent variation should be decided by experimentation under conditions of strong inference (Hasselman, 2013). Consequently, experimental design should be use to confirm predicted changes in the multifractal spectra. Predicted covariation between local scaling exponents of the response time series and other psychological measures will indicate a common generating process of the multifractality of these signals. As an example, intermittent changes in attention and error correction could be verified by multifractal analyses of gaze fixation and eye movements during the same cognitive task (e.g., Kelty-Stephen and Mirman, 2013).

In summary, caution should be made when inferring response time series as multifractal in a strict mathematical sense. Nevertheless, the width of the multifractal spectra could still be a sensitive index of the intermittency of the response time series even though the intermittency is not prototypical multifractal. The main advantage of multifractal analyses of response time series is their ability to assess the temporal changes in their scale invariant structure. Further studies should focus on the assessment of generating processes of multifractal by experimentation under strong inference. This might include the assessment of temporal changes in the local scaling exponent (i.e., the local structure of response time variation) in more heterogeneous and real-life experiments where the task conditions and characteristics of the stimuli involve change across trials. Furthermore, the correlation between the temporal changes in the structure of the response time variation and other neurophysiological and psychological measurements can be assessed through multifractal analyses by correlating the temporal change of the scaling exponents (see example in Figure 7 in Ihlen and Vereijken, 2013). Time series from different levels of the cognitive and neurophysiological system are more likely to correlate in their scale independent structure rather than their unit dependent magnitude. Thus, multifractal analyses might provide new insight into the interaction and coordination of multiple levels of cognitive performance and human behavior.

#### **REFERENCES**


structure function approach versus the wavelettransform modulus-maxima method. *Phys. Rev. E* 47, 875–884. doi: 10.1103/PhysRevE. 47.875


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 April 2014; accepted: 27 June 2014; published online: 21 July 2014.*

*Citation: Ihlen EAF (2014) Multifractal analyses of human response time: potential pitfalls in the interpretation of results. Front. Hum. Neurosci. 8:523. doi: 10.3389/fnhum.2014.00523*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Ihlen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A theory of power laws in human reaction times: insights from an information-processing approach

#### *José M. Medina1 \*, José A. Díaz <sup>1</sup> and Kenneth H. Norwich2*

*<sup>1</sup> Departamento de Óptica, Facultad de Ciencias, Universidad de Granada, Granada, Spain*

*<sup>2</sup> Department of Physics, Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada*

*\*Correspondence: jmedinaru@cofis.es*

#### *Edited by:*

*John J. Foxe, Albert Einstein College of Medicine, USA*

*Reviewed by:*

*Andreas Klaus, National Institute of Mental Health, USA*

**Keywords: human reaction time, intrinsic variability, power laws, information transfer, Piéron's law**

Human reaction time (RT) can be defined as the time elapsed from stimulus presentation until a reaction/response occurs (e.g., manual, verbal, saccadic, etc.). RT has been a fundamental measure of the sensory-motor latency at suprathreshold conditions for more than a century and is one of the hallmarks of human performance in everyday tasks (Luce, 1986; Meyer et al., 1988). Some examples are the measurement of RTs in sports science, driving safety or in aging. Under repeated experimental conditions the RT is not a constant value but fluctuates irregularly over time. Stochastic fluctuations of RTs are considered a benchmark for modeling neural latency mechanisms at a macroscopic scale (Luce, 1986; Smith and Ratcliff, 2004). Power-law behavior has been reported in at least three major types of experiments. (1) RT distributions exhibit extreme values. The probability density function (pdf) is often heavy-tailed and can lead to an asymptotic powerlaw distribution in the right tail (Holden et al., 2009; Moscoso del Prado Martín, 2009; Sigman et al., 2010). (2) RT variability (e.g., variance) is not bounded and usually shows a power relation with the mean, with an exponent β close to unity (Luce, 1986; Wagenmakers and Brown, 2007; Holden et al., 2009; Medina and Díaz, 2011, 2012). This relationship is a manifestation of Taylor's law (also called "fluctuation scaling") (Taylor, 1961; Eisler et al., 2008), although departures from power law have been reported (Eisler et al., 2008; Schmiedek et al., 2009). And (3), the mean RTs decay as the stimulus strength increases (Cattell, 1886), an issue that is

well-described by a truncated power function written in the form of Piéron's law (Piéron, 1914, 1920; Luce, 1986):

$$t\_{n+1} = t\_n + \frac{d}{S^p} \tag{1}$$

*tn* <sup>+</sup> <sup>1</sup> indicates the mean RT, *S* is the stimulus strength (e.g., loudness intensity, odor concentration, etc.), *tn* represents the asymptotic component of the mean RT reached at very high stimulus strength and *d* and *p* are two parameters (Luce, 1986). The sub-index *n* denotes the time step or order and it indicates a causal process: *tn* <sup>+</sup> <sup>1</sup> grows from the previous stage *tn* by an additive factor that depends on the stimulus strength *S* (Medina, 2009). The previous stage *tn* contains those processes at the threshold at an earlier time and *tn* <sup>+</sup> <sup>1</sup> in Equation (1) describes those processes at suprathreshold conditions at a later time (Norwich et al., 1989; Medina, 2009). The origin of power-law behavior in RTs has been a long-standing issue. Considerable effort has been dedicated in modeling each power relation separately. While it might be plausible that power laws in RTs could share a limited number of mechanisms, a successful theory remains unresolved. The ubiquity of power laws in many biological and physical systems has revealed the existence of multiple generative mechanisms (Mitzenmacher, 2004; Newman, 2005; Sornette, 2007; Frank, 2009). Research on a unifying framework that links power laws in RTs is an important issue for better understanding the emergent complex behavior of neural activity in simple decisions and in dysfunctional states.

We propose that type (3) power laws govern the threshold for RT; and it follows consequently that power laws govern suprathreshold fluctuations in RT. Piéron's law is valid for each sensory modality (Chocholle, 1940; Banks, 1973; Luce, 1986; Overbosch et al., 1989; Pins and Bonnet, 1996; Bonnet et al., 1999), and in both simple and choice reaction times (Schweickert et al., 1988; Pins and Bonnet, 1996). Instead of diffusion models (Luce, 1986; Smith and Ratcliff, 2004), we use elements from information theory and statistical physics as the principal conceptual tools. We also discuss random multiplicative processes as an important approach to Piéron's law and power laws in RTs.

In our information-theoretic formalism, the information entropy function *H* always expresses a measure of uncertainty within a sensory neural network. High information entropy values indicates high uncertainty and vice versa. Information is related to the drop of uncertainty (measured, e.g., in bits). It is postulated that sensory perception is not an instantaneous act but it always takes time (Norwich, 1993). Initially, for a given external input signal, the sensory system encodes the stimulus efficiently and then, it adapts and transfers information over time. Therefore, the *H*-function depends explicitly on the time to represent a continuous process of sensory adaptation (Norwich, 1993). The human RT can be re-defined as the time needed to accumulate *H* bits of information after efficient encoding (Norwich et al., 1989; Norwich, 1993):

$$
\Delta H = H\left(\frac{1}{t\_0}\right) - H\left(\frac{1}{t\_{n+1}}\right) > 0 \quad (2)
$$

**Figure 1A** represents the entropy function *H* in Equation (2). At least two stages can be differentiated. The *H*-function evolves from a previous state of maximum uncertainty reached at the encoding time *t*0, *H* (1/*t*0), to a final adapting stage with a lower uncertainty *H* (1/*tn* <sup>+</sup> <sup>1</sup>) where a reaction occurs, (*tn* <sup>+</sup> <sup>1</sup> > *t*0). Maximum production of entropy and then, a reduction of uncertainty in *H* as a function of time are concepts introduced from statistical physics, the latter as expressed by Boltzmann (Norwich, 1993). Based on an analytical model of the *H*-function (Norwich, 1993), the gain of information *H* is connected with the formation of an internal threshold in Equation (1) (Norwich et al., 1989; Medina, 2009):

$$d = t\_n S\_0^\rho \tag{3}$$

Piéron's law can be written as follows:

$$t\_{n+1} = (b\_n + 1) \ t\_n,\qquad(4)$$

where *bn* = (*S*0/*S*) *<sup>p</sup>*. The parameter *S*<sup>0</sup> represents an estimation of the internal threshold that controls the RT: an external incoming signal *S* exceeding *S*<sup>0</sup> leads to a RT response (Norwich et al., 1989). Furthermore, *S*<sup>0</sup> varies based on several factors and provides the sensitivity

**FIGURE 1 | (A)** Schematic representation of the information entropy function *H* (1/*t*) (in bits) as a function of the time *t* (Norwich, 1993). The transfer of information *H* is defined in Equation (2) from the encoding time *t*<sup>0</sup> until a reaction occurs at *tn* <sup>+</sup> 1. (a.u.) = arbitrary units. **(B)** Schematic representation of a model of hyperbolic growth in reaction times based on Piéron's law and analogous to Michaelis-Menten kinetics in biochemistry (i.e., the Hill equation) (Pins and Bonnet, 1996). In Michaelis-Menten kinetics, an enzyme *E* is bounded to a substrate *U* to form a complex *EU* that is converted into a product *D* and the enzyme *E*. In Piéron's law, those neurons tuned at the time *tn* are bounded to those neurons that perform the formation of an internal threshold *S*<sup>0</sup> in *bn* = *S*0/*S <sup>p</sup>* to form the term *tnbn* that is converted into the product *tnbn* plus the time *tn*. Red double arrows indicate that the "reaction" is reversible whereas green single arrows indicate that the "reaction" goes only in one way.

(1/*S*0) of the sensory system (e.g., in vision the human contrast sensitivity function) (Felipe et al., 1993; Murray and Plainis, 2003). The model of Piéron's law in Equation (4) sets a number of important properties. First property, Equation (4) indicates the existence of multiplicative interactions in a cascade between different time scales: the mean RT is expressed in terms of the asymptotic time, *tn*, and Piéron's law is written in multiples of threshold *S*0. That is, we work with dimensionless ratios of *S*0/*S* (Norwich, 1993). Different interpretations of the exponent *p* have been reported. *S p* <sup>0</sup> could be interpreted as the transfer or transducer function between neurons (Copelli et al., 2002; Billock and Tsou, 2011) at the threshold. The exponent *p* usually takes non-integer values and could indicate a signature of self-organized criticality in a phase transition (Kinouchi and Copelli, 2006). Here the concept of phase transition does not deal with the classical view of different states of matter in thermodynamics (e.g., liquid vs. gas), but with different states of connectivity between neurons as modeled by branching processes (Kinouchi and Copelli, 2006). Alternatively, power functions *S p* <sup>0</sup> can be derived from Mackay transforms (Mackay, 1963) and the exponent *p* could represent oscillatory synchronization states between neurons (Billock and Tsou, 2005, 2011). The model of Piéron's law in Equation (4) is a useful alternative approach and optimal information transfer is related with the entropy function *H* (Norwich, 1993). Low values of *p* will promote a minimum in *H* after efficient encoding, i.e., an *Infomin* principle at the macroscopic scale (Medina, 2011, 2012).

Second property, the threshold barrier *S*<sup>0</sup> is not a fixed static value but unstable and fluctuates over time due to the presence of endogenous or internal noise (Faisal et al., 2008). Consequently, RTs are influenced and modified by neural noise. Therefore, Equation (4) is not deterministic and is included in a general class of discrete-time stochastic equations that has been used in many applications such as in epidemics, finance, etc. (Levy and Solomon, 1996; Sornette and Cont, 1997; Takayasu et al., 1997; Newman, 2005; Sornette, 2006). The term *bn* is a random and positive multiplicative factor that depends on the temporal fluctuations of *S*<sup>0</sup> and thus, on *H*. It has been demonstrated that the model of Piéron's law in Equation (4) produce type (1) power laws. RT pdfs obey a transition from a log-normal distribution into a power law in the right tail (Medina, 2012). If RTs are longer than the asymptotic term, *tn*, the RT pdf is distributed as a power law with an exponent γ that depends on the exponent *p* of Piéron's law (Medina, 2012):γ = 1 + *c*/*p* , *c* being a constant. Two different regimes are observed: for those values *p* > 0.6 the central moments diverge and if *p* ≤ 0.6 they are finite (Medina, 2012). Therefore, long RTs compared to the asymptotic term *tn* are considered intermittent events over time. Their distribution is characterized by power law pdfs that might have finite or infinite variance. A cautionary note should be mentioned here. The magnitude of *p* could also depend on the metric of the stimulus strength *S* selected and values different from the boundary *p* ∼= 0.6 might be possible. For instance, this is important when testing power law RT pdfs in color vision because an appropriate color contrast metric has not been established (Medina and Diaz, 2010).

Third property, the reciprocal of Piéron's law is invariant under rescaling (Chater and Brown, 1999; Medina, 2009). Taking the reciprocal of the mean RT, *R* = 1/*tn* <sup>+</sup> 1. and the reciprocal of the irreducible asymptotic term, *R*max = 1/*tn* in Equation (4), then, *R* = *R*max 1 + (*S*0/*S*) *p* . Therefore, the reciprocal of the Equation (4) defines an affine transformation over multiple time scales that can be mapped into the Naka-Rushton equation at the cellular level (Naka and Rushton, 1966) and the Michaelis-Menten equation in enzyme reactions at the sub-cellular level (Michaelis and Menten, 1913; Pins and Bonnet, 1996). This suggests that some general properties of RT patterns governed by Piéron's law could be mirrored in part into the dynamics of the Naka-Rushton equation and/or the Michealis kinetics (Medina, 2009, 2012). The Naka-Rushton equation represents a canonical form of non-linear gain control in neural responses before saturation (Albrecht and Hamilton, 1982; Billock and Tsou, 2011; Carandini and Heeger, 2012). Threshold normalization in the Naka-Rushton equation is often modeled as a pool of many neurons tuned to different stimulus properties (Heeger, 1992; Carandini and Heeger, 2012). In the Michaelis-Menten equation, the normalization factor is the Michaelis constant and indicates the substrate concentration at a reference value. The Michaelis constant is related with the substrate's affinity for the enzyme and depends on many factors (Murray, 2002). **Figure 1B** represents a schematic model of RT growth based on Piéron's law and an analogy with enzyme kinetics.

The exponent *p* of Piéron's law could be related to the scaling exponent β of the variance-mean relationship in type (2) power law. A power law relationship between variance and mean of the stimulus population has been proposed in the *H*-function (Norwich, 1993) and this relationship could be compatible with the RT variance-mean relationship in the regime around *p* > 0.6 (Medina, 2011, 2012). Alternative approaches have explored α-stable processes to relate type (1) power laws and long-range correlations (Ihlen, 2013). Tweedie exponential dispersion models are also able to describe type (2) power laws in many biological and physical processes (Eisler et al., 2008; Kendal and Jørgensen, 2011; Moshitch and Nelken, 2014). However, a connection between Piéron's law and α-stable and Tweedie models remains unknown.

In summary, maximum entropy *H* and then, adaptation over time in Equation (2) leads to a type (3) power law, Piéron's law Equation (1). The *H*-function also explains many empirical relations of sensory perception (Norwich, 1993). An important message of the entropy function *H* is that the term *d* in Piéron's law depends explicitly on a sensory threshold *S*<sup>0</sup> by the power law Equation (3). There is also experimental evidence that RTs and threshold-based sensitivities are mediated by common sensory processes (Felipe et al., 1993; Murray and Plainis, 2003). Therefore, temporal fluctuations at the sensory threshold *S*<sup>0</sup> affect RT fluctuations at suprathreshold conditions and this can be described by means of a simple random multiplicative process in Equation (4). The same multiplicative process produces non-Gaussian RT distributions and type (1) power law RT pdfs. The model of Piéron's law in Equation (4) also generates fractal-like behavior that extends to smaller time scales. The reciprocal of Equation (4) provides a direct link with neural gain control in single neurons as exemplified by the Naka-Ruston equation and a possible analogy with enzyme kinetics within neurons as exemplified by the Michaelis-Menten kinetics.

# **REFERENCES**


identification: Piéron's law and choice reaction time. *Percept. Psychophys.* 44, 383–389. doi: 10.3758/bf03210422


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 May 2014; accepted: 24 July 2014; published online: 12 August 2014.*

*Citation: Medina JM, Díaz JA and Norwich KH (2014) A theory of power laws in human reaction times: insights from an information-processing approach. Front. Hum. Neurosci. 8:621. doi: 10.3389/fnhum.2014.00621*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Medina, Díaz and Norwich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Spectral convergence in tapping and physiological fluctuations: coupling and independence of 1/f noise in the central and autonomic nervous systems

# **Lillian M. Rigoli , Daniel Holman, Michael J. Spivey and Christopher T. Kello\***

Cognitive and Information Sciences, University of California, Merced, CA, USA

#### **Edited by:**

José Antonio Díaz, Universidad de Granada, Spain

#### **Reviewed by:**

Klaus Linkenkaer-Hansen, Neuroscience Campus Amsterdam, Netherlands Vadim Nikulin, Charite University Hospital, Germany Gerardo Aquino, Imperial College London, UK

#### **\*Correspondence:**

Christopher T. Kello, Cognitive and Information Sciences, University of California, 5200 N. Lake Road, Merced, CA 95343, USA e-mail: ckello@ucmerced.edu

When humans perform a response task or timing task repeatedly, fluctuations in measures of timing from one action to the next exhibit long-range correlations known as 1/f noise. The origins of 1/f noise in timing have been debated for over 20 years, with one common explanation serving as a default: humans are composed of physiological processes throughout the brain and body that operate over a wide range of timescales, and these processes combine to be expressed as a general source of 1/f noise. To test this explanation, the present study investigated the coupling vs. independence of 1/f noise in timing deviations, key-press durations, pupil dilations, and heartbeat intervals while tapping to an audiovisual metronome. All four dependent measures exhibited clear 1/f noise, regardless of whether tapping was synchronized or syncopated. 1/f spectra for timing deviations were found to match those for key-press durations on an individual basis, and 1/f spectra for pupil dilations matched those in heartbeat intervals. Results indicate a complex, multiscale relationship among 1/f noises arising from common sources, such as those arising from timing functions vs. those arising from autonomic nervous system (ANS) functions. Results also provide further evidence against the default hypothesis that 1/f noise in human timing is just the additive combination of processes throughout the brain and body. Our findings are better accommodated by theories of complexity matching that begin to formalize multiscale coordination as a foundation of human behavior.

**Keywords: complexity matching, long-range correlations, interdependent coordination, tapping, spectral analysis**

# **INTRODUCTION**

All behaviors of biological organisms can be viewed as phenomena of coordination, including human behaviors. Neurons work together to create temporal patterns of neural activity, and those patterns play important roles in motor activities leading to overt human behaviors. Likewise, behaviors result in changes to sensory and proprioceptive inputs that affect patterns of neural activity. Thus coordination happens amongst the components of brains and bodies, and also between brains, bodies, and their environments.

Perhaps the most fundamental expression of coordination in human behavior is found in the relative timing of events. Movements of the eyes must be timed relative to those of the hands to draw a picture (Huette et al., 2013); movements of hands must be timed relative to those of the vocal tract to gesture during speech (Kelly et al., 2010); and movements of the legs must be timed with movement of the ball in soccer (Bartlett et al., 2007), just to name a few examples. These are all exquisite phenomena of timing and coordination, but it is often useful to study simpler cases to formulate basic principles and theories.

From this perspective, one of the simplest cases of coordination occurs when brief, individual behaviors are timed in direct relation to clearly delineated events in the environment—stepping on the gas or brake pedal in response to a traffic light, for instance, or working on an assembly line to perform the same action repeatedly on each widget being transported along a conveyor belt. The experimental analogs to these illustrative examples are simple response times to individual stimuli, and tapping in time with a metronome. These experimental paradigms have been used in thousands of studies, with response times figuring most prominently in experimental psychology (Holden et al., 2009), and tapping in motor control (Rosenbaum, 2009).

Despite the intimate relationship between timing and coordination, theoretical approaches to response times and tapping times do not usually refer to coordination. Instead, response times are usually studied in terms of information processing, where time is theorized to reflect the number of processing steps needed to identify each stimulus, and then choose and prepare a response (Sternberg, 1998). Tapping times are usually studied in terms of timing mechanisms (Georgopoulos, 2000), where questions focus on the nature of internal clocks and their associated neural machinery.

#### **THE PUZZLE OF 1/f NOISE**

Studies of information processing and clocks have led to many advances in theories of cognitive and neural processes for decades, and this progress is likely to continue for some time. However, a general property of response times and tapping times has been established over the years, and it is not easily explained within these paradigms. Responses and taps vary from one to the next, even when there is no overt change in stimuli, the metronome, or any other task conditions. This is not surprising in itself, given that humans are not robots or machines in any traditional sense of the words. One should expect a certain amount of imprecision in human timing that could be described as "noise".

The puzzling property concerns the kind of noise observed in human timing, and many other fluctuations in biological and complex systems. A default assumption of most statistical models used in experimental psychology is that noise (i.e., error variance) in repeated measures is Gaussian and uncorrelated. The term "uncorrelated" means that current and previous measured values of noise provide no information about future measurements. This simplifying assumption is useful for statistical models, but we know it is incorrect because memory is universal to all human and biological systems—not memory as storage of information, but the more general sense that system states are conditioned on their past, and therefore carry some of their history forward in time.

The memory inherent to human and biological systems suggests that noise in human timing should be correlated in some way. For instance, homeostatic systems are often expected to exhibit negative correlations in their fluctuations, as a result of negative feedback. When synchronizing to a metronome, if one tap falls behind the beat, the next tap would be adjusted earlier in time, and vice versa. One can measure such negative feedback by taking a time series of tap intervals, and a copy of the same time series where values are shifted backward in time, and then correlating the time series with itself at different lags. Such *autocorrelation* analyses show evidence to support the presence of negative correlations in tapping data (Wing and Kristofferson, 1973), but negative correlations account for only a small amount of the noise variance. Most of the noise in human response times and tapping times exhibits positive auto-correlations (Pressing and Jolley-Rogers, 1997).

In general, positive auto-correlations can be understood in terms of hysteresis—simply put, systems are sluggish to change their states. For instance, if a response is relatively fast on one trial, then whatever system conditions caused the fast deviation will still be in play on the next trial, at least to some degree. This principle can explain positive auto-correlations in general, but it is the specific pattern of auto-correlations that is puzzling. In particular, they decay slowly as an *inverse power law* function of increasing lag, *C*(*k*)∼1/*k* λ , where *C*() is the auto-correlation function, *k* is the integer lag, and λ is the power law exponent. This kind of positive auto-correlation is known as *long-range correlation* because the power law indicates that *all* past states play a role in determining any given current state. It is also known as *1/f noise* because the auto-correlation function can be expressed in the frequency domain as a relation between spectral power and frequency, *P*(*f*)∼1/*f* α , where *P*() is the spectral function, *f* is frequency, and α is the power law exponent. Exponents estimated from timing data have varied across studies and conditions, but most estimates have been near 1.

The presence of 1/f noise has been reported in many studies of human response times (Van Orden et al., 2003) and tapping times (Ding et al., 2002), as well as other repeated measures of human behavior (Gilden, 2001) and neural systems (Allegrini et al., 2009). These reports have stirred up much debate. Some of this debate has concerned the veracity of findings (Farrell et al., 2006), with opponents arguing that observed auto-correlations actually may not be long-range but short-range instead (i.e., fall off exponentially with lag, instead of an inverse power law function). However, recent studies have compared these two statistical models and found 1/f noise to better account for the data (Gilden, 2009).

Accepting that the noise in human timing follows a 1/f scaling relation, most of the debate has focused on theoretical explanations. One reason for debating 1/f noise is that the theoretical constructs of clocks and information processing yield no ready insights in and of themselves. Certainly one can add mechanisms to information processing models and clock models that exhibit 1/f noise, and this has been done (Torre and Wagenmakers, 2009). Perhaps the most general mechanism thus far has been *strategy shifting* (Diebold and Inoue, 2001), whereby a perturbation is added to each response time or tapping time that reflects discrete shifts among distinct plateaus in response strategies. The varying duration of these plateaus, and non-stationarity of shifting among them, has been shown to yield 1/f noise under certain parameterizations.

One problem with strategy shifting and similar accounts is that they appear *post hoc*, in that they do not provide principled answers as to why information processing and clock models would include such processes. Another problem is that such domain-specific processes are difficult to generalize to other repeated measures of human activity that exhibit 1/f noise, such as speech acoustics (Kello et al., 2008) and affect ratings (Delignières et al., 2004), and repeated measures of human neural activity as well (Linkenkaer-Hansen et al., 2001). We believe that progress will continue to be made by improving and expanding domainspecific accounts related to clocks and information processing, but here we investigate two domain-general accounts aimed at the broader range of 1/f phenomena in human and biological systems.

#### **TWO DOMAIN-GENERAL ACCOUNTS OF 1/f NOISE**

The generality and ubiquity of 1/f noise has led some researchers to formulate two classes of domain-general explanations: a *process summation* account and an *interdependent coordination* account. The process summation account is based on sums of processes across various timescales, and the interdependent coordination account is based on the interdependence of processes necessary for coordination. Here we describe each in turn, and then present an experiment designed to test these alternative accounts of 1/f noise in human timing and physiology.

Regarding the process summation account, a 1/f-like signal can be created by sampling from three or more uncorrelated noises generated over different timescales and amplitudes (with timescale inversely related to amplitude), and summing the samples together (Wagenmakers et al., 2004). A 1/f signal can be similarly created by summing independent processes whose exponential decay rates span a range of timescales (Granger, 1980). In either case, the signal will be 1/f-like only within the range of timescales sampled. Then again, 1/f noise in human behavior can be observed only within a limited range of timescales due to limits on measurement (Van Orden et al., 2005).

The human brain and body is composed of processes that unfold over a wide range of timescales, from fast ion channel dynamics to slower changes in neurotransmitters, cardiovascular, and various homeostatic processes, and even slower changes in hormones, circadian dynamics, and developmental processes in general (Bassingthwaighte et al., 1994). A similar claim can be made regarding cognitive processes, from the millisecond dynamics of perception, to the waxing and waning of attention that may span seconds to minutes, to processes of decision-making, planning, and learning that may span anywhere from seconds to years (Ward, 2002). It seems quite plausible that any measurement of human behavior may be influenced by any combination of these ongoing processes. If the magnitude of influence (i.e., amplitude) is generally inversely related to timescale, then one would expect these processes to sum up to 1/f noise in repeated measurements of response times, tapping times, and any other measure of human behavior.

The interdependent coordination account is based on interactions among system components, rather than summations of independent processes. The coordination of behavioral activity requires interactions among whatever components and events are being coordinated. The same is true for neural and physiological activities, the difference being that the components and events are different and reside on shorter spatial and temporal scales. We can say further that interdependencies among system components must strike a balance between too much and too little coupling as a result of interactions (Kello and Van Orden, 2009). Too much coupling would result in interlocked patterns of activity that are unable to differentiate or adapt to changes in conditions. Too little coupling would fail to support the emergence of coordination patterns that extend in space and time. Instead, adaptive systems exhibiting coordination need loosely coupled components that support the formation of many different potential patterns of activity.

The balance of coupling and its relationship to pattern formation has been formalized in statistical mechanics in terms of *metastability* (Kelso, 1995), and the dynamics of interactions that underlie metastability have been shown to produce 1/f noise (Usher et al., 1995). Metastability appears to be a useful property for biological and behavioral systems in general, because it endows them with an ability to respond and adapt to their everchanging conditions (Sasaki et al., 2007; Pinder et al., 2012). On this account, 1/f noise reflects fluctuations across multiple timescales that result from patterns being organized and reorganized across multiple timescales. Thus 1/f noise should be a general property of any metastable system, including human systems involved in response times and tapping times.

This approach to 1/f noise and other power laws in nature was made famous by models of *self-organized criticality* (SOC; Bak et al., 1987). The ubiquity of power laws in nature, like 1/f noise, led physicists to hypothesize that *critical points* may be common attractors of complex systems in nature (Bak, 1996). Critical points are associated with (second-order) phase transitions in systems of many interdependent elements, where dynamics take on unique properties of memory and symmetrybreaking (Stanley, 1987). Original models of SOC were criticized as models of human behavior because they more closely resemble models of avalanches, forest fires, and other physical complex systems (Wagenmakers et al., 2005). However, a large body of work has shown how SOC may be a functional principle of neural networks and other physiological networks (see Kello, 2013).

The *interdependent coordination* account is similar to the *process summation* account, in that both provide a rationale for the ubiquity of 1/f noise. However, they make different predictions when it comes to taking multiple repeated measurements of human behavior. Kello et al. highlighted this distinction between accounts by measuring two aspects of key-press dynamics (Kello et al., 2007). Participants made repeated simple responses to series of visual cues, and both response times and key-press durations were recorded. A key-press duration is the very brief period of time (∼100–150 ms ) that a key remains in contact with its sensor for a normal, ballistic keystroke. Both domain-general accounts predict 1/f noise in both time series of measurements.

The accounts differ in whether they predict the *same* 1/f signal to appear in each time series, or whether distinct 1/f signals may arise from simultaneous yet distinct measures of behavior. The process summation account predicts the same 1/f signal because key-press response times and durations should draw from roughly the same set of summed processes, especially at the larger timescales (e.g., waxing and waning of attention and circadian rhythms). The reasons are that the two measurements are inextricably paired for each keystroke, are produced by overlapping sets of muscles, and effectively occur at the same time relative to the timescales of 1/f noise spanning dozens and hundreds of responses. It is difficult to hypothesize how these measurements could tap into distinct sets of component processes spanning the same timescales as 1/f noise. By contrast, interdependent coordination holds that any given system or subsystem can exhibit 1/f noise on its own, or in coupling with other systems. The reason is that interdependence can hold for components at all scales, and criticality can create dynamics with long-range memory (i.e., correlations) for any given subsystem. In other words, 1/f noise is hypothesized to pervade the heterogeneous networks of interacting processes in human systems.

Results from four experiments reported by Kello et al. (2007) showed that key-press response times and durations were independent of each other, in terms of exhibiting 1/f noises that were uncorrelated with each other, and also in terms of independently manipulating the 1/f noise in response times while leaving keypress durations unaffected. The authors argued that the data provided evidence against the process summation account, but were consistent with the interdependent component account. However, a subsequent reanalysis of these data indicated more subtle, nonlinear relationships between the time series (Moscoso del Prado Martín, 2011). Thus while the process summation account is called into question, more experiments and analyses are needed to investigate the nature of coupling and independence among simultaneous measures of 1/f noises (e.g., Kello et al., 2008).

# **COMPLEXITY MATCHING AS MEASURED BY SPECTRAL CONVERGENCE**

The present experiment and analyses were designed to further investigate the nature of 1/f noise in human behavior by measuring coupling based on a recently formulated theoretical principle known as *complexity matching* (West et al., 2008; Aquino et al., 2010, 2011). Theoretical analyses using statistical mechanics have shown that, when two complex systems become coupled, there is maximal rate of information exchange between them when their power laws converge. This formalization of coupling is different from more standard measures like synchronization. Complexity matching between two signals does not refer to phase relations—instead, it refers to convergence of the two power spectra. Thus coupling in terms of complexity matching means that each system retains its own distinct phase dynamics, yet the systems affect the statistical character of each other's dynamics. This effect is equivalent to an exchange of information between two given systems, in the sense of mutual information.

Complexity matching is a theoretical construct general to all complex systems, but it has already garnered some empirical support in studies of dyadic coordination, which can be viewed in terms of informational coupling between two human complex systems. Marmelat and Delignières (2012) conducted an experiment in which each participant in a dyad swung a hand-held pendulum, with instructions to swing in synchrony. Synchronization is a direct phase relation, but deviations from synchrony were analyzed for 1/f noise. Results showed that the spectral shape of 1/f noise for each member of a dyad converged to the extent that coupling was facilitated by visual and physical contact. This convergence could not be explained in terms of simple phase relations because there were no reliable cross-correlations in the time series of deviations from synchrony. Other more recent experiments showed the same basic effect, but in the speech signals of dyads engaged in conversation (Fusaroli et al., 2013; Abney et al., 2014).

Dyadic coordination is one example of two interacting systems, but as we discussed at the outset, humans are composed of many components across many scales that must coordinate in order to function. Complexity matching suggests that the coordination of two subsystems in a single individual may manifest as a convergence in their 1/f noise spectra when repeated measures are taken. Evidence for spectral convergence in 1/f noise would provide further evidence against the process summation account, provided that this convergence was not simply a product of correlated time series. The present experiment tested this hypothesis by measuring tapping deviations and key-press durations while either synchronizing or syncopating with a metronome. Previous studies have shown slightly stronger 1/f noise in timing deviations during syncopation (Chen et al., 2001), so we varied tapping between synchronization and syncopation to test whether an effect on timing deviations would dissociate from an effect on keypress durations.

To further investigate coupling in terms of complexity matching, we wanted to compare 1/f noise in key-presses with other fluctuations in physiological activity that either were or were not responsive to the metronome. For the former, we presented a flash of light with each auditory beat of the metronome, and measured fluctuations in the pupil dilation response across audiovisual beats of the metronome. In the synchronization condition, pupil and key-press responses occurred to the same stimuli, and roughly at the same time. If this co-occurrence leads to coordination between the neural and physiological systems underlying keypress and pupil responses, then we should observe coupling in their 1/f noise signals. However, reflexive pupil dilation is coordinated by the autonomic nervous system (ANS), whereas learned motor responses are coordinated by the central nervous system (CNS). These two physiological systems may not measurably be coupled when the body is at rest, as it is while sitting quietly during a tapping task.

For physiological fluctuations that were not responsive to the metronome, we measured heartbeat intervals. Resting heart rate should not be driven by the negligible effort required to execute each key-press. Yet healthy heartbeat intervals are known to exhibit 1/f noise (Peng et al., 1995), and both heart rate and pupil dilation are known to be coordinated by the ANS. Therefore, if the ANS is not driven by the tapping task, then we expect 1/f noise in pupil responses and heartbeat intervals to be coupled with each other, but independent of 1/f noise in key-press durations and timing deviations in tapping. The latter should be coupled with each other through the CNS and the tapping task.

# **EXPERIMENTAL METHODS**

### **Participants**

Thirteen female and 13 male UC Merced students 18–30 years of age participated in the experiment for course credit or as volunteers. All reported having normal hearing and either normal or corrected vision. Four participants were left-handed. Data from two participants were removed due to equipment malfunction.

# **Apparatus**

Pupil dilations were recorded using an Eye-Link II head mounted video-based eye-tracker (SR Research Ltd.) with a temporal resolution of 500 Hz and a spatial resolution of 0.025◦ . The eye-tracker uses two infrared LEDs mounted on the headband to illuminate each eye, and pupil dilations were recorded from whichever eye had the more accurate track. ECG samples were recorded at 250 Hz using a Zephyr™ Bioharness 3 (Zephyr Technology, Auckland, New Zealand) fastened around each participant as a chest belt. Taps were recorded using a keyboard and MAX 6 (Cycling 74) experiment software. The audiovisual metronome was presented using a 22-inch ThinkVision LCD monitor with 1280 × 1024 resolution, and Koss over-the-ear headphones. The metronome consisted of a 200 ms tone played at a loud but comfortable volume, synchronized with the display of a white circle for 200 ms with 25 cm diameter on a blank screen viewed from a 60 cm distance. A moderate level of light in the room was held constant across all participants.

# **Procedure**

Participants were instructed to sit quietly for 10 min at the beginning of the experiment to allow the heart to settle to its resting rate. Participants were instructed how to fasten the Bioharness 3 to themselves, and the eye-tracker was calibrated using the standard nine-point calibration method. Participants were randomly assigned to either the synchronization condition (i.e., tapping in-phase with the metronome beats) or syncopation condition (i.e., tapping in between the metronome beats). Participants were instructed that they would see and hear a metronome beat presented at a constant, comfortable pace, and that they should tap the spacebar either in time with the beat or in between the beats. They were also instructed to keep their eyes fixated on the screen for the duration of the experiment. Each participant tapped to 1100 beats, which was set at a constant 800 ms interbeat interval. This interval was set to be within the range of the healthy resting heart rate for young adults, and also to allow for 1100 beats to be administered in about 15 min.

#### **Data pre-processing**

The keyboard and heart rate apparatus directly produced time series of key-press durations and heartbeat intervals. Timing deviations were computed by subtracting each key-press time from its corresponding metronome beat. The interval timing of beats was known with high precision, but the phase of the metronome relative to key-press times was estimated for each participant. Any error in this estimate was constant across each time series of keypresses, and therefore not a factor.

The eye-tracker produced a sampled time series of pupil size that did not demarcate pupil responses to the flashes of light. However, pupil dilation responses could be seen as a clear waveform that rose and fell with roughly the same frequency as the metronome. We wrote a simple signal processing algorithm that found each peak value and trough value of the waveform. The algorithm iterated through the sampled time series from beginning to end, and determined "peak periods" and "trough periods" relative to prior minima and maxima. Each peak period started when the signal rose 100 units (approximately 5 µm per unit) above the previous minimum, and ended when the signal fell 100 units below its current peak value. Trough periods were defined conversely, and minima below half the previous maximum were discarded to remove eye blinks. The algorithm produced one time series of peak values and a corresponding time series of trough values for each participant. Analyses showed no qualitative difference in results between peak and trough time series, so here we report only analyses for peak dilation values.

The same trimming procedure was applied to all four time series for each participant: values above and below 2.5 standard deviations were removed. Then, if the remaining time series was shorter than 1024 measurements, it was padded with mean values to reach a length of 1024. If the remaining time series was longer than 1024, an even amount of beginning and ending values were trimmed to reach 1024 (with an extra value trimmed at the start for odd numbers).

#### **RESULTS**

Each individual time series was submitted to spectral analysis, and each resulting spectrum was logarithmically binned to create nine estimates of spectral power in nine evenly spaced frequency bins on a logarithmic scale (see also Thornton and Gilden, 2005). Logarithmic binning ensures that the same amount of data goes into each power estimate, and it also facilitates our spectral matching analyses reported below.

Mean spectra are plotted in **Figure 1** for each of the four dependent measures, separated by synchronization vs. syncopation. The graphs show that fluctuations for all measures in both conditions followed a clear 1/f scaling relation. 1/f exponents were estimated by fitting regression lines (and reversing their signs to account for the inverse relationship) to spectra for individual participants: mean values combining the two metronome conditions were 0.76 for timing deviations, 0.83 for key-press durations, 0.90 for pupil dilations, and 0.81 for heartbeat intervals, where 1.0 is ideal 1/f noise. Estimated exponents for the synchronization condition were not reliably different from the syncopation condition—all *t*-tests were within-subjects and had 12−1 = 11 degrees of freedom, and all yielded *t*-values less than 1, *t*(11) < 1. Thus we did not replicate a previous study showing larger 1/f exponents for timing deviations when syncopating vs. synchronizing to a metronome (Chen et al., 2001). However, we found a trend in this direction (0.72 vs. 0.80, respectively), and we used an audiovisual metronome whereas the previous study used an audio-only metronome. Timing with pulsed visual signals is known to be less accurate than for audio signals (Chen et al., 2002), which might explain the small difference between our results and previous results (but see Hove et al., 2010). In any case, because there were no reliable effects of metronome condition, we combined them in subsequent analyses.

To test for coupling among 1/f noises, we used a measure of spectral convergence as an expression of complexity matching. In particular, log power estimates were subtracted per frequency bin for two given signals *a* and *b*, and the sum of their absolute values served as our measure of spectral convergence:

$$C\_{a,b} = \sum\_{f} |\log\left(\mathcal{S}\_{f,a}\right) - \log\left(\mathcal{S}\_{f,b}\right)|.$$

Smaller values corresponded with more similar i.e., convergent spectra. A measure of convergence was chosen over correlation of estimated 1/f exponents because the former is sensitive to idiosyncrasies in the individual 1/f-like spectra that converge towards 1/f in the average. Spectral Individual signals were compared because convergence is hypothesized to occur for the motor processes and ANS functions within individuals, as products of coordination, rather than across individuals. We also compared spectral convergence with cross-correlation to test whether coupling could be explained in terms of linear phase relations. **Figures 2**, **3** each show two example signals from one participant in the syncopation condition, along with two of the corresponding cross-correlation functions and two pairs of spectra to visualize their differences.

These measures of coupling cannot be interpreted without a baseline for comparison. With regards to spectral convergence, a spectral difference of zero is the absolute maximal similarity, but this measure does not have an inherent value or formula corresponding to chance similarity. We created baselines from surrogate pairings between signals from different participants. In particular, for each original comparison between time series *A* and *B*, a corresponding mean surrogate coupling was created by pairing each original time series with all *other* participants, i.e., 23 surrogate comparisons with *A* and 23 with *B*. Spectral convergence values were averaged for each set of 56 surrogate pairings to create a single baseline control for each pairing.

Comparisons between original and surrogate coupling values showed a clear and consistent pattern of results: there was reliable spectral coupling between key-press timing deviations and key-press durations, and also between peak pupil dilations and heartbeat intervals. However, there were no reliable couplings across key-press and ANS measures. To examine the two observed effects of spectral convergence, **Figure 4** plots the absolute log differences as a function of frequency, averaged for originals and baseline controls. Differences are generally greater in the lower frequencies, which appears to be attributable to more overall variability in spectral power relative to higher frequencies. Aside from this effect, original pairings are seen to be more similar to each other (i.e., smaller differences) compared with controls across the range of measured frequencies, indicative of coupling across timescales.

Statistical reliability of coupling was assessed using pairedsamples *t*-tests with *Ca*,*<sup>b</sup>* values for original pairings vs. their yoked controls. Spectra for timing deviations were reliably more similar to those for key-press durations compared with baseline controls, *t*(23) = 2.52, *p* < 0.01, and the same was true for pupil dilations and heartbeat intervals, *t*(23) = 2.18, *p* < 0.05. No other comparisons for spectral convergence approached significance, all *t*(23) < 1. Mean *Ca*,*<sup>b</sup>* values (with standard errors) for non-significant comparisons were the following for originals and controls, respectively: 0.83 (0.06) and 0.81 (0.02) for timing deviations X pupil dilations, 0.81 (0.05) and 0.81 (0.02) for timing deviations X heartbeat intervals, 0.88 (0.07) and 0.87 (0.03) for key-press durations X pupil dilations, and 0.87 (0.04) and 0.87 (0.02) for key-press durations X heartbeat intervals. Altogether, these tests show clear evidence of spectral convergence for keypress activity and for ANS activity, but not between the two.

Spectral convergence is not sensitive to the phase relation between the signals because phase information is discarded by spectral analysis. However, it is possible that phase relations played a role in the observed effects because two highly correlated signals (i.e., strong linear phase relation) will also have highly similar spectra. Here we show that signals may appear to be phase related, but that further analyses reveal these relations to be purely spectral in nature. Linear cross-correlation is perhaps the simplest and most common type of phase analysis, which tests for phase relations across the range of available lags. Given that we did not know a priori at what lag signals might be related, we simply took the peak *negative* correlation as a measure of maximal phase coupling. Preliminary analyses showed that peak negative correlations were slightly stronger than peak positive correlations, although both were weak: mean magnitudes varied within the small range of 0.15–0.21 across pairwise comparisons, at mean lags from about 140–340 beats apart. We report results with peak negative correlations, but results were not qualitatively different from peak positive correlations.

**FIGURE 2 | Time series of peak key-press response times and durations for one participant (above), where the x-axis was the sequence of over 1000 key-presses, and the y-axis is normalized times and durations with 0 mean and showing +/**−**2.5 standard** **deviations**. Corresponding cross-correlation function and spectra are shown below. The red dashed circle shows the peak negative correlation, and the dashed lines between spectra show absolute log differences.

**function and spectra**. The red dashed circle shows the peak negative correlation, and the red dashed lines between spectra show absolute log differences.

We conducted the same baseline control analysis as for spectral convergence, and we found the same pattern of effects as for spectral convergence: peaks were significantly more negative between timing deviations and key-press durations compared with baseline controls, *t*(23) = 1.82, *p* < 0.05, and the same was true for pupil dilations and heartbeat intervals, *t*(23) = 2.18, *p* < 0.05. And again, no other comparisons for spectral convergence approached significance, all *t*(23) < 1.93, *p* < 0.05. A summary of the correlational and spectral coupling results is shown in **Table 1**, which contains mean differences between coupling measures for original pairings minus baseline controls, for all pairwise comparisons. The table shows that, for the two reliable comparisons, differences from baseline were proportionally greater for spectral coupling than for correlational coupling.

These results suggest that simple linear phase relations may have contributed to the observed effects of spectral convergence, but it is curious that peak lags were so far apart. We do not know what type of phase coupling would explain phase relations offset by 2–4 min and well over 100 responses. An alternate possibility

**Table 1 | Mean correlational (top) and spectral (bottom) coupling effects for all pairwise comparisons between the four dependent measures (TD = Timing Deviations, KD = Key-Press Durations, PD = Pupil Dilations)**.


Statistically reliable effects are in bold, and the signs are reversed for correlational effects for consistent interpretation with spectral effects.

is that effects of spectral convergence can lead to spurious phase coupling when measured using our surrogate baseline analysis. We tested this alternative by using iterated amplitude adapted Fourier transform (IAAFT; Theiler et al., 1992; Schreiber and Schmitz, 1996), which scrambles phase relations in a given time series while preserving its spectral properties. If there is truly phase coupling, then cross-correlations for original comparisons should be stronger than those for the corresponding scrambled time series. Each surrogate pair had one original time series and one scrambled time series, and each original time series was paired with 100 scrambled series. We used paired-sampled *t*-tests to compare each original peak cross-correlation with the mean of its corresponding surrogate set.

Results from the IAAFT surrogate analysis revealed that there was no reliable linear phase coupling among any of the four dependent measures, as measured by peak cross-correlations. Surrogates were no different from originals for timing deviations and key-press durations, *t*(23) = 1.5, *p* > 0.14, nor for pupil dilations and heartbeat intervals, *t*(23) = 1.4, *p* > 0.17. The remaining comparisons were all near *t*(23) ∼ 1.4 or less. The IAAFT surrogate analysis provides additional evidence that the observed couplings in key-press responses and in measures of ANS functions (but not between the two) were expressed in terms of their power law spectral distributions, and not their phase relations (see also Kello et al., 2007).

#### **DISCUSSION**

The aim of the present experiment was to add to the body of evidence on the origins of 1/f noise in human timing, particularly with respect to two domain-general explanations. We employed a standard tapping task with synchronization and syncopation conditions, and we measured deviations in timing from a metronome. Our contribution was to record and analyze three additional repeated measures that varied in their relationship to timing deviations and the metronome. All four measures exhibited clear 1/f noise, consistent with previous studies suggesting that 1/f noise will manifest for any repeated measure of human behavior that is minimally perturbed and minimally constrained from one measurement to the next (Kello et al., 2010).

Our goal in eliciting these 1/f signals was to examine the relationships among them, as a way to test and elaborate upon the process summation vs. interdependent coordination accounts. The process summation account has served as a default explanation for many researchers over the years, in part because repeated measures of human timing and other behaviors might plausibly "pick up" on fluctuations in physiological and cognitive processes ranging across spatial and temporal scales of the brain and body. However, the idea of process summation has been called into question by a number of recent results. Our findings cast further doubt on this account because all four dependent measures exhibited distinct 1/f signals in terms of their phase relations—none were reliably cross-correlated relative to IAAFT controls. These findings are difficult to explain because at least some of these measures should pick up on the same summation of processes, which should result in reliable near-lag zero correlations. This is not what we found.

One could argue that each of our dependent measures tapped into a (mostly) distinct set of processes that each summed to produce distinct 1/f noises. However, while the 1/f signals had mostly unique phase profiles, their spectra were not fully distinct. Instead we found that spectra converged for timing deviations and keypress durations, and separately for pupil dilations and heartbeat intervals. These results indicate that 1/f fluctuations in different aspects of key-presses were coordinated across timescales, and likewise for ANS activity.

Interdependent coordination is in a better position to accommodate these results. We started with the basic premise that human timing is part and parcel with coordination, and that coordination requires a balanced, flexible coupling among whatever components are being coordinated. Flexible coupling is hypothesized to support the *soft-assembly* of sensorimotor function, and other types of biological and cognitive functions (Kello and Van Orden, 2009). A defining feature of highly adaptive systems is that their components can play multiple functional roles depending on context. In order to take on these different roles, components need to fall into different interdependent relationships under different conditions.

It is challenging to understand how biological and cognitive systems are so flexible. One valid and necessary approach is to study very particular examples and develop domain-specific theories to explain them. For instance, there are specific mechanisms of plasticity that re-organize sensorimotor maps in prism adaptation studies (Redding et al., 2005) or amputation cases (Sanes and Donoghue, 2000). However, it is equally valid and necessary to study basic principles from which many or even all mechanisms of sensorimotor function draw their flexibility. Metastability is one such principle that explicitly predicts 1/f noise to be a pervasive feature of systems of interdependent components poised near critical points. Theories of SOC have been formulated to explain why critical points appear to be so common to complex systems.

Metastability can explain 1/f noise in all four dependent measures, and it is consistent with the finding that two and only two pairs of these measures were coupled. However, the concept of metastability alone does not explain how spectral coupling can occur across timescales distinct from any phase coupling, nor does it explain the particular couplings of dependent measures that were observed. To explain the particular couplings observed, we will ultimately need domain-specific theories of manual sensorimotor control, and ANS function. For now, we can say that timing deviations and key-press durations measured two aspects of key-press dynamics that were coupled by the tapping task, and that coupling between pupil dilation and heartbeat intervals is "hard-wired" into the ANS. Moreover, tapping to a metronome while at rest did not enforce any physiological or informational demands on coupling between key-presses and the ANS. We conjecture that these systems would couple under more strenuous conditions, such as a sport with intense hand-eye coordination.

Finally, to explain spectral coupling across timescales, we refer to formal analyses of complexity matching that show maximal information exchange between complex systems with convergent power laws, yet distinct phase portraits. It is reasonable to assume that coordination is facilitated by maximal information exchange, and that key-press responses require information exchange among neural and motor processes involved in depressing and releasing the key on each response. It is also reasonable to assume that information must be exchanged among components of the ANS. As mentioned in the Introduction section, we do not mean information exchange in the sense of sending bits between components and subsystems. Instead we mean that components rely on each other to support sensorimotor and physiological functions (Kello and Van Orden, 2009). These functions are inherently multiscale, and hence the mutual interdependence that underlies them must span a range of spatial and temporal scales. Computational models based in metastability, such as critical branching networks (Kello, 2013), are needed to express formal theories of complexity matching in terms of neural, sensorimotor, and cognitive functions.

# **ACKNOWLEDGMENTS**

The experiment reported herein was approved by the UC Merced Institutional Review Board, and conformed to the board's regulatory standards. We thank the reviewers for their helpful comments and suggestions for additional analyses.

# **REFERENCES**


Bassingthwaighte, J. B., Liebovitch, L. S., and West, B. J. (1994). *Fractal Physiology.* New York: Oxford University Press.


Long-range correlations and their breakdown with disease. *J. Electrocardiol.* 28, 59–65. doi: 10.1016/s0022-0736(95)80017-4


Rosenbaum, D. A. (2009). *Human Motor Control.* San Diego, CA: Academic Press.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 May 2014; accepted: 26 August 2014; published online: 11 September 2014*.

*Citation: Rigoli LM, Holman D, Spivey MJ and Kello CT (2014) Spectral convergence in tapping and physiological fluctuations: coupling and independence of 1/f noise in the central and autonomic nervous systems. Front. Hum. Neurosci. 8:713. doi: 10.3389/fnhum.2014.00713*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Rigoli, Holman, Spivey and Kello. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# What does scalar timing tell us about neural dynamics?

#### *Harel Z. Shouval <sup>1</sup> \*, Marshall G. Hussain Shuler 2, Animesh Agarwal 1,3 and Jeffrey P. Gavornik4*

*<sup>1</sup> Deptartment of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA*

*<sup>2</sup> Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA*

*<sup>3</sup> Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX, USA*

*<sup>4</sup> Department of Brain and Cognitive Sciences, The Picower Institute of Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA*

#### *Edited by:*

*Willy Wong, University of Toronto, Canada*

#### *Reviewed by:*

*José M. Medina, Universidad de Granada, Spain Willy Wong, University of Toronto, Canada Marc Howard, Boston University, USA*

#### *\*Correspondence:*

*Harel Z. Shouval, Deptartment of Neurobiology and Anatomy, University of Texas Medical School at Houston, 6431 Fannin St., - Suite MSB 7.046, Houston, TX 77030, USA*

*e-mail: harel.shouval@uth.tmc.edu*

The "Scalar Timing Law," which is a temporal domain generalization of the well known Weber Law, states that the errors estimating temporal intervals scale linearly with the durations of the intervals. Linear scaling has been studied extensively in human and animal models and holds over several orders of magnitude, though to date there is no agreed upon explanation for its physiological basis. Starting from the assumption that behavioral variability stems from neural variability, this work shows how to derive firing rate functions that are consistent with scalar timing. We show that firing rate functions with a *log-power* form, and a set of parameters that depend on spike count statistics, can account for scalar timing. Our derivation depends on a linear approximation, but we use simulations to validate the theory and show that *log-power* firing rate functions result in scalar timing over a large range of times and parameters. Simulation results match the predictions of our model, though our initial formulation results in a slight bias toward overestimation that can be corrected using a simple iterative approach to learn a decision threshold.

**Keywords: scalar timing, Weber's law, temporal intervals, temporal coding, neural dynamics**

# **1. INTRODUCTION**

Errors estimating the intensity of a stimulus commonly scale linearly with the magnitude of the stimulus. This relationship, called *Weber's Law*, has proven to be a surprisingly general property of the brain that accurately describes perception across sensory modalities (Weber, 1843; Coren et al., 1984). We have previously used basic principles to argue that this scaling naturally emerges if neural processes representing stimulus magnitudes have tuning curves with a specific mathematical form and that the generality of the law implies that this is a fundamental organizing principal of neural computation (Shouval et al., 2013).

An analog of Weber's law in the temporal domain, called *linear scaling* or *scalar timing*, states that errors estimating temporal intervals scale linearly with the duration of the intervals (Gibbon, 1977; Church, 2003). Temporal perception has been extensively studied by psychologists and neuroscientists for over 150 years, starting in the 1860s with Fechner (Fechner, 1966), leading to considerable knowledge about the behavioral aspects of temporal perception. Much less is known, however, about the underlying neural substrate responsible for engendering observed timing behavior.

Over the years many theories have been proposed to account for scalar timing. The scalar expectancy theory (Gibbon, 1977; Church, 2003) is based on a counter and accumulator model, conceptually similar to counting the ticks of a mechanical clock, and variability arises from comparison errors with remembered reference values. Another class of models assumes an ensemble of neurons oscillating at different frequencies, and timing is produced by decision neurons which become active only when a precise set of the oscillating neurons are coactive (Matell and Meck, 2000). These models are akin to a Fourier transform of the desired temporal response profile. Variability stems from the addition of stochastic noise to the ensemble dynamics, and it has recently been shown analytically that general addition of noise at various levels in the model can result in scalar timing (Oprisan and Buhusi, 2011, 2014). Note that these models are currently derived based on continuous dynamical systems, not spiking neural models. Drift diffusion models have also been proposed to provide a mechanistic basis of interval timing, though with spike-statistics that are inconsistent with scalar timing. Recent derivations of this model, with drift that is driven by opponent inhibitory and excitatory processes, can account for scalar timing (Rivest and Bengio, 2011; Simen et al., 2011). A final class of models, including this work, assume that timing is derived from the state of dynamic neural responses. For example, time can be estimated from the threshold crossing of decaying neural response (Staddon et al., 1999), or from a precisely designed set of leaky integrators (Shankar and Howard, 2012).

Here, by extending our earlier analysis (Shouval et al., 2013) to the temporal domain, we explore the relationship between neural dynamics and temporal perception and propose a theory of scalar timing based on experimentally verifiable physiological processes. Our approach is based on the assumption that estimates of a temporal interval vary on a trial-by-trial basis due to spike count variability (Dean, 1981; Tolhurst et al., 1983; Churchland et al., 2010). Although our analysis is mathematically homologous to the intensity variable case, the physiological substrates of time and intensity estimation are quite different. Here we show that neural processes with with activity levels that dynamically progress with a log-power temporal profiles can account for scalar timing, in much the same way that our previous work showed that Weber's law results from log-power tuning curves. Our analysis is based on a linear approximation, with results that are less precise than we found when analyzing intensity variables. Though the log-power model does produces scalar timing, our initial formulation of the model with the linear approximation results in a small bias toward underestimation and slightly less variability than found in simulations. In this paper we also derive a discrete approximation, which is applicable only in the temporal domain, that precisely estimates simulated neural variability. We also demonstrate that the bias is a consequence of our initial threshold selection criteria that can be easily eliminated with a simple algorithm that learns the correct threshold value to accurately decode desired intervals.

### **2. METHODS AND RESULTS**

We start with the assumption that the brain uses the temporal evolution of neural activity, which progresses with predictable stochastic dynamics, to estimate intervals. Specifically, we assume that some stimulus at time *t* = 0 initiates a neural process (describing either a single neuron or, more likely, a neural ensemble) that is characterized by a spike rate function *r*(*t*) (see **Figures 1B,C**), which either increases (Roitman and Shadlen, 2002) or decreases (Shuler and Bear, 2006) monotonically. The average spike count within a window τ is:

$$R(t) = \int\_{t-\pi/2}^{t+\pi/2} r(t+t')dt'\tag{1}$$

which can be approximated as *R*(*t*) ≈ τ *r*(*t*) for small τ values. As illustrated in **Figure 2**, the temporal interval described by this process is defined as the time required for *R*(*t*) to reach some threshold *R*0, which can be set as *R*<sup>0</sup> = *r*(*ttar*) for a target time *ttar*. Noise driven fluctuations of *r*(*t*) result in variability of trial-by-trial estimates, *test*, of the encoded target time.

In this framework there is a direct relationship between the magnitude of temporal estimate errors (which can be easily recorded using standard psychophysical methods) and spike count statistics that can be used to infer a mathematical form of *r*(*t*), and thus the underlying physiology, subject to the linear constraint on estimate errors as a function of *ttar* specified by the scalar timing law. A simple linear approximation (illustrated in **Figure 1A**) of this relationship between the interval estimate and spike count has the form:

$$\sigma\_t(\mathbf{t}\_{\rm est}) \approx \frac{\sigma\_R(\mathbf{t}\_{\rm tar})}{|R'(\mathbf{t}\_{\rm tar})|} = \frac{\sigma\_R(\mathbf{t}\_{\rm tar})}{|\mathbf{r} \, r'(\mathbf{t}\_{\rm tar})|} \tag{2}$$

where σ*R*(*t*) is the standard deviation of the average spike count at time *t*, and *R*- (*ttar*) is derivative of the spike count curve with respect to the time, estimated at the target time. Note that σ*t*(*test*) is the standard deviation of the estimation of the time *t* over many trials.

The scalar law states that errors estimating *t* scale linearly with *t*. Using standard deviation as the error measure:

$$
\sigma\_t(t) = \alpha \cdot t. \tag{3}
$$

where α specifies the slope of linear scaling, equivalent to the "Weber fraction."

Combining Equations 2 and 3:

$$
\pi \frac{dr(t)}{dt} = \frac{d\mathcal{R}(t)}{dt} = \pm \frac{\sigma\_{\mathcal{R}}(t)}{\alpha t} \tag{4}
$$

where the + sign is valid when the slope of *r*(*t*) is positive and the − sign when it is negative.

We assume that spike count variability can be characterized using a power-law model with the form:

$$
\sigma\_{\mathbb{R}}(t) = \beta \left( \mathbb{tr}(t) \right)^{\rho} \tag{5}
$$

where the parameters β and ρ specify the specific noise model. This power-law model can account for many forms of spike-count variability. For example, ρ = 1/2 and β = 1 result in Poisson noise, and the ρ = 0 case is the constant noise case, which means spike count variability does not depend on the spike count. Experimentally spike count variability is found to be close to Poisson and often with somewhat larger variability than Poisson (ρ ≈> 1/2). Although the power-law noise is a relatively general model, it obviously cannot account for all forms of noise.

Applying this form to Equation 4, we obtain a differential equation relating the neural firing rate to specific noise and estimate error models:

$$\frac{dr(t)}{dt} = \pm \left(\frac{\beta \text{r}^{\rho - 1}}{\alpha}\right) \frac{r(t)^{\rho}}{t} \tag{6}$$

The solution of Equation 6 has a *log-power* form:

$$r(t) = K \cdot \left(\pm \log(t/t\_0)\right)^n \tag{7}$$

where *K* = <sup>1</sup> <sup>τ</sup> (β(1 <sup>−</sup> <sup>ρ</sup>)/α)*n*, and *<sup>n</sup>* <sup>=</sup> <sup>1</sup> <sup>1</sup>−<sup>ρ</sup> . This relationship holds whether *r*(*t*) rises (*t* ≥ *t*0, "+"case) or falls (*t* < *t*0,"−" case) monotonically. The integration constant *t*<sup>0</sup> has a simple interpretation: it is the minimal (or maximal) time that can be estimated using this specific monotonically increasing (or decreasing) firing rate function (as shown in **Figures 1B,C**) for different values of *t*0. Note that all the parameters of the *logpower* function are determined by measurable spike statistics and behavioral performance; none of them are free parameters.

The specific shape of the general *log-power* form depends primarily on the spike count statistics. In the constant noise case (ρ = 0) this equation reduces to Fechner's law (Fechner, 1966). Hence, Fechner's law can be seen as making an implicit constant noise assumption. In the special and unrealistic case of proportional noise (ρ = 1) a power-law solution is obtained (Stevens, 1961).

Experimental recordings are often characterized by a nearlylinear relationship between mean spike count and variance(Dean, 1981; Tolhurst et al., 1983; Churchland et al., 2010). In the near-Poisson case, (ρ = 1/2), the *log-power* form has an exponent of *n* = 2 (note, the examples in **Figures 1B,C** assume Poisson statistics). We have previously shown for the case of magnitude

**FIGURE 1 | Scalar timing and neural statistics. (A)** A local linear approximation (green line, Equation 2) of the the average firing rate *R*(*t*) (real distribution shown schematically by the gradient as a function of *t*, mean and standard deviations indicated by dashed-white and solid red lines) together with the scalar timing law leads to Equation 4, the solution of which (Equation

7 for the case of Poisson noise) is the firing rate curve *r*(*t*). Note, *R* is the slope of the linear approximation to *R*(*t*). **(B,C)** Example firing rate curves with Poisson spike statistics for different values of the integration constant *t*0. **(B)** Increasing solutions are defined above minimal values at *t*0. **(C)** Decreasing solutions are defined below maximal values at *t*0.

**FIGURE 2 | Temporal interval estimation. (A)** A stimulus (at time *t* = 0) initiates a neural process with a mean firing rate (black line, determined by linear approximation theory) that decreases with time. In each trial the actual number of spikes varies stochastically; three trial-by-trial examples of the spike count variable are shown by the colored lines. The time estimate in each trial is determined by the first threshold crossing (*RT* - horizontal dashed line) of the spike count variable. The actual estimated time for one trial (*test*) is shown in comparison to the target time (*ttar* ). **(B)** The mean time predicted

by the model (*test*, averaged over 200 trials) as a function of the target time. Blue circles based on simulations, red circles using discrete approximation. **(C)** The standard deviation of the time estimate (σ*<sup>T</sup>* ) as a function of the mean predicted interval. **(D)** When rescaled by the mean estimated time (values specified by the color-code shown in the legend), the cumulative distributions of the actual response times overlap and are statistically indistinguishable (KS-test). These distributions were generated using Poisson statistics, a decreasing *log-power* function, *t*<sup>0</sup> = 10 and τ = 0.1 sec.

estimation that Weber's law can be based on the tuning curves of either single neurons or neural ensembles (Shouval et al., 2013). Similarly here, while scalar timing can result if a single neuron's time-varying activity follows a *log-power* function, it is more likely to arise from the combined activity of a heterogenous population of neurons whose collective activity has the appropriate form.

To test the validity of our theory, we simulated a stochastic neural process with a monotonically falling spike rate in time (the increasing case, not discussed, is similar). Specifically, as per the derivations above, simulations were performed by generating spikes using a non-homogeneous Poisson process with a firing-rate parameter that decreased as a *log-power* function of time. Firing rates were determined by convolving the resultant spikes trains with a square window of width τ = 100 ms. The estimate of the temporal interval (*test*) was defined, on a per-trial basis, as the time at which the firing rate first reached threshold (*R*<sup>0</sup> set to *r*(*ttar*)). The result of these simulations are shown in **Figures 2B,C**. The mean value of *test* is close to, but a bit shorter than that predicted by the linear approximation theory (**Figure 2B**, blue circle) and the standard deviation is a linear function of the mean estimated time (**Figure 2C**–blue circle), although with a slope lower than that predicted by the linear approximation. Nevertheless, the rescaled distributions are almost completely overlapping (**Figure 2D**) and paired Kolmogorov-Smirnov tests find that the differences between these distributions are not statistically significant (although small differences may emerge with longer time spans, larger α values, or more trials). This shows that the *log-power* firing rate function indeed produces scalar timing, but that the theory described above results in an overestimate of error and a small bias of the mean.

It is possible to obtain a discrete approximation that better captures the simulation results. This approximation is obtained by dividing time into non-overlapping time bins of length τ , such the average spike count within a time bin (designated by the integer *i*) is *Ri* ≈ τ · *r*((*i* − 0.5) · τ ). Under the assumptions that the bins are non-overlapping and have no significant temporal correlations, the spike counts in each bin are conditionally independent. Then, for a given spike generation model, the probability of a threshold crossing within a time bin as time unfolds is:

$$P\_{\mathfrak{c}}(i|R\_0) = \sum\_{n \le R\_0} P\_{\mathfrak{s}}(n|R\_i), \tag{8}$$

where *Ps*(*n*|*R*) is the probability of emitting *n* spikes given the mean spike count *R* (note that this formulation assumes a decreasing function *r*(*t*), for the increasing function the sum is over *n* ≥ *R*0). The probability that the first threshold crossing occurs in time bin *j* is

$$P\_{\rm FC}(j|R\_0) = P\_{\rm c}(j|R\_0) \Pi\_{\vec{i}=1}^{\vec{j}-1} (1 - P\_{\rm c}(i|R\_0)). \tag{9}$$

This distribution can be used to calculate the mean, *test*, and standard deviation, σ*T*, of elapsed time estimates. Results of these calculations, (**Figures 2B,C**, red circles) agree closely with the numerical simulation results. The small discrepancy between this calculation and the actual simulations arises from partitioning time into non-overlapping time bins. Coarse grained simulations in which zero crossings are allowed only at these discrete points agree perfectly with the results of this discrete approximation.

Despite the close agreement between theory and simulations, the model as described consistently underestimates the mean target time. This bias, which results from the somewhat arbitrary decision to select *R*<sup>0</sup> = *r*(*ttar*), can be corrected if the thresholds are learned rather than chosen directly from the spike count function. To learn the threshold (*R*0) we used a simple iterative learning rule:

$$\frac{d\mathcal{R}\_0}{dt} = \pm \eta (t\_{tar} = t\_{est}) \tag{10}$$

where the + sign corresponds to the monotonically falling firing rate case, and the − sign is used for the monotonically increasing cases, and η << 1 is the learning rate. This procedure quickly converges to provide an unbiased estimate of the target times (**Figure 3A**); error still scales linearly with time (**Figure 3B**) and the discrete approximation accounts well for the slope.

# **3. DISCUSSION**

Relating behavior to its underlying physiological mechanism is a fundamental aim of neuroscience. Here we have shown how, in a class of models based on the idea that time is estimated based on the dynamic state of neural processes, to relate scalar timing to the time varying firing rate of neurons. We show that, given firing rate statistics characterized by a power-law, scalar timing arises from a log-power firing rate function with parameters that depend on the spike statistics. Our derivation relies on a linear approximation, but we have also shown that a log-power function results in scalar timing irrespective of this approximation. The initial method for setting the detection threshold using the mean of the firing rate function causes a small estimate bias, but this can be corrected using an iterative procedure to find an appropriate detection threshold. Further, we have shown how to use a discrete approximation to calculate better estimates of encoded time and variability given a log-power firing rate function. These result depend on a mathematical analysis which is similar to the one used in our previous analysis of Weber's law for intensity variables (Shouval et al., 2013). Though it produces scalar timing, the linear approximation is less precise in the temporal domain than it was when we used it to analyze the coding of stimulus intensity. Accordingly, it was necessary to introduced a discrete approximation in order to accurately calculate simulated variability. We also showed how the decision-selection threshold can bias the encoded interval. Most models that account for Weber's law in the intensity domain are completely distinct from models that account for scalar timing. Our results show that a single mathematical approach can provide a unified explanation for these two distinct observations.

Although we derived firing-rate functions for the case of perfectly-linear scalar timing, the same procedure used to generate Equation 4 can also be used when if the relationship between estimation error and time is non-linear (Grondin, 2012). Similar derivations are possible for other functional forms of Equation 3. Indeed there are experiments showing that scalar-timing is pre-

cisely linear only in a limited range (Getty, 1975; Bizo et al., 2006; Grondin, 2012), and the exact forms of scaling observed in these experiments could be used to replace the linear scaling assumed here. There is no guarantee that an analytical result can be derived in such cases, but numerical solutions are always possible. Similarly, the same type of approach can be used to obtain a firing rate function, either analytically or numerically, assuming non-power-law forms for neural spike statistics.

Our analytical derivation produces a monotonically falling functions that are valid only below and upper threshold (*t*0) and monotonically increasing functions valid only above a lower threshold (*t*0). There is a lower threshold below which we can not evaluate time intervals, which would indicate that the monotonically increasing results are possibly more realistic. However, experimental results showing different Weber fractions at different temporal intervals could also be interpreted as indications that different processes are used for different time scales (Getty, 1975). One possibility is that falling functions could be used for very short temporal durations, on the order of a second or less, and increasing functions for longer durations.

As we outlined above, various models of interval timing have been proposed over the years to account for scaler timing (Gibbon, 1977; Matell and Meck, 2000; Church, 2003; Durstewitz, 2003; Oprisan and Buhusi, 2011, 2014; Rivest and Bengio, 2011; Simen et al., 2011; Shankar and Howard, 2012) and some share key properties of the model proposed here (Staddon et al., 1999; Durstewitz, 2003). An entirely different class of models is based on the idea that time can be read from the dynamic state of circuits in the cortical network (Buonomano and Mauk, 1994; Karmarkar and Buonomano, 2007), though the conditions for scalar timing in these models have not been analyzed. Some of the previously developed models of scalar timing are based on abstract entities such as counters and accumulators (Gibbon, 1977; Church, 2003), and some are dependent on continuous variables (Matell and Meck, 2000; Karmarkar and Buonomano, 2007; Oprisan and Buhusi, 2011), while others can be interpreted in terms of spiking inhibitory and excitatory neurons (Rivest and Bengio, 2011; Simen et al., 2011) and require nearlyperfect integration for the decision process. The model presented here is formulated directly in terms populations of spiking neurons with experimentally measurable variables. There are no free parameters in our model, since all depend directly on neural and behavioral statistics. Therefore, our theory has the advantage that it can be tested experimentally at the physiological level.

Our analysis indicates a very precise log-power form for the firing rate function. One might wonder, rightly, if it realistic to expect a neural processes to have such a precise formulation. It is important to realize that our analysis does not require or claim that any single neuron should display precise log-power dynamics, though to get true linear scaling the relevant population of neurons must possess this form. The population can be composed of individual neurons with diverse response dynamics, as we demonstrated in the intensity domain (Shouval et al., 2013). A question not answered here is how single neurons or a population of neurons can develop firing rate functions with a desired form. Possible answers are provided by previous work showing how single neurons with active conductances (Durstewitz, 2004; Shouval and Gavornik, 2011) or networks of interacting neurons (Gavornik et al., 2009; Gavornik and Shouval, 2011) can be tuned to, or even learn de-novo, specific temporal dynamics. An additional possibility is that decision neurons can select (in the Hebbian sense) a sub-population of existing neurons with a combined spike rate that has a log-power form without requiring that the dynamics of any of the individual neurons change at all, though we can not here propose a biologically realistic mechanism for making this choice.

Recent experiments (Leon and Shadlen, 2003; Shuler and Bear, 2006; Chubykin et al., 2013) have made physiological recordings from cortical cells in animals as they learn temporal discrimination tasks. These results show that the firing rate function of cells change when animals learn different temporal intervals and theoretical models have been devised to account for them (Reutimann et al., 2004; Gavornik et al., 2009). Analyzing these cases, and determining a single framework that leads to scalar timing, is quite different in many respects from the analysis carried out here. We are currently studying this issue. Experimentally it requires many trials to change the firing rate function. One possibility, as mentioned above, is that the model presented here describes mechanisms used to discriminate times in a manner that requires little or no learning, whereas other models would be required to describe how representations of specific temporal intervals are encoded over many trials. Regardless, the work here makes a strong prediction that any neural process used to encode temporal intervals that display scalar timing, with our minimal assumptions, will have firing rates that evolve as a log-power of time.

# **AUTHOR CONTRIBUTIONS**

Harel Z. Shouval and Jeffrey P. Gavornik developed the original work on scalar timing and Weber's law based on experimental work by Marshall G. Hussain Shuler. Harel Z. Shouval and Animesh Agarwal performed analysis and simulations, analysis was confirmed by Jeffrey P. Gavornik. Harel Z. Shouval and Jeffrey P. Gavornik wrote the paper with the help of and Marshall G. Hussain Shuler and Animesh Agarwal.

# **FUNDING**

This publication was partially supported by R01MH093665. Jeffrey P. Gavornik is supported by K99MH099654.

# **ACKNOWLEDGMENT**

The authors would like to thank Leon Cooper for reading and commenting on the an early version of the manuscript.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 March 2014; accepted: 31 May 2014; published online: 19 June 2014. Citation: Shouval HZ, Hussain Shuler MG, Agarwal A and Gavornik JP (2014) What does scalar timing tell us about neural dynamics? Front. Hum. Neurosci. 8:438. doi: 10.3389/fnhum.2014.00438*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Shouval, Hussain Shuler, Agarwal and Gavornik. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Sequential sampling model for multiattribute choice alternatives with random attention time and processing order

#### *Adele Diederich1 \* and Peter Oswald <sup>2</sup>*

*<sup>1</sup> Cognitive Psychology, School of Humanities and Social Sciences, Jacobs University, Bremen, Germany*

*<sup>2</sup> Mathematics, Modeling, and Computing Center, School of Engineering and Science, Jacobs University, Bremen, Germany*

#### *Edited by:*

*José Antonio Díaz, Universidad de Granada, Spain*

#### *Reviewed by:*

*Chris Donkin, University of New South Wales, Australia José Antonio Díaz, Universidad de Granada, Spain*

#### *\*Correspondence:*

*Adele Diederich, Cognitive Psychology, School of Humanities and Social Sciences, Jacobs University, Campus Ring 1, Bremen 28759, Germany e-mail: a.diederich@ jacobs-university.de*

A sequential sampling model for multiattribute binary choice options, called *multiattribute attention switching* (MAAS) model, assumes a separate sampling process for each attribute. During the deliberation process attention switches from one attribute consideration to the next. The order in which attributes are considered as well for how long each attribute is considered—the attention time—influences the predicted choice probabilities and choice response times. Several probability distributions for the attention time with different variances are investigated. Depending on the time and order schedule the model predicts a rich choice probability/choice response time pattern including preference reversals and fast errors. Furthermore, the difference between finite and infinite decision horizons for the attribute considered last is investigated. For the former case the model predicts a probability *p*<sup>0</sup> > 0 of not deciding within the available time. The underlying stochastic process for each attribute is an Ornstein-Uhlenbeck process approximated by a discrete birth-death process. All predictions are also true for the widely applied Wiener process.

**Keywords: sequential sampling, multiattribute, attention time, time schedule, order schedule, finite time horizon, Ornstein-Uhlenbeck, Wiener**

# **1. INTRODUCTION**

Sequential sampling models are powerful models to account simultaneously for choice probabilities and choice response times. They have become the dominant approach to modeling decision processes in cognitive science. Their application includes a variety of psychological tasks from basic perceptual decision to complex preferential choice tasks. Early on they have been applied to identification and discrimination tasks (e.g., Edwards, 1965; Laming, 1968; Pike, 1973; Link and Heath, 1975; Heath, 1981; Ashby, 1983); memory retrieval (e.g., Stone, 1960; Ratcliff, 1978; Van Zandt et al., 2000); and classification (e.g., general recognition theory, Ashby, 2000; exemplar–based random walk models of classification, Nosofsky and Palmeri, 1997) to account for speed-accuracy data. They have also been used for preferential decision tasks (e.g., decision field theory (DFT), Busemeyer and Townsend, 1993; multiattribute dynamic decision model, Diederich, 1997; Diederich and Busemeyer, 1999) to account for choice response times and choice probabilities interpreted as preference strength; judgment and confidence ratings (Pleskac and Busemeyer, 2010); to account for selling prices, certainty equivalents, and preference reversal phenomena (Busemeyer and Goldstein, 1992; Johnson and Busemeyer, 2005). More recently, they have been applied to combining perceptional decision making and payoffs (Diederich and Busemeyer, 2006; Diederich, 2008; Rorie et al., 2010; Gao et al., 2011). Furthermore, these models have been closely linked to measures from neuroscience like multi-cell electrode recordings (e.g., Ditterich, 2006; Gold and Shadlen, 2007; Churchland et al., 2008).

Sequential sampling models assume that (1) stimulus and choice alternative characteristics can be mapped onto a hypothetical numerical value representing the instantaneous level of evidence (activation, information, or preference—the wording often depends on the context), (2) some random fluctuation of this value over time occurs, (3) this evidence is accumulated over time, and (4) a final choice is made as soon as the evidence reaches a threshold. Therefore, sequential sampling can be described as a stochastic process. Two quantities are of foremost interest: (1) the probability that the process eventually reaches one of the thresholds or boundaries for the first time (the criterion to initiate a response), i.e., *first passage probability*; (2) the time it takes for the process to reach one of the boundaries for the first time, i.e., *first passage time*. The former quantity is related to the observed relative frequencies, the latter usually to the observed mean choice response times or the observed choice response time distribution.

Two classes of sequential sampling models have been predominantly used in psychology: Random walk/diffusion models and accumulator/counter models. The former are typically applied to a binary choice task, so that evidence for one choice alternative is at the same time evidence against the other. A decision is made as soon as the process reaches one of two preset criteria. In the latter, an accumulator/counter is established for each choice alternative separately, and evidence is accumulated in parallel. A decision is made as soon as one counter wins the race to reach one preset criterion. The accumulators/counters may or may not be independent. In the following we focus on random walk/diffusion models. For a review of both diffusion models and counter models see Ratcliff and Smith (2004).

To be more precise and to introduce notation, let *X*(*t*) denote the accumulation process. For a binary choice, say between choice options A and B (**Figure 1**), the models assume that the decision process begins with an initial state of evidence *X*(0). This initial state may either favor option A (*X*(0) > 0) or option B (*X*(0) < 0) or may be neutral with respect to A or B (*X*(0) = 0). Upon presentation of the choice options, the decision maker sequentially samples information from the stimulus display over time, retrieves information from memory, or forms preferences, depending on the context. The small increments of evidence sampled at any moment in time are such that they either favor option A (*dX*(*t*) > 0) or option B (*dX*(*t*) < 0). The evidence is accumulated from one moment in time to the next by summing the current state with the new increment: *X*(*t* + *h*) ≈ *X*(*t*) + μ(*X*(*t*), *t*) *h* + σ (*X*(*t*), *t*) (*W*(*t* + *h*) − *W*(*t*)). Here, μ(*x*, *t*) is called the *drift rate* and describes the expected value of increments per unit time. The factor σ(*x*, *t*) in front of the increments *W*(*t* + *h*) − *W*(*t*) of a standard Wiener process *W*(*t*) is called the *diffusion rate*, and relates to the variance of the increments. This process continues until the magnitude of the cumulative evidence exceeds a threshold criterion, θ. The process stops and option A is chosen as soon as the accumulated evidence reaches a criterion value for choosing A (here, *X*(*t*) = θ*<sup>A</sup>* > 0) or it stops and chooses option B as soon as the accumulated evidence reaches a criterion value for choosing B (here *X*(*t*) = θ*<sup>B</sup>* < 0). The probability of choosing A over B is determined by the accumulation process reaching the threshold for A before reaching the threshold for B. The criterion is assumed to be set by the decision maker prior to the decision task.

the accumulation process still evolves and no response is yet initiated.

The Wiener process with drift, lately called *drift-diffusion model* in the psychological literature (Bogacz et al., 2006), is the most widely applied model. Different versions reflect additional assumptions for specific psychological domains. Ratcliff (1978) proposed a diffusion model for memory retrieval that is used for various psychological decision tasks. It is based on the work by Laming (1968) and Link and Heath (1975) and assumes variability in the starting point (i.e., *X*(0) follows a uniform distribution), and the drift rate μ = μ(*t*) of the Wiener process is normally distributed (cf. Laming). The residual time, i.e., the time other than the decision time, such as stimulus encoding and motor response, is assumed to be uniformly distributed and added to the decision time, i.e., response time equals the decision time plus a residual (non-decision) time. For a recent overview with applications see Voss et al. (2013). Other approaches include the Ornstein-Uhlenbeck model that linearly accumulates evidence with decay (Busemeyer and Townsend, 1993; Diederich, 1997), and the leaky competing accumulator model (Usher and McClelland, 2001) that non-linearly accumulates evidence with decay.

Common to almost all of these approaches is the assumption that a single integrated source of evidence generates the evidence during the deliberation process leading to a decision. In particular, the integrated source may be based on multiple features or attributes, but all of these features or attributes are assumed to be combined and integrated into a single source of evidence, and this single source is used throughout the decision process until a final decision is reached. Diederich (e.g., Diederich, 1995, 1997, 2003, 2008), however, assumed a separate process for each attribute1. The decision maker switches attention from one attribute to the next during the time course of one trial. For instance, in a crossmodal task (visual, auditory, tactile), Diederich (1995) assumed a serial processing controlled by stimulus input at given stimulus onset asynchronies (SOA). That is, the order of attributes, here a light, followed by a tone, followed by a tactile vibration, as well as the point in time when a new attribute was added, here the tone presented at *t*<sup>1</sup> (*t*<sup>1</sup> ms after the light onset) and the tactile vibration at *t*<sup>2</sup> (*t*<sup>2</sup> ms after the light onset) was determined externally by the experimental setup. In the following we will call an attention switch at predetermined, fixed times, and predefined order attributes, a *deterministic time and order schedule*. Often, however, neither the processing order of attributes nor the point in time when the decision maker switches attention from one attribute to the next are known or can be inferred from the experimental setup. For those cases, Diederich (1997) proposed a specific model in which attention switches from one attribute to the next with some probability. This is an instance of a *random time and order schedule* which will be investigated more systematically in the present study.

<sup>1</sup>The notion of *attributes* is defined here in a broad sense. For example, it includes dimensions such as color and size of visual target; amplitude and frequency of a tone; different modalities in a crossmodal task; payoff information and perceptual information; attitudinal evidence and perceptual evidence; prize and quality of a consumer product and more.

The purpose of this paper is to present a unified treatment of sequential sampling models for both deterministic and random time and order schedules. To do so we start with deriving expressions for mean choice response times and choice probabilities for a deterministic time and order schedule before we show how they extend to random time and order schedules, including Poisson, binomial, geometric, and uniform distributions for the attention time devoted to each attribute in the sequence before attention switches to the next randomly or deterministically chosen attribute. We will provide first numerical evidence on the influence of various properties of a schedule on the predictions for mean choice response times and choice probabilities.

# **2. PRELIMINARIES**

The model applies to any finite number of attributes that the decision maker may consider, i.e., *k* = 1,..., *K*. For convenience we first describe the process for one attribute. As underlying information process for each attribute we assume an Ornstein-Uhlenbeck process *X*(*t*) defined by

$$dX(t) = \left(\delta\_k - \gamma\_k X(t)\right)dt + \sigma\_k \, dW(t),\tag{1}$$

where *W*(*t*) is a standard Wiener process. The parameters δ*k*, γ*k*, and σ*<sup>k</sup>* are characteristics of the *k*-th attribute. The attribute characteristics may affect the quality of the extracted evidence for choosing *A* over *B* and this quality of evidence determines the drift rate δ*k*. That is, the better an attribute discriminates between *A* and *B*, the larger is δ*k*. The parameter γ*<sup>k</sup>* which induces a change of the drift rate depending on the current state in the state space is often connected to memory processes (e.g., primacy and recency effects), conflict situations (e.g., approach-avoidance), or similarities between choice alternatives. Thus, together the effective drift δ*<sup>k</sup>* − γ*kX*(*t*) determines the direction and the velocity of the process when considering the *k*-th attribute at time *t*. Note that by setting γ*<sup>k</sup>* to 0 results in a Wiener process with drift. That is, all the analysis we perform in the following is also valid for the Wiener process with drift. The diffusion coefficient σ*<sup>k</sup>* indicates the variance of the increments of the process, for simplicity, we will set σ*<sup>k</sup>* = σ for all *k*.

#### **2.1. MATRIX APPROACH**

Stochastic processes such as the above *X*(*t*) can be approximated by a discrete time, finite state space Markov chain. We use the matrix approach since it is simple to implement, sufficient in determining the entities of interest, i.e., choice probabilities and choice response times, and flexible to account for non-stationary and non-linear properties one wishes to include for the decision making process in the future. The continuous state space [θ*B*, θ*A*] of the piecewise Ornstein-Uhlenbeck process *X*(*t*) is replaced by a finite state space *S* = {−*mB*,..., *mA*} with *m* = *mA* + *mB* + 1 states. The diffusion process {*X*(*t*), *t* ≥ 0} is approximated by a discrete random walk {*X*˜(*n*), *n* ≥ 0} with values in *S* such that *X*(*n*τ ) ≈ · *X*˜(*n*) and θ*<sup>A</sup>* ≈ *mA* and θ*<sup>B</sup>* ≈ −*mB*, where is the step size of change in evidence. To achieve convergence in the limit, the discretization parameters ( for state space, and τ for time) are tied to each other by the relation = σ √τ .

The attribute-related matrices *Pk*, *k* = 1,..., *K*, are given in their canonical form by

$$\begin{aligned} \;^pP\_k &= \left[ \begin{array}{c|cccc} I & 0 & \\ \hline \;R\_k & \;Q\_k \end{array} \right] \\ &= \left[ \begin{array}{c|cccc} 1 & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 1 & 0 & 0 & \cdots & 0 & 0 \\ \hline \;P\_{21} & 0 & P\_{22}^{(k)} & P\_{22}^{(k)} & \cdots & 0 & 0 \\ 0 & 0 & P\_{22}^{(k)} & P\_{22}^{(k)} & \cdots & 0 & 0 \\ 0 & 0 & \begin{array}{c|c} P\_{21} & P\_{1k}^{(k)} & \cdots & 0 & 0 \\ 0 & P\_{43} & P\_{45}^{(k)} & \cdots & 0 & 0 \\ 0 & P\_{43} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & P\_{m-3,m-2}^{(k)} & 0 \\ 0 & 0 & \cdots & P\_{m-2,m-2}^{(k)} & P\_{m-2,m-1}^{(k)} \\ 0 & 0 & \cdots & P\_{m-1,m-2}^{(k)} & P\_{m-1,m-1}^{(k)} \end{array} \right. \end{aligned} (2)$$

where

$$p\_{i, \ i-1}^{(k)} = \frac{1}{2} \left( 1 - (\delta\_k - \chi\_k i \Delta) \frac{\sqrt{\tau}}{\sigma} \right),$$

$$p\_{i, \ i+1}^{(k)} = \frac{1}{2} \left( 1 + (\delta\_k - \chi\_k i \Delta) \frac{\sqrt{\tau}}{\sigma} \right),$$

for *i* = 2,..., *m* − 1 (here, the index *i* corresponds to the state *i* − 1 − *mB*). As → 0 (or, equivalently, τ → 0), the decision probabilities and mean choice response times obtained from the Markov chain model converge to the values obtained from the underlying continuous process *X*(*t*). The identity matrix *I* corresponds to the two absorbing states (−*mB* and *mA*) associated with the two decision thresholds, one for each choice alternative; the matrix *Qk* contains the transient probabilities, corresponding to the updating evidence process, and the matrix *Rk* contains the one-step transition probabilities from the transient to the absorbing states. In particular, the first column vector of the matrix *Rk* (denoted by *RB*,*k*) contains the transient probabilities for reaching alternative *B*, while the second *RA*,*<sup>k</sup>* contains the ones for alternative *A*. For details and derivations see Diederich (1997) and Diederich and Busemeyer (2003).

#### **2.2. TIME AND ORDER SCHEDULE**

For *K* attributes, each one to be considered for some specific time in some specific order it is convenient to introduce a formal schedule of both time and order. A finite time and order schedule consists of a set of *L* consecutive time intervals {[*tl* <sup>−</sup> <sup>1</sup>, *tl*]}*l*=1,...,*<sup>L</sup>* and the attribute sequence {*kl* ∈ {1,..., *K*}}*<sup>l</sup>* <sup>=</sup> <sup>1</sup>,...,*<sup>L</sup>* which specifies that during the time interval [*tl* <sup>−</sup> <sup>1</sup>, *tl*] the *kl*-th attribute is considered. At switching time *tl*, *l* = 1,..., *L* − 1, attention switches from attribute *kl* to attribute *kl* <sup>+</sup> 1. Depending on the situation, the final time *tL* may be set finite (then the decision process may also finish without deciding for one of the alternatives) or infinite. Consequently, the process *X*(*t*) determined by such a schedule is a piecewise Ornstein-Uhlenbeck process, defined over a finite partition *t*<sup>0</sup> = 0 < *t*<sup>1</sup> <...< *tL* <sup>−</sup> <sup>1</sup> < *tL* ≤ +∞ of the time interval [0, *tL*], where for *t* ∈ [*tl* <sup>−</sup> <sup>1</sup>, *tl*] the process is determined by (1) with *k* = *kl*. **Figure 2** shows an example with three

different attributes (*K* = 3) and a deterministic time and order schedule of length *L* = 4 with switching times *tl* independent of the trajectories, and attribute order (1, 2, 1, 3), i.e., *k*<sup>1</sup> = 1, *k*<sup>2</sup> = 2, *k*<sup>3</sup> = 1, *k*<sup>4</sup> = 3 (note that the first attribute is reconsidered once).

For fixed resp. τ , the *m* × *m* transition probability matrix *P*˜ *<sup>n</sup>* containing the transition probabilities *p*˜*ii* := *P*(*X*˜ *<sup>n</sup>* <sup>+</sup> <sup>1</sup> = *i* |*X*˜ *<sup>n</sup>* = *i*) for the *n*-th step of the discretetime random walk depends on the currently considered attribute defined by the time and order schedule, i.e., we set *<sup>P</sup>*˜*<sup>n</sup>* <sup>=</sup> *Pkl* if *n* = *nl* <sup>−</sup> <sup>1</sup>,..., *nl* − 1, where *n*<sup>0</sup> = 0, τ*nl* ≈ *tl* for *l* = 1,..., *L* (if *tL* = ∞, we formally set *nL* = ∞).

# **3. CHOICE PROBABILITIES AND MEAN CHOICE RESPONSE TIMES**

In this section we derive the choice probabilities and mean choice response times for various time and order schedules. For simplicity we assume an unbiased process, i.e., with *X*(0) = 0 and symmetric decision thresholds , i.e., θ*<sup>A</sup>* = −θ*B*. Since the diffusion coefficient is a scaling parameter it will be set to σ = 1 for all attributes throughout. We start with the deterministic time and order schedule.

#### **3.1. DETERMINISTIC TIME AND ORDER SCHEDULE**

The evidence accumulation process for attribute *k*1, which is considered first, evolves until time *t*<sup>1</sup> when the second attribute *k*<sup>2</sup> comes into consideration, triggering a change in the accumulation process. This attribute in turn is considered until time *t*<sup>2</sup> when a third attribute *k*<sup>3</sup> is considered and so forth until a decision is initiated (or *tL* is reached). Let the random variables *TA* and *TB* denote the finite time when the process reaches a decision threshold θ*<sup>A</sup>* or −θ*B*, stops, and a decision response for *A* or *B* is initiated. With the switching times *tl* replaced by integers *nl* ≈ *tl*/τ , the choice probability *Pr*[choose *A*] = *Pr*(*TA* < ∞) is then approximated by the value *pA* obtained from the discrete random walk model as

$$\begin{split} Pr(T\_{\mathcal{A}} < \infty) &\quad \approx p\_{\mathcal{A}} := Z' \sum\_{i=1}^{n\_1} Q\_{k\_1}^{i-1} R\_{\mathcal{A}, k\_1} \\ &\quad + Z' Q\_{k\_1}^{n\_1} \sum\_{i=n\_1+1}^{n\_2} Q\_{k\_2}^{i-(n\_1+1)} R\_{\mathcal{A}, k\_2} + \dots \dots \\ &\quad + Z' Q\_{k\_1}^{n\_1} \dots Q\_{k\_{L-1}-1}^{n\_{L-1}-n\_{L-2}} \sum\_{i=n\_{L-1}+1}^{n\_L} Q\_{n\_L}^{i-(n\_{L-1}+1)} R\_{\mathcal{A}, k\_L}, \end{split} \tag{3}$$

where *Z* is the probability distribution for the initial state *X*(0). For instance, for an unbiased process, *Z* would be a coordinate vector with probability 1 at state 0 halfway between the decision thresholds. The remaining vectors and matrices are those defined in (2). The evidence accumulation process for a successive attribute starts with the final evidence state of the previous attribute. Note that *Z Qn*<sup>1</sup> *<sup>k</sup>*<sup>1</sup> to *<sup>Z</sup> Qn*<sup>1</sup> *<sup>k</sup>*<sup>1</sup> ... *<sup>Q</sup>nL* <sup>−</sup> <sup>1</sup>−*nL* <sup>−</sup> <sup>2</sup> *kL* <sup>−</sup> <sup>1</sup> are defective distributions, i.e., the entries of these vectors do not sum up to 1, for the states of the random walk at discrete times *n*1,..., *nL* <sup>−</sup> 1. Further note that the stochastic process is time homogeneous within each time interval [0, *t*1) to [*tL* <sup>−</sup> <sup>1</sup>, *tL*] but non-homogeneous across [0, *tL*] (see Diederich, 1992, 1995).

Similarly, the mean response time for choosing alternative *A* is approximated as

$$\begin{aligned} E[T\_{\Lambda} \mid \text{choose } A] & \approx E T\_{\Lambda} := \frac{\pi}{P\_{\Lambda}} \left[ Z' \sum\_{i=1}^{n\_1} i Q\_{k\_1}^{i-1} R\_{A, k\_1} \\ & + Z' Q\_{k\_1}^{n\_1} \sum\_{i=n\_1+1}^{n\_2} i Q\_{k\_2}^{i-(n\_1+1)} R\_{A, k\_2} + \dots \dots \\ & + Z' Q\_{k\_1}^{n\_1} \dots Q\_{k\_{\ell-1}}^{n\_{\ell-1}-n\_{\ell-2}} \sum\_{i=n\_{\ell-1}+1}^{n\_\ell} i Q\_{n\_\ell}^{i-(n\_{\ell-1}+1)} R\_{A, k\_\ell} \right]. \end{aligned} \tag{4}$$

The probability and the mean response time for choosing alternative *B* can be determined similarly. Note that *p*<sup>0</sup> := 1 − (*pA* + *pB*), the probability of not making a decision until the final time *tL*, is strictly positive if *tL* < ∞. As shown in Diederich (1997), these formulas can be further compactified. We will do this below for the general case of deterministic and random schedules by deriving an efficient recursion for their evaluation.

#### **3.2. RANDOM TIME AND ORDER SCHEDULE**

The above derivation of formulas for choice probabilities and mean response times for a deterministic time and order schedule have counterparts for random schedules which we describe next in three steps.

#### *3.2.1. Random order schedule*

For generating the attribute order {*kl*}*<sup>l</sup>* <sup>=</sup> <sup>1</sup>,...,*L*, we consider stochastic *K* × *K* matrices *D*(*l*) such that *d*(*l*) *k <sup>k</sup>* ≥ 0 describes the probability with which attention switches from the *k* -th attribute to the *k*-th attribute at switching time *tl* ≈ τ*nl*, *l* = 1,..., *L* − 1. Normally, *d*(*l*) *kk* = 0 would be assumed, to avoid a no switching situation. For two attributes *K* = 2, we must then have *d*(*l*) <sup>11</sup> = *d*(*l*) <sup>22</sup> <sup>=</sup> 0, *<sup>d</sup>*(*l*) <sup>12</sup> <sup>=</sup> *<sup>d</sup>*(*l*) <sup>21</sup> = 1, and the attribute sequence is either (1, 2, 1, 2,...) or (2, 1, 2, 1,...), depending on whether *k*<sup>1</sup> = 1 or *k*<sup>1</sup> = 2. For three attributes and *L* = 3, choosing

$$D^{(1)} = \begin{bmatrix} \mathbf{0} & 1/2 \ 1/2 \\ 1/2 & \mathbf{0} & 1/2 \\ 1/2 \ 1/2 & \mathbf{0} \end{bmatrix}, \quad D^{(2)} = \begin{bmatrix} \mathbf{0} & 1 & \mathbf{0} \\ 1 & \mathbf{0} & \mathbf{0} \\ \mathbf{3}/4 & 1/4 & \mathbf{0} \end{bmatrix},$$

would for *k*<sup>1</sup> = 1 result in order sequences (1, 2, 1), (1, 3, 1), (1, 3, 2) with probability 1/2, 3/8, 1/8, respectively. The above matrix *D*(1) models the situation when no preference or bias for considering attributes can be asserted.

#### *3.2.2. Random time schedule*

We assume that the number of discrete time steps during which attention is paid to the *k*-th attribute is a discrete random variable denoted by *Tat* with given distribution. In principle, this distribution may change its type and may have different parameters, such as expected value, depending on the attribute and the attribute order {*kl*}*<sup>l</sup>* <sup>=</sup> <sup>1</sup>,...,*L*. This can be used to model time pressure and other temporal effects. However, often we assume one and the same distribution type for attention times across all attributes, and allow for different parameters only.

For instance, the *geometric distribution* (as implicitly considered in Diederich, 1997) is given by

$$Pr(T\_{at} = n) = (1 - r)^{n - 1}r, \quad n = 1, 2, \dots, n$$

and characterized by a single parameter *r* > 0, with expectation 1/*r* and variance (1 − *r*)/*r*2, and the uniform distribution is defined as

$$Pr(T\_{at} = n) = \frac{1}{2M + 1}, \quad n = N - M, \dots, N + M, \dots$$

with parameters *N* and *M* = 0, 1,..., *N* − 1 and expectation *N* and variance *M*(*M* + 1)/3. Details for other tested distributions (Poisson with parameter λ > 0, and binomial distributions with parameters *n* and *p*) are omitted. For comparable expectation values *E*(*Tat*) (i.e., for parameter choices 1/*r* ≈ *N* ≈ λ ≈ *np*), the geometric distribution has much larger variance than the Poisson, binomial and uniform distribution with *<sup>M</sup>* <sup>≈</sup> <sup>√</sup>*<sup>N</sup>* (the latter are very close to each other). **Figure 3** shows the pdf and cdf for different *Tat* distributions with fixed mean value *E*(*Tat*) = 300. The two uniform distributions are with *M* = 150 = *N*/2 and *M* = 299 = *N* − 1. Varying the parameter *M* of the uniform distribution allows us to produce intermediate results between the deterministic and geometric distribution cases as shown in the following.

#### *3.2.3. Constructing random time and order schedules*

We create a *random time and order schedule* of length *L* in two steps: First, given an initial distribution of *k*<sup>1</sup> ∈ {1,..., *K*}, we create the attribute sequence {*kl*}*l*=2,...,*<sup>L</sup>* using a non-stationary Markov chain model with transition probability matrices *D*(*l*) , *l* = 1,..., *L* − 1. In a second step, for each *l* = 1,..., *L*, the attention time *T*(*l*) *at* = *nl* − *nl* <sup>−</sup> <sup>1</sup> is created by the discrete random variable responsible for the attention time paid to the *kl*-th attribute, choices are independent for the different *l*. Consequently, *tl* − *tl* <sup>−</sup> <sup>1</sup> ≈ τ*T*(*l*) *at* is the real attention time paid to the *kl*-th attribute. We note that *semi-random schedules*, where the sequence {*kl*} is given deterministically, and only the *T*(*l*) *at* are determined as in the second step outlined above, are covered if we choose the *D*(*l*) such that *d*(*l*) *kl*,*kl* <sup>+</sup> <sup>1</sup> <sup>=</sup> 1.

To understand the recursive computation of choice probabilities and mean response times in this more general case, we first consider the special cases *L* = 1, 2, and illustrate the derivation on some distribution types of the random variable *Tat* generating attention times by providing concrete formulas. In general, the distribution for *Tat* is given by its probability mass distribution (pdf) and cumulative distribution function (cdf)

$$\Pr(T\_{at} = n) = p\_{n,k},\tag{5}$$

$$\Pr(T\_{at} \le n) = f\_{n,k} := \sum\_{i=0}^{n} p\_{i,k}, \quad n = 0, 1, \ldots$$

We start with *L* = 1, and will drop the index *l* from the notation introduced in the previous subsection. Since the probability of choosing alternative *A* at the *i*-th step is given by *Z Qi* <sup>−</sup> <sup>1</sup> *<sup>k</sup> RA*,*k*, *i* = 1,..., *Tat*, and *Tat* is a random variable distributed according to (5) we get

$$\begin{aligned} p\_{A,k} &= \sum\_{n=1}^{\infty} p\_{n,k} Z' \left( \sum\_{i=1}^{n} Q\_k^{i-1} \right) R\_{A,k} \\ &= Z' \left[ \sum\_{i=1}^{\infty} \left( \sum\_{n=i}^{\infty} p\_{n,k} \right) Q\_k^{i-1} \right] R\_{A,k} \\ &= Z' \left[ \sum\_{i=0}^{\infty} \left( 1 - f\_{i,k} \right) Q\_k^i \right] R\_{A,k} . \end{aligned}$$

A similar formula holds for *pB*,*k*. To avoid repetition, introduce the row vector *pAB*,*<sup>k</sup>* := [*pB*,*k*, *pA*,*k*], then

$$p\_{AB,k} = Z'V\_k, \quad V\_k := \left[\sum\_{i=0}^{\infty} (1 - f\_{i,k})Q\_k^i\right]R\_k. \tag{6}$$

The 2 × (*m* − 2) matrix *Vk* depends on the attribute and its parameters via *Qk*, *Rk*, and on the chosen attention time distribution and the cdf (*fn*,*k*). For the discussed concrete attention time distributions these matrices may be precomputed, in some cases closed-form expressions can be found, e.g., for the geometric distribution with parameter *r* = *rk* we have

**FIGURE 3 | Probability mass distributions (A) and cumulative distribution functions (B) for commonly used attention time distributions.** All distributions have expected value 300. The uniform

$$\begin{aligned} V\_k &= \sum\_{i=0}^{\infty} \left( \sum\_{j=i+1}^{\infty} r\_k (1 - r\_k)^{j-1} \right) Q\_k^i R\_k \\ &= \sum\_{i=0}^{\infty} \left( 1 - r\_k \right)^i Q\_k^i R\_k = \left( I - \left( 1 - r\_k \right) Q\_k \right)^{-1} R\_k. \end{aligned}$$

Next we discuss choice probabilities for the case *L* = 2, assuming for simplicity that the attention time distribution is the same for all attributes. To save on indices, denote *k*<sup>1</sup> ≡ *k* , *k*<sup>2</sup> ≡ *k*, and *D*(1) ≡ *D* (this matrix is responsible for the random choice of *k* given any *k* ). Then the decision probability vector *pAB*,*<sup>k</sup>* ,*<sup>k</sup>* for reaching alternatives *B* or *A* in with attribute order (*k* , *k*) has two parts: the probabilities of having decided on while still considering the *k* -th attribute (i.e., *TA*/τ ≤ *T at*, where *<sup>T</sup> at* is the randomly generated attention time for the first attribute *k* ) plus the probabilities that τ*T at* <sup>&</sup>lt; *TA*/τ <sup>≤</sup> *<sup>T</sup> at* + *Tat*, where *Tat* is the randomly (and independently) generated attention time for the second attribute *k*. On top of this, *k* itself is randomly chosen according to the entries in the *k* -th row of *D*. Thus, for each fixed *k*<sup>1</sup> = *k* and *n*<sup>1</sup> = *T at* according to (6) probabilities for reaching a decision after *n*<sup>1</sup> are given by

$$\begin{aligned} &\left[\Pr\left(T\_{at}' < \frac{T\_B}{\tau} < \infty\right), \Pr\left(T\_{at}' < \frac{T\_A}{\tau} < \infty\right)\right]\_{\mathfrak{n}\_1 = T\_{at}', k\_1 = k'} \\ &\approx \sum\_{k=1}^K d\_{k'} {}\_{k} {}\_{k} {}^{\ell} \mathbf{Q}\_{k'}^{\mathfrak{n}\_1} V\_k = Z' \mathbf{Q}\_{k'}^{T\_{at}'} \left(\sum\_{k=1}^K d\_{k'k} V\_k\right). \end{aligned}$$

distributions with *N* = 300 and *M* = *N*/2 = 150 are labeled as Unif.1 and with *N* = 300 and *M* = *N* − 1 = 299 as Unif.2. Geom. represents the geometric distribution.

Thus, for *L* = 2, the choice probabilities (under the assumption that *k*<sup>1</sup> = *k* is fixed) can be obtained as

$$\begin{aligned} \|p\_B, p\_A\|\_{k\_1=k'} &= Z'V\_{k'} + \sum\_{n\geq 0} p\_{n,k'} Z'Q\_{k'}^n \left(\sum\_{k=1}^K d\_{k'k} V\_k\right) \\ &= Z' \left[ V\_{k'} + \left(\sum\_{n\geq 0} p\_{n,k'} Q\_{k'}^n \right) \left(\sum\_{m=1}^M d\_{k'k} V\_k \right) \right] \\ &= Z' \left[ V\_{k'} + B\_{k'} \left(\sum\_{k=1}^K d\_{k'k} V\_k \right) \right], \quad k' = 1, \dots, K, \end{aligned}$$

where

$$B\_k = \sum\_{n\geq 0} p\_{n,k} Q\_k^n, \quad k = 1, \ldots, K,\tag{7}$$

are (*m* − 2) × (*m* − 2) matrices depending on the attribute and attention time distribution type. For example, for the geometric distribution this simplifies to *Bk* = *rkQk*(*I* − (1 − *rk*)*Qk*)<sup>−</sup>1, closed form expressions are available for Poisson, binomial, and uniform distributions as well.

For arbitrary *L*, it is more convenient to write the resulting recursion in terms of block-matrix-vector operations. Denote by


Then the above result for *L* = 2 can be compactly written as

$$\mathbf{p}\_{AB} = \mathbf{Z}' \left(\mathbf{I} + \mathbf{B}D\right) \mathbf{V}.\tag{8}$$

Note that the product **B***D* of the array **B** with the matrix *D* is interpreted as the *K* × *K* array with *dk kBk* as entry in row *k* and column *k*. Moreover, by iterating (8), one arrives at the formula for arbitrary *L*:

$$\mathbf{p}\_{AB} = \mathbf{Z}' \left(\mathbf{I} + \mathbf{B}D^{(1)}\right) \dots \left(\mathbf{I} + \mathbf{B}D^{(l-1)}\right) \mathbf{V}.\tag{9}$$

Formulas for mean response times can be derived similarly. Indeed, for *L* = 1, denote by *ETA*,*<sup>k</sup>* the mean response time for reaching alternative *A* when considering the *k*-th attribute for a random time *Tat* distributed according to (5). Then *ETA*,*<sup>k</sup>* ≈ τ *etA*,*k*/*pA*,*k*, where

$$\text{det}\_{A,k} = \sum\_{n=1}^{\infty} p\_{n,k} \left( \sum\_{i=0}^{n-1} (i+1)Z'Q\_k^i \right) R\_{A,k}$$

$$= Z' \left[ \sum\_{i=0}^{\infty} \left( \sum\_{n=i+1}^{\infty} p\_{n,k} \right) (i+1)Q\_k^i \right] R\_{A,k}$$

$$= Z' \left[ \sum\_{i=0}^{\infty} (1 - f\_{i,k})(i+1)Q\_k^i \right] R\_{A,k}. \tag{10}$$

Similarly for *ETB*,*<sup>k</sup>* and *etB*,*k*. Thus, similar to (6), we can write

$$et\_{AB,k} := [et\_{B,k}, \, et\_{A,k}] = Z'W\_k,\tag{11}$$

$$\mathbb{T} \propto \begin{array}{c} \blacksquare \infty \end{array} \tag{12}$$

$$\mathcal{W}\_k := \left[ \sum\_{i=0}^{\infty} (1 - f\_{i,k})(i+1) Q\_k^i \right] R\_k, \quad k = 1, \dots, K.$$

The matrices *Wk* can be precomputed to any accuracy at essentially the same cost as the *Vk*. For particular distributions, the formulas can be turned into closed form expressions.

Next, let us look at *L* = 2. By using similar notation and arguments as for choice probabilities, the quantities *etA*,*<sup>k</sup>* ,*k*,*etB*,*<sup>k</sup>* ,*k* have a part before and after *T at*. This, together with (10), (11), gives

$$\begin{split} \operatorname{det}\_{AB}|\_{k\_{1}=k'} &= Z'W\_{k'} + \sum\_{n=0}^{\infty} p\_{n,k} Z'Q\_{k'}^{n} \left( \sum\_{k=1}^{K} d\_{k'k} (nV\_{k} + W\_{k}) \right) \\ &= Z' \left[ W\_{k'} + \left( \sum\_{i=0}^{\infty} p\_{i,k'} iQ\_{k'}^{i} \right) \left( \sum\_{k=1}^{K} d\_{k'k} V\_{k} \right) \right. \\ &\left. + \left( \sum\_{i=0}^{\infty} p\_{i,k'} Q\_{k'}^{i} \right) \left( \sum\_{k=1}^{K} d\_{k'k} W\_{k} \right) \right] \\ &= Z' \left[ W\_{k'} + C\_{k'} \left( \sum\_{k=1}^{K} d\_{k'k} V\_{k} \right) + B\_{k'} \left( \sum\_{k=1}^{K} d\_{k'k} W\_{k} \right) \right]. \end{split}$$

where

$$C\_k = \sum\_{n\geq 0} p\_{n,k} n Q\_k^n, \quad k = 1, \ldots, K. \tag{12}$$

Thus, the counterpart of (8) is

$$\mathbf{et}\_{AB} = Z'( (\mathbf{C}D)\mathbf{V} + (\mathbf{I} + \mathbf{B}D)\mathbf{W} ), \tag{13}$$

From here, combining with (8), a joint recursion for computing **p***AB* and **et***AB* results:

$$[\mathbf{p}\_{AB}, \ \mathbf{et}\_{AB}] = [\mathbf{Z}', \ \mathbf{Z}'] \begin{bmatrix} \mathbf{I} + \mathbf{B}D^{(1)} & \mathbf{0} \\ \mathbf{C}D^{(1)} & \mathbf{I} + \mathbf{B}D^{(1)} \end{bmatrix} \cdots$$

$$\begin{bmatrix} \mathbf{I} + \mathbf{B}D^{(L-1)} & \mathbf{0} \\ \mathbf{C}D^{(L-1)} & \mathbf{I} + \mathbf{B}D^{(L-1)} \end{bmatrix} \begin{bmatrix} \mathbf{V} \\ \mathbf{W} \end{bmatrix} . \quad (14)$$

We conclude this section with a few remarks. In Diederich (1997), under the name MADD/pp, a slightly different presentation of random schedules is given for the special case of geometrically distributed attention times. It is not hard to see, that (with the notation *rij* used in the *K* = 3 example presented in Section 4.2 Diederich, 1997) our model is equivalent to MADD/pp as *L* → ∞, if we set *rk* = 1 − *rkk* for the parameters *r* of the geometrically distributed *Tat*, *k* = 1, 2, 3, and *dkk* = 0, *dkk* = *rkk* /(1 − *rkk*), *k* = *k*, for the entries of the matrix *D* = *D*(*l*) , *l* ≥ 1. The advantage of the MADD/pp model is that it provides closed form formulas for the case *L* = ∞, a possibility that we did not pursue here for other types of attention time distributions.

In previous sequential decision models with finite *L* (Diederich, 1997), the last attribute was always considered infinitely long (infinite decision horizon) to avoid the situation of no decision, i. e., *p*<sup>0</sup> > 0. This can be incorporated into the current model by modifying the definition of the matrices *Vk*, *Wk* corresponding to the last interval [*tL* <sup>−</sup> <sup>1</sup>,∞) to

$$W\_k = (I - Q\_k)^{-1} R\_k, \quad W\_k = (I - Q\_k)^{-2} R\_k, \quad k = 1, \dots, K, \quad 1$$

and modifying the recursion (14) slightly. Alternatively, one can artificially change the parameters of the attention time distribution for *l* = *L* such that its expected value is sufficiently large, and make *p*<sup>0</sup> practically negligible. Since infinite decision horizons do not seem to adequately reflect the situation of a real decision process or laboratory experiment, it might be interesting to work under scenarios where *tL* is fixed and finite that we described in this paper.

# **4. SIMULATIONS**

We present some simulations that demonstrate the predictive power of the proposed model. We focus on features that have not been considered in Diederich (1997) for the deterministic case. Throughout this section we fix certain parameters, such as σ = 1, θ*<sup>A</sup>* = −θ*<sup>B</sup>* = 10, = <sup>1</sup> <sup>4</sup> , <sup>τ</sup> <sup>=</sup> <sup>1</sup> <sup>16</sup> (this implies a state space size of *m* = 81), and always start at the neutral position *X*(0) = 0 between choice alternatives *A* and *B*.

#### **4.1. IMPACT OF ATTENTION TIME DISTRIBUTIONS**

First, we show how different assumptions on the randomness of the attention time *Tat* (i.e., the time spent on considering a certain attribute) influences choice probabilities and mean response times. In the first example, we assume just two attributes with parameters δ<sup>1</sup> = 0.2, γ<sup>1</sup> = 0.03, δ<sup>2</sup> = 0.04, γ<sup>2</sup> = 0.003, both attributes favor alternative *A*, the first one more strongly than the second one2 . The attributes are considered only once (*L* = 2), with order *k*<sup>1</sup> = 1, *k*<sup>2</sup> = 2. The first attribute is considered for time *t*<sup>1</sup> = τ*n*1, where *n*<sup>1</sup> is a random variable *Tat* described above with given expectation *N*. For the second attribute we compare two situations: (1) We assume an infinitely long decision horizon *t*<sup>2</sup> = ∞, and (2) we determine a finite time horizon *t*<sup>2</sup> = τ*n*<sup>2</sup> by choosing *n*<sup>2</sup> = *n*<sup>1</sup> + *Tat* which is also *Tat* distributed with the same expected value *N*. These two situations are depicted in **Figures 4**, **5**. The graphs show choice probabilities and mean response times as functions of the expectation τ*E*(*Tat*) of the real attention times. Lines of different color represent different distributions. Distributions with a small variance, such as the Poisson distribution, the binomial distribution, and the uniform distribution with *<sup>M</sup>* <sup>≈</sup> <sup>√</sup>*<sup>N</sup>* produce results indistinguishable from the deterministic case. This holds for all tested situations shown below. This means, small uncertainties in attention time spans do not influence the observable choice frequencies and mean response times. However, as the variance of the attention times grows, we see quantitative and qualitative changes. Compared to the deterministic attention time situation, the geometric distribution differs most, and the uniform distributions with *M* = *N*/2 = 150 (Unif.1) and *M* = *N* − 1 = 299 (Unif.2) are intermediate. Moreover, there is expectedly a big difference for small mean attention times between finite and infinite decision horizons. Most importantly, for the former case it predicts a probability *p*<sup>0</sup> > 0 of not deciding within the available time *t*2. We claim that for many situations, where an infinite time horizon does not represent reality well enough, our finite schedule model might be more appealing. This aspect will be pursued in further research.

**Figures 6**, **7** show similar simulation results for the situation of considering first an attribute favoring *B* (δ<sup>1</sup> = −0.1, γ<sup>1</sup> = 0)

graphs for distribution types with small variance are almost indistinguishable from the graph corresponding to deterministically fixed *t*<sup>1</sup> (variance 0) and therefore are omitted here.

<sup>2</sup>Note that when looking only at the numerical values of the drift parameter δ<sup>1</sup> = 0.2 and the decision criterion θ*<sup>A</sup>* = 10 and assuming that the attention times *t*<sup>1</sup> to the first attribute are large enough it would suggest mean response times in the range *TA* ≈ 50 (and very small *pB*). However, since γ<sup>1</sup> = 0.03 it leads to a negative effective drift δ<sup>1</sup> − γ1*X*(*t*) if *X*(*t*) comes close θ*A*, and the mean response times become much longer. This also demonstrates the effect of the parameter γ*k*, and a difference between Ornstein-Uhlenbeck process and Wiener process based models.

followed by an attribute more strongly favoring *A* (δ<sup>2</sup> = 0.2, γ<sup>2</sup> = 0.03). As expected, the results look now different, however, the main conclusions from the previous example concerning the influence of the randomness type for attention times and the differences for finite vs. infinite time horizons remain the same. Most importantly here, the model predicts a preference reversal (i.e., choice probabilities from below 0.5 to above 0.5) as a function of attention time when one attribute is in favor of choosing alternative A and the other in favor of choosing alternative B. Parameter studies, as in Diederich (1997), will be pursued further elsewhere.

To complete the picture, we show a three-attribute example (*K* = 3) in **Figure 8**. The chosen attribute parameters are now δ<sup>1</sup> = 0.04, γ<sup>1</sup> = 0.003, δ<sup>2</sup> = −0.1, γ<sup>2</sup> = 0, δ<sup>3</sup> = 0.2, γ<sup>3</sup> = 0.03, i.e., a weakly in favor of *A*, in favor of *B*, and strongly in favor of *A* sequence of attributes. Attention times for the first two attributes are chosen independently from each other but with the same distribution with fixed mean value; the last attribute is considered indefinitely.

#### **4.2. DEPENDENCE ON ATTRIBUTE ORDER**

The proposed sequential decision model is sensitive to the order in which the attributes are consider. If we consider in the aforementioned second two-attribute example the attribute in favor of *A* first, and then the attribute in favor of *B* we get very different patterns as shown in **Figure 9** compared to **Figure 6**. A similar effect is true for the above *K* = 3 example. In **Figure 10**, the attribute in favor of *B* is now the last one; the graphs need to be compared with **Figure 8**. One interesting pattern can be observed. If the evidence for choosing one alternative decreases in the sequence of attribute consideration then the model predicts faster choice response times for the more frequently chosen alternative—a typical pattern observed in response time analysis. However, if the evidence increases in the sequence of attribute consideration then the model predicts faster choice response times for the less frequently chosen alternative which has been called *fast error*, as shown in **Figure 11** compared to **Figure 4**. Simply by changing the order of attribute processing the model predicts a complex pattern of choice response times and choice probabilities.

So far, all examples shown are with a fixed, deterministic attribute order with no repetitions (semi-random schedule, *L* = *K*). The evaluation of fully random time and order schedules requires larger *L*, and will be presented elsewhere.

# **5. CONCLUDING REMARKS**

The proposed *multiattribute attention switching* (MAAS) model can predict a very complex choice probability/(mean) choice response time pattern. It may appear too flexible to be testable. However, this is not the case. If two attributes both favor alternative, *A* say, and the first attribute that is considered provides more evidence for choosing *A* than the second (δ<sup>1</sup> > δ2), then the model predicts always shorter response times for the more

**FIGURE 6 | Choice probabilities (A,C) and mean response times (B,D) for a decision situation where an attribute favoring alternative** *B* **is considered first for a random time** *t***1, followed by a second attribute strongly favoring** *A* **but considered indefinitely.** We show graphs of

choice probabilities and mean response times as functions of the expected attention time *E*(*t*1) = 10 ... 500 paid to the first attribute for different distribution types. Again, graphs for distribution types with small variance are indistinguishable from each other.

**FIGURE 8 | Choice probabilities (A,C) and mean response times (B,D) for a decision model with three attributes.** An attribute weakly favoring alternative *A* is considered first for a random time *t*1, followed by a second attribute favoring *B* considered for a random time *t*<sup>2</sup> − *t*1, while the last attribute (strongly favoring *A*) is considered indefinitely. The random attention times *t*<sup>1</sup> and *t*<sup>2</sup> − *t*<sup>1</sup> for the first two attributes are independently chosen from the same distribution. We show graphs of choice probabilities and mean response times as functions of the expected attention time *E*(*t*1) = *E*(*t*<sup>2</sup> − *t*1) = 10. . . 500 for different distribution types. Again, small variance distributions yield almost identical results.

*t***1, then the attribute favoring** *B* **is considered indefinitely long. (A)** and

respectively. **(B)** and **(D)** show the mean response times for choosing alternatives A and B, respectively.

frequently chosen alternative, here *A*, regardless of the assumed underlying attention time distribution. If the order of processing these attributes is reversed, i.e., the attribute that favors alternative *A* less is considered first (δ<sup>2</sup> > δ1), then the model always predicts faster responses for the less frequently chosen alternative, here *B*, again regardless of the assumed underlying attention time distribution. A single stage process can only account for this pattern by assuming variability in starting positions and

**FIGURE 10 | Same as in Figure 8 but with a different attribute order: First the two attributes in favor of** *A* **(strong followed by weak) are considered for finite random periods of time, then the attribute favoring**

*B* **is considered indefinitely long. (A)** and **(C)** show the choice probabilities for choosing alternatives A and B, respectively. **(B)** and **(D)** show the mean response times for choosing alternatives A and B, respectively.

**considered indefinitely. (A)** and **(C)** show the choice probabilities for choosing alternatives A and B respectively. **(B)** and **(D)** show the mean response times for choosing alternatives A and B, respectively.

variability in drift rates, i.e., a statistical means where the drift rate itself is a random variable. It is difficult experimentally to disentangle the variability stemming from the stochastic process itself and the variability from the distribution of different drift rates. As Jones and Dzhafarov (2013) pointed out, the predictions of various sequential sampling models rest upon the assumptions made about the assumed probability distributions. This is not the case here. The model is falsifiable without assuming specific distributions. Rather than relying on statistical mechanisms to ensure an observed response patterns we rely on assumptions about cognitive processes such as attention switching and salience. The specific attention time distribution used for an application may be related to the experimental paradigm. For instance, when tracking eye movements, the sequence of attribute consideration and the switching times are directly observable, and a deterministic or a uniform distribution with a small variance is advisable. When all attributes are shown simultaneously, like in complex objects, and attention may shift at any moment in time a geometric distribution or a uniform distribution with a large variance may describe the situation better. Testing the model rigorously will be pursued in the future.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 April 2014; accepted: 19 August 2014; published online: 09 September 2014.*

*Citation: Diederich A and Oswald P (2014) Sequential sampling model for multiattribute choice alternatives with random attention time and processing order. Front. Hum. Neurosci. 8:697. doi: 10.3389/fnhum.2014.00697*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Diederich and Oswald. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Manual choice reaction times in the rate-domain

*Christopher M. Harris 1,2\*, Jonathan Waddington3,4, Valerio Biscione1,2 and Sean Manzi <sup>2</sup>*

*<sup>1</sup> Centre for Robotics and Neural Systems and Cognition Institute, Plymouth University, Plymouth, UK*

*<sup>2</sup> School of Psychology, Plymouth University, Plymouth, UK*

*<sup>3</sup> The WESC Foundation, Exeter, UK*

*<sup>4</sup> School of Psychology, University of Lincoln, Lincoln, UK*

#### *Edited by:*

*José M. Medina, Universidad de Granada, Spain*

*Reviewed by:*

*José M. Medina, Universidad de Granada, Spain*

*Fuat Balci, Koc University, Turkey*

# *\*Correspondence:*

*Christopher M. Harris, School of Psychology, Plymouth University, Drake's Circus, Plymouth, Devon PL4 8AA, UK e-mail: cmharris@plymouth.ac.uk* Over the last 150 years, human manual reaction times (RTs) have been recorded countless times. Yet, our understanding of them remains remarkably poor. RTs are highly variable with positively skewed frequency distributions, often modeled as an inverse Gaussian distribution reflecting a stochastic rise to threshold (diffusion process). However, latency distributions of saccades are very close to the reciprocal Normal, suggesting that "rate" (reciprocal RT) may be the more fundamental variable. We explored whether this phenomenon extends to choice manual RTs. We recorded two-alternative choice RTs from 24 subjects, each with 4 blocks of 200 trials with two task difficulties (easy vs. difficult discrimination) and two instruction sets (urgent vs. accurate). We found that rate distributions were, indeed, very close to Normal, shifting to lower rates with increasing difficulty and accuracy, and for some blocks they appeared to become left-truncated, but still close to Normal. Using autoregressive techniques, we found temporal sequential dependencies for lags of at least 3. We identified a transient and steady-state component in each block. Because rates were Normal, we were able to estimate autoregressive weights using the Box-Jenkins technique, and convert to a moving average model using z-transforms to show explicit dependence on stimulus input. We also found a spatial sequential dependence for the previous 3 lags depending on whether the laterality of previous trials was repeated or alternated. This was partially dissociated from temporal dependency as it only occurred in the easy tasks. We conclude that 2-alternative choice manual RT distributions are close to reciprocal Normal and not the inverse Gaussian. This is not consistent with stochastic rise to threshold models, and we propose a simple optimality model in which reward is maximized to yield to an optimal rate, and hence an optimal time to respond. We discuss how it might be implemented.

**Keywords: reaction times, latency, reciprocal Normal, autoregressive integrated moving average (ARIMA), speedaccuracy trade-off, Pieron's law, optimality**

# **INTRODUCTION**

Reaction times (response times, latency) (RTs) have been measured and discussed innumerable times since their first measurements in the mid-19th century by von Helmholtz (1850) and Donders (1969). RT experiments are so commonplace that they have become a standard paradigm for measuring behavioral responses, often with scant regard to any underlying process. However, the mechanisms behind RTs are complex and poorly understood. A common view is that RTs reflect processing in the time-domain, where RTs are the sum of independent sequential processes including conduction delays, decision-making processes, and motor responses. We question this very fundamental assumption and consider responses in the rate-domain, where rate is defined as the reciprocal of RT.

One of the most perplexing aspects of RTs is their extreme variability from one trial to the next with some very long RTs, even when the same stimulus is repeated and subjects are instructed to respond as quickly as possible. As exemplified by the saccadic system, why does it take hundreds of milliseconds to decide to make a saccade, when the saccade itself only takes a few tens of milliseconds to execute (Carpenter, 1981)? Moreover, if we accept that point-to-point movements, such as saccades and arm reaching are time-optimal (Harris and Wolpert, 1998), should we not expect the RT also to be optimized? One is then led to wonder how such long response times could be optimal.

#### **DRIFT DIFFUSION MODELS (DDM)**

The most popular explanation for the variability of RTs has revolved around the putative mechanism of an accumulator or "rise to threshold" model. A signal, ρ(*t*), increases (accumulates) in time until it crosses a boundary ("trigger level" or "decision threshold"), θ(*t*), whereupon the response is initiated (first-passage time; **Figure 1A**). Typically, ρ(*t*) is assumed to be a stochastic signal reflecting the accumulation of "information" for or against an alternative until a predetermined level of confidence is reached represented by a constant θ(*t*) (Ratcliff, 1978) (**Figure 1B**). A simple reaction time is modeled by a single boundary, and a two-alternative choice task is modeled by

model, ρ(*t*) increases linearly and deterministically until the threshold is reached. It is assumed that the slope of rise is a between-trial Normal random variable and gives rise to a reciprocal Normal distribution. In the rate domain, rate is distributed with a truncated Normal distribution.

two boundaries. A RT is then first-passage time for one of the alternatives plus any other "non-decision" time such as sensorimotor delays (e.g., Ratcliff and Rouder, 1998; Ratcliff et al., 1999).

Typically, ρ(*t*) is assumed to drift with a constant mean rate but is instantaneously perturbed by a stationary Normal white noise process (Wiener process), so that within a given trial and with one boundary, the time of crossing the threshold is a random variable with an inverse Gaussian distribution (Schrodinger, 1915; Wald, 1945). With two boundaries, the first passage time for one boundary indicates the decision time for a correct response, and an error response for the other boundary; their probability density functions (pdf's) are computed numerically (Ratcliff, 1978; Ratcliff and Tuerlinckx, 2002) (see **Table 1** for pdf's). For an easy choice task (i.e., high drift rate toward the "correct" boundary), the pdf will approach the inverse Gaussian distribution as error rate become negligible. Although, there are numerous variations on this theme (e.g., Ratcliff and Rouder, 1998, 2000; Smith and Ratcliff, 2004; Bogacz et al., 2006; Ratcliff and Starns, 2013), they share the same basic stochastic rise to threshold decision-making process in the time-domain. It has been recently shown how the pure diffusion process (without variability across trials) has an exact equivalent in terms of Bayesian inference (Bitzer et al., 2014). As shown by Bogacz et al. (2006), the DDM is optimal in the sense that for a given boundary (decision accuracy) the decision is made in minimal time.

Ratcliff (1978) also allowed the mean drift rate to fluctuate between trials with a Normal distribution to reflect "stimulus encoding" variability. This version has often been called the extended DDM, which also includes variability in the starting point of drift, and variability in the non-decision component (Ratcliff and Tuerlinckx, 2002). The extended DDM has been used to describe simple RT experiment (Ratcliff and van Dongen, 2011) and choice RT (Ratcliff, 1978; Hanes and Schall, 1996; Ratcliff and Rouder, 1998; Schall, 2001; Shadlen and Newsome, 2001; Ratcliff et al., 2003, 2004; Smith and Ratcliff, 2004; Wagenmakers et al., 2004; Ratcliff and McKoon, 2008; Roxin and Ledberg, 2008).

Although the multi-parameter extended DDM is claimed to fit observations, a serious problem has emerged from the eye movement literature, when we consider the distribution of the reciprocal of RTs, which we call "rate."

#### **THE RECIPROCAL NORMAL DISTRIBUTION**

Investigations into the timing of saccades for supra-threshold stimuli have shown that the frequency distribution of simple RTs (latency) is close to the reciprocal Normal distribution; that is, rate has a near-Normal distribution. Small deviations from true Normal are observed in the tails, but probit plots are typically linear between at least the 5th and 95th centiles (Carpenter, 1981). The reciprocal Normal is not known to be a first-passage distribution for a constant threshold, and is easily distinguished from the inverse Gaussian or the two-boundary pdf. Carpenter has proposed the LATER model in which the rise to threshold is linear and deterministic, but the slope of rise varies from trial to trial with a Normal distribution (Carpenter, 1981; Carpenter and Williams, 1995; Reddi and Carpenter, 2000) (**Figure 1D**). If Carpenter's findings can be generalized beyond saccades, they are equivalent to the extended DDM without fluctuation in the rise of ρ(*t*) (i.e., no diffusion) and with only one threshold. There is an obvious difficulty in how to explain a deterministic rise to threshold based on a Bayesian update rule, which is inherently stochastic. Moreover, if the rise is deterministic then

#### **Table 1 | Left column: mathematical expressions of the probability density functions (pdf's) for RTs for a single boundary diffusion model, two boundary diffusion model, and the reciprocal Normal.**


 = *Normal cdf;* ξ = *drift rate; a* = *upper boundary; b* = *lower boundary; z* = *starting point. Right column: equivalent pdf's in the rate (reciprocal RT) domain. See Harris and Waddington (2012) for the mathematical relationship between the two domains.*

the time to reach threshold is known at the outset, and any competition among alternatives can be resolved very quickly—so why wait?

The reciprocal Normal is a bimodal distribution with positive and negative modes. In the time-domain this would imply very large negative RTs, which would require the response to occur long before the stimulus onset and violate causality. Therefore, we need to consider the reciprocal truncated Normal distribution (**rectrN**), (where the Normal rate distribution is left truncated at or near zero; see Harris and Waddington, 2012). The question is what happens at or near zero rate? For easy tasks where RTs are low, the probability of rate reaching zero (i.e., RT approaching infinity) is negligible and the problem might be dismissed as a mathematical nuance. However, for difficult tasks, the probability becomes significant, as we have shown (Harris and Waddington, 2012). A departure from the reciprocal Normal has been reported for saccade latency to very dim targets, but this has been modeled instead as an inverse Gaussian based on a diffusion process (Carpenter et al., 2009). Clarification is needed on what happens when rates are low.

It has long been known that sequential effects occur in manual choice RTs (Hyman, 1953). In sequences of 2-alternative choice RT experiments, RTs may be correlated with the previous trial (first-order) and also earlier trials (high-order). Moreover, this sequential dependency seems to be a function of whether a stimulus is repeated or alternated (Kirby, 1976; Jentzsch and Sommer, 2002). Sequential dependencies cannot be explained by within-trial noise processes, such as the DDM, unless there are between-trial parameter changes (changes in drift rate or threshold values). If we assume a linear dependence on history (autoregressive model) in the rate-domain, then it could in principle lead to convergence onto the Normal distribution via the central limit theorem.

### **THE RATE-DOMAIN**

It is important, therefore, to identify RT distributions, but this is a non-trivial problem. It is difficult to distinguish among highly skewed distributions in the time-domain. The method of moments is infeasible due to poor convergence (the reciprocal Normal has no finite moments; Harris and Waddington, 2012). Maximum likelihood estimation of parameters requires vast amounts of data to distinguish between models (Waddington and Harris, 2012). There is also the problem of under-sampling at extreme values (Harris and Waddington, 2012) which is further exacerbated by the tendency of many investigators to discard "outliers." It is easier in the rate-domain, although large data sets are still needed. Distributions that are less skewed than the reciprocal Normal (such as the inverse Gaussian) remain positively skewed in the rate-domain, whereas the reciprocal Normal does not. Surprisingly, there have only been a few published examples of manual reaction times in the rate-domain (Carpenter, 1999; Harris and Waddington, 2012), and it is conceivable that saccades are somehow "special." For example, express saccades do not appear to have an equivalent in manual tasks. Another important issue is lack of stationarity, where the mean and variance (and higher moments for non-Normal distributions) change over time. Non-stationarity of the mean is particularly troublesome because it smears out the observed distribution making the RT distribution more platykurtic and heavy-tailed. Non-stationarity is more likely in long recording sessions, as subjects become fatigued and bored by the repetitive nature of RT experiments. Using large sample sizes from prolonged recording sessions may be counterproductive.

When a probability density function (pdf) is known in one domain, the pdf in the reciprocal domain can easily be found. However, it is important to recognize this is not true for moments. For example, the mean of the rate distribution is not the reciprocal of the mean of the RT distribution (Harris and Waddington, 2012). Thus, it is not possible to infer parametric statistics of rate from RT statistics. Raw data are needed. Therefore, our goal in this study was to explore rate-domain analysis in a typical two-choice manual RT experiment. We imposed two tasks (instruction set) and two levels of stimulus difficulty (brightness difference) in order to explore the effects of truncation, and we used autoregression analysis and z-transforms to examine sequential dependency. To minimize problems of nonstationarity, we recorded only modest block sizes (200) from many subjects (24) and collapsed after standardization. We show that rate is indeed near-Normal and not the reciprocal of the inverse Gaussian. Sequential dependency is evident, but not the cause of the near-Normality. In the discussion we propose a rate model as an alternative to first-passage time models.

# **METHODS**

#### **REACTION TIME RECORDING**

Subjects were 24 adults aged between 18 and 45 years old selected through the Plymouth University paid participant pool as an opportunity sample. Subjects were naïve to the experimental procedure. Based on self-report, all participants were required to have normal or corrected-to-normal vision with no known neurological conditions. This study received ethical approval from the local ethics committee.

Stimuli consisted of two solid colored rectangles of different luminances arranged horizontally and displayed on a computer monitor (Hanns-G HA191, 1280 × 1024, at 60 Hz). Both rectangles were displayed in the same green color in Red-Green-Blue (RGB) coordinates against a gray background of luminance 37.1 cd/m2. Each rectangle subtended a visual angle of 5.5 horizontal and 6.6◦ vertically, and the inner edges were separated horizontally by 9.6◦. Viewing distance was 0.5 m. Subjects were instructed to respond to the side with brighter stimulus by pressing the "z" or "2" key. In the easy task (E), rectangle luminances were 37.6 and 131.6 cd/m2, and in the difficult task (D), they were 37.6 and 37.8 cd/m2. Calibration was made with a Konica Minolta LS-100 luminance meter. All luminances and ambient room lighting were held constant for all subjects. The luminances in the (E) and (D) tasks were chosen to yield low and high error rates of 1% and 24% for these tasks respectively based on a pilot study. Two task instructions were used and displayed at the beginning of a block. In the "Urgent" (U) task, the instruction was to "respond as fast as possible," and in the "Accurate" (A) task, to "respond as accurately as possible." Each subject was presented with 4 blocks of 200 trials each. Within a block each trial consisted of the same combination of stimulus and task, either AE, AD, UE, or UD. There were 24 different permutations of blocks, and the order was balanced such that each of the 24 subjects had a unique order. We refer to the "easy" tasks as AE and UE, and the "difficult" tasks as AD and UD.

On each trial the subject was prompted to press the space key to commence the trial and a cross appeared in the center of the screen for 500 ms. Subsequently, the two rectangles appeared after a constant foreperiod of 500 ms. For choice reaction time experiments (unlike simple reaction time experiments), constant and variable foreperiods have similar effects (Bertelson and Tisseyre, 1968). We chose constant to avoid introducing additional variability into the decision process (see Discussion). Stimulus onset was also highly salient, even in the difficult tasks, due to the highly visible colored rectangles. The stimuli remained on screen until a response was made or until a time-out of 60 s occurred (see Harris and Waddington, 2012 for a discussion on the importance of a long time-out). For incorrect responses, feedback was provided in the form of a black cross, which remained on screen for 500 ms. A rest break occurred between blocks.

Reaction times (RTs) were measured from the onset of the stimulus presentation and recorded to the nearest millisecond. Rates were computed by taking the reciprocal RT. Taking reciprocals of integer RTs magnifies the effect of the quantization and can lead to artifactual "clumping" and "gaps" in the rate frequency histograms at high values of rate. We eliminated this by using a dithering technique, where we added a uniform floating point random number between −0.5 and +0.5 ms to each RT before taking the reciprocal (see Schuchman, 1964). This has no statistical effect in the time-domain. RTs less than 0.15 s (i.e., rate >6.67 s<sup>−</sup>1) were considered anticipatory and not analyzed.

#### **MOMENTS**

Sample central moments (mean, standard deviation, skewness, and excess kurtosis) and medians were estimated for each block for RT and rate. Note that moments of RT and rate are not reciprocally related, but depend on the underlying parent distribution. However, median rate is the reciprocal of median RT (see Harris and Waddington, 2012).

We also estimated the mean and standard deviation in the ratedomain assuming the underlying distribution was Normal. The underlying mean and standard deviation of the Normal distribution will differ from the sample mean and standard deviation depending on how much of the underlying Normal distribution is truncated. We therefore obtained maximum likelihood estimates (MLEs) of the underlying Normal parameters from each dataset using the *mle.m* function. This function applied a simplex search algorithm to find the parameters that maximized the log likelihood of the probability density function:

$$f(\mathbf{x}; \mu, \sigma, a) = \left| \begin{array}{c} \frac{\boldsymbol{\varphi}\left[\left(\mathbf{x} - \mu\right)/\sigma\right]}{1 - \Phi\left[\left(a - \mu\right)/\sigma\right]} \text{ } a \le \mathbf{x} < \infty \\\ 0 & \text{ } \times < a \end{array} \right|$$

where *x* is the observed rate, μ is the mean of the underlying (untruncated) Normal distribution, σ is the standard deviation of the underlying distribution, *a* = 1/60 = 0.0167 s<sup>−</sup>1, ϕ is the standard Normal probability density function (pdf), and is the standard Normal cumulative distribution function (cdf).

#### **SEQUENTIAL ANALYSIS**

The partial autocorrelation function (PACF) was computed using the *parcorr.m* Matlab function. The first 10 trials on each block were omitted to avoid contamination from initial transients. The coefficients for the first *m* = 20 lags were computed for each block and averaged across blocks. An autoregressive model (AR) was assumed to be of the form:

$$r\_n = a\_1 r\_{n-1} + a\_2 r\_{n-2} + \dots + a\_m r\_{n-m} + u\_n \tag{1.1}$$

where *ri* is the response on the *i*th trial, *aj*, 1 < *j* < *m* are constant weights, and *ui* is a stochastic input on the *i*th trial (negative indices were assumed to have zero weights). The autoregressive weights, *aj* and input *ui* are unknown and were estimated using the Box-Jenkins maximum likelihood procedure. We used the *estimate.m* function and an autoregressive integrated moving average (ARIMA) model with only an autoregressive polynomial (i.e., no non-seasonal differencing or moving average polynomials). We assumed the distributional form of *ui* to be Normal with constant mean and variance.

An AR model can be converted to the equivalent moving average (MA) series using the standard z-transform method. The z-transforms Z(.)of (1.1) is

$$R(z) = a\_1 z^{-1} R(z) + a\_2 z^{-2} R(z) + \dots + a\_m z^{-m} R(z) + U(z)$$

where *R*(*z*) = *Z*(*r*), *U*(*z*) = Z(*s*). This can be viewed as a discrete time MA system with

*R*(*z*) = *B*(*z*)*U*(*z*)where the system response of order *m* is

$$B(z) = \frac{1}{1 - a\_1 z^{-1} - a\_2 z^{-2} - \dots - a\_m z^{-m}}$$

To find *B*(*z*) we took a partial fraction expansion:

$$B(z) = \sum\_{i=k}^{m} \frac{\rho\_k}{1 - \lambda\_k z^{-1}}$$

where λ*<sup>i</sup>* are the roots and ρ*<sup>i</sup>* the residues Taking the inverse ztransform, we then have:

$$r\_n = b\_0 u\_n + b\_1 u\_{n-1} + b\_2 u\_{n-2} + \dotsb \tag{1.2}$$

where *uk* is the stochastic input on trial *k* and independent of other trial inputs, *b*<sup>0</sup> = 1, and *bi* = *<sup>m</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> ρ*k*λ*<sup>i</sup> <sup>k</sup>*, 1 ≤ *i* < ∞, and was computed in Matlab using the *roots.m* and *residue.m* functions. Note that (1.1) and (1.2) describe the same system, but (1.1) is a feedback description, and (1.2) is the feed-forward description. We chose 6 roots, as this encompassed the obviously larger PACF coefficients. The roots were all within the unit circle indicating stability and the existence of a steady-state.

#### **STEADY-STATE TRANSFER**

From (1.2) we can relate the pdf of rate (output), *pr*(*r*) to the pdf of the input where *ui* are identical independent random variables with pdf *pu*(*u*), *u* ≥ 0. From basic probability theory, (Papoullis and Pillai, 2002) the steady-state output pdf is given by the convolution sequence:

$$p\_r(r) = \left[\frac{1}{|b\_0|} p\_u\left(\frac{u}{b\_0}\right)\right] \otimes \left[\frac{1}{|b\_1|} p\_u\left(\frac{u}{b\_1}\right)\right] \otimes \left[\frac{1}{|b\_2|} p\_u\left(\frac{u}{b\_2}\right)\right] \otimes \cdots \tag{1.3}$$

where ⊗ is the convolution operator. If *pu*(*u*) is Normally distributed then so is *pr*(*r*). If *pu*(*u*) is not Normal then *pr*(*r*) may or may not converge to Normal depending on *pu*(*u*) and the coefficients *bi*. We computed (1.3) numerically for the truncated Normal (see Results).

Consider the case where *pu*(0) = *c* where *c* > 0 which corresponds to the case of truncation and when the RT distribution has no finite moments (see Harris and Waddington, 2012). For one term, we have *pr*,1(0) = *c*/ |*b*0|. However, with two terms (one convolution) we have *pr*,2(*r*) = <sup>1</sup> |*b*0||*b*1| <sup>∞</sup> <sup>0</sup> *pu r* − *<sup>x</sup>* |*b*0| *pu x* |*b*1| *dx*. For *r* = 0 and *c* < ∞, *pr*,2(*r*) = 0. Similarly, for all terms we must have *pr*(*r*) = 0, so that truncation is lost and the RT distribution will have a finite mean (but not necessarily higher moments).

#### **RESULTS**

Subjects' RTs were clearly sensitive to the task and stimulus manipulations, as shown by the example in **Figure 2A** (left column). When stimulus discriminability was easy, RT distributions were brief with low dispersion (AE and UE), but when difficult, they became longer and much more dispersive (AD and UD). In the rate-domain (reciprocal RT) difficulty resulted in a shift toward zero, but the dispersion remained similar (**Figure 2A** right column). For the difficult tasks, the rate distributions appear to approach zero and possibly became truncated. The difficulty was also evident by the number of errors (∼25% in this example).

Similar patterns were seen in all subjects, as can be seen from the plot of medians of RT for all subjects in **Figure 2B.** Again there was much more inter-subject variability for the difficult tasks, but in the rate-domain the variability was more even (**Figure 2C**). Non-parametric testing (Wilcoxon test) showed that the medians differed significantly between the difficult and easy discriminability (AD∪UD vs. AE∪UE: *p* < 0.001), and between task instructions (AD∪AE vs. UD∪UE: *p* < 0.001).

We computed the sample central moments (mean, standard deviation, skewness, excess kurtosis) in the time- and rate-domains (**Figure 3**) for each task for each subject. In the time-domain (left column), the moments were strongly interdependent, as expected from skewed distributions. Standard deviation increased and skewness and excess kurtosis decreased with the mean (note that skewness and kurtosis are normalized with respect to standard deviation). In the rate-domain (right column), however, the interdependence was much weaker (note the difference in ordinate scales).

Because of possible left truncation, we estimated the mean and standard deviation of the putative underlying Normal rate distribution using MLE (see Methods). We set the left truncation to 0.0167 s−<sup>1</sup> corresponding to a time-out of 60 s (**Figure 4**). When the sample coefficient of variation (CV) was less than 0.4 (zscore = 2.5; line in **Figure 4**) the MLE estimates (circles) were seen to agree closely with sample moments (crosses). For higher CVs the MLE moments estimates were shifted from the conventional estimates (shown by up-left lines). These shifts in MLE moments are expected from left truncation, and are consistent with, but not definitive of an underlying truncated Normal distribution. Therefore, we next grouped blocks according whether their truncation was severe, "truncated" blocks (CV > 0.4), or negligible, "untruncated" blocks (CV < 0.4).

#### **GROUP DISTRIBUTION**

In the untruncated blocks, we standardized the rate for each trial into a z-score based on the ML mean and standard deviation of

distributions of RT (left column) and rate (right column) for the 4 different blocks (AD, accurate and difficult; AE, accurate and easy; UD, urgent and difficult; UE, urgent and easy; see Methods). In the easy tasks, RTs are brief with few errors (block size was 200 trials). For the difficult tasks RTs are much more variable with about 25% error rate. In the rate-domain, dispersion is similar for all blocks with a shift to lower rates for the difficult tasks. Note that the shift approaches zero (arrows) suggesting possible truncation. **(B)** Median RTs for all subjects showing longer RTs for difficult blocks and more inter-subject variability. **(C)** Same as **(B)** but for median rates showing similar inter-subject variability for all blocks.

its block, and then collapsed all trials into one group. The distribution of the untruncated group was very close to Normal between the 5th and 95th percentile, as seen from the probit plot (**Figure 5A**). There was a slight deviation in the tails. As a check on this method, we created simulated data sets using the true reciprocal Normal distribution with the same ML moments and sample sizes as the empirical data. Carrying out exactly the same analysis, the rate distribution was a perfect Normal—as expected (**Figure 5B**). As a further check, we also simulated the inverse Gaussian. Here there is no truncation issue, so we used sample moments and sample sizes to generate the simulated data. As seen in **Figure 5C**, the reciprocal distribution of the inverse Gaussian is skewed and does not fit the Normal—as expected (Harris and Waddington, 2012). Thus, we are confident that near Normality is not an artifact, but reflects the underlying distribution of the empirical rate distributions.

For the truncated blocks, we standardized as above using the ML mean and standard deviation and collapsed into one group. However, we only considered positive z-scores because any putative truncation would lead to under representation for negative z-scores (we included the one block that had a slightly negative ML mean, see **Figure 3**, but had no discernable effect on the plots when excluded). As shown in **Figure 6A**, the collapsed distribution was close to Normal with a slight deviation above the 95th percentile. Simulation with a true reciprocal Normal showed half a Normal distribution, as expected (**Figure 6B**), and the inverse Gaussian was not close to the truncated Normal (**Figure 6C**). Thus, we conclude that at least the right half of the truncated group are close to Normal, but not the inverse Gaussian. However, this does not address necessarily what happens near zero rate for each block (infra vide).

# **SEQUENTIAL DEPENDENCY**

# *Temporal effects*

The sequence of RTs during a block was clearly not statistically stationary as RTs were typically longer in the first few trials than later. This transient lasted less than 10 trials, after which a steadystate seemed to prevail, best seen by averaging across blocks in the time- or rate-domain (**Figure 7**). The transient was clearly more pronounced for the easy than difficult tasks.

We excluded the first 10 trials of each block in order to examine the steady-state component. The Pearson correlation coefficient between consecutive RTs was 0.20 with 63% of these being significant at *p* < 0.05. In the rate-domain this increased to 0.25 with 76% being significant.

A 1-lag correlation would be expected to lead to autocorrelations with a geometric fall-off at higher lags. Therefore, we examined the partial autocorrelation function (PACF) to explore explicit dependencies up to lags of 20 (see Methods). The PACF of rate was positive and a smoothly decreasing function of lag with no obvious cut-off (**Figure 8A** filled circles). As a check, we shuffled trials randomly within each block and found no significant dependencies (**Figure 8A** open circles). When plotted against reciprocal lag, the PACF coefficients plot was approximately linear (**Figure 8B**; solid circles).

We next considered a stationary autoregressive (AR) relationship of the form: *rn* = *a*1*rn* <sup>−</sup> <sup>1</sup> + *a*2*rn* <sup>−</sup> <sup>2</sup> +···+ *amrn* <sup>−</sup> *<sup>m</sup>* + *un* (see Equation 1.1 in Methods), where *ai* (1 ≤ *i* ≤ *m*) are constant coefficients, *un* is a stochastic input on trial *n*, which we assumed stationary and Normal, and *m* is the order of the process (see Methods). We used the Box-Jenkins maximum likelihood

estimation procedure (see Methods) to estimate the *ai* for the first 6 lags. We only included "untruncated" blocks (CV < 0.4). Combining all such blocks revealed that only the first 3 weights were significantly different from zero and decreased roughly linearly with reciprocal lag *a*1, <sup>2</sup>, <sup>3</sup> = {0.222, 0.104, 0.076}. The 4th weight *a*<sup>4</sup> = 0.016 was borderline (**Figure 8C**). We also examined the difficult and easy tasks separately, but found negligible difference [AD∪UD: *a*1, <sup>2</sup>, <sup>3</sup>, <sup>4</sup> = {0.212, 0.100, 0.078, 0.016}; AE∪UE: *a*1, <sup>2</sup>, <sup>3</sup>, <sup>4</sup> = {0.227, 0.105, 0.076, 0.037}]. Henceforth, we used the first 3 weights of the combined tasks.

It is possible to invert the AR process to find the input, since from (1.1) we have *un* = *rn* − *a*1*rn* <sup>−</sup> <sup>1</sup> + *a*2*rn* <sup>−</sup> <sup>2</sup> +···+ *amrn* <sup>−</sup> *<sup>m</sup>*, and the resulting *un* should have no sequential dependency. To test this, we estimated the *un* sequence from each block and re-computed the mean PACF (**Figure 8B** open symbols). Clearly, sequential dependency was eliminated *on average* with a mean lag 1 correlation of 0.032. However, the number of blocks that had a significant lag 1 correlation also dropped from 61 to 10%—which is close to that expected by chance. This implies that most blocks were driven by a similar AR process.

The AR model in (1.1) has a step response which reflects the underlying dynamics behind the steady-state response. It is easily computed (curve in **Figure 8D**) and clearly similar to the empirical average transient response at the beginning of each block (grand average from **Figure 7B**). Thus, the transient response is consistent with the steady-state dynamics.

Using the single-sided z-transform, we converted (1.1) to a moving average (MA) formulation in terms of a discrete series of independent stochastic inputs *uj* 1 ≤ *j* ≤ *n* (see Equation 1.2 in Methods): *rn* = *b*0*un* + *b*1*un* <sup>−</sup> <sup>1</sup> + *b*2*un* <sup>−</sup> <sup>2</sup> +··· . The weights are the feed-forward impulse response function and are plotted against lag in **Figure 9A**. As can be seen, there is modest but prolonged dependence on input value history implying considerable "memory."

Assuming stationarity, one effect of the sequential dependency is to scale the moments of the input (see Methods). Based on the AR weights, the mean of rate was *r*¯ = 1.67*u*¯. The effect on standard deviation was small σ*<sup>r</sup>* = 1.05(σ*u*), and on higher moments it was negligible. For an untruncated rate distribution, the effect of sequential dependency was to shift the rate distribution to

the right with minimal changes to the shape of the distribution. Thus, we conclude the observed near-Normality of untruncated rate distributions is not a manifestation of the central limit theorem arising from the sequential dependency, but must reflect the near-Normality of the input distribution itself. Therefore, assuming the pdf of the input *pu*(*r*) to be Normal, the output pdf *pr*(*r*) can be computed numerically from the convolution sequence in (Equation 1.3) (see Methods). For an "untruncated" Normal input there is a shift to higher rate with negligible change in variance, as illustrated in **Figure 9B**. For an input truncated at zero, there is not only a shift in the mean, but the sharp truncation at zero is smoothed and eliminated (which can also be demonstrated analytically; see Methods). Remarkably, this smooth shape can also be fit very well by a reciprocal inverse Gaussian (dotted curve) when the tail is excluded (see Discussion).

#### *Spatial effects*

Previous studies have shown that mean RT can depend on the sequence of the laterality of previous trials (see Introduction), in particular whether laterality was repeated (**R**) or alternated (**A**). Thus, the sequence **RRRR** indicates that the stimulus and the previous four stimuli were all on the same side (i.e., all left *LLLLL* or all right *RRRRR*), whereas the sequence **AAAA** means that each stimulus alternated sides from the previous (*RLRLR* or *LRLRL*) (note the last symbol is the current trial). Jentzsch and Sommer (2002) examined sequences with 4 lags and showed a significant dependence of RT on a binary weighting of the AR sequence, where **R** was binary "0" and **A** binary "1" (e.g., **RRRR** = 0, **RRAR** = 2, **AARA** = 13, **AAAA** = 16). We used the same scheme for comparison.

For the easy tasks (AE and UE), averaging across all blocks showed a significant dependence on the AR sequence [*F*(15, 645) = 4.58; *p* < 0.001] when all trials in a block were considered. In particular the sequences AARR, RRRA, RRRA, were associated with high RTs (arrow in **Figure 8**), and remarkably similar to Jentzsch and Sommer's results. The inverse pattern was more clearly seen in the rate-domain, with smaller and more even standard errors. For the difficult tasks (AD and UD), there was no significant pattern in the time- or rate-domain.

# **DISCUSSION**

These data clearly show that when the task is easy (AE and UE blocks), RT distributions are close to reciprocal Normal, and not close to the inverse Gaussian distribution. Moreover, we have demonstrated this using practical block sizes (*n* = 200) collapsed across 24 subjects after standardization, unlike previous studies which used very large data sets recorded from only a few subjects. We emphasize that this near-Normality of rate was not an artifact from collapsing across subjects, as this does not invoke the central limit theorem, but simply combines the underlying distributions—as confirmed by Monte-Carlo simulations (**Figure 5B**). We conclude that 2-alternative choice manual RT distributions are very close to the rectrN distribution, similar to the simple reaction experiments with saccades (Carpenter, 1981; Carpenter and Williams, 1995; Reddi and Carpenter, 2000) and the few studies of *simple* manual reaction times (Carpenter, 1999; Harris and Waddington, 2012). In simple RT studies it is necessary to introduce a variable foreperiod to prevent anticipation for the stimulus onset. In choice RT study, a foreperiod may increase "preparedness," but randomization is not essential, as a choice cannot be made with confidence until the discriminative stimulus appears, and Bertelson and Tisseyre (1968) have shown similar effects for constant or random foreperiods in choice experiments. We chose a constant foreperiod to reduce the amount of extrinsic variability introduced into the decision process (see Methods). We can conclude that near-Normality in the rate domain is not a consequence of foreperiod randomization, and by implication presumably neither in simple RT experiments. However, this does not eliminate a possible role of a subject's intrinsic variability in judging foreperiod durations (i.e., Weber's law), and whether or how this affects the rate distribution remains to be explored.

It is difficult to reconcile the rectrN with a pure Wiener diffusion process, where within trial drift noise is Normal (**Figure 1B**), as this would yield an inverse Gaussian distribution in the timedomain, or a reciprocal inverse Gaussian in the rate-domain. Monte Carlo simulation using the reciprocal inverse Gaussian with moments from our subjects did not yield near Normal rates (**Figure 5C**). Ratcliff (1978) considered the compound inverse Gaussian where drift rate fluctuated between trials with another Normal distribution. This would fit the reciprocal Normal if there were no drift noise, which is consistent with Carpenter's LATER model. This strongly suggests that the underlying RT process operates in the rate-domain, rather than in the more intuitive time-domain. It also explains why RTs are so variable—modest symmetric fluctuations in rate can lead to asymmetric and very high changes in RT, especially when rate becomes small as occurs in difficult tasks.

Temporal sequential dependency among trials has frequently been observed in choice reaction experiments (Laming, 1979). Clearly, any inter-trial correlations affect between-trial fluctuations, but they have been ignored in recent models of RT distributions. Using autoregressive techniques, we have shown explicit dependency of rate output for at least the 3 previous trials, very similar to Laming's original finding in the time-domain. Converting to a MA representation, this "memory" extends even further in terms of stimulus inputs (**Figure 9A**). We also found a transient response at the beginning of each block lasting less than 10 trials, which was similar to the predicted step response of the steady-state dynamics (**Figure 8D**). The simplest explanation is that the rest time between blocks allowed the memory "trace" to decay. However, this needs further exploration since we did not manipulate block intervals, and it was not possible to distinguish between sequential dependencies that are based on absolute time or based on trial number.

Based on moments, the main effect of this temporal dependency was to scale the mean response rate to higher values (i.e., shorten RTs) with little change in variance or higher moments (**Figure 9B**). One could view this as improving signal-noise ratio, or that previous trials/stimuli provide some information about

the upcoming stimulus (prediction), hence allowing a faster response. Because higher moments are negligibly affected by the MA process, we can also conclude that the temporal sequential dependency does not cause rate to be Normal via the central limit theorem, and we deduce that the input must already be near-Normal.

We also found a sequential dependency that was related to the sequence of stimulus laterality for the easy tasks. Using Jentsch and Sommer's binary weighting system, we found a remarkably similar result to theirs for the easy tasks with **RRRR** and **AAAA** having the highest rates (shortest RTs) and **AAAR**, **RRRA**, **ARRA** having the lowest rates (longest RTs) (**Figure 10**). The weighting scheme of Jentsch and Sommer's extends backward for 4 lags and assumes binary (power function) weighting. From the temporal viewpoint, our results suggest that the 4th lag is questionable and that weightings should follow an approximately hyperbolic decrease. Using this scheme, the dependency becomes even more pronounced (not shown). It is tempting to argue that the temporal and spatial dependencies are manifestations of the same process. Jentsch and Sommer have assumed the dependency reflects a decaying memory trace, as this would explain why higher-order dependencies tend to be weaker when the trials are

longer in absolute time. Indeed, we found that the spatial dependency was absent in the difficult tasks (**Figure 10**). Surprisingly, the temporal dependency was still present and virtually identical to the easy task AR process. The reason for this is unclear at present, but suggests that temporal and spatial dependencies can be dissociated.

We emphasize that we have examined sequential dependency in the rate-domain. In the rate domain, a sequence of responses is a well-behaved stochastic process because of its near-Normality, and this permits the wide range of standard analysis techniques (moments, autocorrelations, spectral analyses, etc). In the timedomain this is not necessarily the case because taking the reciprocal of rate is a non-linear operation. Trials with low rates become disproportionately magnified in the time domain, which can lead to "spikes" with very long RTs. In particular, there is the possibility that artefacts may arise in power spectra as these spikes have high spectral energy, and we advocate caution interpreting power spectra based only on time-domain analyses (e.g., 1/f noise: Thornton and Gilden, 2005) subject to further exploration.

#### **TRUNCATION**

Strictly, the Normal distribution has infinite extent and includes zero and negative rates, but this is not possible in RT experiments, so we need to consider the left-truncated Normal and the corresponding reciprocal truncated Normal (Harris and Waddington, 2012). We observed that when the task became more difficult (AD and UD), there was a leftward shift of the rate distribution (i.e., longer RTs) (**Figure 2A**) suggesting that left-truncation

**FIGURE 8 | Sequential dependency based on blocks without transients (first 10 trials omitted). (A)** Mean partial autocorrelation function (PACF) of all blocks (filled symbols) showing smooth decay. Lines are ± 1 standard error. Open symbols show PACF for the same data after random shuffling leaving no sequential dependency. **(B)** PACF is plotted against reciprocal of lag showing a roughly linear increase (filled symbols). After de-correlation (see text) PACF coefficients become negligible (open symbols). **(C)** Maximum likelihood estimation of autoregressive coefficients (Equation 1.1) using the Box-Jenkins methods (see Methods) showing linear increase with reciprocal lag. **(D)** Comparison of step response function of autorgressive model (solid curve) with observed initial transient from grand mean in **Figure 7B**.

truncated at 0. Note truncation is eliminated by smoothing. Resulting pdf could be mistaken for a reciprocal inverse Gaussian distribution (dotted curve).

may have occurred. Because moments are sensitive to truncation, we used MLE to find the underlying Normal that fitted each block the best, and this showed that truncation was occurring (**Figure 4**). Collapsing across these subjects showed that the untruncated right half of the distribution was also very close to Normal (**Figure 6A**). This is a novel finding, and is evidence that task difficulty can lead to truncated Normal rate distributions. This has not been considered in previous models but has some far-reaching implications.

Truncation leads to very long RTs, which could theoretically approach infinity. Such responses would not usually be observed because either the experimenter imposes a maximum trial duration (time-out), or because the experiment is of finite duration in time or in number of trials. Thus, practically, rate will appear bound at some non-zero minimum, depending of the experimental design (see Harris and Waddington, 2012 for further discussion). For easy tasks, this will have minimal effect since long RTs are rare, but as the task becomes more difficult, the effect of truncation becomes increasingly important.

Interestingly, it has been proposed that the latency distribution of saccades departs from reciprocal normal for low stimulus contrasts, and that the inverse Gaussian is a better model (Carpenter et al., 2009). However, could this instead be due to truncation of the reciprocal Normal? Consider the theoretical example in **Figure 9C**, where we have set the rate standard deviation to 0.3 s−<sup>1</sup> with left truncation set by a mean of 0. The effect of temporal sequential dependency is to smooth out the truncation, which reduces the probability of very long RTs. The resulting pdf could easily be mistaken for the reciprocal inverse Gaussian (**Figure 9C** dotted curve). Thus, in the timedomain, it is plausible that studies using the inverse Gaussian may have overlooked the reciprocal truncated Normal with sequential dependency as a more parsimonious and unifying explanation.

#### **NON-HOMOGENEITY**

In this experiment we have used homogenous and stationary blocks, where the same stimuli were used in each trial of a block, and the laterality was random. However, many RT experiments are not homogenous, and the stimulus value changes on trials within a block. Generally, we expect that rate would no longer be reciprocal Normal. We distinguish between discrete and continuous non-homogeneity.

In the discrete case, a block contains a small number of different but known stimuli that are typically randomized or counterbalanced within the block. Assuming independent trials, the observed rate on each trial would then be a single sample from the Normal distribution associated with that stimulus. The overall rate distribution would then be a mixture of Normal distributions depending on the value and relative frequency of each stimulus. Since the stimulus is known on each trial, responses could be segregated and the rate distributions computed. Clearly, any sequential dependency should be reduced before segregation.

The continuous case is more problematic. It typically occurs when task difficulty and/or stimulus value vary on every trial in an unknown way. The rate on each trial can still be considered as a single sample from a Normal distribution, but the mean of the rate distribution (and possibly the standard deviation) are continuously variable leading overall to a compound Normal distribution, which can take on a wide range of positively or negatively skewed shapes. Whether de-convolving a putative Normal distribution is useful remains to be explored on real data.

#### **RATE AND OPTIMALITY**

As posed in the introduction, why RTs are so variable and whether, or under what circumstances, they could be optimal are longstanding questions that have been asked or assumed to be answerable by time-domain analysis (e.g., Luce, 1986; Bogacz et al., 2006). However, our and Carpenter's data are highly suggestive that there exists a preferred rate, *r*∗, for a given set of experimental conditions, and that rate fluctuates according to a Normal random process from trial to trial around *r*∗. Clearly, modest symmetrical variations in rate can lead to very large and highly asymmetric fluctuations in the time domain, especially when *r*<sup>∗</sup> is small—as occurs in difficult discriminative tasks. Also *r*<sup>∗</sup> is easily recognizable as the modal rate, but there is no obvious landmark in the time domain: *t* <sup>∗</sup> = 1/*r*<sup>∗</sup> does not correspond to the mode in the time-domain. Moreover, the rectrN is a strange distribution without finite moments (Harris and Waddington, 2012), whereas the Normal distribution is a common basic distribution. This strongly suggests that we should be considering rate as the more fundamental variable than RT, even if it seems counter-intuitive.

It seems that if we accept a rise to threshold model, then we require a deterministic drift rate that fluctuates *between* trials with a truncated Normal distribution, as originally proposed by Carpenter (1981). It is conceivable that there is still a stochastic rise to threshold, but it would need to be almost completely masked by the inter-trial variability (this needs future modeling), and rate is still the dominant variable. However, it is important not to conflate proximal with ultimate explanations. At the proximal level, there must be some physiological mechanism for triggering an all-or-none response, and an accumulator process seems physiologically plausible. However, even if true, it only explains how rate could be represented mechanistically, and there is a myriad of ways in which an accumulator could be constructed/evolved as a trigger (e.g., linear vs. curvilinear signal rise, deterministic vs. stochastic signal, fixed vs. variable trigger level; **Figure 1A**). It does not explain why rate is important.

Rate of response may be fundamental for an organism. For example, in the study of natural foraging, it is widely assumed that animals seek to maximize the rate of nutrient intake, rather than quantity *per se*. This has led to the marginal value theorem (Charnov, 1976) which predicts the time spent by animals on patches of food. In the study of animal learning, Skinner introduced his famous cumulative plots as a way of visualizing the stationarity of an animal's rate of response (Skinner, 1938; Ferster and Skinner, 1957). There is an obvious parallel between RT and operant behavior. When a subject presses a button ("operant"), she presumably derives a reward if the button press is a "correct" response, and a loss if "incorrect." The onset of lights acts as a "discriminant" or "conditioned" stimulus that provides information about the probability of reward (Skinner, 1938). It is well known that response times decrease with increasing reward but also increasing intensity of the conditioned stimulus (Mackintosh, 1974). Similarly, numerous studies have shown RTs decrease with increasing reward (Takikawa et al., 2002; Lauwereyns and Wisnewski, 2006; Spreckelmeyer et al., 2009; Milstein and Dorris, 2011; Delmonte et al., 2012; van Hell et al., 2012; Gopin et al., 2013) or increasing stimulus intensity (Cattell, 1886; Piéron, 1914). This leads us to consider the possibility of maximizing expected *rate* of reward or utility as an explanation for our observations (also considered by Gold and Shadlen, 2002).

For each trial, we define the gain in subjective utility for a correct response by *U*<sup>+</sup> > 0, and the loss by *U*<sup>−</sup> > 0. Objectively, utility would be maximized by responding to the correct stimulus any time after the stimulus onset. The stimulus value depends on the temporal response of the visual system, and will also increase in time due to any temporal integration and/or Bayesian update of priors. We therefore denote *p*(*t*) as the subjective probability of making a correct response given that a response occurs at *t* (measured relative to some origin; see below). We assume that *p*(*t*) is a concave function (**Figure 11A**), where for two alternatives with no prior information, *p*(0) = 0.5.

The expected gain in utility *G*ˆ(*t*) for a response at time *t* is (curve in **Figure 11B**):

$$\hat{G}(t) = U^+ p(t) - U^- \left(1 - p(t)\right) = \left(U^+ + U^-\right) p(t) - U^- \tag{1.4}$$

It can be seen that expected gain will be negative when *t* < *t*min, where *p*(*t*min) = 1/ *U*+/*U*<sup>−</sup> + 1 . In this case, it does not pay to respond at all, but there will always be a positive gain as *p*(*t*) → 1 and maximized by responding as late as possible. Expected rate of gain is *R*ˆ(*t*) = *G*ˆ(*t*)/*t*. When rate of gain is positive, there may be an optimal time to respond given by *t* <sup>∗</sup> = argmax *t R*ˆ(*t*), which is

the solution to:

$$t^\* = \frac{\hat{G}(t^\*)}{\hat{G}'(t^\*)}\tag{1.5}$$

where the dash refers to the derivative with respect to *t* (**Figure 11C**). The conditions for a positive maximum are complicated, but it occurs under quite broad conditions and is easily visualized geometrically in **Figure 11B**, since from (1.4) the optimum is given by the tangent of *G*ˆ(*t*) that intercepts the origin. Thus, depending on the utility payoff ratio *U*+/*U*−, and *p*(*t*), there is an optimal time to respond. *Responding as quickly as possible is generally suboptimal—*it pays to wait for a specific time to respond.

We can make some general deductions. First, any increase/decrease in the utility payoff ratio, *U*+/*U*−, will reduce/increase *t* <sup>∗</sup> for a concave *p*(*t*). Thus, increasing reward will reduce *t* <sup>∗</sup>, as empirically observed (*vide supra*). In our experiment, asking subjects to respond accurately as opposed to quickly required "caution" by reducing the ratio and increasing *t* <sup>∗</sup> (**Figure 2**).

Faster/slower rise in *p*(*t*) will also reduce/increase *t* <sup>∗</sup> similar to, but not in precisely the same manner as manipulating payoff. For example, increasing the number of alternatives, *n*, will reduce *p*(*t*) since *p*(0) = 1/*n* (given no other prior information) and hence increase *t* <sup>∗</sup>. Whether there is a logarithmic relationship between *n* and E[*t* <sup>∗</sup>] (Hick's law) depends on the precise form of *p*(*t*) and remains to be explored. On the other hand, any prior information will decrease the rise-time of *p*(*t*) and reduce *t* <sup>∗</sup>, as has been reported in some experiments with random foreperiods (see Niemi and Näätänen, 1981).

Stimulus intensity has a strong inverse relationship on *t* <sup>∗</sup>, but this depends on *p*(*t*). The simplest way to parameterize *p*(*t*), is to assume that *p*(*t*) depends on a single parameter, ε, that accelerates time so that *p*ε(*t*) = *p*(ε*t*). We assume that εˆ is an unbiased estimate of ε and distributed Normally across trials. It follows that *G*ˆ <sup>ε</sup>ˆ(*t*) = *G*ˆ(εˆ*t*) and *G*ˆ  εˆ (*t*) = ˆε*G*ˆ(εˆ*t*). Then (1.5) becomes

$$
\hat{G}'(\hat{\varepsilon}t^\*) = \frac{\hat{G}(\hat{\varepsilon}t^\*)}{\hat{\varepsilon}t^\*} \tag{1.6}
$$

so it follows that the optimal solution *t* <sup>∗</sup> is given by:

$$t^\* = \frac{t\_1}{\hat{\varepsilon}}\tag{1.7}$$

**FIGURE 11 | Rate model. (A)** *p*(*t*) is subjective probability of being correct given a response is made at time *t,* and is assumed to be concave. Initial value of *p*(*t*) assumes guessing with no prior information, and final value is assumes that response will be correct given infinite time. **(B)** *G*ˆ (*t*) is the expected gain in utility (Equation 1.4) for a response made at time *t*. Note that gain may be negative (i.e., loss) for *t* < *t*min (dashed curve) and no response is optimal. **(C)** *R*ˆ (*t*) is the expected rate of gain in utility (Equation 1.5) which has a maximum at *t*∗, and can be visualized geometrically as the point where the tangent touches *G*ˆ (*t*) in **(B**,**D)** shifting the time origin back by γ increases *t*<sup>∗</sup> by γ  (see text).

where *t*<sup>1</sup> is the solution to (1.6) evaluated at εˆ = 1. Thus, if each trial is optimized based on the estimate εˆ, then the optimal time to respond is distributed with the reciprocal of the distribution of εˆ and hence has a reciprocal Normal distribution, as observed.

Since only one reward can occur per trial, we would expect trial duration to be the more relevant epoch for response rate, rather than decision time *per se*. Including an additional non-decision time *TND* (foreperiod, sensorimotor delays, etc.) in the computation of estimated rate: *R*ˆ(*t*) = *G*ˆ(*t*)/(*t* + *TND*) yields the more general equation for *t* ∗

$$t^\* + T\_{\rm ND} = \frac{\hat{G}(t^\*)}{\hat{G}'(t^\*)}\tag{1.8}$$

As shown in **Figure 11D**, including *TND* increases optimal response time (relative to stimulus onset). In other words, decision time depends on the amount of non-decision time.

**FIGURE 12 | (A)** Effect of scaling factor εˆ on optimal decision time *t*<sup>∗</sup> for different non-decision time *TND* = {0, 10, 100, 1000} (see text). Note that *t*<sup>∗</sup> and hence RT increases with *TND*, although asymptote is zero (not shown); **(B)** same as **(A)** but on log-log axes (base 10) showing near power function *t*<sup>∗</sup> ≈ *a*ε−*<sup>k</sup>* with *k* = {0, 0.82, 0.83, 0.87} and *a* = {25.1, 25.1, 39.8, 63.1} from linear regressions; **(C)** linear plot of optimal rate *r*<sup>∗</sup> vs. εˆ. Although strictly a power function, relationship is locally quasi-linear.

Returning to the parametric model: *G*ˆ <sup>ε</sup>ˆ(*t*) = *G*ˆ(εˆ*t*), we note that

$$
\hat{\varepsilon}\left(t^\* + T\_{\text{ND}}\right) = \frac{\hat{G}(\hat{\varepsilon}t^\*)}{\hat{G}'(\hat{\varepsilon}t^\*)}\tag{1.9}
$$

The solution is not the same as for (1.6), and requires an explicit form for *p*(*t*). For the purposes of illustration, we assumed a simple exponential form of *p*(*t*) = 1/2 + 1 − exp ( − ˆε*t*) /2 and plotted *t* <sup>∗</sup> against εˆ with *U*<sup>+</sup> = 1, *U*<sup>−</sup> = 5 and parametric in *TND* (**Figure 12A**). As can be seen, *t* <sup>∗</sup> decays with increasing εˆ but also increases with *TND*. Although we did not manipulate "nondecision" time here, others have shown that increasing foreperiod increases RT in both simple (Niemi and Näätänen, 1981) and choice RT (Green et al., 1983).

For *TND* > 0, the relationship is still very close to a power law with *t* <sup>∗</sup> ≈ *a*ε−*<sup>k</sup>* where *k* ≈ 0.8 (**Figure 12B**). In terms of rate, we can see that as *TND* increases, *r*<sup>∗</sup> decreases but the relationship to εˆ is still locally close to linear even for very large *TND* (**Figure 12C**). Thus, if εˆ is Normally distributed *r*<sup>∗</sup> will also be very near Normal.

If we add sensorimotor delays γ to decision time, then we have *RT* = *a*εˆ−*<sup>k</sup>* + γ which is clearly similar to Pieron's law: *E* [*RT*] = α*I*−<sup>β</sup> + γ , where α,β and γ are constants for a given experiment and *I* is objective stimulus intensity. Piéron's law was originally found for simple RT experiments, but also holds for choice RTs (van Maanen et al., 2012). If we assume that εˆ is subjective estimated stimulus intensity, then we require εˆ ∝ *I*β/*<sup>k</sup>* which is plausible from Steven's power law (Chater and Brown, 1999).

#### **MECHANISM**

How optimal rate could be controlled is open to speculation. We can see that the mechanism in **Figure 1A** could act as an equation solver since the time of crossing is the solution of ρ(*t*) = θ(*t*) [more formally: the lowest real positive root of ρ(*t*) − θ(*t*)], and when equality is reached, the behavior is triggered in real-time. This can be mapped onto (1.5) in an infinity of ways. A simple possibility is that a deterministic linear rise to threshold behaves as rate-to-time converter (**Figure 1C**). The input *R*ˆ(*t*) is integrated in time to yield a rising deterministic ρ(*t*) which triggers the response when then a threshold is reached. Gold and Shadlen (2002) proposed that an optimal decision time could be found by an adaptive process (trial-and-error) that varies the threshold. In this case, the distribution of decision times would be given by the distribution of thresholds (for a fixed ρ(*t*)), but this hardly explains why RTs have a near-rectrN distribution. A more parsimonious model would be that the optimal ρ(*t*) is found for a fixed threshold (i.e., Carpenter's original model). Normally distributed estimates of ρ(*t*) would then yield RTs with the observed rectrN distribution. It is possible that both threshold and ρ(*t*) are variable leading to a ratio of distributions for decision time (Waddington and Harris, 2013), although we have no evidence for this in this experiment.

Taking a different perspective, we can draw a correspondence between rate (responses per second) and frequency (cycles per second), and consider control by underlying banks of oscillators in the Fourier domain. It is conceivable that repetitive nature of RT experiments entrain oscillator frequencies, possibly with phase resets from the stimulus onset to allow some degree of prediction. Our observed temporal and spatial sequential dependencies could reflect this entrainment (phase-locking), and the Normal distribution of rate could reflect sampling of subpopulations of oscillators. This is speculative, but not discordant with the known correlation between RTs and alpha brain waves (Drewes and van Rullen, 2011; Diederich et al., 2012; Hamm et al., 2012).

#### **SUMMARY**

For 2-alternative manual choice RTs, distributions are close to the reciprocal Normal but not close to the inverse Gaussian distribution. This is not consistent with stochastic rise to threshold models, and implies that between-trial rate (reciprocal RT) is a fundamental variable. There are significant between-trial temporal and spatial sequential dependencies extending back about 3 lags. When tasks become difficult, the rate distributions shift to the left and becomes truncated near zero. We deduced true truncation could not occur due the sequential dependency, but rate distributions are still close to the truncated Normal. Responding to back-to-back sequences of hundreds of almost identical RT trials is not a natural behavior. Nevertheless, it does reflect decision-making when there is time pressure. We propose that when gain in utility is an increasing concave function of time (speed-accuracy trade-off) there emerges an optimal time of response when time is a penalty. We propose that response rate reflects such a process and argue against the longstanding assumption of rise-to-threshold.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 March 2014; accepted: 21 May 2014; published online: 10 June 2014. Citation: Harris CM, Waddington J, Biscione V and Manzi S (2014) Manual choice reaction times in the rate-domain. Front. Hum. Neurosci. 8:418. doi: 10.3389/fnhum. 2014.00418*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Harris, Waddington, Biscione and Manzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A new perspective on binaural integration using response time methodology: super capacity revealed in conditions of binaural masking release

#### *Jennifer J. Lentz <sup>1</sup> \*, Yuan He1 and James T. Townsend2*

*<sup>1</sup> Department of Speech and Hearing Sciences, Indiana University, Bloomington, IN, USA*

*<sup>2</sup> Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA*

#### *Edited by:*

*José M. Medina, Universidad de Granada, Spain*

#### *Reviewed by:*

*Martin Lages, University of Glasgow, UK Josef Schlittenlacher, Technische Universität Darmstadt, Germany*

#### *\*Correspondence:*

*Jennifer J. Lentz, Department of Speech and Hearing Sciences, Indiana University, 200 S. Jordan Ave., Bloomington, IN 47405, USA e-mail: jjlentz@indiana.edu*

This study applied reaction-time based methods to assess the workload capacity of binaural integration by comparing reaction time (RT) distributions for monaural and binaural tone-in-noise detection tasks. In the diotic contexts, an identical tone + noise stimulus was presented to each ear. In the dichotic contexts, an identical noise was presented to each ear, but the tone was presented to one of the ears 180◦ out of phase with respect to the other ear. Accuracy-based measurements have demonstrated a much lower signal detection threshold for the dichotic vs. the diotic conditions, but accuracy-based techniques do not allow for assessment of system dynamics or resource allocation across time. Further, RTs allow comparisons between these conditions at the same signal-to-noise ratio. Here, we apply a reaction-time based capacity coefficient, which provides an index of workload efficiency and quantifies the resource allocations for single ear vs. two ear presentations. We demonstrate that the release from masking generated by the addition of an identical stimulus to one ear is limited-to-unlimited capacity (efficiency typically less than 1), consistent with less gain than would be expected by probability summation. However, the dichotic presentation leads to a significant increase in workload capacity (increased efficiency)—most specifically at lower signal-to-noise ratios. These experimental results provide further evidence that configural processing plays a critical role in binaural masking release, and that these mechanisms may operate more strongly when the signal stimulus is difficult to detect, albeit still with nearly 100% accuracy.

**Keywords: reaction time, binaural hearing, masking release, systems factorial technology, workload capacity**

# **INTRODUCTION**

An integral question in psychoacoustics is that of binaural integration: how information presented to the two ears is combined in order to form a unified percept. In natural environments, the sounds received by the two ears are typically different from one another, but experiments using headphones allow identical stimuli to be presented to both ears. It is well-known that identical auditory stimuli presented to each ear are perceived as a single sound (e.g., Leakey et al., 1958), but there are also many instances in which unified percepts are elicited when different signals are presented to the two ears (e.g., if a sound source is presented to one side of a listener). In his seminal work on the "cocktail party effect," Cherry (1953) demonstrated that the auditory system generates fused percepts of auditory sources in sophisticated listening situations. Although multiple cues are used by the auditory system to accomplish this goal, the binaural system is a critical component of this process (see Bregman, 1994 for a review).

One notable aspect of many studies is that they evaluate the mechanisms responsible for detection using thresholdand accuracy-based techniques. Accuracy based methods can answer many important questions pertaining to various aspects of perception and cognition. Yet, they are inherently limited when issues pertaining to dynamic mechanisms are raised, since by definition they ignore temporal features of the system and correlate data (e.g., see Van Zandt and Townsend, 2013).

We can apply a separate strain of research in perceptual and cognitive psychology which focuses on multiple signals vs. a single signal (or more specifically, two ears vs. one ear) and primarily uses reaction time (RT) for its dependent variable. We will refer to that approach as the "redundant signals approach" (cf. Bernstein, 1970; Grice et al., 1984). Its terminology is, of course, rather different than that typically employed in the hearing domain but we will strive to provide sufficient bridges across the divide.

Within that general domain, strong tools have been developed that can assist the investigator in unveiling the dynamics of the underlying perceptual system. We suggest that the two basic measures, accuracy, and RT, can together go a long way in answering fundamental questions within binaural hearing. In fact, statistics derived within a theoretical, information processing framework have led to theory-driven methodologies within which various aspects of cognitive sensory processing can be evaluated.

The fundamental goal of this study is to apply the redundant signals techniques to further our understanding of the mechanisms responsible for integrating information across the ears. However, we need to first review some of the germane, basic findings in the binaural literature. Almost all of these were accuracy based but a few measured RTs.

Several psychophysical approaches have been taken to address the fundamental question of binaural integration with a substantial proportion of experiments using a basic task—detecting a tone added to a band of noise. In these experiments, the detection threshold level of the tone is typically measured (cf. Fletcher, 1940). The tone + noise stimulus can be presented to a single ear, commonly referred to as *monaural* presentation, denoted *NmSm*, where *N* refers to the noise, *S* refers to the tonal signal, and *m* denotes the monaural presentation. The tone + noise stimulus can also be presented to both ears. If both ears receive identical signals, we refer to this as a *diotic*, homophasic presentation, *N*0*S*0, where 0 represents identical noise (*N*0) and identical tone (*S*0) presented to each ear. A number of psychophysical studies have demonstrated that presenting a tone-in-noise diotically yields, at most, a marginal improvement in the detection threshold of the pure tone compared to a monaural presentation (e.g., Hirsh and Burgeat, 1958; Egan et al., 1969; Davidson et al., 2006).

In fact, to date, thresholds for *N*0*S*<sup>0</sup> and *NmSm* are generally treated as being the same (cf. Durlach and Colburn, 1978). For threshold-based tests, then, there appears to be little or no benefit to having the redundant tone-in-noise presented to a second ear, although a small benefit has been reported for detecting pure tones in quiet (cf. Moore, 2013). Consequently, performance in the diotic conditions (for tones alone or tones in noise) is worse than a probability summation model would predict with accuracy being, at best, slightly better for two ears compared to one.

Of course, natural conditions typically allow the two ears to receive different signals. Such a situation would occur when a sound source is not directly in front of the listener. Any instance in which the ears receive different signals is referred to as *dichotic* listening. In a very special case, when presenting sounds over headphones, one can present a noise source identical (correlated) between ears (*N*0) with a signal source uncorrelated between the ears. If the signal stimulus is presented π radians out of phase across the ears, we refer to this as an antiphasic presentation, *N*0*S*π. Here, the signal level at threshold is much lower than in the *N*0*S*<sup>0</sup> condition, with the difference in threshold commonly referred to as the binaural masking level difference (BMLD; e.g., Hirsh, 1948; Jeffress et al., 1952; Egan, 1965; Henning, 1965; Henning et al., 2005; Davidson et al., 2009). The dichotic stimulation thus leads to superior accuracy over either monaural or diotic performance. Models of these types of psychophysical data include processes of interaural cross-correlation, equalization and cancelation, and across-ear inhibition (e.g., Bernstein et al., 1999; Breebaart et al., 2001; Davidson et al., 2009).

To summarize, first the performance in the diotic conditions is worse than a probability summation model would predict but with a slightly better relative accuracy in the binaural vs. monaural conditions. Secondly, dichotic stimulation with inverted tones leads to superior performance. An ideal detector which could cancel the noise would allow for this superior result, but would predict signal detection thresholds in *N*0*S*<sup>π</sup> to be the same as in quiet (Durlach and Colburn, 1978). Because masking still does occur (that is, thresholds in *N*0*S*<sup>π</sup> are not equivalent to unmasked thresholds), the noise cancelation process, though robust, is imperfect.

Both these findings indicate the absence of independent detection with each detector being the same (i.e., just as good but no better) with both ears functioning as with only one. The substandard performance in the diotic conditions could presumably be due to limitations in capacity (i.e., caused by inadequate resources available to both ears simultaneously or perhaps to mutual channel inhibition). However, the superior performance found with the dichotic conditions suggests, as noted, some type of either energy or activation summation or, contrarily, a type of information interaction as intimated by the cross-correlation interpretation.

Moving on to consider what has been accomplished in the binaural detection domain with RT as the dependent variable, in 1944, Chocholle was the first to measure RTs for binaural vs. monaural stimulation, demonstrating that binaural detection of pure tones (in quiet) was faster than monaural detection. Simon (1967) showed that the difference in mean RT between binaural and monaural stimulation was very small (about 4 ms for an average 200 ms RT) but statistically significant. More recently, Schlittenlacher et al. (2014) also demonstrated a 5–10 ms binaural advantage in RT. These studies reported only mean RTs and without a deeper quantitative analysis, one is challenged to establish how activation of the two ears relates to resource allocation.

A seminal RT based study within the domain of redundant signals literature, was undertaken by Schröter et al. (2007) who reported RT distributions for detection of a 300-ms, 60 dB SPL pure tone presented to the left ear, the right ear, or both ears. Whether the two tones had identical or different frequencies, there was little evidence for a redundant-signal benefit. That is, although RTs were slightly faster for detecting two tones vs. one tone, the increase in RT was less than would be expected under probability summation. However, in a second experiment, one of the tones was replaced by a noise, and here they found faster RTs than would be predicted by a probability summation model. We will discuss the Schröter et al. (2007) results alongside our own.

Our approach here will be to implement a suite of tools from the theory-driven RT methodology, "systems factorial technology" (subsequently SFT) originated by Townsend and colleagues (e.g., Townsend and Nozawa, 1995; Townsend and Wenger, 2004a). This methodology permits the simultaneous assessment of a number of critical information processing mechanisms within the same experimental paradigm. These tools will allow an analysis of resource allocation and interaction between the two ears and also provides for psychophysical assessment under very different conditions than accuracy- or threshold-based measures.

First, RTs can be measured under conditions of very high accuracy, tapping into different locations on the psychometric function. With respect to BMLD studies, the psychometric functions for detecting a tone added to noise in the *N*0*S*<sup>0</sup> and *N*0*S*<sup>π</sup> contexts are parallel but they do not overlap when the masking release is large (Egan et al., 1969). Because the psychometric functions do not overlap, auditory mechanisms are evaluated for these two contexts at largely different SNRs. Given the nonlinear nature of the ear, it is indeed possible that different auditory mechanisms may be invoked at the two different SNRs estimated at threshold. Second, accuracy-based techniques do not allow easy assessment of the dynamics of the system without clever stimulus manipulations that can be difficult to implement acoustically. Finally, RT measures can provide a complement to accuracy-based measures in our attempt at converging on a unified understanding of the mechanisms responsible for perception. Since the broad suite of tools available in SFT has not heretofore been implemented in binaural perception and not at all to the release from masking phenomenon, the following section provides a brief tutorial.

### **ARCHITECTURE: THE SERIAL vs. PARALLEL ISSUE**

One of the first issues to address is the form, or the architecture, used by a system. We define *serial processing* as processing things one at a time or sequentially, with no overlap among the successive processing times. Processing might mean search for a target among a set of distractors in memory or in a display, solving facets of a problem, deciding among a set of objects, and so on. *Parallel processing* means processing all things simultaneously, although it is allowed that each process may finish at different times (Townsend et al., 2011).

Although the term *architecture* might seem to imply rigid structure, we may also employ it to refer to more flexible arrangements. Thus, it might be asserted that certain neural systems are, at least by adulthood, fairly wired in and that they act in parallel (or in some cases, in serial). On the other hand, a person might scan the newspaper for, say, two terms, one at a time, that, is serially or, by dint of will, might try to scan for them in parallel. Although parallel vs. serial processing is in some sense the most elemental pair of architectures, much more complexity can be imagined and, indeed, investigated theoretically and empirically (e.g., Schweickert, 1978; Schweickert and Townsend, 1989). **Figure 1** illustrates the architecture associated with serial and parallel processing.

If we are dealing with only one or two channels or items, we shall often just refer to these as *a* or *b*, but if we must consider the general case of arbitrary *n* items or channels, we list them as 1*,* 2*,..., n* − 1*, n*. In a serial system, then, if *n* = 2, and channels *a* and *b* are stochastically independent (see subsequent material for more on this issue), then the density of the sum of the two serial times is the convolution of the separate densities (Townsend and Ashby, 1983, p. 30).

This new density is designated as *fa*(*t*)<sup>∗</sup>*fb*(*t*), where the asterisk denotes convolution and *a* and *b* are processed serially. The mean or expectation of the sum *E*[*Ta* + *Tb*] = *E*[*Ta*] + *E*[*Tb*] indicates that the overall completion time for serial processes is the sum of

all the individual means. The standard serial model requires that *fa*(*t*) = *fb*(*t*), which in turn implies that *E*[*Ta*] = *E*[*Tb*] = *E*[*T*], and *E*[*Ta* + *Tb*] = 2*E*[*T*].

In parallel processing, assuming again stochastic independence across the items or channels, the overall completion time for both items has to be the last, or maximum finishing time for either item. Thus, the density that measures the last finishing time is *f*max(*t*) = *fa*(*t*)*Fb*(*t*) + *fb*(*t*)*Fa*(*t*). While *f*(*t*) represents the density function, *F*(*t*) represents the cumulative distribution function. The interpretation of this formula is that *a* is either the last to finish by time *t* (*b* is already done by then), or *b* finishes last at time *t* and *a* is already done by then. In this case, we can write the mean in terms of the survivor function: *E*[*T*] = - *S*(*t*)*dt*, integrating *t* from 0 to infinity. The survivor function in the present situation is *S*(*t*) = 1 − *Fa*(*t*)*Fb*(*t*) and the mean can be calculated using the already given integral.

### **STANDARD SERIAL MODELS**

This type of model is what most people mean when they only say "serial unadorned." Thus, it is the model advocated by Sternberg in many of his early papers (e.g., Sternberg, 1966). To reach it in the case that *n* = 2, let *fa*(*t*) = *fb*(*t*) = *f*(*t*). That is, the probability densities are the same across items or positions and even *n*. The latter indicates that *f*(*t*) defines the length of time taken on an item or channel no matter how the size of the set of operating items or channels. Furthermore, it is assumed in the standard serial model that each successive processing time is independent of all others. So, if *a* is second, say, its time does not depend on how long the preceding item (e.g., *b*) took to complete its processing.

Note, however, that we allow that different paths through the items might be followed from trial to trial. We also do not confine the stopping rule to a single variety. Now, Sternberg's preferred model assumed that exhaustive processing (all items were required to finish to stop) was used even in target-present trials. But we allow the standard model to follow other, sometimes more optimal, rules of cessation. Because all the *n* densities are now the same we can simply write the *n*th order convolution for exhaustive processing in symbolic form as *f*max(*t*) = *f* <sup>∗</sup>(*n*) (*t*). The exhaustive mean processing time is then *E*max[*T*<sup>1</sup> + *T*<sup>2</sup> + *...* + *Tn*] = *nE*[*T*].

Next consider the situation where exactly one target is present among *n* − 1 distractors and the system is self-terminating (ST; only one item is required to stop the process). Again, it is assumed that the target is placed with probability 1/*n* in any of the *n* locations. Then it follows that *fst*(*t*) = 1*/n f* <sup>∗</sup>(*i*) . The mean processing time in this case is the well-known *Est*[*T*] = (*n* + 1)*E*[*T*]*/*2. This formula can be interpreted that on average, it takes the searcher approximately one-half of the set of items to find the target and cease processing. Finally, when processing stops as soon as the first item is finished, then we have the result *f*min(*t*) = *f*(*t*) and that *E*min[*T*] = *E*[*T*].

#### **STANDARD PARALLEL MODELS**

The standard parallel model also assumes independence among the processing items, but this time in a simultaneous sense. Thus, the processing time on any individual channel is stochastically independent of that of any other channel. The standard parallel model further assumes unlimited capacity. The notion of capacity will be developed in detail below but suffice to mention for the moment that it means that, overall, the speed of each channel does not vary as the number of other channels in operation is varied. However, we do not assume that the various channel distributions are identical, unlike the standard serial model. Here, mean exhaustive processing time is just *E*[MAX(*T*1*, T*2*,..., Tn*<sup>−</sup>1*, Tn*)] and the mean time in the event of single target self-termination and the target is in channel *i*, is simply *E*[*Ti*]. That for the minimum time (i.e., race) is *E*[MIN(*T*1*, T*2*,..., Tn*<sup>−</sup>1*, Tn*)].

# **SELECTIVE INFLUENCE**

For decades, a popular way to attempt to test serial vs. parallel processing has been to vary the processing load (i.e., number of items, *n*), and then to plot the slopes of the mean response times as a function. If the slope of such a graph differs significantly from 0, then processing is declared to be serial. If it does not differ significantly from 0, parallel processing is inferred. This reasoning is fallacious on several grounds but the major infirmity is that such "tests" are primarily assessing capacity as workload changes, not architecture. Thus, what is commonly determined to be evidence for serial processing can be perfectly and mathematically mimicked by a limited capacity parallel model (Townsend, 1990; Townsend et al., 2011).

Sternberg's celebrated additive factors (Sternberg, 1969) method offered a technique which avoided the fragile capacity logic, which could affirm or deny serial processing. The method was based on the notion of "selective influence" of mean processing times, which stipulated that each experimental factor affect one and only one psychological subprocess at the level of means. The challenge there was that the method did not directly test other important architectures such as parallel systems. Also, there was a lack of mathematical proof for the association of "factors that are additive" even with serial processing if the successive times were not stochastically independent and again, no clear way to include other architectures.

Townsend and Schweickert (1989) proved that if selective influence acted at a stronger level, then many architectures, including parallel and serial ones, could be discriminated at the level of mean response times. Subsequent work, and that which we attempted to implement here, extended such theorems to the more powerful level of entire response time distributions (Townsend and Nozawa, 1995; Townsend and Wenger, 2004b).

We have discovered many tasks where stern tests of selective influence are passed. When they do not pass the tests it can itself often help to determine certain aspects of a processing system (see, e.g., Eidels et al., 2011). However, the strict use of the methodology to assess architecture cannot be applied. As we will learn below, the tests were not successfully passed, and this feature does play an important role in our discussion.

#### **INDEPENDENCE vs. DEPENDENCE OF CHANNEL OR ITEM PROCESSING TIMES**

We also must discuss *independence* vs. *dependence* of channels, stages, or subsystems (these terms can be used interchangeably although the term stages is sometimes restricted to serial systems and channels to parallel systems). In this introduction, we have been explicitly assuming stochastic independence of processing times, whether the architecture is serial or parallel.

In serial processing, if the successive items are dependent, then what happens on *a*, say, can affect the processing time for *b*. Although it is still true that the overall mean exhaustive time will be the sum of the two means, the second, say *b*, will depend on *a*'s processing time. Speeding up *a* could either speed up or slow down *b* because they are being processed simultaneously; ongoing inhibition or facilitation (or both) can take place during a single trial and while processing is ongoing. Townsend and Wenger (2004b) discuss this topic in detail.

It is interesting to note that the earlier prediction of independent parallel processing in self-terminating situations will no longer strictly hold. However, it will still be true even if processing is dependent that the predicted ST density will be the average or expected value of the density in the channel where the sought-for target is located, *E*[*Ta*]. Only in the non-independent situation, this expectation has to be taken over all the potential influences from the surrounding channels.

# **STOPPING OR DECISION RULE: WHEN DOES PROCESSING CEASE?**

No predictions can be made about processing times until the model designer has a rule for when processing stops. In some high-accuracy situations, such as search tasks, it is usually possible to define a set of events, any one of which will allow the processor to stop without error. In search for a set of targets then, the detection of any one of them can serve as a signal to cease processing. A special case ensues when exactly one sought- for target is present. In any task where a subset of the display or memory items is sufficient to stop without error, and the system processor is capable of stopping (not all may be), the processor is said to be capacity of *self-termination*. Because many earlier (e.g., Sternberg, 1966) investigations studied exhaustive vs. singletarget search, self-termination was often employed to refer to the latter, although it can also have generic meaning and convey, say, *first-termination* when the completion of any of the present items suffices to stop processing. The latter case is often called an OR design because completion of any of a set of presented items is sufficient to stop processing and ensure a correct response (e.g., Egeth, 1966; Townsend and Nozawa, 1995).

If all items or channels must be processed to ensure a correct response then exhaustive processing is entailed. For instance, on no-target (i.e., nothing present but distractors or noise) trials, every item must be examined to guarantee no targets are present. In an experiment where, say, all *n* items in the search set must be a certain kind of target, called an AND design, exhaustive processing is forced on the observer (e.g., Sternberg, 1966; Townsend and Nozawa, 1995). Nevertheless, as intimated earlier, some systems may by their very design have to process everything in the search set, so the question is of interest even when, in principle, self-termination is a possibility.

Hence, in summary, there are three cases of especial interest:(a) minimum time, OR, or first-self-termination, where there is one target among *n* − 1 other items and processing can cease when it is found; (b) single-target self-termination, where there is one target among *n* − 1 other items and processing can cease when it

is found, and (c) exhaustive or AND processing, where all items or channels are processed. **Figure 2** depicts AND (exhaustive) and OR (first-terminating) processing in a serial system, whereas **Figure 3** does the same for a parallel system. Suppose again there are just two items or channels to process, *a* and *b*, and serial processing is being deployed. Assume that a is processed first. Then the minimum time processing density is simply *f*min(*t*) = *fa*(*t*), naturally just the density of *a* itself. Assume now there is a single target present in channel *a* and one distractor is in channel *b*, and self-terminating serial processing is in force. Then the predicted density is *fst*(*t*) = *pfa*(*t*) + (1 − *p*)*fb*(*t*)<sup>∗</sup>*fa*(*t*). That is, if *a* happens to be checked first, which occurs with probability *p*, then the processing stops. On the other hand, if *b* is processed first and *a* distractor is found then *a* has to be processed also so the second term is the convolution of the *a* and *b* densities. In the event that both items must be processed, then the prediction is just that given earlier: *f*max(*t*) = *fa*(*t*)<sup>∗</sup>*fb*(*t*).

When processing is independent parallel, the minimum time rule delivers a horse race to the finish, with the winning channel determining the processing time (**Figure 3B**). The density is just *f*min(*t*) = *fa*(*t*)*Sb*(*t*) + *fb*(*t*)*Sa*(*t*). This formula possesses the nice interpretation that *a* can finish at time *t* but *b* is not yet done (indicated by *b*'s survivor function), or the reverse can happen. If processing is single-target self-terminating with the target in channel *a*, parallel independence predicts that the density is the simple *fst*(*t*) = *fa*(*t*). Finally, if processing is exhaustive (maximum time) and independent, then processing is the same as shown before: *f*max(*t*) = *fa*(*t*)*Fb*(*t*) + *fb*(*t*)*Fa*(*t*) (**Figure 3A**).

The stopping rule in our experiments is always OR, that is, the observers were required to respond with the "yes" button if a signal tone appeared either in the left ear, the right ear, or both ears. Otherwise, they were instructed to respond with the "no" button.

#### **CAPACITY AND WORKLOAD CAPACITY: VARIOUS SPEEDS ON A SPEED CONTINUUM**

*Capacity* generally refers to the relationships between the speeds of processing in response-time tasks. Workload capacity will refer to the effects on efficiency as the workload is increased. For greater mathematical detail and in-depth discussion, see Townsend and Ashby (1978), Townsend and Nozawa (1995), and Townsend and Wenger (2004b). Wenger and Townsend (2000) offer an explicit tutorial and instructions on how to carry out a capacity analysis.

Informally, the notion of *unlimited capacity* refers to the situation when the finishing time of a subsystem (item, channel, etc.) is identical to that of a standard parallel system (described in more detail later); that is, the finishing times of the distinct subsystems are parallel, and the average finishing times of each do not depend on how many others are engaged [e.g., in a search task the finishing time marginal density function for an item, channel etc., *f*(*t*) is invariant over the total number of items being searched]. *Limited capacity* refers to the situation when item or channel finishing times are less than what would be expected in a standard parallel system. *Super capacity* indicates that individual channels are processing at a rate even faster than standard parallel processing. **Figure 4** illustrates the general intuitions accorded these concepts, again in an informal manner. The size of the cylinders provides a description of the amount of resources available.

The stopping rule obviously affects overall processing times (see **Figure 5** for a depiction of how RTs change with increasing workload for the different models). **Figure 5** indicates mean response times as a function of workload. *Workload* refers to the quantity of labor required in a task. Most often, workload is given by the number of items that must be operated on. For instance, workload could refer to the number of items in a visual display that must be compared with a target or memory item.

However, we assess capacity (i.e., efficiency of processing speed) in comparison with standard parallel processing with specification of a particular stopping rule. Thus, although the minimum time (first-terminating or OR processing) decreases as a function of the number of items undergoing processing (because all items are targets), the system is merely unlimited, not super, because the actual predictions are from a standard parallel model (i.e., unlimited capacity with independent channels). But observe that each of the serial predictions would be measured as limited capacity because for each stopping rule, they are slower than the predictions from standard parallel processing.

Although **Figure 5** indicates speed of processing through the mean response times, there are various ways of measuring this speed. The mean (*E*[*T*]) is a rather coarse level of capacity measurement. A stronger gauge is found in the cumulative distribution function *F*(*t*), and the hazard function [*h*(*t*), to be discussed momentarily] is an even more powerful and fine grained measure. This kind of ordering is a special case of a hierarchy on the strengths of a vital set of statistics (Townsend and Ashby, 1978; Townsend, 1990).

The ordering establishes a hierarchy of power because, say, if *Fa*(*t*) *> Fb*(*t*) then the mean of *a* is less than the mean of *b*. However, the reverse implication does not hold (the means being ordered do not imply an order of the cumulative distribution functions). Similarly if *ha*(*t*) *> hb*(*t*) then *Fa*(*t*) *> Fb*(*t*), but not vice versa, and so on. Obviously, if the cumulative distribution functions are ordered then so are the survivor functions. That is, *Fa*(*t*) *> Fb*(*t*) implies *Sa*(*t*) *< Sb*(*t*).

There is a useful measure that is at the same strength level as *F* or *S*. This measure is defined as—ln *S*(*t*). Wenger and Townsend (2000) illustrate that this is actually the integral of the hazard function *h*(*t* ) from 0 to *t* (e.g., Wenger and Townsend, 2000; see also Neufeld et al., 2007). We thus write the integrated hazard function as *H*(*t*) = − log[*S*(*t*)]. Although *H*(*t*) is of the same level of strength as *S*(*t*), it has some very helpful properties not directly shared by *S*(*t*).

Now it has been demonstrated that when processing is of this form, the sum of the integrated hazard functions for each item presented alone is precisely the value, for all times *t*, of the integrated hazard function when both items are presented together (Townsend and Nozawa, 1995). That is, *Ha*(*t*) + *Hb*(*t*) = *Hab*(*t*). This intriguing fact suggests the formulation of a new capacity measure, which the Townsend and Nozawa called the *workload capacity coefficient C*(*t*) = *Hab*(*t*)*/*[*Ha*(*t*) + *Hb*(*t*)], that is, the ratio of the double item condition over the sum of the single item conditions. If this ratio is identical to 1 for all *t*, then the processing is considered *unlimited*, as it is identical to that of an unlimited capacity independent parallel model. If *C*(*t*) is less than 1 for some value of *t*, then we call processing *limited*. For instance, either serial processing of the ordinary kind or a fixed-capacity parallel model that spreads the capacity equally across *a* and *b* predicts *C*(*t*) = 1*/*2 for all times *t >* 0. If *C*(*t*) *>* 1 at any time (or range of times) *t*, then we call the system *super capacity* for those times. A tutorial on capacity and how to assess it in experimental data is offered in Wenger and Townsend (2000). In a recent extension of these notions, we have shown that if configural parallel processing is interpreted as positively interactive parallel channels (thus being dependent or positively correlated rather than independent), then configural processing can produce striking super capacity (Townsend and Wenger, 2004b).

Subsequently, a general theory of capacity was formulated that permitted the measurement of processing efficiency for all times during a trial (Townsend and Nozawa, 1995). Employing standard parallel processing as a cornerstone, the theory defined unlimited capacity as efficiency identical to that of standard parallel processing in which case the measure is *C*(*t*) = 1. It defined limited capacity as efficiency slower than standard parallel processing. For instance, standard serial processing produces a measure of capacity of *C*(*t*) = 1*/*2. And finally, the theory defined super capacity as processing with greater efficiency than standard parallel models could produce, that is, *C*(*t*) *>* 1.

In sum, our measuring instrument is that of the set of predictions by unlimited-capacity independent parallel processing (UCIP). As mentioned above, *unlimited capacity* means here that each parallel channel processes its input (item, etc.) just as fast when there are other surrounding channels working (i.e., with greater n) as when it is the only channel being forced to process information. The purpose of this paper is to apply these techniques, with a focus on comparing binaural detection capacity measures in diotic and dichotic contexts.

# **METHODS**

#### **STIMULI**

Stimuli were 440-Hz pure tones added to wide bands of noise. The target signal was a 250-ms pure tone with 25-ms cosinesquared onset and offset ramps. For each trial, the signal was generated with a random phase, selected according to a uniform distribution. The 500-ms noise was generated using a Gaussian distribution in the time domain at a sampling rate of 48828 Hz. A new random sample of noise was generated for each trial. The noise was always presented at a sound pressure level of 57 dB SPL and also had 25-ms rise/fall times. The target tone was presented at signal-to-noise ratios (SNR) of either +6 (the High SNR) and −6 dB (the Low SNR). These SNRs would be expected to yield accuracy measures near 100% for all detection conditions. Accuracy was indeed very high for all conditions and subjects: ranging from 97.5 to 99% percent correct.

# **PROCEDURES**

On each trial, there were four possible events: a tone + noise presented to both ears (binaural trials), a tone + noise presented to the left ear, a tone + noise presented to the right ear, or noise alone. These four events were equally probable and are described below and are also illustrated in **Table 1**.

In the tone + noise trials ("Yes" trials), the SNR was manipulated such that the low and the high SNRs were presented equally often. The binaural trials (referred to as dual-target trials) yield four possible events (see **Table 1**, top four rows): Left ear-High + Right ear-High (denoted HH throughout), Left ear-High + Right ear-Low (HL), and Left ear-Low + Right ear-High (LH), Left ear-Low + Right ear-Low (LL). The monaural trials (referred to as single-target trials) yielded two SNRs (High and Low) for each ear. These are depicted in the middle eight rows of **Table 1**.

**Table 1 | Illustration of stimulus conditions.**


*Each row represents an occurrence with frequency of 1/16th. S* + *N refers to signal* + *noise, N refers to noise, and a blank space indicates no stimulus presented. H and L refer to High and Low signal-to-noise ratios, respectively. Seventy-five percent of the trials are "Yes" (signal-present trials) whereas 25% of the trials are "No" (signal-absent trials).*

Of the noise (or "No") trials, 1/2 of the trials presented the noise in both ears, 1/4 of trials had noise in the left ear, and 1/4 of trials had noise in the right ear1 . Trials were presented in random order throughout the experiment in blocks of 128 trials. Ten blocks were collected for each context, yielding a total of 80 trials in each dual-target condition (HH, LL, LH, HL) and 160 trials in each single-target condition (Left-High, Left-Low, Right-High, Right-Low).

Trials were run in two separate contexts, defined by the characteristics of the dual-target trials: *N*0*S*<sup>0</sup> and *N*0*S*π. In the *N*0*S*<sup>0</sup> context (diotic), identical noises and signals were presented to the two ears. In the *N*0*S*<sup>π</sup> context (dichotic), the noises were identical across the ears but the signal was phase shifted by π radians to one of the ears. Note that the single-target stimuli were the same regardless of whether they were presented in the *N*0*S*<sup>0</sup> or *N*0*S*<sup>π</sup> context. In this way, a single block in either context consisted of 50% single-target trials (½ to left ear and ½ to right ear), 25% dual-target trials, and 25% noise-alone trials.

Observers participated in experimental sessions lasting 1 h. A single session consisted of 6–8 blocks of 128 trials. Each trial began with a visual warning of "listen" appearing on a computer monitor for 500 ms. A silent period of 500 ms followed removal of the warning, when the noise stimulus began. When the 250-ms target tone was present, it occurred at a random interval from 50 to 250 ms after the onset of the 500-ms noise.

<sup>1</sup>Note that 1/2 of the no trials were binaural trials whereas only 1/3 of the yes trials were binaural. In this case, then there could be a bias toward a "no" response when a binaural noise is heard. Additional data collection suggests that this bias did not lead to a difference in the results presented here.

Stimuli were presented to the observers at a 24414 kHz sampling rate using a 24-bit Tucker Davis Technologies (TDT) RP2.1 real-time processor. Target and masker were summed digitally prior to being played though a single channel of the RP2.1 (for the monaural stimuli) or both channels of the RP2.1 (for the binaural stimuli). Each channel was calibrated via a PA5 programmable attenuator, passed through an HB6 headphone buffer, and presented to observers through a Sennheiser HD280 Pro headphone set. Reaction times were measured using a button box interfaced to the computer through the TDT hardware.

#### **OBSERVERS**

Four listeners, ranging in age from 20 to 43 participated in the experiment. All subjects had hearing thresholds of 15 dB HL or better in both ears at all audiometric frequencies. Obs. 4 is the first author. Obs 1–3 competed trials in the *N*0*S*<sup>0</sup> context first whereas Obs. 4 completed trials in the *N*0*S*<sup>π</sup> context first. Subjects provided written informed consent prior to participation and Obs. 1–3 were paid per session. Testing procedures were overseen by Indiana University's Institutional Review Board.

Observers were instructed to respond as quickly to the signal tone as possible while attempting to provide correct responses. Using an "OR" design, observers were required to respond with the "yes" button if a tone was present. Otherwise, they were instructed to respond with the "no" button. The RT was measured from the onset of the tone stimulus within the noise. Percent correct was recorded in order to ensure that subjects achieved high levels of performance for both SNRs.

#### **RESULTS**

#### **MEAN REACTION TIMES**

**Table 2** shows mean RTs in milliseconds for single targets for the two contexts (*N*0*S*<sup>0</sup> and *N*0*S*π). Reaction times below 100 ms or greater than 3 standard deviations from the mean were excluded from the data set. A repeated-measures ANOVA revealed a significant effect of SNR [*F*(1*,* 3) = 586*.*6, *p <* 0*.*0001] in which faster RTs were associated with the higher SNR (254 vs. 209 ms). No other significant main effects or interactions were revealed by the ANOVA, although the main effect of context approached significance [*F*(1*,* 3) = 10*.*0; *p* = 0*.*051]. The slightly faster RTs in *N*0*S*<sup>π</sup>

**Table 2 | Mean reaction times in ms for the single-target conditions for each subject in the two contexts.**


*RTs for both ears and both SNRs are shown. Standard errors of the mean are indicated for the averages.*

(293 vs. 270 ms) may be due to three of the observers completing *N*0*S*<sup>π</sup> after *N*0*S*<sup>0</sup> and consequently could be attributable to practice effects. However, even Obs. 4 was faster in *N*0*S*<sup>π</sup> and she completed these conditions first. Recall that for these contexts, the same stimuli were used for the single-target conditions, and so no difference in context was expected.

These results are consistent with previous studies demonstrating a robust negative relationship between the RT and the intensity of the stimulus being detected in quiet (e.g., Chocholle, 1944; Kohfeld, 1971; Grice et al., 1974; Santee and Kohfeld, 1977; Schlittenlacher et al., 2014) as well as the signal-to-noise ratio (and signal levels) for a signal detected in noise (e.g., Green and Luce, 1971; Kemp, 1984). Accuracy was very high, with the miss rate averaging 0.5% for the high SNR and 2.6% for the low, also implicating a small difference in accuracy for the two SNRs. Consequently, we, like others, have observed strong selective influence effects for single-target stimuli.

**Table 3** shows the mean RTs in milliseconds for the dual target conditions for *N*0*S*<sup>0</sup> and *N*0*S*<sup>π</sup> contexts. A repeated-measures ANOVA revealed a significant effect of SNR [*F*(3*,* 9) = 95*.*8, *p <* 0*.*0001] and an interaction between context and SNR [*F*(3*,* 9) = 18*.*7; *p <* 0*.*001]. *Post-hoc t*-tests with a Bonferroni correction indicated that RTs in LL were slower than all other conditions, but only for *N*0*S*0.

For the *N*0*S*<sup>0</sup> context, a general failure of selective influence is evident, as only LL was associated with RTs slower than the other conditions. Recall that for accuracy data, *N*0*S*<sup>0</sup> detection thresholds are similar to monaural (*NmSm*) detection thresholds. Thus, these RT results essentially mirror the threshold data: HH, LH, and HL RTs are effectively determined by the faster of the two detections. For LH and HL, this is the stimulus with the higher SNR. Note, however, there is a slight (albeit not statistically significant) trend for the HH trials to have faster RTs than the HL and LH trials. On average, the HH trials are about 5 ms faster than the HL and LH trials. If we consider that HL and LH trials are similar to monaural presentation, we see that this result is similar to the size of the effect observed for monaural vs. binaural stimulation for pure tones (e.g., Chocholle, 1944; Simon, 1967; Schröter et al., 2007 Exp. 1; Schlittenlacher et al., 2014). Although the effect size, as measured by Cohen's d, is less than 0.2 we believe that with more samples we would see a consistent advantage of two ears over one in mean RT.

Further, there is some evidence that RTs are faster in for the dual targets than for the single targets. In the *N*0*S*<sup>0</sup> context, RTs



*Standard errors of the mean are indicated in parentheses for the averages.*

for the high SNR were 257 ms for the HH dual targets and 265 ms for the High single targets. For the low SNR, RTs were 307 ms for the LL dual targets and 320 ms for the Low single targets. These results again imply a small but consistent binaural advantage for detecting tones embedded in noise. Miss rates also followed this trend, averaging 0.5% for dual targets and 1.6% for single targets.

In the *N*0*S***<sup>π</sup>** context, we see failure of selective influence, with no statistically significant difference between any of the dualtarget conditions. These results do not simply suggest that the RT is primarily driven by the stimulus yielding the faster RT because RTs in LL are similar to those in HH. Here, mean RTs for the LL conditions are significantly faster for the dual target than the single-target conditions. RTs for LL were 260 ms but were 298 ms for the low-SNR single targets. The implications of these results will be discussed subsequently, as we address the RT distributions and in the section describing capacity. Miss rates were 0% for all subjects and conditions within *N*0*S*π.

#### **SURVIVOR FUNCTIONS**

Although of primary interest to this paper are the RT data for the dual target conditions, it is worth presenting the RT distributions for the single-target data, to familiarize the reader to the data format and to present the robust reaction-time distributional data. **Figure 6** plots derived survivor functions for the high and low SNRs presented to the left and right ears in the two contexts: *N*0*S*<sup>0</sup> (left panels) and *N*0*S*<sup>π</sup> (right panels). Recall that the survivor function, *S*(*t*) is simply 1 − *F*(*t*), where *F*(*t*) represents the cumulative distribution function of RTs. Data from a representative single subject (Obs. 2) are presented because of overwhelming similarity in the pattern of results across the subjects.

Because a powerful ordering of faster RTs associated with the high SNR ratio, the same symbols are used to display data from

**panels) and** *N***0***S*<sup>π</sup> **(right panels) for a single representative subject.**

the left ear (unfilled circles) and data from the right ear (solid lines). All subjects demonstrated significantly faster RTs for the high SNRs vs. the low SNR. For all statistical tests, non-parametric Kolmogorov-Smirnov (KS) tests of survivor function orderings at the *p <* 0*.*0001 level were taken to establish statistical significance. The lower-than-typically used *p-*value is used due to the presence of multiple comparisons. The only parameter associated with survivor function ordering was SNR. **Table 4** presents the *p*-values to illustrate the pattern of results across subjects. There also was no difference in RTs measured for the single targets dependent on context. That is, the RT distributions for single targets were not statistically different whether RTs were measured in the *N*0*S*<sup>0</sup> or the *N*0*S*<sup>π</sup> context.

The data present a compelling case that selective influence is present for tone-in-noise detection and that increases in SNR



*\*\*Indicates statistical significance at the p < 0.0001 level.*

facilitate a faster RT. Further, the context in which the RTs were measured (in the presence of *N*0*S*<sup>0</sup> or *N*0*S*<sup>π</sup> stimuli) has little effect on the distribution of RTs. We also see no evidence that the right ear is faster than the left ear for tone-in-noise detection, at least in a task where listeners must divide their attention across ears (see also Schlittenlacher et al., 2014).

**Figure 7** plots the derived survivor functions for the dual target data in the *N*0*S*<sup>0</sup> contexts (left panels) and the *N*0*S*<sup>π</sup> contexts (right panels). For all observers, a failure of selective influence is obvious, with HH, HL, LH being not statistically different from each other. This overlap is present for both the *N*0*S*<sup>0</sup> contexts and the *N*0*S*<sup>π</sup> contexts.

The *N*0*S*<sup>π</sup> contexts reveal a slightly different pattern although the failure of selective influence is still obvious. The only consistent pattern across all subjects is LL *<* HH. Obs. 1, 3, and 4 show a pattern similar to *N*0*S*<sup>0</sup> with LL *<* LH = LH. Obs. 4 also demonstrates HH *<* LH.

Although the *N*0*S*<sup>π</sup> context indicates survivor function orderings that are a little more diverse across observers than the *N*0*S*<sup>0</sup> context, the glaring failure in both immediately renders untenable

any analysis of architecture. We shall discuss potential reasons for this failure in the General Discussion. In any case, the statistical function, *C*(*t*) = workload capacity, turns out to be highly informative all by itself.

#### **CAPACITY**

Capacity functions for the two contexts are plotted in **Figures 8**, **9** for the four subjects and summarized in **Table 5** using Houpt and Townsend's (2012) statistical analysis. Because the HH and LL conditions showed the starkest contrast from one another, those are shown in **Figure 8**. Capacity functions for the LH and HL conditions are then shown in **Figure 9**.

Miller (1982) suggested an inequality, or upper bound on RTs for channels involved in a race within a redundant-target paradigm. Consider the OR paradigm, where any target item can lead to a correct response, and suppose that the stimulus presentation initiates a race in a parallel system. The logic behind the Miller inequality states that if the marginal finishing time distributions from the single target conditions stay unchanged in the redundant target condition (implying unlimited capacity), then the cumulative distribution function for the double-targets

display cannot exceed the sum of the single-target cumulative distribution functions (see, e.g., Townsend and Wenger, 2004b).

In our current language, violation of the Miller bound (i.e., the inequality), would imply super capacity. Next, it is possible, using a formula introduced by Townsend and Eidels (2011), to allow the investigator to plot this upper bound (referred to as the "Miller bound") in the capacity space of **Figures 8**, **9**. This tactic permits us to provide a direct comparison between the race model prediction and our data all within the same graph.

Grice and colleagues proposed a lower bound on performance parallel systems (e.g., Grice et al., 1984) that plays a role analogous to the Miller bound, but for limited as opposed to super capacity. If the Grice inequality is violated, the system is limited capacity in a very strong sense (Townsend and Wenger, 2004b). In this case, performance on double-target trials is slower than on those single-target trials that contain the faster of the two targets. When performance on the two channels is equal, the Grice bound indicates efficiency at the level of *fixed capacity* in a parallel system. A fixed capacity system can be viewed as sharing a fixed amount of capacity between the two channels. Alternatively, a serial system can make exactly this prediction as well (Townsend and Wenger, 2004b). This Grice boundary is also plotted on **Figures 8**, **9**.



*Cases where the null hypothesis (the Unlimited Capacity Independent Parallel model) can be rejected using the Houpt and Townsend (2012) statistical tests are displayed in the table with asterisks. Other cases trending toward limited (C consistently less than 0.8) and trending toward super capacity (C consistently greater than 1.25) are also indicated but without the asterisks indicating statistical significance.*

*\*P < 0.01; \*\*P < 0.001.*

Across both figures and panels, the results for *N*0*S*<sup>0</sup> consistently demonstrate *C*(*t*) ≤ 1, and the Miller bound is rarely exceeded by any of the capacity functions in the *N*0*S*<sup>0</sup> context. Further, capacity tends to be at or slightly better than the Grice bound. **Table 5** also shows that for all *N*0*S*<sup>0</sup> conditions, at least two observers show statistically significant limited capacity [i.e., *C*(*t*) is significantly below 1].

Conversely, *N*0*S*<sup>π</sup> data illustrate *C*(*t*) ≥ 1 over most of the RT range, and many *C*(*t*) values exceed the Miller bound in the *N*0*S*<sup>π</sup> context, for LL particularly, implicating super capacity at the level where *C*(*t*) is much larger than 1 for longer RTs (see Townsend and Wenger, 2004b). Only the HH condition demonstrates significant limited capacity consistent across subjects. In the LL conditions, all observers reveal higher workload capacity in the *N*0*S*<sup>π</sup> condition than in the *N*0*S*<sup>0</sup> condition and in fact, the *N*0*S*<sup>π</sup> *C*(*t*)s are higher than any of the other *C*(*t*) data, disclosing super capacity in all cases. Super capacity is statistically significant for two subjects in the *N*0*S*<sup>π</sup> conditions, but only for LL. We believe that the other two subjects (Obs. 2 and 4) demonstrate evidence leaning toward super capacity but that there are limitations due to the sample size. Here, approximately 80 trials were used in each double-target condition. An examination of Houpt and Townsend (2012)'s **Figure 4** suggests that more trials may be needed to establish significance of capacity in the 2.0 range. At a minimum, visual inspection indicates a difference among capacity functions, with the LL functions being above 1 and two of the four subjects demonstrating statistically significant super capacity. These two subjects also had data exceeding the Miller bound for many RTs, implicating capacity values that exceed race-model predictions.

#### **THE HIGH-LOW AND LOW-HIGH CONDITIONS**

We lump these two conditions together since their results are very similar, though not identical. Interestingly, several observers appear to exhibit some super capacity, especially in the *N*0*S*<sup>π</sup> conditions. By and large, *N*0*S*<sup>0</sup> *C*(*t*) functions fall in the moderately limited capacity range, although there are spots of extremely limited capacity, for instance, Obs. 1 in both conditions, Obs. 2 in HL for slower times, Obs. 3 and 4 in LH early on. Although these tend to be concentrated in *N*0*S*<sup>0</sup> trials, some pop up in *N*0*S*<sup>π</sup> data.

In sum, all our statistics confirm that performance in *N*0*S*<sup>0</sup> is very poor in comparison to *N*0*S*<sup>π</sup> and in fact is close to being as poor as ordinary serial processing would predict. *N*0*S*π, on the other hand, regularly produces super capacity with the strongest and most consistent power in the slowest combination of factors (i.e., LL).

# **GENERAL DISCUSSION**

Up to this point, only para-threshold, accuracy experiments have investigated the binaural release from masking using pure tone detection in anti-phase. In fact, as mentioned in the introduction, only a handful of experiments have even employed RT at all when comparing binaural to monaural performance. This study presents analogs to the traditional accuracy statistics RTs for binaural auditory perception and in particular, for the first time, to the masking release effect.

Traditionally, detection thresholds have been the psychophysical tool in this domain. More generally, the psychometric functions can be analyzed from the point of view of probability summation (with appropriate corrections for guessing). We suggest that the appropriate RT analog to probability summation is what is termed the standard parallel model. This model, like probability summation, assumes that each channel acts the same way with one signal as it does with other channels operating at the same time (this is the unlimited capacity assumption). The standard parallel model also stipulates stochastic independence among the channels. It makes the probability summation prediction when only accuracy is measured.

First, although our experiment factor, SNR, was effectual in properly ordering the single-target survivor functions, it failed massively on the double signal trials: While HL, LH, and HH were all stochastically faster than LL (their survivor functions were all greater than that for LL for all times t), the former were very similar for almost all of our data and observers. The consequence is that we may not legitimately attempt to uncover the operational architecture in this experiment. However, the way in which selective influence fails plays a strategic role in our conclusions about the binaural processing system. From here on out, we will concentrate on other issues and especially that of capacity.

Next, recall that the single signal RT data are employed to assess the binaural data relative to predictions from the standard parallel model. If *C*(*t*) = 1, then performance is identical to that from the parallel model for that particular *t*, or range of *t*. If *C*(*t*) *<* 1, then limited capacity is concluded. If *C*(*t*) *>* 1, performance is super capacity relative to the standard parallel expectations. A somewhat more demanding upper bound is found in the Miller inequality, which nevertheless must be violated if *C*(*t*) exceeds 1 for intervals of the faster time responses (see Townsend and Nozawa, 1995). If the lower bound put forth by Grice and colleagues is violated, then capacity is very limited indeed. When performance on the two ears is equal, then the Grice bound is equivalent to *C*(*t*) = ½. On the other hand, if *C*(*t*) is even a little larger than the Grice bound, performance is said to show a redundancy gain. Finally, limited capacity could be associated with inadequate processing (e.g., attentional) resources or interfering channel crosstalk in a parallel system. If capacity is severely limited [e.g., *C*(*t*) *<* ½] it might be caused by serial processing, extreme resource deficits or even across-channel inhibition.

### **INTERPRETATION OF** *N***0***S***<sup>0</sup> RESULTS**

The results indicated that capacity typically was unlimited to severely limited in *N*0*S*<sup>0</sup> conditions. At least two observers demonstrated limited capacity for each of the SNR combinations with all observers demonstrating limited capacity for HH. Potentially, there is more evidence for limited capacity in the HH conditions relative to the other conditions, though there is considerable variability across individuals in the value of the *C*(*t*) function and with respect to the *C*(*t*) functions proximity to the Grice bound.

The only other research of which we are aware, that has applied concepts from the redundant signals RT approach to binaural perception is a seminal study by Schröter et al. (2007) and extended in Schröter et al. (2009) and Fiedler et al. (2011). Schröter et al. (2007) employed the Miller (1982) inequality to assess binaural vs. monaural performance but did not assess performance in terms of the standard parallel model or the Grice bound for extreme limited capacity. They also did not address the antiphasic release-from-masking effect. Thus, we will be able to compare our *N*0*S*<sup>0</sup> results to some extent with their results but not our *N*0*S*<sup>π</sup> findings.

First, although we observed considerable individual differences in the capacity functions across listeners, a common trend was that in the *N*0*S*<sup>0</sup> conditions, *C*(*t*) never exceeded 1. In many cases, *C*(*t*) was found to be significantly less than 1. In no instances was the Miller bound surmounted. Many of the capacity functions are also very similar to the Grice bound and display capacity values around 0.5, or fixed capacity. These results suggest that *a negligible gain* is provided by the addition of a second ear. These capacity values are also consistent with previous work demonstrating a very small two-ear advantage in mean RT (Chocholle, 1944; Simon, 1967; Schlittenlacher et al., 2014). Schröter et al. (2007) also demonstrated an almost complete lack of redundancy gain when identical pure tones were presented to each ear. Our data take their results a step further and report capacity values at two different SNRs. Although this conclusion is a tempered one, it is possible that the easiest to detect stimuli (High SNRs) yield the greatest degree of limited capacity.

This interpretation is closely associated with the trends present in the *N*0*S*<sup>0</sup> survivor functions: the dual-target HH, HL, and LH survivor functions were virtually identical, even though SNR ordered the RT distributions for the single-target conditions (faster RTs for the High conditions). Thus, capacity should be more limited for HH than for HL or LH. It seems likely that the auditory system cannot take advantage of the addition of redundant well-defined signals, and may respond most prevalently to the "loud" or better-defined stimulus in these cases. These results very closely mirror those found in the threshold data, where only a negligible advantage is provided when a second ear is added to tone-in-noise detection tasks.

At this point, we cannot establish whether the lack of redundancy gain is due to interactions between the ears or true limitations in resource capacity. The presence of interactions in the auditory binaural pathway at every level in the auditory pathway central to the cochlear nucleus, indicates that interactions between the ears are prevalent. These interactions include both excitatory and inhibitory pathways, and are responsible for a complex and highly successful noise-reduction system. It appears, from detection and now RT data, the noise-cancelation properties of the auditory system are not activated when the ear receive the same signal and noise.

# **INTERPRETATION OF** *N***0***S***π RESULTS**

The *N*0*S*<sup>π</sup> data reflect a different pattern of results than observed in the *N*0*S*<sup>0</sup> contexts. First, two of the four subjects showed statistically significant levels of super capacity, with all four subjects leaning in that direction. This result occurred only in the LL conditions, but capacity was still higher for *N*0*S*<sup>π</sup> than *N*0*S*<sup>0</sup> for LH and HL. The intermediate conditions (HL and LH) tended toward unlimited capacity. Although one interpretation might be to treat the unlimited capacity functions as support for an independent, parallel model, it seems unlikely that such a model can also account for the limited capacity data observed for HH and the super capacity data observed for LL. Further it is commonly accepted that the BMLD occurs due to interactions between the two ears, and cross-correlation and equalization-cancelation are commonly employed tools implemented into binaural models (e.g., Bernstein et al., 1999; Davidson et al., 2009).

Our data reveal something that would not have been observed by using data obtained at threshold levels: an SNR-dependent effect at high accuracies. Traditionally, psychometric functions for *N*0*S*<sup>0</sup> and *N*0*S*<sup>π</sup> are treated as being parallel (e.g., Egan et al., 1969; Yasin and Henning, 2012). That is, the size of the BMLD does not depend on the accuracy. The implication, then, is that because the psychometric functions have the same shape and only shifted means, there are no SNR-dependent processes at play, although a few studies have demonstrated that the MLD decreases at very high signal sensation levels (e.g., Townsend and Goldstein, 1972; Verhey and Heise, 2012). By testing the binaural system at SNRs occurring well into the tip of the psychometric function (*>*95% accuracy), the super capacity finding in LL but not HH supports the idea that the auditory noise reduction process more effectively cancels the noise at the lower (but high-accuracy) SNRs than at the higher SNRs via a super capacity result.

Because it seems highly likely that our antiphasic effects will appear at other SNRs than those used here (i.e., ours are not "privileged" in any way), these "ceiling-like" SNR effects may be considered as evidence for some type of gain control. That is, it appears that the auditory system uses the differences in signal temporal characteristics to facilitate detection in an SNRdependent manner. These advantageous interactive mechanisms are not deployed at high SNRs but are only implemented for low SNRs. Although the RTs presented here are on the order of those measured previously (e.g., Kemp, 1984), we must eventually rule out the possibility that the ceiling effects in the HH conditions are not due to a lower limit on the RT.

Future studies will need to be conducted to establish whether the parallel psychometric functions would also be observed in the RT data when using stimuli that do not yield 100% accuracy. Townsend and Altieri (2012) have developed a new capacity metric A(t) which takes into account correct and incorrect trials. This capacity measure will be extremely valuable to determine if these results generalize to SNRs more commonly used in the binaural masking literature, where psychometric functions are measured between chance detection and near-perfect accuracy (Egan et al., 1969; Yasin and Henning, 2012).

Finally, Schröter et al. (2007) argued that super capacity results imply that the two ears are not integrated into a single percept (see also Schröter et al., 2009) and that the redundant signal effect would only occur when the stimuli presented to the two ears do not fuse into a single percept. The results in the *N*0*S*<sup>0</sup> conditions would support this interpretation as we found severely limited capacity when identical stimuli were presented to the two ears. However, the SNR-dependent results in the *N*0*S*<sup>π</sup> conditions do not support such an interpretation in a straightforward way. It seems unlikely that the two ears would be fused into a single percept for the HH, HL, and LH trials but not the LL trials. If anything, one might expect the opposite, as the pure tone would be perceived to "pop out" against the noise background more in the HH conditions (due to the high SNR) than in the LL conditions. However, if the SNR-dependent mechanisms elicit a larger perceptual distinction between the tone and noise at the lower SNRs, it remains possible that tone and noise are perceptually segregated in an SNR-dependent manner. One might speculate that these advantageous mechanisms are employed only when listening is more difficult—there may be no need to implement them in high-SNR situations where detection is essentially trivial.

We conclude by advocating an approach that synthesizes accuracy psychophysics together with response time based information processing methodology. We have demonstrated that RT can be a useful tool for assessment of the binaural system. These results support the idea that a combination of both accuracy and RT methods could be enhance our understanding of perceptual mechanisms in many different modalities.

#### **ACKNOWLEDGMENTS**

We would like to thank Amanda Hornbach for assistance with data collection and analysis and Joseph Houpt for making available the software package used for statistical analysis.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 May 2014; accepted: 01 August 2014; published online: 22 August 2014. Citation: Lentz JJ, He Y and Townsend JT (2014) A new perspective on binaural integration using response time methodology: super capacity revealed in conditions of binaural masking release. Front. Hum. Neurosci. 8:641. doi: 10.3389/fnhum. 2014.00641*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Lentz, He and Townsend. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Modeling violations of the race model inequality in bimodal paradigms: co-activation from decision and non-decision components

#### Michael Zehetleitner <sup>1</sup> \*, Emil Ratko-Dehnert <sup>1</sup> and Hermann J. Müller 1, 2

<sup>1</sup> Department Psychologie, Institut für Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany, <sup>2</sup> Department of Psychological Sciences, Birkbeck College, University of London, London, UK

The redundant-signals paradigm (RSP) is designed to investigate response behavior in perceptual tasks in which response-relevant targets are defined by either one or two features, or modalities. The common finding is that responses are speeded for redundantly compared to singly defined targets. This redundant-signals effect (RSE) can be accounted for by race models if the response times do not violate the race model inequality (RMI). When there are violations of the RMI, race models are effectively excluded as a viable account of the RSE. The common alternative is provided by co-activation accounts, which assume that redundant target signals are integrated at some processing stage. However, "co-activation" has mostly been only indirectly inferred and the accounts have only rarely been explicitly modeled; if they were modeled, the RSE has typically been assumed to have a decisional locus. Yet, there are also indications in the literature that the RSE might originate, at least in part, at a non-decisional or motor stage. In the present study, using a distribution analysis of sequential-sampling models (ex-Wald and Ratcliff Diffusion model), the locus of the RSE was investigated for two bimodal (audio-visual) detection tasks that strongly violated the RMI, indicative of substantial co-activation. Three model variants assuming different loci of the RSE were fitted to the quantile reaction time proportions: a decision, a non-decision, and a combined variant both to vincentized group as well as individual data. The results suggest that for the two bimodal detection tasks, co-activation has a shared decisional and non-decisional locus. These findings point to the possibility that the mechanisms underlying the RSE depend on the specifics (task, stimulus, conditions, etc.) of the experimental paradigm.

Keywords: redundant signals effect, locus, co-activation, modeling, sequential sampling models, SRT, two-choice RT

# Introduction

The human perceptual system consists of highly specialized sensory subsystems (for vision, audition, olfaction, etc.) which themselves are organized in a modular fashion. In order to adequately respond to the demands of a dynamically changing environment, the organism has to make countless decisions, which typically require the integration of signals from different modules—be

#### Edited by:

Hans Colonius, Carl von Ossietzky Universität Oldenburg, Germany

#### Reviewed by:

J. Toby Mordkoff, University of Iowa, USA Jeff Miller, University of Otago, New Zealand

#### \*Correspondence:

Michael Zehetleitner, Department Psychologie, Ludwig-Maximilians-Universität München, Leopoldstraße 13, D-80802 Munich, Germany mzehetleitner@psy.lmu.de

Received: 04 April 2014 Accepted: 17 February 2015 Published: 09 March 2015

#### Citation:

Zehetleitner M, Ratko-Dehnert E and Müller HJ (2015) Modeling violations of the race model inequality in bimodal paradigms: co-activation from decision and non-decision components. Front. Hum. Neurosci. 9:119. doi: 10.3389/fnhum.2015.00119 it across modalities (multi-modal), within modalities (multifeature), across different spatial locations (multi-location), or across different points in time.

Signal integration is frequently investigated using the socalled "redundant-signals paradigm" (RSP). For this paradigm, several statistical tools have been developed, which allow inferences to be drawn about the cognitive architecture and decisional mechanisms responsible for signal integration. In the RSP, participants are presented either with one of two possible single targets (e.g., a single auditory tone or a single visual flash) or with both targets redundantly (a tone and a flash). In general, the response times are, on average, faster for redundant-signal trials (RSTs) compared to single-signal trials (SSTs). This speedup of response times, first reported by Todd (1912), is termed "redundant-signals effect" (RSE). It has since been replicated for a great variety of sensory modalities, tasks, and response categories as well as populations (see e.g., Grice et al., 1984; Diederich and Colonius, 1987; Mordkoff and Yantis, 1991; Krummenacher et al., 2002; Iacoboni and Zaidel, 2003; Miller and Reynolds, 2003; Gondan et al., 2004; Koene and Zhaoping, 2007; Schröter et al., 2007; Zehetleitner et al., 2009; Töllner et al., 2011; Krummenacher and Müller, 2014).

What type of processing architecture is underlying the RSE? The first architecture introduced to explain the RSE was the separate-activations or race model. Race models assume that the two stimulus properties of redundant targets are processed in parallel, in separate channels. According to this model, the shortening of response times for redundant relative to single targets derives from the fact that either target channel alone can trigger a response. As one of the two racers is stochastically faster than the other, the minimum time of both is, on average, shorter than that required by any racer alone. More formally, if one conceives of the triggering times of each channel as random variables, X<sup>1</sup> and X2, on RSTs, the race can be expressed as the minimum of both variables. The expected value of this minimum is smaller than (or equal to) the expected values of each element: E[min(X1, X2)] ≤ min[E(X1), E(X2)]; see Jensen's inequality (as e.g., described in Rudin, 2006). Owing to this statistical fact, race models are also referred to as "statistical-facilitation" accounts (Raab, 1962). Importantly, on RSTs, no integration or cross-talk is assumed to take place across the two target channels.

Do race models provide a universal account of all RSEs observed empirically? To answer this question, Miller (1982) introduced a bound that formalizes the maximum amount of RSE that a race model can explain: the so-called "race model inequality" (RMI). The RMI relates the distribution function of the redundant-signal reaction times F<sup>12</sup> to the distribution functions of the single-signal reaction times F1, F<sup>2</sup> (where the indices 1, 2, and 12, denote, e.g., single auditory, single visual, and redundant audio-visual reaction times) given a race model:

$$F\_{12}\left(t\right) \le F\_{11}\left(t\right) + F\_{2}\left(t\right), \text{ for all } t \tag{1}$$

Thus, the fastest response times for RSTs can, at the most, be equal to the fastest response time for SSTs. If there are redundantsignal response times that are even shorter, the architecture of race models is not fit to explain the RSE. Thus, the RMI marks a critical test for all race models: any data violating this inequality (at any time point t) by definition falsifies of the whole class of race models. Ever since its conception, the RMI was found to be violated in many empirical situations (e.g., Miller, 1982; Grice et al., 1984; Egeth and Mordkoff, 1991; Diederich, 1992; Mordkoff et al., 1996; Krummenacher et al., 2001, 2002; Feintuch and Cohen, 2002; Mordkoff and Danek, 2011; Krummenacher and Müller, 2014).

If the RMI is found to be violated, what architecture then would be responsible for the RSE? Several cognitive architectures have been proposed that can in principle produce RSEs and violations of the RMI: interactive-race models (Mordkoff and Yantis, 1991), serial exhaustive models (Townsend and Nozawa, 1995), correlated-noise models (Otto and Mamassian, 2012), and co-activation models. Of these, co-activation models have mostly been defended successfully against potential alternatives (e.g., Mordkoff and Miller, 1993; Patching and Quinlan, 2002; Zehetleitner et al., 2009).

One possibility, which has only rarely been discussed as a potential cause of RMI violations, is a speed-up of the nondecision components of task performance—rather than of the decision component, as standardly assumed by the accounts mentioned above.

Observed response times may be conceived of as consisting of two components: a decision and a non-decision component (Sanders, 1980; Luce, 1991); in terms of processing stages: perceptual latency, then decision latency, then motor latency, where both the perceptual and motor latencies are combined into a single non-decision component. Consequently, processes responsible for RMI violations can logically stem from either or both of these components. The decision stage is defined as the time needed for a decision variable (e.g., sensory evidence) to trigger a decision required by the experimental paradigm, such as whether a target is present or absent, whether a target is located on the left or the right side of perceptual space, etc. The non-decision time is the sum of sub-processes including stimulus encoding, response selection, and response execution. That is, the non-decision component actually comprises two processing stages: one pre- and one post-decisional. For the sake of brevity, we henceforth use the term non-decision processing stage to summarize both preand post-decisional processing. Thus, RMI violations could also be produced by a shortening of the non-decision component on RSTs, compared to SSTs. Such a shortening would result in a shift of the reaction time distribution to the left on the time axis (if the variance of the motor component were left unchanged), thus producing RMI violations. There would be, in principle, other ways of generating RMI violations by the non-decision time alone (though explicated models are lacking in the literature). And, in fact, several scientists have advocated a non-decision locus of RMI violations (see, e.g., Corballis, 1998; Feintuch and Cohen, 2002; Iacoboni and Zaidel, 2003; Miller, 2007; Miller et al., 2009; for a review, see Reynolds and Miller, 2009).

Can one distinguish decisional from non-decisional origins of RMI violations? In order to do so, we used sequential-sampling decision models to account for reaction time distributions in two bimodal RSPs. Sequential-sampling models are based on the assumption that the neuronal states engendered by external stimuli are intrinsically noisy. Such noisy states are sequentially sampled and integrated into sensory evidence until a decision criterion is reached. In the models used here, sensory evidence consists of the accumulated information from sequential samples. The higher the quality of the presented stimulus, the faster this accumulation process reaches the decision criterion (i.e., its drift rate is higher), thus producing faster and more narrowly distributed reaction times, coupled with lower error rates. Additionally, the decision criterion can be low (corresponding to a liberal response criterion), which would give rise to faster and less accurate responses compared to those based on a high criterion. Finally, perceptual and motor latencies are combined into a non-decision time, which has its own distribution (see Section Validity of Model Parameters for empirical evidence that a cognitive interpretation of the model parameters is justified). The observed reaction time distribution is then the convolution of the decision and non-decision time distributions.

In this framework, co-activation models assume that the drift rate on RSTs is higher than the highest drift rate on SSTs. By contrast, a non-decisional origin of the RSE would be reflected in a faster non-decision time parameter for redundant-signals compared to SSTs.—Alterative architectures are considered in the General Discussion.

To date, to our knowledge, only decisional variants of coactivation accounts have been implemented in the form of sequential-sampling models, with the models of Diederich (1995) and Schwarz (2001) both assuming a summation in the rate of evidence accumulation for RSTs over SSTs (see also Blurton et al., 2014). However, there are no studies that attempted to fit nondecisional or combined co-activation accounts (where both decision and non-decision parameters may vary) in a comparative fashion. It is, thus, unclear whether a combined (decision and non-decision) model could outperform a purely decision-based model and how substantial the contribution of a non-decision time shortening might be.

Accordingly, the present study was meant to contribute to the debate on the source of RMI violations, both conceptually and methodologically. In detail, a sequential-sampling model analysis was performed to fit quantile proportions of the response time distributions observed in two bimodal—audio-visual—RSP experiments to three model variants that assume different sources of co-activation: (a) a decisional model (where drift rates may vary), (b) a non-decisional model (where non-decision times may vary), and (c) a combined model (where both drift rates and nondecision times may vary). This way, the question of the origin (s) of RMI violations (and of the RSE in consequence) can be addressed: does co-activation occur at a decisional stage, a nondecisional, or at both stages and, if the latter, to what comparative degree?

On a methodological level, the present study was intended to highlight the applicability of sequential-sampling models to account for reaction time distributions (rather than solely for mean reaction times and their variance) in the RSP, to reveal latent psychological variables and so shed light on the nature of the RSE.

The General Discussion will address aspects of the generalizability of both the general modeling approach and the specific modeling results of the present study, alternative architectures, as well as the notion of the RSE as a theoretical "umbrella term."

# Materials and Methods<sup>1</sup>

In Experiment 1, participants performed a simple reaction time (SRT) task, in which they had to make the same response simultaneously pressing the two buttons of a standard Microsoft mouse—to the onset of a visual target alone (SST 1), an auditory target alone (SST 2), or an audio-visual target pair (RST). A variable inter-trial interval (ITI) was used to prevent anticipatory or rhythmic responses. In Experiment 2, a two-choice reaction time task was introduced, in which participants were presented with the same stimuli as Experiment 1, which could however appear on the left or the right of perceptual space (i.e., to the left or the right of the fixation cross). Participants' task was to make a speeded two-alternative choice response—by pressing one or the other mouse button—to the side of the target (pair) on a given trial. In all other respects, Experiment 2 was identical to Experiment 1.

# Participants

In Experiment 1, 15 participants (11 of them female) performed a single, 45-min session in return for e6.00 or a course credit. Their average age was 25.7 (range: 20–34) years, and they were all right-handed and had normal or corrected-to-normal vision. In Experiment 2, 21 new participants (14 of them female) completed a single, 60-min session in return for e8.00 or a course credit. Their average age was 27.2 (range: 18–46) years; one participant was left-handed, and all had normal or corrected-to-normal vision.

# Apparatus and Stimuli

The experiments were conducted in a sound-insulated booth, and were controlled by programs using MATLAB (R2009bSP1, Natick, Massachusetts: The MathWorks Inc., 2010) and the Psych-Toolbox (Brainard, 1997; Pelli, 1997), running on an Apple Mac mini (Cupertino, California: Apple Inc.) computer (with Mac OS X).

The visual stimuli—gray discs (CIE Yxy 10.9, 0.286, 0.333), 1◦ of visual angle in diameter—were presented on a 20′′ Mitsubishi Diamond Pro 2070SB monitor set at a resolution of 1280 × 1024 pixels and a refresh rate of 100 Hz, with a viewing distance of approximately 75 cm. The auditory stimuli were 400-Hz beeps (of a duration of 150 ms) delivered via headphones and redundant stimuli were the combined visual and auditory stimuli, presented simultaneously (i.e., with an onset asynchrony of 0 ms). In Experiment 1, the visual stimuli were presented centrally and the auditory stimuli binaurally, and participants responded to the onset of the respective target stimulus, or pair of stimuli, by simultaneously pressing both (i.e., the left and the right) mouse buttons using their left- and right-hand index fingers (simple reaction

<sup>1</sup>The raw data, the analysis codes, all model codes, and the reported results are publically available at the Open Science Framework (osf.io/7hbj6), to facilitate reproduction of the present study and replication of its results (for the open-data and open-code idea, see, e.g., Ince et al., 2012; Morin et al., 2012; Wicherts and Bakker, 2012; Simonsohn, 2013; Wicherts, 2013).

time task). In Experiment 2, the stimuli were presented lateralized, and participants responded with the right button to any stimulus, or pair of stimuli, on the right, and with the left button to any stimulus, or pair of stimuli, on the left (left-right forcedchoice discrimination task). On RSTs in Experiment 2, the visual and auditory stimuli were always presented on the same side (i.e., either both on the left or both on the right), so that there was never any spatial conflict between the redundant-target signals.

All analyses and the numerical parameter fitting were carried out using GNU R (version 2.14.0). For the fitting procedures, the "optim" package was used.

# Procedure

Each trial was structured in the following way: First, a white fixation cross (0.5◦ × 0.5◦ of visual angle) was presented centrally on a black screen for 800 ms. Then, after an inter-trial interval (ITI) that varied uniformly between 500 and 1500 ms, the target stimulus or pair of stimuli appeared. The auditory stimulus was terminated after 150 ms, while the visual stimulus remained on the screen until the observer initiated a response. The response was followed by a 750-ms waiting period, after which the next trial started with the fixation cross (see **Figure 1** for the sequence of displays on a trial).

Experiment 1 was divided into 17 blocks of 45 trials, with unimodal trials (SSTs) and bimodal trials (RSTs) interchanging randomly. Overall, this amounted to 765 trials (255 trials for each condition, i.e., SST visual, SST auditory, and RST audio-visual). Experiment 2 was divided into 20 blocks of 45 trials, yielding 900 trials in total (150 trials for each condition and screen side, i.e., SST visual left, SST visual right, SST auditory left, SST auditory right, RST audio-visual left, and RST audio-visual right).

FIGURE 1 | Example display sequence on a trial in the simple RT Experiment 1. A trial started with a fixation cross presented centrally for 800 ms. Following a variable inter-trial interval, the response-relevant target—a single auditory (SST auditory), a single visual (SST visual), or a redundant audio-visual stimulus (RST audio-visual) appeared. The auditory stimulus was terminated after 150 ms, while the visual stimulus remained on the screen until the observer responded bimanually. A blank screen followed for 750 ms before the next trial began.

Participants could take a break in between blocks, and they were provided with feedback about their block mean reaction time and error rate. They were instructed to respond as fast as possible while keeping their error rate below 5%.

As pointed out by Mordkoff and Yantis (1991), violations of the RMI are difficult to attribute to a co-activation model if the experimental design involves contingencies that could benefit redundant-signals over SSTs. Specifically, there are two types of contingencies, inter-stimulus and non-target response benefits. The inter-stimulus response benefit is calculated as Pr(TA|TV) - Pr(TA|NV), that is, it indicates by how much the conditional probability of an auditory target given that the visual channel detected a visual target exceeds the conditional probability of an auditory target given that the visual channel determined the absence of a visual target. The non-target response bias for redundant targets is calculated as Pr(+) − Pr(+|NA/V), that is, it indicates by how much the probability of a target (denoted as "+") exceeds the conditional probability of a target given that no target has been detected in one (the auditory or the visual) channel.

In Experiment 1, the inter-stimulus response benefit was −0.5, Pr(TA|TV) − Pr(TA|NV) = 0.5–1, and thus, although present, it worked against redundant-target and in favor of single-target trials. Further, the non-target response benefit was 0, Pr(+) − Pr(+|NA/V) = 1–1. However, given that a SRT paradigm was used in Experiment 1, the target could appear in the time interval between 1300 and 2300 ms after the onset of the fixation cross (at the start of the trial). If one divides this 1000 ms interval into two time windows of 500 ms each, both types of contingencies would be benefitting redundant-signals trials, to the numerical value of 0.25 each. In Experiment 2, the two types of contingency benefit were Pr(leftA|leftV) - Pr(leftA|rightV) = 0.5 and Pr("left") − Pr("left"|rightA/V) = 0.5, respectively.

# Models and Fitting Single-Boundary Accumulation and Ratcliff Diffusion Models

The three co-activation models (the decisional, the nondecisional, and the combined model) were each implemented assuming a noisy accumulation of evidence against one boundary for the SRT experiment, and the Ratcliff Diffusion Model (Ratcliff, 1978) for the two-choice RT experiment. The accumulation of a stochastic source of evidence against one boundary produces a distribution of response times captured by the ex-Wald distribution (Schwarz, 2001). Here, the Wald component is responsible for the distribution of decision times, and an exponential distribution accounts for the non-decision times, which summarize all processes following (and possibly preceding) the decision stage. The parameters of the ex-Wald model are the mean drift rate of accumulation v, the decision criterion a, and the exponential rate parameter γ = 1/t. While single-boundary accumulation models can account for SRT performance, two-alternative choice performance is more appropriately captured by a diffusion process against two decision boundaries reflecting the two response alternatives, such as the Ratcliff Diffusion Model (RDM). The RDM involves seven parameters, the four most important being the drift rate v, the criterion a, the starting point z, and the non-decision time Ter. The RDM parameter z, controlling the starting point of the evidence accumulation process, was set here to a/2 for each model (for purposes of simplification), resulting in unbiased evidence accumulation. The variability of the nondecision time Ter, s<sup>t</sup> , controls the amount of variance of the nondecision component. The parameters η and sz, the variability of the drift rate and starting point, respectively, were both set to zero (the EZ-diffusion model of Wagenmakers et al., 2007, makes the same simplifying assumptions).

For the decisional model, the respective drift rate parameter ν was free to vary between the two SSTs and RSTs as they control the rate of evidence accumulation over time and thus represent the clarity or "ease of processing" of the signals. For the nondecisional model, the parameters (t and Ter) were free to vary, as they quantify the mean non-decision time for each accumulation process. The combined model allowed both the drift rate and the non-decision time to vary across conditions. Additionally, a free model was implemented that allowed every ex-Wald and RDM parameter to vary for each condition. This completely unconstrained model, albeit theoretically implausible, was used to assess the general ability of each model to fit the conditions. **Table 1** gives an overview of the free and constrained parameters for each co-activation model variant.

#### Quantile Distribution Functions

In order to find the model (and the respective parameters) that can best explain the data, fitting of quantile proportions was performed. These were computed by use of quantile probability functions. Quantile probability functions plot response probabilities against quantile response times. The probability of a response for a particular stimulus type determines the position of a point on the X-axis, and the quantile RTs for that stimulus type determine the position on the Y-axis (Ratcliff et al., 2004). Quantile functions give a fuller description of the reaction time data than mean and standard deviation values alone, as the proportion in each quantile bin is visible as well as the spread of the entire distribution. **Figure 3** displays the empirical quantile proportions of Experiments 1 and 2. Vincentizing was used to combine the data of all participants for each condition (Ratcliff, 1979). For estimating, quantile definition 7 of Hyndman's sample quantiles was used (Hyndman and Fan, 1996). Consistent

TABLE 1 | Co-activation models with free and constrained parameters, and degrees of freedom.


with the mean-variance relation, the fastest condition (here, the bimodal, redundant-target trials) also displayed the narrowest response time range (Wagenmakers et al., 2005).

#### Fitting Procedure

The generic fitting procedure for each model involved four computational steps. First, a vector of starting parameters was generated randomly. By design, it consisted of the parameters for each of the three target types (i.e., auditory, visual, and audiovisual). The exact composition of this vector varied depending on the model that was being tested. For example, the decisional model only allowed the drift rates to vary; all other parameters were fixed across the three target types.

Second, for that parameter vector, the model cumulative distribution function was calculated, using an R implementation of the ex-Wald densitiy (Heathcote, 2004) and the "fastdm" code for the density of Ratcliff's diffusion model (Voss and Voss, 2007, 2008) to extract the model quantiles, 0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0.

Third, the quantile response times of the experimental and model data were used to generate the predicted cumulative probability of a response by that quantile response time. Subtracting the cumulative probabilities for each successive quantile from the next higher quantile gives the proportion of responses between each quantile (ideally this yields 0.1, 0.2, 0.2, 0.2, 0.2, 0.1). The observed and expected proportions were multiplied by the number of observations to produce the expected frequencies (see Quantile Maximal Probability, Heathcote et al., 2002).

Fourth, the model fit quality was quantified and minimized, using a general SIMPLEX minimization routine (Nelder and Mead, 1965, implemented in the "optim" package for R), which adjusts the parameters to find those that yield the minimum score for each model (i.e., iterating through steps 2 and 3). As a cost function, the BIC statistic was used (Schwarz, 1978; Raftery, 1986), which penalizes for the complexity (i.e., the degrees of freedom) of the models:

$$BIC = -2\left[\sum N\_i p\_i \ln\left(\pi\_i\right)\right] + M \ln\left(N\right) \tag{2}$$

Here, p<sup>i</sup> and π<sup>i</sup> are the proportion of observations in the i-th bin for the empirical data and the model prediction, respectively, and M ln(N) is the penalizing term related to the number of free parameters M and the sample size N, that is, the number of observations (see Gomez et al., 2007). N<sup>i</sup> denotes the number of observations per bin, with N = PN<sup>i</sup> , which was calculated by averaging the number of observations over all participants and conditions. The last bin contains the proportion of errors. Bins 1-6 are the inter-quantile proportions for correct responses (i.e., 0.1, 0.2, 0.2, 0.2, 0.2, 0.1 for the quantiles 0.1, 0.3, 0.5, 0.7, 0.9) multiplied by the proportion of correct responses. Thus, the sum of all bin proportions is 1.

The model with the lowest BIC can be considered that which concurrently maximizes descriptive accuracy (goodness of fit) and parsimony (smallest complexity of description, i.e., fewest necessary parameters). The BIC rests on the assumption that the correct model is among the candidate models tested. For advantages and disadvantages of BIC and alternatives (such as the Akaike Information Criterion, AIC; Akaike, 1978, see for instance Burnham and Anderson (2002) and Kass and Raftery (2012) (cf. Wagenmakers and Farrell, 2004). In order to identify the best out of the set of tested model, the raw BIC values were transformed to BIC weights (Wagenmakers and Farrell, 2004; Jepma et al., 2009). The transformation of BIC values involved three steps: First, for each model i, the difference in BIC with respect to the model with the lowest BIC value was computed [i.e., 1i(BIC)]. Second, the relative likelihood L of each model i was estimated by means of the following transformation:

$$L\left(M\_i|data\right) \propto \exp\left[-0.5 \cdot \triangle\_i(BIC)\right] \tag{3}$$

where ∝ stands for "is proportional to." Third, the model probabilities were computed by normalizing the relative model likelihoods, by dividing each model likelihood by the sum of the likelihoods of all models. The values thus derived for each model are referred to as BIC weights, wi(BIC) for each model M<sup>i</sup> and wi(BIC) can be interpreted as the probability that model M<sup>i</sup> is correct, given the data, the set of models, and equal priors on the models (Wagenmakers and Farrell, 2004).

# Model Selection

The fitting procedure was performed by randomly sampling initial parameter values (1000 times) and performing the four computational steps described above. This procedure was followed to assure that local minima were avoided in the optimization algorithm. The minimum cost value for each condition was used to assess which model was in best agreement with the data and with which specific parameter vector.

### RMI Analysis

For the analysis of violations of the RMI, we used Ulrich et al. (2007) algorithm for calculating the empirical cumulative density functions. First, for each participant, we calculated the magnitude of RMI violations

$$d(t) = G\_{AV}(t) - \min[G\_A(t) + G\_V(t), 1],\tag{4}$$

where GAV, GA, and G<sup>V</sup> stand for the estimates of the empirical cumulative density functions for the redundant, single audio, and single visual trials, respectively (using Ulrich et al.'s, 2007 algorithm; Equation 3). Then, d(t) was evaluated at the 0.05, 0.1, . . . , 0.95 quantile RTs of the redundant trials. For each percentile, d(t) was tested against zero, d(t) > 0, using using a two-tailed t-test, with the alpha level Bonferroni-corrected to 0.0026 (= 0.05/19 probability points).

# Results

#### Errors

Errors were defined as anticipatory responses (RT ≤ 150 ms) or misses (RT > 1600 ms). Participants committed 2.00% errors (1.34% anticipations and 0.66% misses) in Experiment 1 and 3.4% in Experiment 2. For each experiment, the data of one participant had to be discarded due to error rates greater than 10 and 20%, respectively.

### Mean Reaction Times and RSEs

The mean RTs for both experiments are listed in **Table 2**. Although numerically different, both unimodal conditions in TABLE 2 | Mean Response Times and RSEs (standard deviations in parentheses) for unimodal (auditory, visual) and bimodal (audio-visual) stimulus conditions in the simple RT Experiment 1 and the two-choice RT Experiment 2.


Experiment 1 were statistically the same. There were pronounced RSEs of 55 and 50 ms for Experiments 1 and 2, respectively. The mean RSEs and their standard deviations were computed by calculating the difference of the mean in the RST condition from that of the faster one of the two SST conditions, for each participant.

#### RMI Violations

Significant violations (p < 0.0026) were found across 10 and nine probabilities (0.05 to 0.50 and to 0.45) for Experiments 1 and 2, respectively. **Figure 2** presents the individual and mean RMI test function d(t) curves for Experiment 1 and 2 (Colonius and Diederich, 2006). The RMI test function plots the difference between the single-signal distribution and the redundantsignals distribution. Any area above the X-axis signifies violations of the RMI; areas below are in accordance with the RMI bound.

#### Fitting Results

On the level of mean RTs, all implemented model variants (except the simple-RT decision model) were able to reproduce the reaction time patterns for both experiments. None of the models could generate the standard deviation for every experimental condition; rather, they tended to overestimate the standard deviations. In the simple-RT fitting, the decision model proved unable to produce the empirical RSE; and in the two-alternative choice RT fitting, the non-decision model was unable to fit the RSE. See **Table 3** for a list of mean reaction times, standard deviations, and RSEs.

The outcome of the fitting procedure for Experiments 1 and 2, however, produced a clear separation among the models. **Table 4** lists the minimum BIC values for all models, separately for Experiments 1 and 2. For both experiments, the combined model turned out to be best-fitting model. The combined model of the two-choice RT data exhibited an even better fit than the fully unconstrained model, though only because the latter suffered a larger BIC penalty for its extra free parameters. Interestingly, the composition of the RSEs differed between the best-fitting simple-RT and two-alternative choice RT models. In the combined model for the simple-RT data, the non-decision component contributed to 78% of the RSE; in the combined model of the two-alternative choice RT data, by contrast, 58%.

**Figure 3** presents the quantile function plots of the combined model for Experiments 1 and 2, respectively.

FIGURE 2 | Violations of the RMI. The race model test function d(t) (please refer Equation 4) aggregated across individual observers (blue line) and for each individual observer (gray lines) for Experiments 1 (left) and 2 (right). Values that are significantly above zero constitute violations of the RMI. Violations were obtained for the probability points 0.05–0.50 using multiple t-tests with a Bonferroni-corrected significance level of 0.0026. This region is highlighted in light green.

TABLE 3 | Mean response times (and standard deviations in parentheses) of the empirical and model data for Experiment 1 (simple RT) and Experiment 2 (two-choice RT).


TABLE 4 | Minimum BIC values (and degrees of freedom in parentheses) and BIC weights for each model, separately for the simple RT data (Experiment 1) and the two-alternative choice RT data (Experiment 2).


# Parameter Analysis

From a qualitative view, arguably, the free, motor, and combined models agree well with regard to the range of the drift rates, criteria, and non-decision times for the three conditions. All models yielded the highest drift rate parameter and the lowest non-decision time for the redundant condition (where these parameters are allowed to vary). **Table 5** gives an overview of the best fitting parameters per model.

# Discussion

# Observers' Performance

The low error rates across the two experiments indicate the general simplicity of the tasks and attest to our observers' ability to follow the instructions. On a mean level analysis, the experiment demonstrated pronounced RSEs of 55 and 50 ms (in Experiments 1 and 2), respectively. Comparing the two single target conditions in Experiment 1, auditory-signal trials were processed faster than visual-signal trials. Albeit not statistically significant, this is in accordance with basic findings (Todd, 1912) of faster response times to auditory than to visual stimuli (for medium intensity levels). In Experiment 2,

FIGURE 3 | Quantile reaction times. Quantile reaction times for the combined model and empirical data from Experiment 1 (left panel) and Experiment 2 (right panel). Continuous lines and filled pyramids denote the empirical data, dashed lines and empty pyramids the model data.



the two unimodal conditions differed neither numerically nor statistically.

The many RMI violations—obtained for ten quantiles in Experiment 1 and nine in Experiment 2—effectively rule out the class of race models as explanatory accounts for the simple RT and the two-alternative choice RT data. This conclusion is underscored by the facts that both a conservative α-correction was used and response contingencies were avoided (Mordkoff and Miller, 1993). The RMI violations occurred in the lower range of probability points, which is of course plausible given the "make-up" of the RMI. Overall, these results indicate that the empirical RT data cannot be accounted for by a race model architecture.

#### Validity of Model Parameters

In general, the model parameters used here are mathematical constructs that, by mathematical transformations, yield distributions which can be compared to empirical reaction time distributions. The conclusions of the present study are based on the assumption that the different parameters of the decision models indeed map onto cognitive processes—specifically that the drift parameter v maps to stimulus quality and the parameters Ter and t to non-decision times; and that parameter a maps to response caution. Here, we review four studies which argue that this mapping is indeed justified.

In all of these studies, experimental manipulations were used to manipulate those cognitive aspects of processing that decision models' parameters are supposed to map onto. Specifically, manipulations comprised stimulus difficulty (Schwarz, 2001; Voss et al., 2004; Philiastides et al., 2014; van Vugt et al., 2014), response caution (Schwarz, 2001; Voss et al., 2004), and duration of response execution (Voss et al., 2004).

Voss et al. (2004) investigated four experimental conditions in a two-alternative color discrimination task, a baseline condition, and three variations. In the first variation, stimulus discriminability was manipulated by making the two possible colors more similar to each other. In the second variation, observers were instructed to perform the task carefully and avoid making mistakes. In the third variation, the response scheme was manipulated: instead of using two different fingers for the two responses, participants were allowed to use only one, single finger to submit one of the two responses. In accordance with the psychological interpretation of model parameters, drift rates were lower for the manipulation of stimulus discriminability, the two response boundaries were separated more widely when observers followed a conservative (error-avoiding) strategy, and the non-decision parameter increased substantially when the motor response required a more time-consuming movement.

For the ex-Wald model, in a "go/no-go" task, Schwarz (2001) used a digit comparison paradigm: observers, on each trial, were presented with one digit; they had to press a button if the number was greater than five, but withhold a response if the digit was less than five. Schwarz manipulated decision difficulty of discrimination and proportion of "go" responses in a crossed design. Supporting the usual psychological interpretation of decision model parameters, difficulty affected the drift parameter and proportion of "go" responses the threshold parameter. Importantly, neither of the two manipulations affected the non-decision parameter.

Recently, diffusion model parameters have been related to electrophysiological markers of the lateralized readiness potential (LRP), a difference wave between centrally located scalp potentials that usually are evoked by manual responses. van Vugt et al. (2014) found a consistent relation between diffusion model parameters with the temporal dynamics and shape of averaged LRPs. Taken together, they found that the ramping up of activity in the LRP is related to the accumulation of evidence toward a threshold. Importantly for the present context, van Vugt et al. used the LRP wave to estimate perceptual and motor latency. They calculated, for each observer, perceptual latency as the time at which the stimulus-locked LRP deviated from baseline activity, and motor latency as the time from the peak of the responselocked LRP to the manual response. The sum of these two latencies thus provided an estimator of non-decision time based on EEG data. This electrophysiologically derived estimator was significantly correlated with the non-decision time parameters individually recovered from a diffusion model fit to the behavioral data.

Finally, Philiastides et al. (2014) also investigated the relation between the parameters of a diffusion model fit to two-alternative choice behavioral data and single-trial EEG traces. First, they found that the model that best captured the behavioral changes induced by a manipulation of stimulus quality only had drift rate as a free parameter. Additionally freeing non-decision time to vary between stimulus conditions did not improve the fit any further. Moreover, of importance in the present context, they extracted, from single-trial EEG, a signal that best differentiated between the low- and high-quality stimulus conditions. The onset time of this extracted signal, that is, the time from stimulus onset until stimulus quality has differential effects on the EEG signal, can be considered as a marker of non-decision processing time. This onset time was found to correlate strongly with individually fitted non-decision time parameters of the diffusion model.

In sum, these studies strongly indicate that the parameters of decision models, especially non-decision time parameters, are indeed related to the corresponding cognitive processes. Thus, arguably, it is justified to interpret our finding of redundant signals to affect non-decision time parameters as reflecting cognitive non-decision processing.

# Decision and Non-Decision Processes Contribute to RMI Violations

The fitting results indicate that the best-fitting account for both the simple RT and the two-alternative choice RT data is provided by the combined model, in which the drift rates and nondecision times are allowed to vary across all conditions. This model is clearly set apart from the next best-fitting model, as the cost function is defined on a logarithmic scale. Inspection of the parameters (of the combined models) revealed that all models yielded a comparable parameter value range, which points to the reliability of the fits. Also, all models shared a pattern across both experiments: for all models, redundant-signals trials exhibited the highest drift rates and the shortest non-decision times. Together with the BIC scores, this can be taken as evidence for a combined drift rate and non-decision component account for the data of the present, bimodal RSP experiments. However, the models were fitted to the average (vincentized) distribution of the whole sample of participants. Thus, it remains possible that some participants actually exhibited purely decisional and others purely non-decisional origins of the RSE and that their mixture is responsible for the best-fitting model being the combined one. To examine this, we also fitted the models to each, single participant's data. In Experiment 1, the decision model, the non-decision model, the combined model, and the full model provided the best fit for 2, 0, 10, and three participants, respectively. For Experiment 2, the best fitting models were one times the decision model, three times the non-decision time model, 16 times the combined model, and one times the full model. That is, even for model fits on the level of single participants, the combined model provided the best fit for the large majority of the participants (see **Figures 4**, **5** for individual results).

Given that the fitting results do indeed reveal the generating mechanisms for the data obtained in the two experiments, the decisional and non-decisional components would appear to be contributing differentially to the total, observed RSEs. In Experiment 1, of a total RSE of 56, 43 ms are attributable to the non-decision time difference between the faster of the two unimodal conditions and the bimodal condition alone. In contrast, in Experiment 2, just half the RSE—26 ms of a total 51 ms can be attributed to this non-decision time difference. This outcome would be consistent with Miller (1982), who hinted at the possibility of the RSE being a mixture of both decisional and non-decisional processes.

Studies that have tried to fit data to explicit co-activation models are rare. One of the explicit models, which assumes coactivation at the decisional stage, is Schwarz's (2001) superposition model. The basic assumption of Schwarz's model is that, on redundant-target trials, the separate activations of the two stimulated channels superpose to form the overall-diffusion process, where sensory evidence on RSTs is the sum of sensory evidence from the two single channels: X12(t) = X1(t) + X2(t). Activity in both channels can be adequately described by independent diffusion processes of the Wiener type and can have variable channel dependency. Applying Schwarz's superposition model to data from Miller (1986) achieved a good prediction on the level of mean reaction times.

Diederich (1995) conducted a trimodal simple-RT study with visual, auditory, and tactile stimuli, with varying inter-stimulus intervals, and fitted a race model and two co-activation models to empirically observed RT means and variances. Although both coactivation models outperformed the separate-activations model and yielded excellent fits of the mean reaction time, Diederich notes that they failed to adequately capture the spread of the response times.

In line with the present diffusion model analysis, Diederich and Colonius's (1987) study also yielded positive evidence for co-activation occurring at the non-decision stage: examining the distributions of RT differences between left- and right-hand responses revealed a U-shaped dependence of the amount of facilitation in the motor component on the inter-stimulus interval. Note though that this analysis based on RT differences rests upon the (disputable) assumption that the motor delay constitutes an additive component of the entire observable RT (see, e.g., McClelland, 1979).

However, a comparison with the studies of Diederich (1995) and Diederich and Colonius (1987) remains problematic. Both studies examined the goodness-of-fit only for decisional models and only at the level of reaction time means and variances rather than the complete reaction time distribution (see also Blurton et al., 2014). In the present study, relying on the fit to the means alone would not have helped distinguish between the decisional and combined models in Experiment 1. And for Experiment 2, such an analysis would not have allowed us to rule out any of the models. As the decisional model involved the lowest number of parameters (namely, six) compared to the other models, the principle of parsimony would imply a preference for decisional models—though even for Experiment 2, the decisional model exhibited the poorest fit. On a methodological level, these differential outcomes provide a strong argument in favor of the use of distributional analyses of sequential-sampling models and against fitting decision models only to reaction time means and variances.

However, it must be acknowledged that the data from these two tasks were analyzed using different models (ex-Wald vs. RDM), so that the difference in RSE sources observed might be attributable, at least in part, to the difference in the models, rather than the tasks, employed. Specifically, in the RDM, the non-decision component has a uniform distribution, whereas in the ex-Wald model, the non-decision time has an exponential distribution. Perhaps the exponential rather than uniform non-decision component is responsible for non-decision time to exhibit a larger contribution to the RSE than the decision component<sup>2</sup> . In order to examine this possibility, we fit a RDM model to the SRT data from Experiment 1, and an ex-Wald model to the 2AFC data from Experiment 2. To do so, in the RDM, we set the separation from the starting point to the negative response boundary at a very high value, so as not to produce decision errors. Apart from that, the fitting routines and data were the same as above. Both for the data of Experiment 1 and for the data from Experiment 2, the best fitting model with the lowest BIC was the combined model, where target redundancy affected both the drift rate and the non-decision time (as compared to the pure drift and the pure non-decision time component), thus replicating the model ordering of the original fitting. Furthermore, for the data of Experiment 1 fit with a RDM, and for the data from Experiment 2 fit with the ex-Wald model, 4% (Experiment 1) and, respectively, 57% (Experiment 2) of the RSE was attributable to non-decision time—which compares with 78% (Experiment 1) and 58% (Experiment 2) in our original fit. It has to be noted, though, that the RDM model fit to the data of Experiment 1 yielded near-zero variance (ca. 4 ms) of the nondecision component, which is likely indicative of an overestimation of the variance of the decision component. Given that the ex-Wald model explicitly describes the decision mechanism of a go/no-go task and the RMD model that of a 2AFC task, these "cross-task fitting" results must be viewed with caution. For the data of Experiment 2, the proportion of the RSE attributable to the non-decision component was equivalent whether it was fit with a RDM or an ex-Wald model; by contrast, this proportion changed for the data of Experiment 2. Nevertheless, for the crossfitting too, the best model out of the set of candidates was the combined model. Whether and to what degree the contributions of the decision and non-decision components to the RSE differ between tasks cannot be decided on the basis of the present results.

# Generalizability

We showed that both our experiments yielded RSEs that cannot be accounted for by race model architectures. There are, however, other accounts that can, in principle, produce the critical RMI violations. However, the question of whether these alternatives would involve a non-decision component is fundamental and pertinent to all of these models. Interactive-race models (Mordkoff and Yantis, 1991) are similar to race models but allow for cross-talk between the two single-signal channels: when one channel registers activity, this can lead to a reduction of the drift rate in the other channel. Another model that could account for RMI violations is the serial exhaustive model (Townsend and Nozawa, 1997), according to which, as the name implies, both feature channels (e.g., visual, auditory) are processed in series and exhaustively. This model can generate RMI violations provided that the non-target channel accumulates evidence at a slower rate than the target channel. Another, conceptually different cause of RMI violations would be the presence of response contingencies (Mordkoff and Yantis, 1991; Mordkoff and Miller, 1993). As our study design included such contingencies (see Section Procedure above), we cannot firmly rule out response contingencies as an additional source of the RMI violations. However, as there are currently no explicit generative formulations of these alternative accounts, they cannot, at present, be assessed against the empirical data. Note, though, that the framework of our fitting procedure allows for extensions and adaptations that would make such a model comparison feasible in principle.

In order to corroborate our fitting results and validate the identification of decision and non-decision components in the reaction time data, we additionally performed a validation fitting with synthetically produced RSEs. To this end, we generated three sets of reaction time data (using the ex-Wald and RDM

<sup>2</sup>We are grateful to one of the reviewers for pointing out this possibility.

models). One set featured a purely decision-based RSE, generated by models in which only the decision parameter differed between SSTs and RSTs. Another set of data featured a purely non-decision based RSE, generated by analogously changing only the non-decision time parameter across SSTs and RSTs. Lastly, a combined decision/non-decision-based RSE was built into a data set. These three sets of data were then subjected to fitting to all three model types examined (see Section Single-boundary Accumulation and Ratcliff Diffusion Models and **Table 1** above). The fitting results showed that all built-in RSEs could be recovered and correctly identified by the fitting procedure, that is: the decision based RSE was best fitted by the decision-only model, and so on. Although the parameter values were not recovered numerically, the qualitative pattern was the same, in terms of the order of the fits and parameter relations. This validation fitting strengthens the results of the fitting of the empirical reaction time data and serves as a proof of concept: it is possible and meaningful to investigate the decision and non-decision components of the RSE employing (generative) reaction time models and fittings on the distributional level. Note that this validation procedure was based on the assumption that the data were indeed generated by the exact model that was used to fit the data. To our knowledge, it is an open question what the implications would be with regard to the validity of a model fit if the empirical data were generated by a mechanism that is different to that assumed by the model used to fit the data.

# The Redundant Signals Effect—an Umbrella Term?

Other studies that used different experimental paradigms (stimuli, tasks, modalities) have focused solely on a decisional origin of the RSE. The present results however raise the fundamental question whether "RSE" is, in fact, an umbrella term for different phenomena which share the general property of "multiple evidence sources" for performing a perceptual-motor task. Similar notions have already been put forward by Reynolds and Miller (2009) as well as (Schulte et al., 2006). It is, thus, likely that for specific stimulus properties (luminance, spatial frequency, orientation, etc.), tasks of differential complexity (detection, go/no-go, discrimination, etc.), uni- vs. multi-modal paradigms, the RSE is in fact generated by a combination of different mechanisms—and thus to be appropriately accounted for by different types of models. Similarly, Corballis (2002) showed that the RSE is subject to a substantial amount of inter-individual variability. Accordingly, inferences and generalizations across the many variations of the RSE paradigm, and perhaps even across participants, would appear problematic if the data basis is heterogeneous, gathered under very different experimental conditions. In this situation, a sequential-sampling model analysis can help systematize potential sources of the RSE across different paradigm variations and settings.

In summary, the present study examined the locus or loci of the RSE by applying a sequential-sampling model analysis to two bimodal, target detection and left-right localization, tasks. The fitting results challenge the view that co-activation in the RSP is a purely decisional effect. This pattern was even more pronounced in the data of Experiment 2, where the decisional model fared worst and the purely non-decisional model turned out second best in goodness-of-fit terms. Although two experiments are clearly insufficient to definitely rule out a decisiononly model, their results emphasize the role of the non-decision stage as a potential source of co-activation effects. Moreover, the results illustrate the usefulness of a systematic sequentialsampling model analysis for situations where the RMI is violated.

Thus, in conclusion, in order to achieve a realistic picture of what the sources of the RSE actually are and how the RSE is composed, a comprehensive series of experiments would be required that elaborate exactly what roles, in the RSP, are played by the stimuli, sensory modalities, response effectors, and experimental tasks in producing co-activation effects and exactly what the generating mechanisms are.

# Acknowledgments

This research was supported by grants from DFG Excellence Cluster EC 142 "CoTeSys" (HM and MZ), the DFG research group FOR480 (HM), DFG grant ZE 887/3-1 (MZ and HM), and German-Israeli Foundation for Scientific Research and Development grant 1130-158.4 (MZ and HM). The R interface for the "fastdm" C code was written by Scott Brown.

# References


Corballis, M. C. (2002). Hemispheric interactions in simple reaction time. Neuropsychologia 40, 423–434. doi: 10.1016/S0028-3932(01)00097-5


detection. J. Exp. Psychol. Hum. Percept. Perform. 22:25. doi: 10.1037/0096- 1523.22.1.25


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Zehetleitner, Ratko-Dehnert and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation

# **Detlef Wegener \*, Fingal Orlando Galashan, Maike Kathrin Aurich† and Andreas Kurt Kreiter**

Center for Cognitive Science, Brain Research Institute, University of Bremen, Bremen, Germany

#### **Edited by:**

Hans Colonius, Carl von Ossietzky Universität Oldenburg, Germany

#### **Reviewed by:**

Zhe Chen, University of Canterbury, New Zealand Søren K. Andersen, University of Aberdeen, UK

#### **\*Correspondence:**

Detlef Wegener, Center for Cognitive Science, Brain Research Institute, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany e-mail: wegener@ brain.uni-bremen.de

#### **†Present address:**

Maike Kathrin Aurich, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belval, Luxembourg

Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time (RT) measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns (RDPs), presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g., motion direction) was unique for each object, whereas the other feature (e.g., color) was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

**Keywords: reaction times, object-based attention, feature-based attention, attention model, task difficulty**

#### **INTRODUCTION**

The term attention is widely used to paraphrase specific modulations in the representation of task-relevant sensory information. While it suggests the assumption of a homogenous process, attention research has revealed many different aspects of attentional modulation, both in terms of neuronal mechanisms and behavior, and not all of these results turned out to be easily compatible.

Most confidence has been obtained for processing the attended information. Studies investigating the influence of attention on neuronal responses revealed a multitude of effects. For instance, directing attention to the motion of a stimulus, in terms of direction and speed, locally increases the firing rate (Treue and Maunsell, 1996) and the gamma power of the local field potential (Khayat et al., 2010) of neurons in motion-sensitive mediotemporal (MT) area, causes shrinkage of receptive fields around the attended stimulus (Womelsdorf et al., 2006a), and increases stimulus selectivity of single neurons (Wegener et al., 2004). As a consequence of attentional modulation, task-relevant motion changes are represented with shorter latency, and reaction times (RTs) become faster (Galashan et al., 2013). Corresponding findings have been obtained in other visual areas for features like color and form (McAdams and Maunsell, 1999; Reynolds et al., 1999; Fries et al., 2001; Taylor et al., 2005; Sundberg et al., 2012).

Less clear than the enhanced processing of the selected information is the processing fate of currently task-irrelevant, unattended information. For instance, if attention is directed to the motion of a colored object, what about processing of the target object's color, or motion information at other objects? In the framework of object-based attention theory, objects are considered the natural "units of attention", and attending a certain object feature has been shown to cause spreading of attention to other features of that object, thus promoting selection of the entire object (Duncan, 1984; O'Craven et al., 1999; Blaser et al., 2000; Scholl, 2001; Rodríguez et al., 2002; Schoenfeld et al., 2003; Wannig et al., 2007; Ernst et al., 2013). If taken literally, objectbased attention requires restriction of any response modulation to features at the attended object by definition, without spreading to features at other objects. However, psychophysical, imaging, and electrophysiological studies showed that attending towards a certain object feature is associated with enhanced processing of that feature throughout the visual field (Rossi and Paradiso, 1995; Treue and Martínez Trujillo, 1999; McAdams and Maunsell, 2000; Saenz et al., 2002; Arman et al., 2006; Müller et al., 2006; Serences and Boynton, 2007). In addition, various recent studies indicated that selection of a single target-object feature may result in suppression of other, task-irrelevant features of that object (Fanini et al., 2006; Nobre et al., 2006; Cant et al., 2008; Polk et al., 2008; Wegener et al., 2008; Serences et al., 2009; Taya et al., 2009; Xu, 2010; Freeman et al., 2014).

These results might be perceived as conceptually contradictory in some cases, and apparently conflicting in others. Understanding the underlying attentional mechanisms will critically depend on investigating the interaction of different forms of attention. This issue has been addressed by a surprisingly small number of studies (Boehler et al., 2011; Kravitz and Behrmann, 2011; Lustig and Beck, 2012). Since many of the above cited studies used the basically same behavioral requirement of object-feature directed attention, we performed two psychophysical experiments to further investigate the interaction and potential co-existence of feature- and object-based attention. To this end, we used stimulus and task conditions similar to those previously utilized for demonstrating object- and feature-based attention (Schoenfeld et al., 2003, 2014; Müller et al., 2006). We used RT as a measure for attention-dependent processing, and studied attentional spreading along both the object and feature domain in parallel. We presented two superimposed random dot arrays at fixation, having either motion in opposite direction but the same color, or having different colors but the same motion direction. Subjects were asked to make speeded responses to changes of either speed or color at one of the two objects, and attention was directed using cues indicating the object and the feature for which the change was to occur, with 75% validity. **Figure 1** shows two hypothetical patterns of cumulative RT distributions. In **Figure 1A**, fastest responses occur if both the feature and the object-cue dimension are correct, and slowest responses occur if both are incorrect. RTs are in-between if only one of the two cue dimensions is correct. Since responses to the unattended feature are faster if they occur at the attended object (straight blue line shifted to the left as compared to the dashed blue line) this result pattern suggests an object-based benefit for the unattended feature. In **Figure 1B**, only the feature dimension of the cue is effective, but the object dimension has no impact. Such an RT pattern is in favor of a pure feature-based attentional modulation, since RTs solely depend on attention directed to the feature, with no differences between attended and unattended object.

Our findings suggest that both feature- and object-specific attentional effects are evident at the same time. The results confirm that attending a single target-object feature is accompanied

**FIGURE 1 | Hypothetical RT distributions. (A)** Cumulative RT distribution compatible with an object-based attention approach. Task-irrelevant features are associated with faster RTs if they belong to the attended object, as indicated by a leftward shift of the RT distributions for invalidly cued features. **(B)** Cumulative RT distribution incompatible with a strict object-based attention approach. Here, object cueing is ineffective (i.e., a valid object cue has no influence or has an equal influence at both objects) and RTs are influenced solely by the feature cue, as indicated by overlaying RT distributions for corresponding feature cue conditions.

by co-selection of other, task-irrelevant features of the same object. However, they also show that this modulation is not restricted to the selected object but instead, spreads towards the unattended object. We suggest a simple, physiologically plausible 3-step model of attention to unify findings from object-based and feature-based attention theory in a single framework. Preliminary results have previously been published in abstract form (Wegener et al., 2009, 2010).

# **MATERIAL AND METHODS**

# **SUBJECTS**

The study was conducted with eight naïve female participants (mean age: 25.8 years). All subjects had normal or corrected-tonormal vision, as approved by the Freiburg Visual Acuity Test (Bach, 1996), and gave their written informed consent. The study conformed to the Code of Ethics of the World Medical Association (Declaration of Helsinki) and was approved by the University's ethics committee.

# **VISUAL STIMULATION AND TASK**

The behavioral task consisted of a feature-change detection paradigm, as outlined in **Figure 2**. Stimuli consisted of two superimposed, doughnut-shaped random dot patterns (RDPs) presented at the center of the screen with the fixation point and the cue being located in the inner notch of the stimulus. Stimuli had a diameter of 6.34◦ with the notch being 1.9◦ in diameter. Each RDP consisted of 50 dots with a maximal lifetime of 200 ms. Dot positions within the RDP were calculated as to never overlap each other, thus resulting in an individual dot's lifetime of mostly less than 200 ms. In Experiment 1 (**Figure 2A**), RDPs possessed coherent motion in opposite directions along the vertical meridian, at a constant speed of 2.54◦ /s. Color was the same for both RDPs. In case of a speed change, speed increased by 50%, in case of a color change, color switched from white to pale yellow. Speed and color change trials were cued by an arrow that was either gray (in case of a presumed speed change) or pale yellow (in case of a presumed color change). The orientation of the arrow indicated the direction of motion of the RDP on which the change was to occur. Cues had a validity of 75%. In case of an incorrect cue, the cue either (i) indicated the correct object, but the wrong feature to be changed; or (ii) the correct feature, but the wrong object; or (iii) was wrong in both respects (**Figure 2B**). In Experiment 2, RDPs consisted of isoluminant yellow and green dots but had coherent motion in the same direction (**Figure 2C**). In case of a speed change, the target object's speed again increased by 50%, in case of a color change dot color was getting slightly more intense. Color changes were matched to be as equally difficult to detect as those in Experiment 1, as confirmed in independent test trials with other subjects. Cues of Experiment 2 consisted of arrows (in case of a presumed speed change) or bars (in case of a presumed color change). Cue color (yellow or green) indicated the object on which the change was to occur (**Figure 2D**).

Subjects sat 45 cm in front of a 22 inch monitor (NEC Multisync FB2111 SB, NEC Display Solutions, Munich, Germany), with their head stabilized by a head-chin rest. Stimuli were generated on a Pentium computer with an NVidia Quadro NVS graphics card and were displayed on a dark background with a resolution of 1280 × 1024 pixels, at 100 Hz horizontal refresh rate. Eye position was monitored using a CCIR Monochrome Camera (DMK 83 Micro/C, The Imaging Source, Bremen, Germany) and a custommade remote videooculography system.

Each trial started with the appearance of a red fixation point in the center of the screen. Subjects initiated the trial by pressing a handheld button and keeping it pressed until a response was required. Following trial initiation, the cue appeared in the center of the screen with the fixation point superimposed on it, and remained visible throughout the trial. After a delay period of 1300 ms the two RDPs were displayed. Following RDP onset, one of the patterns changed either speed or color at one of nine possible points in time, separated by 320 ms between 640 ms and 3200 ms. Subjects were required to respond to any change as quick as possible, but in any case within a response interval of maximally 1000 ms, by releasing the button. Note that they were only required to detect but not to discriminate the change. Subjects were given immediate auditory feedback about their RTs by using sinus tones of different pitch. Very fast RTs were indicated by a different, especially pleasant tone. Divergence of the eye position by more than 1◦ from the fixation point, release of the

straight lines indicate increased speed, and arrow color indicates RDP color. These arrows are shown for illustration purposes only and were not part of the display. Note that stimulus colors were chosen for illustration purposes only. Actual colors used in the experiment were slightly different and color changes were less obvious as compared to the figure.

coherence was 100% in both of the patterns. In Experiment 1, motion was the object-defining feature and color was the shared feature (i.e., motion was in opposite direction and color was the same), whereas the opposite was true for Experiment 2 (i.e., motion direction was the same but color differed). **(B, D)** Cue assignments in experiment 1 and 2. Long, button prior to any change (false alarm), or absence of a response 1 s after the change (miss) caused immediate termination of the trial.

Per experiment, data were obtained within nine consecutive blocks, with no more than two blocks per day. Each block consisted of 96 trials, i.e., 48 trials per feature change condition. Speed and color change as well as correctly and incorrectly cued trials were fully interleaved with the order randomly chosen by the stimulation program. Prior to both experiments, subjects were given one block to familiarize them with the task and stimuli.

# **DATA ANALYSIS**

Data were analyzed with custom-written scripts and the Statistics Toolbox in Matlab 7.13 (The MathWorks, Natick, MA). Trials in which the button was released at or before 200 ms after a feature change were counted as false alarms. Performance was calculated as the percentage of correct responses from the sum of correct responses, false alarms, and misses. RT analysis was performed separately for speed and color change trials. To avoid influences of day-by-day variations in RT and to allow for comparing RTs across experiments, each speed and color change RT was normalized by dividing through the mean RT of all speed and color change trials, respectively, of the corresponding block. Group RTs were calculated as the average of the median normalized RTs per subject and cue condition. Feature- and object-cue effects were analyzed by 2-way ANOVA using the factors feature cue (valid, invalid) and object cue (valid, invalid). *Post-hoc* tests were conducted with two-tailed paired *t*-tests. All tests were performed on a 95% significance level.

# **RESULTS**

#### **BEHAVIORAL DATA**

Subjects performed a total of 864 trials in each of the experiments, registered within nine consecutive blocks distributed over usually 5 days. Eye movements exceeding 1◦ from central fixation or eye blinks resulted in termination of 2.5% of all trials. Excluding these fixation errors, mean performance was 92.8 ± 1.5% in Experiment 1 and 92.9 ± 1.3% in Experiment 2, and was very similar between speed and color change trials (range: 92.1–93.6%). Regarding practicing effects over blocks, mean performance in Experiment 1 increased slightly during the course of the experiment but did not show significant variations, whereas in Experiment 2, performance in the first session was worse than in some subsequent sessions, as revealed by blockwise comparison of the percentage of successful trials by means of 1-way ANOVA and Bonferroni's Multiple Comparison Test (Experiment 1: *F*(8,63) = 1.8, *p* = 0.094; Experiment 2: *F*(8,63) = 4.2, *p* = 0.0005). Considering the relevant behavioral measure of this study, we found very similar RTs across blocks for both correctly cued speed and color changes in both experiments, with no significant difference between blocks (Experiment 1: speed: *F*(8,63) = 0.24, *p* = 0.981, color: *F*(8,63) = 0.85, *p* = 0.563; Experiment 2: speed: *F*(8,63) = 1.1, *p* = 0.378, color: *F*(8,63) = 0.8, *p* =0.609). For optimal comparability between speed-change and color-change trials and between experiments, we normalized all speed-change RTs of a subject to the mean speed-change RT of the respective experimental block, and proceeded accordingly for

color-change trials. All results reported in this paper also hold true for absolute RTs.

# **EXPERIMENT 1—OBJECTS DEFINED BY MOTION DIRECTION**

**Figure 3** shows the RT results for Experiment 1, when objects were defined by motion direction and color was the shared feature. For speed changes, mean normalized RTs were fastest when both the feature and the object cue were correct (0.932 ± 0.022), and slowest when both were incorrect (1.281 ± 0.088). When either the feature or the object cue dimension was correct and the other cue dimension was incorrect, RTs were in-between (1.112 ± 0.087 and 1.083 ± 0.09, respectively; **Figure 3A**). For comparison with the literature, **Table 1** lists absolute RTs. A 2-way ANOVA with the factors feature cue (valid, invalid) and object cue (valid, invalid) revealed highly significant effects of both factors (feature cue: *F*(1,7) = 34.3, *p* < 0.0001; object cue: (*F*(1,7) = 47.8, *p* < 0.0001), and no interaction (*F*(1,7) = 0.121, *p* = 0.73; **Figure 3C**, left). *Posthoc* two-tailed *t-tests* confirmed these results by showing highly significant effects of the feature cue at both the correctly (*p* = 0.0023) and incorrectly cued object (*p* = 0.0036), as well as highly significant effects of the object cue for both correctly (*p* = 0.0015) and incorrectly cued features (*p* = 0.0025). Both cue dimensions were about equally effective as revealed by no differences between the two conditions if only one of the two cue dimensions was correct and the other incorrect (*p* = 0.6367). The corresponding cumulative distributions of RTs are shown in **Figure 3D**, revealing a close similarity with the hypothetical pattern of distributions to illustrate an object-cue benefit, as shown in **Figure 1A**. The critical comparison here is the distribution of RTs for the two conditions using invalid feature cues, showing a clear leftward shift of the RT distribution if the unattended speed change occurred at the attended object as compared to the unattended object.

We next investigated whether this pattern of results also holds true for the detection of color changes. As for speed changes, we found fastest RTs for fully correctly cued trials (0.967 ± 0.016), and slowest RTs for fully incorrectly cued trials (1.124 ± 0.05). Yet, for the two conditions having one incorrect cue dimension, RTs were almost exclusively determined by the validity of the feature cue: if the feature cue was correct, RTs at the uncued object (0.979 ± 0.039) were close to those at the cued object, and if the feature cue was incorrect, RTs at the cued object were close to those at the uncued object (1.099 ± 0.041; **Figure 3B**). A 2-way ANOVA revealed a highly significant effect of the factor feature cue (*F*(1,7) = 102.9, *p* < 0.0001), but no effect of the factor object cue (*F*(1,7) = 1.8, *p* = 0.185), and no interaction (*F*(1,7) = 0.195, *p* = 0.662; **Figure 3C**, right). *Post-hoc* two-tailed *t*-tests showed a significant difference between the two conditions having only one correctly cued dimension (*p* = 0.0002), but no differences between the conditions having a correct or an incorrect feature cue at either the cued (*p* = 0.473) or the uncued (*p* = 0.111) object. Thus, for color changes in Experiment 1 the results were different from those of speed changes, as reflected by a pattern of cumulative RT distributions (**Figure 3E**) similar to those shown in **Figure 1B**, illustrating a strict feature-based modulation of RTs. Moreover, comparing RTs in response to color changes with those in response to speed changes revealed very similar

**(B)** color-change detection as a function of validity of the two cue dimensions. Per cueing condition, bars represent the mean over the median normalized RTs of all subjects. **(C)** Mean normalized RTs for the factors feature cue (black circles) and object cue (open circles) as a function of cue validity. For the

and for the object cue they represent the column mean. **(D, E)** Cumulative distributions of normalized RTs. Line colors correspond to cue conditions as in **(A, B)**. **(F)** Speed- and color-change difference of mean normalized RTs for corresponding cue conditions. Error bars indicate SD throughout the figure.

**Table 1 | Absolute mean RTs** ± **SD [ms] for the four different cueing conditions of Exp. 1 and Exp. 2**.


RTs if the object cue was correct, but also significantly shorter color-change RTs if it was incorrect (correctly cued features: *p* = 0.0045; incorrectly cued features: *p* = 0.0026; **Figure 3F**). Hence, while results for speed changes confirmed predictions of object-based attention theory regarding a same-object benefit, those for color changes were more in line with feature-based modulation.

# **EXPERIMENT 2—OBJECTS DEFINED BY COLOR**

The failure to find a same-object benefit for color-change detection in Experiment 1 could be due to either absent attentional co-selection of the task-irrelevant feature at the cued object, or alternatively, to a spreading of feature-dependent attention towards the uncued object. Both possibilities potentially result in RTs being not different at the cued or uncued object. As a third alternative, the dichotomy in speed- and color-change detection may represent a general difference in attention-dependent processing of the two features. We tested between these alternatives by performing another experiment using objects differing in color but not motion direction. We hypothesized that a general difference in speed- and color-change detection should preserve the pattern of RT distributions found in Experiment 1, whereas these should be inverted (i.e., a same-object benefit now for color but not motion) if one of the former alternatives was true.

**Figure 4** illustrates that the results of Experiment 2 were exactly opposite to those of Experiment 1. For speed changes, we now obtained a pattern of RT distributions similar to those for color changes in Experiment 1, with no same-object benefit: RTs were similarly fast at both the correctly and incorrectly cued object (0.951 ± 0.016 and 0.975 ± 0.022, respectively) when the feature cue was correct, and similarly slow when the feature cue was incorrect (1.14 ± 0.087 and 0.172 ± 0.072, respectively) (**Figures 4A,D**). A 2-way ANOVA revealed a significant influence of the factor feature cue (*F*(1,7) = 88.8, *p* < 0.0001), but not of the factor object cue (*F*(1,7) = 2.0, *p* = 0.168), and no interaction (*F*(1,7) = 0.051, *p* = 0.823; **Figure 4C**, left). *Post-hoc* analysis confirmed the feature cue effect at both the cued and the uncued object (*p* = 0.001 and *p* < 0.001, respectively).

In contrast, for color-change detection we now found a clear same-object benefit, thus resembling the results for speed changes in Experiment 1: RTs were again fastest when both cue dimension were correct (0.94 ± 0.017), and slowest when both were incorrect (1.23 ± 0.087). If only one feature dimension was correct and the other incorrect, RTs were in-between (correct feature cue: 1.061 ± 0.034; correct object cue: 1.1 ± 0.04), indicating an influence of both cue dimensions (**Figures 4B,E**, cf. **Table 1** for absolute RT values). Accordingly, performing a 2-way ANOVA revealed a significant influence of both factors (feature cue: *F*(1,7) = 80.6, *p* < 0.0001 ; object cue: *F*(1,7) = 46.8, *p* < 0.0001), and no interaction (*F*(1,7) = 0.017, *p* = 0.897; **Figure 4C**, right). *Post-hoc ttests* confirmed this by showing significantly shorter RTs between correctly and incorrectly cued features at both the correctly and incorrectly cued object (*p* < 0.0001 and *p* = 0.0031, respectively),

and significantly shorter RTs depending on the validity of the object cue for both correctly and incorrectly cued features (both *p* < 0.0001). Different to Experiment 1, however, comparing the two conditions having only one correctly cued dimension revealed a slightly, but significantly higher influence of the feature cue (*p* = 0.021). Comparing the cue effects for speed and color changes again revealed very similar RTs at the cued object, but slightly faster RTs for speed changes at the uncued object, which were significant for correctly cued features (*p* = 0.0004) (**Figure 4F**).

# **COMPARISON OF SPEED- AND COLOR-CHANGE DETECTION ACROSS EXPERIMENTS**

Experiment 2 showed that the existence or absence of a sameobject benefit is not due to a general difference between speedand color-change detection. Thus, we next investigated whether it is caused by either absent attentional co-selection of the taskirrelevant feature at the attended object, or alternatively by attentional spreading of feature-dependent attention towards the unattended object. To this end, we analyzed speed- and colorchange detection across experiments, i.e., we compared RTs in response to a feature change when it was the unique, objectdefining feature vs. when it was the shared one. We found that speed and color changes provided an essentially identical pattern of results (**Figures 5A,B**). At the cued object, RT distributions were basically indistinguishable between Experiments 1 and 2, i.e., they were about the same independent of whether the feature was object-defining or shared. In contrast, at the uncued object we observed a prominent leftward shift of the RT distribution when subjects responded to a change of the shared feature, regardless of whether this was speed or color, or whether the feature was correctly or incorrectly cued. These findings allow for two important conclusions regarding attentional spreading: First, since RTs at the cued object where equal for shared and object-defining features, the task-irrelevant shared feature received the same attentional modulation as the object-defining feature (for which a sameobject benefit was evident for both speed and color changes), thus indicating attentional co-selection of the task-irrelevant target-object feature independent of its relevance for defining or selecting the object. Second, since RT distributions for shared features were consistently shifted to the left at the uncued object, attentional modulation of shared features was not restricted to the target but spread towards the task-irrelevant object, resulting in a failure to find a same-object benefit for shared features in the previous analyses. Hence, attending towards a single feature of a target object resulted in co-selection of another, task-irrelevant feature of that object. Yet, the underlying attentional process was not restricted to the selected object, but included enhanced processing of that irrelevant feature at another, irrelevant object.

This conclusion is supported by a balanced one-way ANOVA using data for shared and object-defining features at both the cued and the uncued object, pooled over speed- and color-change trials from both experiments (**Figure 5C**). For both validly and invalidly cued features, ANOVAS indicated significant differences between the four cue conditions (uncued features: *F*(3,60) = 15, *p* < 0.0001; cued features: *F*(3,60) = 44.67, *p* < 0.0001). For testing individual conditions, we applied a Bonferroni correction for multiple comparisons and regarded conditions as being significantly different if the confidence interval did not include 0 for alpha errors of 0.05. Mean differences and corresponding lower and upper bounds of confidence intervals are summarized in **Table 2**. For uncued changes of the object-defining feature, we found significantly faster RTs at the cued object, confirming the same-object benefit as described previously by analyzing speedand color change trials individually. However, changes of the shared feature were statistically not different from those of the object-defining feature at the cued object, independent of the object on which they occurred. Even more, they were consistently

was object-defining (straight line) or shared (dashed line). Scaling of axes is identical for all subplots, as indicated in the right bottom panel of **(A)**. **(C)** Mean normalized RTs for shared and object-defining features, separately for

the shared feature at the cued object, as a reference. Numbers on top of upper x-axis indicate stimulus and cue condition for reference to statistical comparisons summarized in **Table 2**.

**Table 2 | Statistical results for comparing change detection at the cued and uncued object, separately for object-defining and shared features, and depending on the feature cue being either validly (upper half) or invalidly (lower half) cued**.


Stimulus condition numbers correspond to those introduced in Figure 5C.

faster than RTs to changes of the object-defining feature at the uncued object. Thus, a feature that was fully irrelevant to select the object received the same attentional modulation than another one that was obligatorily required for object selection, and this attentional modulation spread towards the task-irrelevant object. A similar pattern of results was found for correctly cued feature changes. Again, changes of the object-defining feature were not only significantly slower at the uncued object as compared to the cued one, but also as compared to changes of the shared feature, regardless of whether these occurred at the cued or the uncued object. The only difference to the former analysis for invalidly cued feature changes was that changes of the shared feature at the uncued object were slightly but significantly slower than those of the object-defining feature at the cued object. Thus, statistically testing confirmed our previous conclusion that the absence of a same-object benefit for shared features was not due to absent attentional modulation of that feature but caused by spreading of attention from the co-selected irrelevant object feature to the same feature at the unattended object.

# **DISCUSSION**

Object feature-directed attention (OFDA) has been associated with co-selection as well as suppression of task-irrelevant targetobject features, and with a global spreading of attention towards distant objects sharing the attended feature (for review: Olson, 2001; Scholl, 2001; Maunsell and Treue, 2006; Carrasco, 2011; Chen, 2012; Lee and Choo, 2013). Several factors influencing whether features are processed independently or integrated over objects were postulated, including stimulus characteristics (Vecera and Farah, 1994), the spatial extent of attention (Lavie and Driver, 1996), the need of attentional shifts (Lamy and Egeth, 2002), and task demands (Mayer and Vuong, 2012). Co-selection of task-irrelevant object features has been taken as evidence for object-based attention (Duncan, 1984; O'Craven et al., 1999; Blaser et al., 2000; Rodríguez et al., 2002; Schoenfeld et al., 2003; Wannig et al., 2007), while suppression of task-irrelevant features and global enhancement of the attended feature has been attributed to feature-based attention (Rossi and Paradiso, 1995; Treue and Martínez Trujillo, 1999; Saenz et al., 2002, 2003; Martínez Trujillo and Treue, 2004; Fanini et al., 2006; Nobre et al., 2006; Polk et al., 2008; Wegener et al., 2008; Gál et al., 2009; Serences et al., 2009). Yet, it is an open question whether the different effects observed with OFDA represent different attention mechanisms of which one dominates the other depending on task and stimulus constraints, or whether they represent distinct, potentially co-existing states of a single attention mechanism. The current study provides evidence for the latter possibility by demonstrating that object- and feature-specific effects of attention are not mutually exclusive but co-exist, as expressed by effective attentional modulation of task-irrelevant, co-selected target-object features at non-target objects. To conceptualize our findings, we propose a 3-step model of attention consisting of object-specific selection of features due to binding and grouping dynamics and a subsequent global, object- and spaceindependent modulation of those selected features. Upstream to this, a task-dependent, weighted gain to each of the feature channels potentially constraints the level of object-specific feature binding. The following sub-chapters first describe the basic architecture of the model and then discuss characteristics and predictions of the model based on recent literature and the experimental findings of the current study.

# **3-STEP MODEL OF ATTENTION**

Several computational models of attention have been suggested previously, including Guided Search, Neural Theory of Visual Attention (NTVA), Selective Attention Model (SLAM), and others (for review: Itti and Koch, 2001; Wolfe and Horowitz, 2004; Bundesen and Habekost, 2005; Rothenstein and Tsotsos, 2008). The 3-step model of attention presented in the following is conceptual rather than computational, and represents a unified framework for feature- and object-specific effects of attention and their dependency on task requirements. The model consists of distinct feature channels (A to C in **Figure 6A**), each being represented by multiple modules to account for different locations in space (1 to 4 in **Figure 6A**), and a channel- and location-specific top-down input to these modules, specified by task requirements. The model assumes two forms of interaction, horizontally and vertically. Horizontal interactions are taking place between modules of the same feature channel and support enhanced processing of a selected feature at unattended locations. Vertical interactions are taking place between modules of different channels and essentially represent binding dynamics through which different features of an object are integrated. We propose that the actual strength of these vertical interactions determines the degree to which task-irrelevant features are subject to co-selection, depending on both the task-dependent, weighted top-down input to each of the feature channels and stimulusspecific characteristics. Attenuation or even suppression of these binding dynamics takes place if the top-down gain provides sufficient suppressive drive to those feature channels that process task-irrelevant information. Following feature selection (selected either directly as a consequence of task requirements or indirectly by object-specific binding processes), the strength of horizontal interactions then determines the degree to which these selected features are processed globally. In a nutshell, the model assumes task- and object-specific feature selection and object-independent global processing modulation of the selected features.

Consider an object at location 1, consisting of features A and B (blue square in **Figure 6A**), and a relatively undemanding task requiring attention to feature A. Step 1 of the model sets the top-down signal, which consists of a spatial, featureunspecific selection of the task-relevant object location and a feature-specific selection of the task-relevant object feature A1. With low task demands, non-relevant feature channels will not be particularly suppressed, symbolized by open circles as inputs to channels B and C. Step 2 of the model consists of vertical interactions between different feature modules at the attended location, depending on two factors: 1) the top-down input to each of the feature channels as set in step 1; and 2) stimulusdependent binding or grouping characteristics. In the example, binding is assumed to support object-specific feature integration, resulting in co-selection and enhanced processing of taskirrelevant object-feature B1 (blue arrow), but omitting feature C1. In the third step, the selected features receive a globally enhanced processing benefit, implemented by the horizontal interactions between modules of the same feature channels (red arrows). Hence, due to task requirements, feature A receives a global, strong processing enhancement and feature B receives a somewhat weaker (due to the absent top-down boost, cf. also Lu and Itti, 2005) but globally effective processing enhancement, too. Taken together, the model proposes a task- and object-specific selection of features (supported by binding dynamics and potentially constrained by top-down mediated suppression of task-irrelevant feature channels), and a global processing modulation of the selected features, i.e., not restricted to the initially attended object or location.

The model fully accounts for the experimental findings obtained in the current study and it makes numerous experimentally testable predictions, two of which are illustrated in **Figures 6C,D** and will be discussed below. First, for the results reported in this paper, **Figure 6B** illustrates our experimental situation by considering two objects, each consisting of one unique feature and another feature that is shared among both objects (illustrated by the partially overlapping blue and orange rectangles). The prediction from the model is that fastest RTs are to be expected for the unique, task-relevant feature A of the target object (blue), which receives a direct attention-dependent topdown boost, and slowest RTs for the unattended, unique feature C of the distractor object (orange). Yet, under low task demands (simple change detection under conditions of overt attention) the model predicts that the shared feature B will be subject to coselection due to object-specific binding dynamics during step 2, but will be processed in a global, object-independent manner due to step 3, resulting in RTs that are to be the same regardless of whether this feature is tested at the blue or the orange object. Likewise, if attention is directed to the shared feature, RTs should be fastest for this feature, again independent of the object on

which it is tested, and slowest for the other two features. The results of Experiments 1 and 2 for object-defining and shared features exactly confirm these predictions.

The basic characteristics of the model also predict results of previous studies that have been attributed to support either object- or feature-based attention. For example, O'Craven et al. (1999) reported that attending the motion of a face stimulus elicited higher activity not only in human motion-sensitive region MT+ but also in the fusiform face area (FFA), whereas activity in the parahippocampal place area in response to a spatially overlapping house stimulus was not affected. Considering low or moderate task demands and strong binding dynamics between motion and the high-level feature "face", the 3-step model of attention predicts co-selection of the task-irrelevant feature "face" and enhanced processing in FFA, but no such effect for the feature "house". Importantly, the model also predicts enhanced FFA activity in response to distant, task-irrelevant face stimuli, and to motion bound to the house stimulus. However, these conditions have not been tested in the study of O'Craven et al. (1999).

The results of O'Craven et al. (1999) were taken as evidence for object-based attention. Using essentially the same type of OFDA-paradigm, Treue and Martínez-Trujillo found evidence for feature-based attention by demonstrating that attending a specific feature of an object at a target location causes enhanced processing of that feature also at distant objects (Treue and Martínez Trujillo, 1999; Martínez Trujillo and Treue, 2004). This result is explained by the horizontal interactions of the model within feature channels. Importantly, as noted before, the model also predicts that under appropriate task and stimulus conditions another feature of the attended object may become subject to co-selection and enhanced processing. This condition was tested in a follow-up study by requiring attention to either the color or the motion of a moving object (Katzner et al., 2009). The authors found that attentiondependent effects of MT neurons were independent from the task at hand, supporting the assumption of co-selection of the task-irrelevant feature under experimental conditions for which results were otherwise consistent with feature-based attention.

Another line of evidence suggests that attention can also be directed away from known non-target features (Woodman and Luck, 2007; Arita et al., 2012). In the model, this can be achieved by setting a low weight or even a negative gain for task-irrelevant features, resulting in an advantage of other, not explicitly suppressed features. This effect would be in accordance with the finding that negative cues are effective, although not as powerful as positive ones (Arita et al., 2012).

#### **INFLUENCE OF TASK DEMANDS AND STIMULUS CHARACTERISTICS**

A key-assumption of the model is that the strength of vertical interactions varies as a function of task difficulty, resulting from the weighted top-down input to the various feature channels. Thus, with higher task demands feature channels processing taskirrelevant information may become subject to active suppression (**Figure 6C**, symbolized by minus symbols as input for channels B and C), resulting in attenuation of binding these features with the selected feature, and thereby reducing or preventing their co-selection. For the example of RTs as a measure of attentiondependent processing modulation, higher task demands (as e.g., detecting a hardly visible feature change) will result in stronger suppression of task-irrelevant information and thus, in a rightward shift of the RT distributions for the task-irrelevant targetobject feature B. In the most extreme case, RTs may be as slow as those for the unattended feature C of the unattended object. In any case, RTs in response to the unattended feature B are predicted to still be independent from the object on which they are tested.

Experimental data from neurophysiological and neuroimaging studies support the assumption of a close relation between task demands and the specific form of attentional modulation. In monkey area V4, Spitzer et al. (1988) reported that neurons were more strongly modulated if monkeys had to detect an orientation difference of only 22.5◦ between sample and test stimuli as compared to a difference of 90◦ , and also found a corresponding behavioral improvement in discriminative abilities. Likewise, color-selective neurons in inferotemporal cortex were shown to be strongly modulated depending upon whether the task implied simple color categorization or a more demanding color discrimination (Koida and Komatsu, 2007). Notably, such task-related modulation of neuronal activity may be found as early as V1 (Chen et al., 2008), and has been reported for many areas throughout visual cortex in humans, including MT+ (Huk and Heeger, 2000).

Task demands may even cause a complete perceptual suppression of otherwise highly salient stimuli, as demonstrated by studies on inattentional blindness. A well-known example is the finding of overlooking the "gorilla-in-the-midst" (Simons and Chabris, 1999), but other studies showed that this complete recognition failure may also occur for less complex scenes and artificial stimuli, even if these were presented for prolonged times and moved through the center of gaze (Most et al., 2001). Active attention-dependent inhibition was demonstrated by Slotnick et al. (2003), reporting significant suppression of activity at locations distant to the attended object, in both striate and extrastriate visual areas. Other studies investigated the processing fate of different object features and found evidence for both, co-selection and suppression, suggesting that feature-directed attention may act through a combination of facilitatory and inhibitory mechanisms (Fanini et al., 2006; Xu, 2010). Importantly, whether an irrelevant object feature was selected or blocked depended upon task requirements or attentional load. Active inhibition was evident only if the task induced a strong response conflict, whereas it was absent otherwise (Fanini et al., 2006), or as a function of the target-feature encoding load (Xu, 2010). In addition, effective filtering of a task-irrelevant feature has been shown to increase with learning (Gál et al., 2009), underlining the dynamic nature of feature selection and feature suppression.

Task demands may also vary with stimulus characteristics. Mayer and Vuong (2012) recently showed that changes to unattended motion or color of a stimulus did not affect a subject's performance, but changes to unattended shape did. These results provide direct evidence for stimulus-inherent properties influencing the degree to which irrelevant object features of the attended object can be effectively suppressed. In turn, they also suggest that stimulus properties influence the degree to which irrelevant information is bound to the relevant information. Such spreading of attention was shown by previous behavioral (Egly et al., 1994; Richard et al., 2008) and single-cell studies (Roelfsema et al., 1998, 2004), demonstrating that unattended locations receive a processing enhancement when these were located on the same coherent object than the attended location, as compared to equally distant but unbound locations. Gestalt cues like collinearity, color similarity, and common fate similarly influence attentional spreading towards irrelevant locations (Wannig et al., 2011). These authors demonstrated increased V1 firing rates in response to a spatially unattended stimulus depending on its Gestalt similarity to a stimulus at the attended location. The results were taken as support for the concept of incremental grouping, which builds on labelling of feature-selective neurons, e.g., by enhanced activity (Roelfsema et al., 2000; Roelfsema, 2006). Accordingly, if applied to the framework presented here, task- and bindingmediated enhancement or suppression of feature-selective neurons would determine the degree to which these are labelled and thus directly influences potential co-selection of task-irrelevant object features.

# **PARALLEL, FEATURE-SPECIFIC PROCESSING ENHANCEMENT**

Due to step 3 of the model, another key-assumption is that all selected features gain a global, i.e., spatially independent processing enhancement, no matter whether they were selected by task instructions or as a result of object-specific binding dynamics. Thus, when tested on a distant object, the model not only predicts a processing benefit for the attentionally selected object-feature, but for co-selected features as well. In **Figure 6D**, each of the features A to C is tested on a different object at the unattended location 2. The model predicts that basic relations between RT distributions as observed at the attended object (cf. **Figure 6B**) should be preserved in the periphery, even though shifted to the right due to absent spatial attention (indicated by an open circle for location 2). Thus, the attended feature A receives fastest RTs and the co-selected feature B receives somewhat slower RTs, but still faster than those of the unselected feature C.

By predicting global processing enhancement of all selected features, the model necessarily implies that attention can be divided to multiple features at the same time, as also suggested by the results of the current study showing reduced RTs not only for the cued feature but also in response to the uncued, co-selected feature. This finding is in accordance with previous research indicating that parallel processing of two attended features may occur without costs in accuracy as compared to processing only one (Bonnel and Prinzmetal, 1998; Tsujimoto and Tayama, 2004). Interestingly, dual-task performance involving feature values defined in the same dimension (form, color, motion) was reported to be indistinguishable from dual-task performance involving features from different dimensions (Lee et al., 1999). The most direct proof of divided feature-directed attention has been provided by recent EEG studies using frequency-tagged, steady-state visual evoked potentials (Andersen et al., 2008, 2013). It was not not only shown that attention can indeed be directed to two different features at the same time, but furthermore that facilitation of these features can be observed throughout the visual field even if task demands would favor a spatially restricted processing enhancement (Andersen et al., 2013).

Consistent with this and our own findings, another EEG study recently showed that also the neuronal representation of taskirrelevant features may be globally enhanced (Boehler et al., 2011). The authors investigated the ERP response to a distractor object, located in the hemi-field opposite to the target. Even though the object was irrelevant to the task and located outside the spatial focus of attention, its neuronal representation was modulated depending on the similarity of distinct features between distractor and target. Specifically, if the distractor contained a color that was also present in the target, the ERP response showed a characteristic modulation as compared to the situation when both objects were made of different colors. Interestingly, this irrelevant-feature effect arose about 80 ms later in time than the attentional effect at the target object, a modulation of the N2pc component (being associated with the allocation of attention and also linked to feature selection (Luck and Hillyard, 1994; Eimer, 1996; Hopf et al., 2004)). The authors interpreted this result as to indicate spreading of attention towards other objects outside the spatial focus of attention, as previously been also suggested by studies showing object-based response compatibility effects at distractor items (Chen and Cave, 2006), and an influence of categorical similarity (Kravitz and Behrmann, 2011). In the context of our model, this feature-depending modulation of the distractor is in accordance with representing step 3 of the model a global, feature-based enhancement of those features that were selected during step 1 and 2. Further experimental results in accordance with this notion come from a recent fMRI study demonstrating global enhancement of co-selected, task-irrelevant features bound to the target feature of the attended object (Lustig and Beck, 2012). Notably, this spreading of attention from the target to the distractor object not only occurs under conditions of covert attention, as shown by Boehler et al. (2011) and Lustig and Beck (2012), but even when objects are presented at the spatial focus of attention, as indicated by the results of the current study.

# **TOP-DOWN ADJUSTMENT**

The primary purpose of the model is to suggest a simple, unique framework to account for (1) the experimental results obtained in our experiments; and (2) experimental findings from previous studies that have been attributed to either feature-based or object-based attention. In its current version, the model does not distinguish between features of the same (red, green) or different dimensions (color, motion). There is good evidence that attending a specific feature dimension may affect processing of all features in that dimension (Found and Müller, 1996; Weidner et al., 2002; Gramann et al., 2007; Schubö and Müller, 2009; Gramann et al., 2010), thus posing the constraint that processing of task-irrelevant features may be different depending on whether these are defined in the same or a different dimension. However, such a distinction can easily be incorporated into the model by splitting the top-down gain into two factors, one concerning the feature dimension and the other concerning the specific feature attribute. If all other characteristics of task and stimuli are kept constant, the model allows for predicting the relative size of attentional effects as a function of the dimension to which the task-irrelevant object features belong.

A possible candidate structure as the source of this taskdependent top-down signal is the prefrontal cortex (PFC), a region involved in the executive control of behavior and the current task set (Sakai, 2008). Many neurons in PFC exhibit a strong rule-dependency regarding spatial and featural decisions, and the acquisition and implementation of the current task context has been suggested to constitute a main function of PFC (Sakagami and Niki, 1994; White and Wise, 1999; Assad et al., 2000). Interestingly, in the context of the current study, the activity of a significant fraction of neurons in PFC has been demonstrated to exhibit task-dependent selectivity for both the behaviorally relevant features motion and color (Lauwereyns et al., 2001).

# **CONCLUDING REMARKS AND SUMMARY**

The current study utilized 2-dimensional cues indicating the prospective target object and target feature, and RT as a measure for behavioral performance. The significant dependencies between the information provided by the cue and the respective RT distributions were interpreted as representing feature- and object-based attentional selection, and were integrated into a 3-step model of attention acting on the early processing of visual stimuli. Yet, opposed to this assumption, differences in RT do not necessarily indicate an influence on visual processing, but potentially might also be due to other factors, as e.g., a taskspecific response set (cf. for discussion: Taya et al., 2009). However, the strong evidence provided by several neurophysiological studies revealing the influence of both object- and feature-based attention on neuronal responses in early visual cortex (Roelfsema et al., 1998; Treue and Martínez Trujillo, 1999; McAdams and Maunsell, 2000; Wannig et al., 2007; David et al., 2008; Katzner et al., 2009; Zhou and Desimone, 2009; Wannig et al., 2011; Chen et al., 2012), and the correlation between attention-dependent modulation of neuronal activity in early visual cortex and behavioral RT of the animal (Cook and Maunsell, 2002; Womelsdorf et al., 2006b; Herrington and Assad, 2009; Galashan et al., 2013) provides strong support for relating the RT effects observed in our psychophysical experiments to modulations during early visual processing. The RT measurements and their strong dependence on the feature- and object-specific cueing condition suggest the co-existence of attention-dependent effects commonly attributed to different frameworks of attention. Our model provides a new conceptual framework into which existing theories of neuronal implementations of attention may be incorporated, as e.g., the feature-similarity gain model (Treue and Martínez Trujillo, 1999; Martínez Trujillo and Treue, 2004), or the incremental grouping hypothesis (Roelfsema et al., 2000; Roelfsema, 2006). By assuming a top-down gain adjustment, task- and object-specific binding dynamics, and a global feature-specific response modulation, the model not only explains our own experimental results within a single, coherent framework, but also allows for the unification of a vast amount of experimental data that were usually taken as support for either object- or feature-based attention. Future research for testing predictions of the model regarding the influence of task demands and object-specific binding dynamics on the proposed global nature of processing modulation will reveal benefits and limits of the model, and new insights in the complex interdependencies of various attention- and task dependent

# **AUTHOR CONTRIBUTIONS**

Detlef Wegener, Maike Kathrin Aurich, Fingal Orlando Galashan, and Andreas Kurt Kreiter designed research, Fingal Orlando Galashan contributed stimulation programs, Detlef Wegener and Maike Kathrin Aurich acquired data, Detlef Wegener analyzed data, and Detlef Wegener wrote the paper.

# **REFERENCES**

mechanisms.


attention? *FENS Forum Abstr.* 5, 139.27. http://fens2010.neurosciences.asso.fr/ abstracts/r5/a139\_27.html


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 March 2014; accepted: 23 May 2014; published online: 10 June 2014*.

*Citation: Wegener D, Galashan FO, Aurich MK and Kreiter AK (2014) Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation. Front. Hum. Neurosci. 8:414. doi: 10.3389/fnhum.2014.00414*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Wegener, Galashan, Aurich and Kreiter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Visual evoked potentials to change in coloration of a moving bar

# *Carolina Murd1,2,3 \*, Kairi Kreegipuu1, Nele Kuldkepp1,2 , Aire Raidvee1, Maria Tamm1, 2 and Jüri Allik1,4*

<sup>1</sup> Institute of Psychology, University of Tartu, Tartu, Estonia

<sup>2</sup> Doctoral School of Behavioural, Social and Health Sciences, University of Tartu, Tartu, Estonia

<sup>3</sup> Institute of Public Law, University of Tartu, Tallinn, Estonia

<sup>4</sup> Estonian Academy of Sciences, Estonia

#### *Edited by:*

José Antonio Díaz, Universidad de Granada, Spain

#### *Reviewed by:*

Dirk Kerzel, Université de Genève, Switzerland Michael Anthony Crognale, University of Nevada, Reno, USA

#### *\*Correspondence:*

Carolina Murd, Institute of Psychology, University of Tartu, Näituse 2, Tartu 50409, Estonia email: carolina.murd@ut.ee

In our previous study we found that it takes less time to detect coloration change in a moving object compared to coloration change in a stationary one (Kreegipuu et al., 2006). Here, we replicated the experiment, but in addition to reaction times (RTs) we measured visual evoked potentials (VEPs), to see whether this effect of motion is revealed at the cortical level of information processing. We asked our subjects to detect changes in coloration of stationary (0◦/s) and moving bars (4.4 and 17.6◦/s). Psychophysical results replicate the findings from the previous study showing decreased RTs to coloration changes with increase of velocity of the color changing stimulus. The effect of velocity on VEPs was opposite to the one found on RTs. Except for component N1, the amplitudes of VEPs elicited by the coloration change of faster moving objects were reduced than those elicited by the coloration change of slower moving or stationary objects. The only significant effect of velocity on latency of peaks was found for P2 in frontal region. The results are discussed in the light of change-to-change interval and the two methods reflecting different processing mechanisms.

**Keywords: motion, velocity, color change, reaction time, visual evoked potentials**

# **INTRODUCTION**

The perception of motion is one of evolutionary oldest abilities of the visual system. As it enables us to cope with a dynamic environment, it seems reasonable to assume that the presence of motion information is not easily ignored even when attending to another quality of an object, like its form or color.

Researchers have identified at least two distinct functional subsystems, one of which processes color (parvocellular pathway) and the other motion (magnocellular pathway). The subpopulations of these pathways are evident in retina, projecting through LGN to V1 (Hubel and Wiesel, 1972; Livingstone and Hubel, 1988). From V1 the information is transmitted through ventral and dorsal streams (Goodale and Milner, 1992). The dorsal stream (also referred to as "where"/"how" pathway) gets its input mostly from the magnocellular pathway and projects to posterior parietal lobe. The dorsal stream has been most commonly associated with awareness of object location and guidance of action. The ventral stream (the "what" pathway) gets both magno- and parvocellular input and projects to temporal lobe. This stream has been associated with attention, object recognition and identification. The dorsal stream has been considered to be relatively faster than the ventral stream (Norman, 2002), but it has also been suggested that these two streams are highly interactive (Dobkins and Albright, 1993; Cicerone et al., 1995). These two distinct subsystems are additional evidence of the evolutionary pressure for development of a system specialized for early detection of motion.

The aforementioned visual streams involve specialized areas in the cortex that are activated when processing color ("globs" in V4 and adjacent areas, see Conway et al., 2007) and motion (MT/V5, Zeki, 1974). V5 has been shown to react to luminance changes of an object, but it is not activated by isoluminant, heterochromatic stimuli (Conway et al., 2007). Differently from luminance contrast sensitivity, the magnocellular layers in LGN have not been demonstrated to be color selective. The processing of motion information has been believed to be rather unaffected by color (in some stages of the processing), however, it has been suggested that some magnocellular neurons respond to chromatic contrast, but without concrete information about its sign (Dobkins and Albright, 1993). The color processing mechanisms on different stages get their input from both magno- and parvocellular pathways (e.g., double-opponent cells and thin stripes in V2; Gegenfurtner, 2003; Shapley and Hawken, 2011). Taken together, it is clear that parvoand magnocellular subsystems interact with each other (Dobkins and Albright, 1993; Cicerone et al., 1995; for a review see Skottun, 2013), and therefore the characteristics of one quality can influence the perception of the other (Moller and Hurlbert, 1997; Kreegipuu et al., 2006; Werner, 2007).

It has been suggested by the different latencies theory that stimulus qualities (like color, luminance, shape, motion) have different processing latencies, and the processing latency for color precedes processing of motion by 70–80 ms (Moutoussis and Zeki, 1997). However, by now many studies have indicated that the visual delays for different visual attributes are neither fixed nor identical, but rather depend on different stimulus characteristics, as well as on the experimental set up (Allik and Kreegipuu, 1998; Gauch and Kerzel, 2008).

Kreegipuu et al. (2006) conducted a simple reaction time (RT) study where subjects were asked to detect the color or luminance

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 1 — #1

change of moving or stationary stimuli. The results showed shorter RTs to color or luminance change for faster moving stimuli compared to more slowly moving or stationary stimuli. However, this unexpected discovery that it takes more time to notice change in color of a stationary object rather than of the same object put in motion – was not generalizable to all types of motion. We observed shorter detection times only with a single moving object, not with moving gratings covering an extended portion of the visual field (Murd et al., 2009). It seems that an identifiable object traveling along a solitary trajectory is critical for improved ability to detect change in coloration.

There is an agreement between researchers that Reichardttype motion energy detectors are the main building blocks of many motion analysing mechanisms (Reichardt, 1961; Poggio and Reichardt, 1973; Van Santen and Sperling, 1985). However, beside motion energy, motion can be recovered based on some higher-order perceptual attributes. For example, according to one conceptualization it is possible to distinguish three motion detection systems at least: a first-order system that uses a primitive motion energy computation to extract motion from moving luminance modulations; a second-order system that uses motion energy to extract motion from moving texture-contrast modulations; and a third-order system that tracks features (Van Santen and Sperling, 1985). It seems that the observed pattern – the effect of velocity appearing only with single moving objects (Kreegipuu et al., 2006) but not with large moving gratings (Murd et al., 2009) – fits nicely to this theoretical scheme. The question remains whether this advantage of a single moving stimulus, when compared to a stationary coloration-changing stimulus, appears already on the cortical level of information processing. One approach to address this question is to measure the brain's electrical activity by electroencephalography (EEG) and compare the transient visual evoked potentials (VEPs) of the coloration change between different stimulus conditions (stationary, slow, and fast moving stimuli). This would enable us to see whether the stimulus condition effects the evoked potentials of coloration change causing amplitude and/or latency differences in some components, such as N1, P2, N2, and P3.

Based on the literature on event-related potentials (ERPs; Fonaryova Key et al., 2005; Luck, 2005; McKeefry, 2001), there are some results indicating we might find a difference in VEPs between color-change events in stationary versus moving stimuli. For example, McKeefry (2001) found that amplitudes of positive components P1 and P2 and negative component N2 for the motion onset of chromatic stimuli were reduced for slow moving stimuli than for fast moving stimuli. Since this tendency was not present when motion onset of luminance stimuli for two velocities was compared, it was concluded that this effect of velocity found for the onset of chromatic stimuli might indicate shifting between two separate mechanisms – parvocellular and magnocellular. According to this theory, parvocellular mechanism is active with slow moving chromatic stimuli and magnocellular mechanism with fast moving chromatic stimuli. Therefore, when comparing VEPs of color change in fast and slow moving stimuli, we might find reduced amplitudes in slower moving stimulus.

It has also been suggested that the visual N1 reflects the discrimination process within the focus of attention (Vogel and Luck, 2000). Some studies of selective attention and cueing have shown that N1 amplitude to attended (and validly cued) stimuli is larger (more negative; Luck et al., 1994). Beer and Röder (2004) have suggested that attention to motion enhances processing of visual stimuli, since N1 amplitudes for stimuli moving in the attended direction were more negative compared to stimuli moving in the unattended direction.

As the task in our previous study (Kreegipuu et al., 2006) required a quick response, it presumed directing attention to the stimulus. Since the characteristics of a moving stimulus enable both spatial and temporal predictions about the event, there might be somewhat different expectations about the coloration-change of a moving stimulus compared to the stationary stimulus. Taken into account the previous findings, there seems to be enough reason to consider that this advantage of a moving stimulus will be seen on the cortical level of information processing.

# **MATERIALS AND METHODS**

# **PARTICIPANTS**

Seven participants (six females and one male, aged 20–25) took part in this experiment. One of the subjects was well-trained; other six were naïve concerning the specific purposes of this study. Participants were informed about the general purpose of the experiment (comparison of the data gathered by using psychophysical and electrophysiological methods) and given an overiew of the equipment used in the experiment. Participants were also informed about their right to quit the experiment any time they wished, and gave their informed consent. All participants self-reported to have normal or corrected-to-normal vision and reported no deficits in color perception.

# **STIMULI**

A rectangular bar with luminosity profile corresponding to the positive half-cycle of a sine wave (1.2 × 2.3◦ at 90 cm viewing distance) was presented as a stimulus on the screen of a Mitshubishi Diamond Pro 2070SB monitor (frame rate 140 Hz; 752 × 564 pxl; 27.6 × 20.5◦ at 90 cm viewing distance). The bar was either red (CIE chromaticity coordinates: 0.636; 0.335) or green (CIE chromaticity coordinates: 0.289; 0.607) with luminance of 13.85 cd/m2, luminance was measured at the peak of the positive phase of the sinusoidal luminance profile. The neutral uniform background of the screen had a luminance of 0.3 cd/m2. A white fixation point (8 × 8 minof arc) was present on the screen for the entire trial. Stimulus was rendered with Cambridge *ViSaGe* visual stimulus generator (Cambridge Research Systems Ltd., Rochester, UK). As the red and green color were photometrically isoluminant and we did not measure subjective isoluminance (and the colors were not therefore corrected on these basis), we use the term "coloration change" – as an arrangement of color and tones – to be more precise as the color change might have been subjectively accompanied by small luminance artifacts.

# **PROCEDURE**

Each trial started with the appearance of a moving or stationary test stimulus. The moving stimulus appeared at the left or right

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 2 — #2

edge of the screen and started to move horizontally across the screen with a velocity of 4.4◦ or 17.6◦/s.

**Figure 1** demonstrates the experimental setup. In each trial, coloration change (from red to green or vice versa) took place in one of ten possible switch points in the middle third of the screen (equally spaced positions: 9.2◦; 10.22◦; 11.24◦; 12.26◦; 13.28◦; 14.3◦; 15.32◦; 16.34◦; 17.36◦; 18.38◦ from the starting edge). The stationary stimulus (from here on also referred to as velocity 0◦/s) appeared randomly in one of these ten positions and changed its coloration unpredictably in a time window of 476– 3547 ms after its appearance (which in average corresponds to the coloration change of a stimulus moving with velocity of 10◦/s). Time windows for the coloration change of moving stimuli were: 480–885 ms after its appearance for a faster moving stimulus and 1929–3547 ms for a slower moving stimulus.

Subjects were instructed to press a response button as quickly as possible after the detection of a change in coloration. RTs were saved for offline analyses. Each observer performed two blocks of 150 trials, in total 300 trials – 100 per velocity condition (0◦, 4.4, 17.6◦/s). The order of trials with different velocities was pseudorandomized within the experimental block and there was a pause of 3 s (inter-stimulus interval) before the beginning of each trial. When a response was not given, the missed trial was repeated on random position in the experimental block.

#### **ELECTROENCEPHALOGRAPHY**

The electroencephalogram (EEG) was registered with BioSemi's system Active One (*BioSemi*, Amsterdam, The Netherlands), and Vision Analyzer 1.05 (Brain Products, GmbH, Munich, Germany) was used for offline data analysis. 14 active electrodes (Fz, Fpz, F3, F4, P3, P4, C3, C4, Cz, Pz, T5, T6, O1, O2) were used according to the international 10/20 system electrode placement (Jasper, 1958), off-line referenced to ears. Additionally, the Common Mode Sense (CMS) active electrode was placed between Fz and Cz and the Driven Right Leg (DRL) passive electrode on the observer's neck. Vertical and horizontal eye movements were registered with two bipolar electrodes for both. The DC mode and sample rate of 1024 Hz was applied for online recording. Data

were offline filtered (0.3 Hz low cut-off and 35 Hz high cut-off filters, both 24 dB/oct) and epoched around the coloration change event (−100 to +500 ms). Ocular artefacts were removed with the built-in Gratton and Coles algorithm (Gratton et al., 1983) used by Vision Analyzer that corrects ocular artefacts by subtracting the voltages of the eye channels, multiplied by a channel-dependent correction factor, from respective EEG channels.

A 100 ms interval before the coloration-change was selected for baseline correction and segments were tested for several known artefacts (50 μV allowed voltage step per sampling point, maximal allowed difference within the segment 100 μV, maximal absolute amplitude ± 70 μV and lowest activity criterion of 0.5 μV per 100 ms). Segments were averaged for different velocities and observers. Automatic peak detection (separate search for every channel) for local maximum/minimum was used to find ERP component peaks for N1 (50–130 ms), P2 (130–170 ms), N2 (150– 270 ms) and P3 (230–500 ms). Time intervals for peak detection were set based on the grand average data and visually inspected to be suitablefor all subjects. Since the visual inspection did not reveal any overlapping contrapolar peaks, the electrodes were pooled as follows: frontal (Fz, Fpz, F3, F4), parietal (P3, P4, Pz), central (C3, C4, Cz), temporal (T5, T6), occipital (O1, O2).

Repeated measures analysis of variance (ANOVA; Statistica 10.0, StatSoft Inc., Tulsa, OK, USA) was used for analysis of both RTs and VEPs.

# **RESULTS**

#### **REACTION TIMES**

**Figure 2** shows the averaged RTs in each 10 possible colorationswitch points for three velocities of the moving bar: 0 (stationary), 4.4, and 17.6◦/s. RTs over 1000 ms and below 100 ms were excluded from the analysis. Over all subjects, there were 16 misses (RT > 1000 ms) and 146 anticipated responses (RT < 100 ms) out of 2100 responses.

Since there was no effect of direction (stimulus moving from right to left or vice versa) detected on the RTs [*F*(1,3) = 3.141, *p* < 0.1745] we omitted this parameter from the further analysis.

**FIGURE 2 | Mean RTs as a function of spatial position of the color change along the movement trajectory.** Vertical bars denote ± standard error.

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 3 — #3

**Figure 2** reveals two conspicuous properties. First, it seems to take less time to notice the coloration change which happens during the later portion of the movement trajectory [*F*(9,54) = 3.39, *p* < 0.002]. As can be seen from **Figure 2**, mean RTs were shorter for coloration changes occurring in the last positions (correlation between RT and switch-point *r* = −0.056 *p* < 0.01). Second, it took considerably less time to notice the coloration change of a fast moving (17.6/s) bar than the coloration change of the same bar moving slowly (4.4◦/s) or standing in the same position [*F*(2,12) = 71.52, *p* < 0.00001]. Thus, it seems to be confirmed that mean RTs to the coloration change of the faster moving stimulus were shorter than in case of the slower moving or stationary stimulus. There was also an interaction between velocity and switch-point position [*F*(18,108)=1.7, *p*<0.051] which indicates that the order of RTs at different positions is not identical.

#### **VISUAL EVOKED POTENTIALS**

**Figure 3** demonstrates the grand average potentials in parietal region where the components were most pronounced. The figure presents data pooled together over the data of seven participants for the three velocities. Like manual RTs, VEPs elicited by the coloration change of the fast moving stimulus (17.6◦/s) are different by both amplitude and delay compared to those elicited by the coloration change of the slow moving and stationary stimulus. Repeated measures ANOVA was conducted on mean peak amplitudes of pooled regions of interest (listed at the end of Method section). The significant effect of velocity on N1 amplitude was found in frontal [*F*(2,12) = 4.464, *p* < 0.036] and in central region [*F*(1,12) = 4.501, *p* < 0.035]. This effect demonstrates a difference between N1 amplitudes for the coloration change of slower and faster moving stimuli, showing larger amplitudes in case of faster moving stimuli. Although five out of seven participant also showed similar tendency in parietal region, the overall effect remained insignificant [*F*(2,12) = 2.382, *p* < 0.135]. Significant effect of velocity on P2 amplitude was found in frontal [*F*(2,12) = 8.41, *p* < 0.0053], central [*F*(2,12) = 12.92, *p* < 0.0011] and parietal region [*F*(2,12) = 19.775, *p* < 0.0002], showing less pronounced amplitudes for faster versus slowly moving stimuli.

Significant effect of velocity on N2 amplitude was found in frontal [*F*(2,12) = 8.41, *p* < 0.0052] and central region [*F*(2,12) = 12.92, *p* < 0.0011], showing larger N2 with slower moving stimuli. Significant effect of velocity was also found on P3 amplitude in central [*F*(2,12) = 5.068, *p* < 0.0254] and parietal region [*F*(2,12) =10.814, *p* <0.0021], showing stronger P3 amplitudes for the coloration-change of slower moving and stationary stimuli.

The only significant effect of velocity on latency of peaks was found for P2 in frontal region [*F*(2,12) = 6.359, *p* < 0.014], so that the peak was earliest for the coloration change of the stationary stimulus.

Surprisingly, as is shown in **Figure 3** and by the statistics presented, the amplitudes of P2, N2, and P3 components were reduced for the coloration change of the faster moving stimulus. In frontal and central regions, we did find the amplitude of component N1 to be significantly larger (i.e. more negative) for the coloration change of the faster moving stimulus, but the N1

amplitudes for slower moving and stationary stimulus did not differ significantly.

However, the amplitudes of P2 and P3 seem to be lined up according to the average of the time windows of coloration change – as we described in the Method section, the stationary stimulus changed its coloration 476–3547 ms (corresponding in average to coloration change of a bar moving with velocity of 10◦/s), the faster moving stimulus 480–885 ms and the slower moving stimulus 1929–3547 ms after the beginning of the trial.

We also analyzed the VEPs by the switch-points of coloration change (see **Figure 4**), and noticed that with faster moving stimulus the amplitude of P3 increased with later switch-points, but this trend was not present with slower moving stimuli. In **Figure 5**, P3 amplitude by the merged coloration-change switch-points (two earliest versus two latest on the motion trajectory) are presented.

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 4 — #4

#### **CHANGE-TO-CHANGE INTERVAL ANALYSIS**

There are some previous studies (Gonsalvez et al., 2007; Gonsalvez and Polich, 2002) that have found previous-target-to-next-target interval (TTI) to have an effect on P3 amplitude: the amplitude is larger when the TTI is longer. In our experiment, conditions were presented in random order (not in blocks of velocity) and the time between coloration change in one trial and the next trial varied. Therefore, it was interesting to test whether or not our results of P3 amplitude in parietal electrodes (where P3 was most pronounced) demonstrate TTI – in our case coloration-change-to-colorationchange – effect. This interval is a sum of (a) the time from one coloration change until the end of the present trial, (b) the time between trials (which was 3 s in our experiment) and (c) the time from the beginning of the next trial until the coloration change of this trial. For analysis we divided change-to-change intervals into two: change-to-change intervals longer than the median and change-to-change intervals shorter than the median. The individual medians of change-to-change interval varied between 6.7 and 7.1 seconds (as a result of the randomly varied time window of the coloration change of the stationary stimulus). The comparison was made between these two groups for P3 amplitude in pooled parietal region. The results were as follows: dependent samples *t*-test *t* = 3.63 (df = 6; *p* = 0.011), Cohen's *d* = 1.37, showing that longer than median change-to-change interval trials had considerably larger P3 amplitude compared to shorter than median change-to-change interval trials (see **Figure 6**). It looks like the next VEP elicited by the change of coloration was of higher amplitude when more time had passed from the coloration-change in the previous trial. These results confirm Gonsalvez and Polich (2002) observation that TTI is a critical variable in P3 response.

Mean RTs, divided into two groups by the same principle as for VEPs, did not show statistically significant effect of TTI: dependent samples *t*-test *t* = 2.405 (df = 6; *p* = 0.053).

RT and TTI were correlated by velocity condition (0◦/s, 4.4◦/s, 17.6◦/s), the correlations were insignificant for the stationary stimulus (0◦/s) *r* = −0.04, *p* = 0.344 and faster moving stimulus

(17.6◦/s) *r* = −0.075, *p* = 0.061, but significant for slower moving stimulus (4.4◦/s) *r* = −0.13, *p* = 0.001. Again, the response was attenuated for a faster moving stimulus.

When analysing only the trials with change-to-change interval covered by all velocities – interval from 5488 to 7617 ms –, the effect of velocity on mean RTs was still significant [*F*(2,12)=58.68, *p* < 0.00001], which means that the main effect of velocity on RTs is independent of change-to-change interval.

#### **DISCUSSION**

The behavioral results of our experiment were in a good agreement with our previous study (see Figure 2 in Kreegipuu et al., 2006) showing that the faster the speed of the moving stimulus is, the shorter is the time that is required to detect an instant change in its coloration. For some reason, it takes less time to notice the change in coloration of a relatively fast moving object than the coloration change that happens to the same object if it moves more slowly or stays at the same place. Like RTs, VEPs elicited by colorationchange seem to be able to distinguish between objects that remain stationary or move with different velocities. However, on average evoked potentials to coloration-change of the fast moving object were smaller and their maximal amplitude was reached with a longer delay when compared to evoked potentials to colorationchange of slow moving or stationary objects. Thus, RTs and VEP amplitudes were negatively correlated. For example, VEPs elicited by the coloration-change of the fast moving (17.6◦/s) bar had smaller amplitude of P2 and N2 peaks and longer latency of the P2 peak than the peaks elicited by the coloration-change of slowly moving (4.4◦/s) or stationary (0◦/s) bars.

There are many studies showing reasonable agreement between psychophysical and electrophysiological results (Wolf et al., 1988; Donchin and Lindsley, 1966; Kreegipuu and Allik, 2007). For example, there was a considerable homology between the temporal structure of RTs and VEP intervals when the task was to detect onset or offset of motion (Kreegipuu and Allik, 2007). Both manual reactions and VEPs increase in latency as the velocity of

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 5 — #5

the onset or offset motion decreases and are well approximated by the same negative power function with the exponent close to −2/3 (Dzhafarov et al., 1993; Kreegipuu and Allik, 2007). It is important to remember that in our current study velocity was not a critical attribute to attend. Participants were instructed to ignore motion and react, as fast as possible, to the first noticeable change in coloration of a uniformly moving or stationary bar. In principle, it was expected that the velocity of the test object has only minor effect on the ability to notice a sudden change in coloration. Nevertheless, we observed that the velocity of the test object exerted a considerable effect on both, RTs and VEPs. According to manual RTs, it took less time to notice the coloration-change of a fast moving object but according to VEPs, this change elicited smaller deflections from the base level which were also delayed in time.

One mechanism that could cause the reduction of VEP amplitude at relatively high velocities is lateral or temporal masking (Sperling, 1965). When an object moves rapidly, a place where coloration-change happened will be flanked by a nearby place to which the moving object has reached a few moments later. The VEP signal generated by the stimulus activity in this new place may interfere with the signal elicited by the stimulus in the previous position. Since these two similar signals are out of phase, their summary activity is expected to be reduced in amplitude compared to their amplitudes in isolation. Unfortunately, our data are fragmented to tell exactly from which velocity this potential mechanism could become efficient. At the current moment we can only guess that this critical velocity must be somewhere between 5 and 17◦/s.

Whatever the cause of the VEP amplitude suppression at higher velocities is, the discrepancy between manual RT and evoked potentials is puzzling. There is nothing new in the finding that RT data sometimes disagree with VEP results. Although many studies have shown good agreement between evoked potentials and psychophysical data, there are quite a few studies showing discrepancy between these two measures (Crognale et al., 1997; McKerral et al., 2001; Chakor et al., 2005). Some of these disagreements could be caused by the magno- and parvocellular pathways' specialized input to ventral and dorsal streams. The fact that the dorsal stream – that is presumably specialized for action – receives mostly magnocellular input.

One of the reviewers guided our attention to the circumstance that as subjective isoluminance of colors may not be in accordance with photometric isoluminance and may vary depending on the retinal eccentricity. It is possible that the chromatic change was accompanied by small luminance artifacts (as mentioned in the Method section). We have also shown in our previous study (Kreegipuu et al., 2006) that identical effect of velocity on RTs we have repeatedly found for color changes was also found for luminance changes. However, in this achromatic change condition the luminance changed from 5.09 to 20.2 cd/m<sup>2</sup> (or vice versa). This is considerable luminance change and it is unlikely that the possible luminance artifacts accompanying chromatic change would solely be responsible for identical results. It has also been shown that even in presence of low values of luminance contrast, the chromatic information is highly relevant for detecting a stimulus (O'Donell et al., 2010).

Several studies have demonstrated that the color aberration and isoluminance value related to retinal eccentricity vary depending on the target extent and spatial frequency (Bilodeau and Faubert, 1997; Barboni et al., 2013). However, Bilodeau and Faubert (1997) have shown that while they manipulated with spatial frequency and size of the target, the isoluminance values within central 20 degrees did not change. Psychophysical data [which has been considered to be more sensitive to luminance changes than electrophysiological measurements (e.g. Rabin et al., 1994)] from our previous study (Murd et al., 2009) indicates that the chromatic aberration and/or luminance modulations related to retinal eccentricity do not explain the effect of velocity found on RTs when changes in coloration were detected. We found no difference in the effect of velocity on response times whether subjects were asked to keep central fixation or to follow the stimulus with a gaze (i.e. the location of the target on the retina did not change). Both conditions showed a similar significant effect of velocity on response times and this effect was present for all subjects (Murd et al., 2009).

It has been suggested that some magnocellular neurons signal temporal alternation between light of equal luminance, without signaling the sign of the chromatic contrast (Dobkins andAlbright, 1993; Baker et al., 1998). In our display, motion was both chromatically and achromatically (as there was luminance difference between background and the stimulus) defined, and as the colors (red and green) were not presented simultaneously, it is hard to tell whether the transient color change could have been mediated by this unsigned chromatic contrast detecting mechanism or not. But if considering it as a possibility and taking into account the finding that the sensitivity of VEPs to parvo- and magnocellular input are different (Tobimatsu et al., 1995; Foxe et al., 2008), – so that VEPs are more pronounced for parvocellular input and might not always adequately reflect magnocellular inputs (see Foxe et al., 2008) – this would explain why simple RTs to the color change are more influenced by object's velocity than VEPs.

Also, Di Russo and Spinelli (1999) showed in their study on the effect of spatial attention in chromatic and luminance stimuli, that VEPs did not reveal any latency differences between attended and unattended conditions when chromatic stimuli were used. They suggested that spatial attention is mainly controlled by visual areas considered to be part of the dorsal stream. Therefore, in the light of the abovementioned studies, the discrepancy between RT and VEP results might be explained by findings that these two measures reflect information processing in different streams (for similar results see also Highsmith and Crognale, 2010).

However, there is a considerable amount of critique regarding the extent of the independence of dorsal (action) and ventral (perception) systems and whether the specialization is relative rather than absolute (see the discussion paper by Schenk and McIntosh, 2010; also Himmelback et al., 2012). Sperandio and colleagues (Sperandio et al., 2010) demonstrated in visual illusion experiments that simple RTs – differently from other types of motor behavior (grasping) – are affected by the illusion, although it has been presumed that the dorsal stream is not sensitive to illusions. Their results showed that RT varied as a function of perceived (rather than physical) stimulus properties. Therefore, simple RT is likely to be an outcome of interconnection with the ventral stream.

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 6 — #6

In general, this may mean that recorded VEP signatures are reflecting some neurophysiological mechanisms that are not identical to mechanisms which form the basis for manual RTs. Thus, manual reaction is elicited in this particular case by an internal representation which is not explicitly manifested in the recorded VEP signatures.

It is very unlikely that change-to-change interval has anything to do with the suppression of the VEP amplitude at higher velocities. However, the influence of target-to-target interval on the amplitude of P3 has been demonstrated in some previous studies with both auditory and visual stimuli (Gonsalvez et al., 1999, 2007; Gonsalvez and Polich, 2002). Gonsalvez and Polich (2002) tested TTIs up to 16 seconds and found that when the TTI was relatively long, the P3 amplitudes remained constant, indicating that the increase of P3 amplitude with shorter TTIs might be explained by resource limitation or limitations on memoryupdating operations. Since we conducted a simple single-task experiment (requiring no comparisons between targets and nontargets), the more probable explanation is that our results refer to the capacity of the visual system to "recover" from one event and to be ready for processing the next one. Therefore, it seems that for simple tasks that require a quick response, it is not crucial to have the total amount of resources available for the cortical processing.

To conclude, our results fall in line with the view that although human visual system may have functionally distinct information processing streams that receive their input from brain areas and pathways specialized on different stimulus characteristics, they are highly interactive in several levels. The question of where the results of psychophysical and EEG measurements meet and to what extent can they explain each other still needs some further investigation.

#### **AUTHOR CONTRIBUTIONS**

Carolina Murd, Kairi Kreegipuu, and Jüri Allik formulated the research question. Aire Raidvee programmed the experimental setup. Carolina Murd, Nele Kuldkepp, and Maria Tamm collected the data. Carolina Murd and Kairi Kreegipuu analyzed the data. Carolina Murd drafted the manuscript and in cooperation with Kairi Kreegipuu, Jüri Allik, Nele Kuldkepp, Aire Raidvee, and Maria Tamm revised it to its final form.

#### **ACKNOWLEDGMENTS**

This research was supported by the Estonian Science Foundation (grant#8332), the Estonian Ministry of Education, and Research (Institutional Research Grant IUT02-13 and SF0180029s08).

#### **REFERENCES**


"fnhum-08-00019" — 2014/1/23 — 9:56 — page 7 — #7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 November 2013; accepted: 09 January 2014; published online: 24 January 2014.*

*Citation: Murd C, Kreegipuu K, Kuldkepp N, Raidvee A, Tamm M and Allik J (2014) Visual evoked potentials to change in coloration of a moving bar. Front. Hum. Neurosci. 8:19. doi: 10.3389/fnhum.2014.00019*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Murd, Kreegipuu, Kuldkepp, Raidvee, Tamm and Allik J. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-08-00019" — 2014/1/23 — 9:56 — page 8 — #8

# Overestimation of the second time interval replaces time-shrinking when the difference between two adjacent time intervals increases

# *Yoshitaka Nakajima1\*, Emi Hasuo2 , MikiYamashita3 andYuki Haraguchi <sup>4</sup>*

<sup>1</sup> Department of Human Science, Research Center for Applied Perceptual Science, Kyushu University, Fukuoka, Japan

<sup>2</sup> Japan Society for the Promotion of Science/Neurological Institute, Kyushu University, Fukuoka, Japan

<sup>3</sup> Kyushu Institute of Design, Fukuoka, Japan

<sup>4</sup> Department of Acoustic Design, Kyushu University, Fukuoka, Japan

#### *Edited by:*

Willy Wong, University of Toronto, Canada

#### *Reviewed by:*

Willy Wong, University of Toronto, Canada Simon Grondin, Université Laval, Canada

#### *\*Correspondence:*

Yoshitaka Nakajima, Department of Human Science, Research Center for Applied Perceptual Science, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan e-mail: nakajima@ design.kyushu-u.ac.jp

When the onsets of three successive sound bursts mark two adjacent time intervals, the second time interval can be underestimated when it is physically longer than the first time interval by up to 100 ms. This illusion, time-shrinking, is very stable when the first time interval is 200 ms or shorter (Nakajima et al., 2004, Perception, 33). Time-shrinking had been considered a kind of perceptual assimilation to make the first and the second time interval more similar to each other. Here we investigated whether the underestimation of the second time interval was replaced by an overestimation if the physical difference between the neighboring time intervals was too large for the assimilation to take place; this was a typical situation in which a perceptual contrast could be expected.Three experiments to measure the overestimation/underestimation of the second time interval by the method of adjustment were conducted. The first time interval was varied from 40 to 280 ms, and such overestimations indeed took place when the first time interval was 80–280 ms. The overestimations were robust when the second time interval was longer than the first time interval by 240 ms or more, and the magnitude of the overestimation was larger than 100 ms in some conditions. Thus, a perceptual contrast to replace time-shrinking was established. An additional experiment indicated that this contrast did not affect the perception of the first time interval substantially: The contrast in the present conditions seemed unilateral.

**Keywords: time perception, assimilation, contrast, audition, time-shrinking, empty interval**

# **INTRODUCTION**

When the onsets of three successive sound bursts mark two neighboring time intervals, the second time interval can be underestimated when it is longer than the first time interval by up to 100 ms. This underestimation, i.e., time-shrinking, is very stable when the first time interval is 200 ms or shorter (Nakajima et al., 1991, 2004), and has been considered a kind of perceptual assimilation. Assimilation and contrast in perceptual paradigms often replace each other when the relationship and configuration of stimuli are changed systematically (e.g., Helson, 1963; Morinaga and Noguchi, 1966).

Assimilation and contrast may not necessarily be governed by a single perceptual mechanism, but they are likely to work under one perceptual principle for humans and animals to process information from the environment efficiently and quickly. For example, a figure in which luminance is sufficiently higher than in the background can be distinguished clearly from the background in the visual modality. This process is enhanced by contrast, which enlarges the perceptual difference in terms of lightness or color between the figure and the background, as well as by assimilation, which homogenizes the lightness or color within the figure and within the background (Koffka, 1935; Shapley and Reid, 1985). It is also argued that, when two potential objects are separated enough spatially from each other (but within a distance

to keep a mutual interaction), they are likely to be organized as two separate wholes which are then contrasted (King, 1988). It is widely observed that perceptual assimilation between objects gives way to contrast when the difference between these objects is increased, and that assimilation can be blocked if the area or the group to be assimilated is broken by a boundary (or boundaries; e.g., Koffka, 1935; Hamburger, 2005), or by a temporal distance (Ikeda and Obonai, 1955). In Ikeda and Obonai's (1955) experiment, concentric circles with different diameters *I* and *T* were presented simultaneously for 500 ms using a tachistoscope. The diameter of *T*, whose size was to be judged, was fixed at 30 mm. When the physical size of *I* was similar to that of *T*, assimilation took place, but contrast took over when the physical size difference was larger (**Table 1**). The fact that assimilation and contrast can both take place in the same experimental context is described systematically by Helson (1964). One should note that temporal configurations of stimuli can also lead to an assimilation or contrast of the stimuli (Shigeno, 1991; see also McKenna, 1984). In our study, assimilation and contrast were manipulated through modifying the temporal configuration of the sound bursts.

When the difference between close but distinguishable objects or events is small, the objects will be seen as part of a homogeneous group. If the difference cannot be neglected, the objects or events

**Table 1 | Underestimation and overestimation of the size of a circle,** *T* **= 30 mm, caused by another concentric circle,** *I***, as observed by Ikeda and Obonai (1955).**


The values are in millimeters. A, assimilation; C, contrast.

will instead be perceived in different categories. This is the case particularly for the human auditory modality, which is responsible for quick and complicated communication sometimes in noisy environments without favorable acoustics.

Linguistic communication depends on the human capacity to process strings of categorized elements in time. This requires that any pair of sounds or sound patterns should be clearly either the same or different (de Saussure, 1966); assimilation and contrast must work for the listener to decode speech signals properly (e.g., Shigeno, 1991). Temporal aspects of auditory perception are also very likely to work in the same manner. Relative lengths of syllables are categorized in many languages; it is often important for the listener to judge, without hesitation, whether or not one of two neighboring syllables is longer or shorter than the other. When time intervals are presented in concatenation, listeners often simplify the patterns reducing small differences, and exaggerating larger differences (e.g., Fraisse, 1978, 1982; Povel, 1981). A ratio 1:2 or 2:1 seems stable perceptually, which means that the second time interval is likely to be overestimated if the neighboring time intervals are to be perceived as in a ratio 1:1.7 or 1:1.8 otherwise. We were interested in whether the extremely stable illusion of time-shrinking, a unilateral assimilation of a time interval to a preceding time interval or preceding time intervals, could be grasped in relation to such opposite perceptual processes. We thus examined whether a time interval was contrasted, instead of assimilated, to a preceding time interval at a certain point when the difference between these adjacent time intervals was increased step by step. When two adjacent empty time intervals *t*<sup>P</sup> and *t*<sup>S</sup> were presented in this order in our previous research, the same *t*<sup>P</sup> may have caused both underestimation and overestimation of *t*<sup>S</sup> depending on the physical difference between *t*<sup>P</sup> and *t*S. Nakajima et al.'s (2004) experiments suggested that this possibility is systematic. **Table 2** indicates the cases in which both underestimation and overestimation reached 20 ms for a fixed *t*<sup>P</sup> value.

The present paradigm thus became clear. Time-shrinking typically takes place when two time intervals, *t*<sup>P</sup> and *t*<sup>S</sup> in this order, marked by the onsets of three successive sound bursts meet the following conditions: 0 < *t*<sup>S</sup> − *t*<sup>P</sup> ≤ 80 ms, and *t*<sup>P</sup> ≤ 200 ms. It had been indicated already that overestimation of *t*<sup>S</sup> to exaggerate the difference between *t*<sup>P</sup> and *t*<sup>S</sup> could take place when the physical difference between the neighboring time intervals, *t*<sup>S</sup> − *t*P, exceeded the above range (Nakajima et al., 2004). This problem had never been taken up systematically. In order to reveal the mechanism

**Table 2 |Temporal patterns in which time shrinking was replaced by overestimation in Nakajima et al. (2004).**


Both the underestimation of a standard time interval, i.e., time shrinking, and the overestimation of a longer standard appeared for the same preceding time interval in the stimulus conditions indicated in each line. The physical durations of two adjacent time intervals t<sup>P</sup> and t<sup>S</sup> are indicated as |tP|tS| in milliseconds. The conditions in which the underestimation/overestimation was equal to or above 20 ms were taken up to specify these temporal patterns. UE, underestimation; OE, overestimation.

of rhythmic organization, however, it seemed of crucial importance to examine whether a systematic overestimation of *t*<sup>S</sup> would replace the underestimation, which we call time-shrinking, if we increased the difference *t*<sup>S</sup> − *t*P.

#### **GENERAL METHODS**

The general framework common to the present experiments is described in **Figure 1**. In the first three experiments, we basically followed the paradigm employed in previous studies on timeshrinking (e.g., Nakajima et al., 2004), except that we increased the range of the standard duration to be judged. In the control condition, a time interval, *t*S, marked by the onsets of two successive tone bursts was the standard to be judged. An additional tone burst preceded *t*<sup>S</sup> in the experimental condition; the effect of the preceding time interval, *t*P, marked by the onsets of this additional tone burst and the first marker of *t*<sup>S</sup> was studied. The difference in subjective duration of *t*<sup>S</sup> between the control and the experimental condition was measured.

In the last experiment, Experiment 4, a tone burst did not precede but succeeded *t*S, and the effect of the succeeding time interval, *t*SUC, marked by the onsets of the second marker of *t*<sup>S</sup> and this additional tone burst was examined in order to interpret the results of the first three experiments. This was the experimental condition, and no control condition was employed because the data of the control condition in Experiment 3 could be reused.

The method of adjustment was employed. The participant initiated each presentation by clicking a pane on the computer screen. A few seconds – the interval was chosen randomly within a range – after the clicking, the first tone burst of the standard pattern *t*S, *t*P|*t*S, or *t*S|*t*SUC was presented. After that, there was a period of a few seconds – the interval was again chosen randomly, and then, another time interval, the comparison, *t*C, was presented with the onsets of two successive tone bursts. The task of the participant was to adjust *t*<sup>C</sup> to make it equal to *t*<sup>S</sup> in subjective duration. The participant could change *t*<sup>C</sup> by operating a screen interface, designed in a way not to give a visual hint about the present duration, and the minimum step of the adjustment was 1 ms. The participant was allowed to listen to the whole sequence as many times as he/she needed until *t*<sup>S</sup> and *t*<sup>C</sup> were perceived as equal, and finished the trial when satisfied. The last *t*<sup>C</sup> value was recorded as the point of subjective equality, PSE.

#### **EXPERIMENT 1**

This experiment was conducted in 1996. Because we did not have an institutional ethical committeefor psychological experiments at that time, an internal ethical review was impossible, but the experiment was a part of a research project reviewed by a governmental committee to select projects to be funded (as in the acknowledgments). This experiment is included in the present report because this was the first case in which the perceptual phenomenon we are going to describe appeared systematically. Our original

purpose had been to determine the stimulus conditions to investigate the effect of sound marker duration on the occurrence of time-shrinking (underestimation), for there was a possibility that the amount of time-shrinking may be reduced, or the time condition for maximum time-shrinking could be shifted, by lengthening the markers (see Hasuo et al., 2011). From the present viewpoint, however, the experimental data gave us insight into the possibility of systematic overestimation of the second of two adjacent time intervals. The same *t*<sup>S</sup> values were employed with a *t*<sup>P</sup> in the experimental condition and in isolation in the control condition. The PSEs in these conditions were compared to see the amount of perceptual overestimation or underestimation of *t*<sup>S</sup> caused by *t*P.

# **METHODS**

#### *Participants*

The participants were five students, i.e., three males and two females, of the Kyushu Institute of Design (the predecessor of the Faculty of Design, Kyushu University). They had received education for acoustic design, including basic training in music performance. They were 20–24 years old, and had normal hearing.

#### *Materials*

Duration markers were pure tone bursts of 1000 Hz and 12, 63, or 123 ms with a rise and a fall time of ∼2 ms each. These values were inexact due to our use of an analog filter to shape the

waveform; the inexactness was sufficiently small relative to the effect we were measuring. The tone bursts of different durations were approximately equal in loudness when presented separately. This was realized by conducting preliminary measurements in which the participant could listen to any of the three sounds by clicking corresponding buttons on the computer screen. The stimulus sound was presented always 200 ms after the button was clicked. The level of the 12-ms burst, which was very short, was fixed at 97 dBA as defined as the level of a continuous tone of the same amplitude measured with an artificial ear (Brüel and Kjær 4153), a microphone (Brüel and Kjær 4134), and a sound level meter (Brüel and Kjær 2209). The levels of the other sounds were adjustable, and the participant was instructed to equalize the three sounds in terms of loudness. In each trial, the adjusted levels of the 63 and 123 ms bursts were recorded. The participant performed eight trials, and the median value for each sound was employed as the presentation level in the main part of the experiment. The presentation levels were 87– 94 dBA for the 63-ms burst, and 85–93 dBA for the 123-ms burst.

The pure tones were first generated as rectangular pulse series before being band-pass filtered between 850 and 1250 Hz (NF DV-6BW). This resulted in tone bursts with rise and fall times of ∼2 ms. The tone bursts were presented to the left ear of the participant through an amplifier (JVC AX-Z511) and headphones (AKG K141) in a soundproof room. The experimental procedure including stimulus generation was controlled by a quiet computer without a hard disk drive or a fan (Commodore Amiga 500).

In the main part of the experiment, the marker duration was fixed in each standard pattern, which was marked by two or three successive tone bursts, and the comparison time interval was always marked by two 12-ms tone bursts. In the standard patterns of the experimental condition, *t*P|*t*S, the preceding time interval, *t*P, was fixed at 160 ms. Both in the control and in the experimental condition, the standard time interval, *t*S, was varied from 120 to 440 ms in steps of 40 ms. The *t*<sup>S</sup> duration of 120 ms was not possible when the marker duration was longer, i.e., 123 ms; this condition was omitted. Thus, there were 58 stimulus patterns: [2 (control/experimental) × 2 (marker durations ≤ 63 ms) × 10 (*t*<sup>S</sup> durations) + 1 (marker duration = 123 ms) × 9 (*t*<sup>S</sup> durations)]. The standard pattern was presented 2300–2500 ms after the participant clicked a button on the screen. There was a silence of 2700–3300 ms between the offset of the last sound marker of *t*S, and the onset of the first sound marker of *t*C.

#### *Procedure*

The participant performed four adjustment trials, two in ascending series and two in descending series, for each stimulus pattern: two replications for both series were performed. One replication comprised the first half, and the other the second half of the whole measurement. Each replication (= half) consisted of 116 trials, 58 (stimulus patterns) × 2 (series) in random order, and was divided into 9 blocks of 12 or 13 measurement trials, which were preceded by two warm-up trials. Preceding the measurement, the participant performed 58 training trials, divided into four blocks; each stimulus pattern appeared once. Thus, the whole experiment consisted of 22 blocks: 4 (training blocks) + 2 (replications) x 9

(measurement blocks). Each block took around 15–20 min, and the whole experiment was carried out over a period of 8 days for each participant.

#### **RESULTS AND DISCUSSION**

We performed a three-way [marker duration × condition (experimental/control) × *t*<sup>S</sup> duration] ANOVA utilizing the PSEs for *t*<sup>S</sup> = 160–480 ms. Since it is commonplace that PSEs change as *t*<sup>S</sup> changes, we will not detail the main effect of this factor neither here nor in the following experiments; its main effect was always significant (*p* < 0.001). The main effect of marker duration was significant, *F*(2,8) = 21.902, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.846. Ryan's *post hoc* test showed that the difference between all combinations of marker duration, i.e., 12 and 123; 63 and 123; and 12 and 63 ms; was significant (*p* < 0.05). The interaction between condition (experimental/control) and *t*<sup>S</sup> duration was also significant, *F*(8,32) = 4.614, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.536. This interaction should be related to the assimilation and contrast of *t*<sup>S</sup> to *t*P. The main effect of condition (experimental/control) and the other interactions were not significant (*p* > 0.05).

The PSEs in the control condition were very close to the physical values of *t*<sup>S</sup> (**Figure 2**). Slight deviations appeared systematically, however: PSEs of shorter duration tended to be longer than the physical values of *t*S. This kind of time errors sometimes appear in the literature of time perception (Woodrow, 1951; Eisler et al., 2008). The PSEs tended to be slightly longer when the marker duration was longer, but the present data do not offer much information on this issue. This issue should be investigated intensively in the future in order to understand rhythm perception in speech or music. Hasuo et al. (2011, 2012) reported that inter-onset time intervals up to 360 ms tended to be perceived as longer when the duration of the sound markers to terminate the time intervals were longer. This was the case whether the time interval to be judged was isolated or neighboring another time interval. The duration of the sound markers to initiate the time intervals showed similar effects, but in a more unstable manner.

The PSEs in the control and in the experimental condition differed systematically. The experimental PSEs were smaller than the corresponding control PSEs when *t*<sup>S</sup> = 200 or 240 ms, i.e., when *t*<sup>S</sup> − *t*<sup>P</sup> = 40 or 80 ms: *t*<sup>S</sup> was underestimated showing timeshrinking in a typical manner. However, the difference between the control and the experimental condition was reversed when *t*<sup>S</sup> was longer: the experimental PSEs were systematically greater than the control PSEs when *t*<sup>S</sup> ≥ 320 ms. Thus, time-shrinking as assimilation of *t*<sup>S</sup> to *t*<sup>P</sup> appeared when the difference between these neighboring time intervals was small, and gave way to contrast of *t*<sup>S</sup> to *t*<sup>P</sup> when the difference was large.

The above tendency appeared in similar ways in all the marker conditions between the control and the experimental PSEs despite the fact that the control PSEs increased slightly, but clearly, if the sound marker duration was increased. The contrast appeared as overestimation of *t*<sup>S</sup> in the experimental condition against the control condition. The PSEs were already lengthened in the control condition if the sound markers were longer, and they became even longer – were overestimated further – in the experimental condition. Furthermore, the amount of overestimation was larger when the duration markers were longer. This is in contrast with

the fact that the magnitude of time-shrinking – underestimation – is often smaller when longer markers are used (Yamashita and Nakajima, 1999; Hasuo et al., 2011), as was the case also in the present experiment.

duration of tS. The results for marker durations 63 and 123 ms were raised by

The overestimation, as represented by the difference in the PSEs between the control and the experimental condition, seemed to have a local peak when *t*<sup>S</sup> = 320 ms for all the marker durations. This tendency was peculiar and robust, but we leave this issue for future research.

To test whether the common tendency in overestimation pattern (i.e., the difference between the control and the experimental PSEs over the *t*<sup>S</sup> duration range) across different marker durations was statistically significant, we conducted a Friedman test (e.g., Siegel and Castellan, 1988) utilizing the mean overestimation values for each marker duration. There was a statistically significant tendency in overestimation, χ2(8) = 23.644, *p* = 0.003. To examine whether the overestimation patterns had a common tendency even when the influence of time-shrinking (the negative overestimation at *t*<sup>S</sup> − *t*<sup>P</sup> = 40 or 80 ms ) was cancelled, we also performed the same Friedman test without the conditions in which *t*<sup>S</sup> − *t*<sup>P</sup> = 40 or 80 ms. The tendency in overestimation pattern was significant again, χ2(6) = 17.714, *p* = 0.007. The statistical significance in this additional Friedman test confirmed that the overestimation patterns had a common tendency even without the influence of time-shrinking.

#### **EXPERIMENT 2**

Experiments 2–4 were part of a research project approved by the research ethics committee of the Faculty of Design, Kyushu University, in 2010. Experiment 1 and our previous data on time-shrinking (e.g., Nakajima et al., 2004) revealed that the

represent standard deviations between participants.

underestimation of a time interval that appeared as assimilation of *t*<sup>S</sup> to *t*<sup>P</sup> often gave way to contrast when *t*<sup>S</sup> − *t*<sup>P</sup> > 120 ms. Because we did not have systematic data indicating this effect except in Experiment 1, we decided to conduct an experiment in which *t*<sup>S</sup> was varied in a larger range (up to 640 ms). For *t*P, we chose three values: 80, 120, and 160 ms. Time-shrinking appears most stably in this range of *t*<sup>P</sup> (Nakajima et al., 2004; Miyauchi and Nakajima, 2005), and we first needed experimental data under such conditions. One of the things we were interested in was whether any overestimation would appear for *t*<sup>P</sup> = 120 ms; there had been occasional cases in previous data in which *t*<sup>S</sup> had been overestimated for *t*<sup>P</sup> = 80 or 160 ms, but no such cases ever for *t*<sup>P</sup> = 120 ms. Most importantly, we wanted to see whether the typical time-shrinking, which was expected reliably if *t*<sup>S</sup> − *t*<sup>P</sup> = 40 or 80 ms, would give way to contrast, i.e., overestimation of *t*S.

#### **METHODS**

#### *Participants*

Five students of Kyushu University, three males and two females, participated. One of them had been educated to become a highschool music teacher, and three of them had received education for acoustic design, including basic training in music performance. The fifth one was an amateur musician who had been playing percussions for 8 years. They were 21–46 years old.

#### *Materials*

Duration markers were pure tone bursts of 1000 Hz and 10 ms with cosine-shaped rise and fall times of 5 ms each, with no steady-state part. Their level was 80 dBA as defined as the level of a continuous tone of the same amplitude measured with an artificial ear (Brüel and Kjær 4153), and a sound level meter (Node 2072 or 2075). The tone bursts were presented diotically to the participant through an amplifier (Stax SRM-323A) and headphones (Stax SR-303) in a soundproof room. The experimental procedure including stimulus generation was controlled by a computer (Frontier KZFM71/N) with an audio processor (Onkyo Wavio SE-U55GX). Stimulus patterns were generated digitally (16 bits; a sampling frequency of 44 100 Hz), and went through a 16-kHz low-pass filter (NF DV-8FL) to avoid aliasing.

In the standard patterns of the experimental condition, *t*P|*t*S, the preceding time interval, *t*P, was 80, 120, or 160 ms, for which time-shrinking had occurred typically in previous studies (e.g., Nakajima et al., 2004). Overestimation of *t*<sup>S</sup> had been recorded for *t*<sup>P</sup> = 80 and 160 ms, but only in a few stimulus patterns for each *t*<sup>P</sup> value, and only up to 30 ms, except for Experiment 1 of the present article. For *t*<sup>P</sup> = 120 ms, no related measurements had been done before. The standard time interval, *t*S, was varied from 40 to 640 ms in steps of 40 ms both in the experimental and in the control condition. There were 64 stimulus patterns: 4 (1 control + 3 *t*<sup>P</sup> durations) × 16 (*t*<sup>S</sup> durations). The standard pattern was presented 1500–2500 ms after the participant initiated a presentation. There was an interval of 3000–4000 ms between the onsets of *t*<sup>S</sup> and *t*C.

#### *Procedure*

The participant performed two adjustment trials, one in ascending series and one in descending series, for each stimulus pattern, and thus 128 trials in total: 64 (stimulus patterns) × 2 (series), which were arranged in random order and divided into 11 blocks of 11 or 12 measurement trials preceded by two warm-up trials. Before the measurement, the participant performed one training session of 16 trials, for which representative stimulus patterns were

and 160 ms were raised by 300 and 600 ms, respectively, in this

employed. Thus, the whole experiment consisted of 12 blocks: 1 (training block) + 11 (measurement blocks). Each block took around 15–30 min, and the whole experiment was carried out over a period of 2–3 days for each participant.

#### **RESULTS AND DISCUSSION**

We performed a two-way [condition (1 control + 3 *t*<sup>P</sup> durations) × *t*<sup>S</sup> duration] ANOVA utilizing the PSE values. The main effect of condition (1 control + 3 *t*<sup>P</sup> durations) was significant, *F*(3,12) = 8.624, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.683, and so was the interaction between condition (1 control + 3 *t*<sup>P</sup> durations) and *t*<sup>S</sup> duration, *F*(45,180) = 3.344, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.455.

The PSEs in the control condition were close to the physical values of *t*S, but slight deviations appeared systematically (**Figure 3**). PSEs of longer duration tended to be longer than the physical values of *t*S, and this was not consistent with the tendency observed in Experiment 1. In both cases, however, the observed deviations were extremely small, and can be neglected for our present purpose.

The PSEs in the control and in the experimental condition differed systematically. The experimental PSEs were smaller when *t*<sup>S</sup> = *t*<sup>P</sup> + 40 or *t*<sup>P</sup> + 80 ms, indicating a robust occurrence of timeshrinking. This underestimation of *t*S, however, was replaced by overestimation, whose highest magnitude reached above 50 ms, when *t*<sup>S</sup> ≥ *t*<sup>P</sup> + 240 ms for all the *t*<sup>P</sup> values. Thus, as in Experiment 1, time-shrinking appeared when the difference between *t*<sup>S</sup> and *t*<sup>P</sup> was 40 or 80 ms, and contrast of *t*<sup>S</sup> to *t*<sup>P</sup> took over when *t*<sup>S</sup> was lengthened.

When *t*<sup>P</sup> = 160 ms as in Experiment 1, the overestimation again seemed to have a local peak when *t*<sup>S</sup> = 320 ms. This tendency indeed seems interesting, but is an issue to be investigated in the future.

participants.

To test whether the common tendency in overestimation pattern across different *t*<sup>P</sup> values (i.e., the underestimation of *t*<sup>S</sup> when *t*<sup>S</sup> = *t*<sup>P</sup> + 40 or *t*<sup>P</sup> + 80 ms and the overestimation when *t*<sup>S</sup> ≥ *t*<sup>P</sup> + 240 ms, observed for all *t*<sup>P</sup> values) was statistically significant, we conducted a Friedman test utilizing the mean overestimation values for each *t*<sup>P</sup> duration (= 80, 120, or 160 ms). There was a statistically significant tendency in overestimation depending on the difference between the two neighboring intervals (*t*<sup>S</sup> − *t*<sup>P</sup> = −40 to 480 ms), χ2(13) = 34.505, *p* = 0.001. As in Experiment 1, we also performed the same Friedman test without the conditions in which *t*<sup>S</sup> − *t*<sup>P</sup> = 40 or 80 ms, where timeshrinking should have taken place. The tendency in overestimation pattern was significant again, χ2(11) = 27.410, *p* = 0.004.

# **EXPERIMENT 3**

Time-shrinking almost disappeared, although not completely, when *t*<sup>P</sup> was above 300 ms (Nakajima et al., 2004, Figure 11). Our next step was to examine whether the tendency for *t*<sup>S</sup> to be underestimated when *t*<sup>S</sup> = *t*<sup>P</sup> + 40 or *t*<sup>P</sup> + 80 ms and overestimated when *t*<sup>S</sup> was further lengthened, as observed in Experiments 1 and 2, would appear entirely in the *t*<sup>P</sup> range in which we could expect time-shrinking. Because the overestimation of *t*<sup>S</sup> appeared in a very wide range of *t*<sup>S</sup> in Experiment 2, we made the range of *t*<sup>S</sup> in the present experiment even wider.

#### **METHODS**

#### *Participants*

Six students of Kyushu University, three males and three females, participated. Four of them had taken part in Experiment 2, but there had been an interval of at least 3 months. One of the participants had been educated to become a high-school music teacher, and four of them had received education for acoustic design, including basic training in music performance. The sixth one was an amateur musician who had been playing percussions for 8 years. They were 20–46 years old.

#### *Materials*

Duration markers and the way of presentation were the same as in Experiment 2. In the standard patterns of the experimental condition,*t*P|*t*S,*t*<sup>P</sup> =40, 120, 200, or 280 ms, where time-shrinking had occurred clearly (Nakajima et al., 2004). Overestimation of *t*<sup>S</sup> had been recorded for these *t*<sup>P</sup> values, but only in a handful of stimulus patterns, and only up to 30 ms, except for Experiment 2 of the present article. The standard time interval, *t*S, was varied from 40 to 1000 ms in steps of 80 ms both in the control and in the experimental condition. There were 65 stimulus patterns: 5 (1 control + 4 *t*<sup>P</sup> durations) × 13 (*t*<sup>S</sup> durations). The standard pattern was presented 1500–2500 ms after the participant initiated a presentation. There was an interval of 4000–5000 ms between the onsets of *t*<sup>S</sup> and *tC*.

#### *Procedure*

The participant performed two adjustment trials, one in ascending series and one in descending series, for each stimulus pattern, and thus 130 trials in total: 65 (stimulus patterns) × 2 (series), which were arranged in random order and divided into 10 blocks of 13 measurement trials preceded by two warm-up trials. Before the measurement, the participant performed 15 training trials, for

which representative stimulus patterns were employed. Thus, the whole experiment consisted of 14 blocks: 1 (training block) + 13 (measurement blocks). Each block took around 15–30 min, and the whole experiment was carried out over a period of 2–3 days for each participant.

# **RESULTS AND DISCUSSION**

We performed a two-way [condition (1 control + 4 *t*<sup>P</sup> durations) × *t*<sup>S</sup> duration] ANOVA utilizing the PSE values. The main effect of condition (1 control + 4 *t*<sup>P</sup> durations) was significant, *F*(4,20) = 6.450, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.563, and so was the interaction between condition (1 control + 4 *t*<sup>P</sup> durations) and *t*<sup>S</sup> duration, *F*(48,240) = 2.539, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.337.

The PSEs in the control condition were very close to the physical values of *t*<sup>S</sup> (**Figure 4**). Although slight deviations appeared again systematically, they were almost unrecognizable in the graphs except for the longest *t*<sup>S</sup> values, for which PSEs tended to be slightly shorter than the corresponding points of objective equality.

The PSEs in the control and in the experimental condition differed systematically. The experimental PSEs were conspicuously smaller when *t*<sup>S</sup> = *t*<sup>P</sup> + 80 ms, again showing the robustness of time-shrinking. For *t*<sup>P</sup> = 120, 200, and 280 ms, the underestimation of *t*<sup>S</sup> was replaced by overestimation when *t*<sup>S</sup> was longer.When *t*<sup>S</sup> > *t*<sup>P</sup> + 240 ms, the PSEs in the experimental condition were never smaller than those in the control condition. For *t*<sup>P</sup> = 200 and 280 ms, the overestimation reached above 100 ms, which is comparable to the temporal illusions Israeli (1930) reported in the visual modality. For *t*<sup>P</sup> = 40 ms, no clear overestimation appeared. When the same preceding interval duration was employed in Nakajima et al.'s (2004) Experiment 1, however, some overestimation appeared stably, although the amount was only about 10 ms, and it would be safer to reserve any clear conclusion for this *t*<sup>P</sup> value. In the present experiment, time-shrinking

**FIGURE 4 | Mean PSEs obtained from six participants in Experiment 3.** PSE corresponds to the duration of tC that was perceived to be equal to the duration of tS. The results for t<sup>P</sup> = 120, 200, and 280 ms were raised by 300, 600, and 900 ms, respectively, in this graph for clarity. The physical values of t<sup>S</sup> (the points of objective equality) are indicated by dotted lines, on which t<sup>P</sup> values are indicated by arrows. Error bars represent standard deviations between participants.

appeared when the difference between *t*<sup>S</sup> and *t*<sup>P</sup> was 80 ms, and contrast of *t*<sup>S</sup> to *t*<sup>P</sup> took over when *t*<sup>S</sup> was lengthened except when *t*<sup>P</sup> = 40 ms.

As in Experiment 2, we conducted a Friedman test utilizing the mean overestimation values for each *t*<sup>P</sup> duration to examine whether the common tendency in the overestimation pattern across different *t*<sup>P</sup> values 40, 120, 200, and 280 ms was statistically significant. There was a statistically significant tendency in overestimation depending on the difference between the two neighboring intervals (*t*<sup>S</sup> − *t*<sup>P</sup> = 0–720 ms), χ2(9) = 25.855, *p* = 0.002. We also performed the same Friedman test, but without the (negative) overestimations in conditions in which *t*<sup>S</sup> − *t*<sup>P</sup> = 80 ms, where time-shrinking should have taken place. The tendency in overestimation pattern was significant again, χ2(8) = 19.600, *p* = 0.012, confirming that the overestimation patterns had a common tendency even when the influence of time-shrinking (the dip at *t*<sup>S</sup> − *t*<sup>P</sup> = 80 ms) was cancelled.

# **EXPERIMENT 4**

The overestimation of *t*<sup>S</sup> took place to a remarkable degree in Experiments 1–3. It seemed necessary to have some idea on whether this strong contrast, which was observed between the two neighboring time intervals, *t*<sup>1</sup> and *t*<sup>2</sup> in this order, for the perception of *t*2, also affected the perception of *t*1. Because timeshrinking was a unilateral illusion affecting mainly the perception of *t*2, we first examined whether, and if so how, the underestimation of *t*<sup>2</sup> gave way to overestimation, and this indeed happened to a remarkable degree. Now it seemed important to check whether this contrast was unilateral or bilateral. In the present study, we just conducted an experiment to be appended to Experiment 3, but this would help us to interpret the present results. We picked up six temporal patterns of two neighboring time intervals in which contrast between them had caused overestimation of *t*<sup>2</sup> (*t*<sup>S</sup> in Experiment 3). Then PSEs of *t*<sup>1</sup> were measured for these patterns. For example, we took up a pattern of *t*<sup>1</sup> = 200 ms and *t*<sup>2</sup> = 680 ms, in which *t*<sup>2</sup> had been overestimated by more than 100 ms in Experiment 3. In the present experiment, we were interested in whether or not the same mechanism of contrast (bilaterally) led to the underestimation of *t*<sup>1</sup> making its PSE shorter than the control value. Because *t*<sup>1</sup> was the standard time interval, it is called *t*S, and the succeeding time interval *t*<sup>2</sup> is called *t*SUC in the present report. In other words, we used the same temporal patterns of two neighboring time intervals marked by three successive sounds as in Experiment 3, and the key difference was that *tC* was adjusted to match the perceived duration of the first interval instead of that of the second interval.

Due to the unavailability of a certain potential participant, we decided to employ five of the six participants from Experiment 3, making it still possible to reuse the data in the control condition of Experiment 3.

#### **METHODS**

#### *Participants*

Five students, three males and two females, participated in this experiment after participating in Experiment 3. There had been an

interval of at least 1 month between these experiments. They were 21–25 years old. Four of them had taken part in Experiment 2, but there had been an interval of at least 3 months. Four of them had received education for acoustic design, including basic training in music performance. The fifth one was an amateur musician who had been playing percussions for 8 years.

# *Materials*

Six stimulus patterns were chosen from the stimulus patterns in Experiment 3. In the standard patterns of the experimental condition, *t*S|*t*SUC, the standard time interval, *t*S, was 120, 200, or 280 ms; these values had been chosen for *t*<sup>P</sup> in Experiment 3. The control patterns of these *t*<sup>S</sup> values in Experiment 3 were regarded as the virtual control patterns of the present experiment, and thus the control data of the present participants were reused. The succeeding time interval, *t*SUC, was 440 or 680 ms; *t*SUC in any stimulus pattern would have been overestimated stably if it had been the standard time interval. There were six stimulus patterns not including the virtual control patterns. The standard pattern was presented 1500–2500 ms after the participant initiated a presentation. There was a silence of 4000–5000 ms between the onsets of *t*<sup>S</sup> and *tC*.

#### *Procedure*

The participant performed two adjustment trials, one in ascending series and one in descending series, for each stimulus pattern, and thus 12 trials in total arranged in random order. Four trials were conducted first for training and a warm-up, and the measurement trials followed without a break. The experiment took around 20 min.

#### **RESULTS AND DISCUSSION**

We performed a two-way [condition (1 control + 2 *t*SUC durations) × *t*<sup>S</sup> duration] ANOVA utilizing the PSE values. Neither the main effect of condition (1 control + 2 *t*SUC durations) nor the interaction between condition (1 control + 2 *t*SUC durations) and *t*<sup>S</sup> duration was significant, *F*(2,8) = 0.222, *p* > 0.05, η<sup>2</sup> <sup>p</sup> = 0.052; *F*(4,16) = 2.740, *p* > 0.05, η<sup>2</sup> <sup>p</sup> = 0.407, respectively.

The PSEs in the control condition were almost equal to the physical values of *t*<sup>S</sup> (**Figure 5**). The PSEs in the control and in the experimental condition were very close to each other. Underestimation of *t*<sup>S</sup> that should have occurred if the systematic contrast in Experiment 3 were bilateral did not take place to any observable degree. Although we do not have sufficient data to conclude that the systematic contrast observed in Experiments 1, 2, and 3 was unilateral, the underestimation of *t*<sup>S</sup> was almost negligible even in conditions in which the mechanism of contrast must have worked clearly. The observed contrast was at least very close to unilateral.

#### **GENERAL DISCUSSION**

The purpose of the present study was to observe the overestimation of an empty time interval caused by a preceding time interval. The conditions in the present study were comparable to the conditions in which time-shrinking had been reported to take place. We had assumed that time-shrinking was a unilateral perceptual assimilation of an empty time interval to a shorter preceding time interval. One may wonder whether the potential rhythmic regularity of presented patterns may be playing a

crucial role, but this idea is not supported by the fact that timeshrinking took place even when the preceding time interval and the time interval to be judged were separated in time (Sasaki et al., 2002). The assumption of "assimilation" itself is not related to

any particular perceptual mechanism directly, but it can give us a wider view of the observed facts. Because perceptual assimilation and contrast often appear in the same context, we examined whether a change from the unilateral assimilation, time-shrinking, could give way to contrast when the difference between the neighboring time intervals was increased. The range of the first time interval that can cause time-shrinking has been determined systematically in previous studies, and it has been established that the illusion takes place only when the difference between the neighboring time intervals was smaller than ∼100 ms. This knowledge made it possible for us to focus onto the stimulus conditions in which contrast was likely to take place. As a result, overestimation of the second of the neighboring time intervals appeared systematically.

When *t*<sup>P</sup> precedes and neighbors*t*<sup>S</sup> causing time-shrinking (i.e., the systematic underestimation of *t*S), an overestimation of *t*<sup>S</sup> was observed when *t*<sup>S</sup> was lengthened. The only exception was when *t*<sup>P</sup> was set to be extremely short, i.e., *t*<sup>P</sup> = 40 ms. The overestimation of *t*<sup>S</sup> never disappeared when *t*<sup>S</sup> − *t*<sup>P</sup> > 240 ms for the other *t*<sup>P</sup> values. The overestimation as a function of *t*<sup>S</sup> − *t*<sup>P</sup> showed a common tendency across the different *t*<sup>P</sup> values (**Figure 6**), which was confirmed by the Friedman tests.

What we had not expected was that the contrast appeared in such a wide range and to such a large degree. About the range of the second time interval, we have already reached 1 s as the longest duration. It will be very important in the future to determine the upper limit of the range in which the overestimation takes

place, but this would require a new experimental paradigm because we can easily reach the perceptual limit; when a time interval is equal to or above 1.5–2 s, it is often difficult to grasp the whole interval perceptually, or to perceive it as a part of a single rhythm pattern (Fraisse, 1978; Nakajima et al., 1980; Warren, 2008; see also Grondin, 2012 for a perceptual limit at around 1.5 s).

The amount of the overestimation sometimes surpassed 100 ms. Although similar overestimation had appeared occasionally in previous studies on unilateral or bilateral assimilation between neighboring time intervals, the positive overestimation had never reached 40 ms except in the present Experiment 1. It turned out now that the overestimation can be larger than time-shrinking in terms of deviation from the control PSEs in milliseconds. Although we had (re)started this study as something to be added to the studies of perceptual assimilation between time intervals, the overestimation of the second time interval now appeared as a phenomenon worth investigating more systematically in different series of studies. It is particularly necessary to examine whether the present results can be related to the fact that a successive presentation of two objects (as would be inevitable for time intervals) could facilitate the perceptual contrast between them (Ikeda and Obonai, 1955).

Fraisse (1978, 1982) argued that rhythm patterns were often based on two dominant duration values, and that they were mostly in a ratio 1:2, and occasionally in 1:3; inWestern music, the shorter durations were typically 150–290 ms, and the longer durations 300–900 ms. This could explain the overestimation in the present study in some cases. Perceptual contrast can often take place as, or as a result of, categorical perception, although it is often difficult to relate results in different paradigms (Repp and Liberman, 1987). If a shorter duration and a longer duration neighboring each other are to be perceived as in different perceptual categories, i.e., in the short-duration category and in the long-duration category, this can be an aspect, or a cause, of perceptual contrast. In the present experiments, the first time interval was always below 290 ms, and the second time interval was mostly above 300 ms when it was overestimated. Most cases in which *t*<sup>P</sup> caused the overestimation of *t*<sup>S</sup> can be interpreted by the fact that *t*<sup>P</sup> <300<*t*<sup>S</sup> ms, which should have caused the time intervals to be relocated in different perceptual categories, which then should have led to the overestimation of *t*S. This interpretation describes the general tendency of the present data rather well, and is worth investigating further. However, the categorical boundary at about 300 ms is hardly a part of common knowledge, and a systematic investigation on this issue should be the first thing necessary to pursue this path.

Another possible explanation related to a categorical aspect of temporal perception is related to the studies of Miyauchi and Nakajima (2005) and ten Hoopen et al. (2006; see also Sasaki et al., 1998; andMiyauchi and Nakajima,2007). They presented auditory temporal patterns as used in the present experiments to participants, and established a *1:1 category*, i.e., a perceptual category in which the neighboring time intervals are perceived as equal to each other even when the physical difference between them is greater than the differential limen. One of the boundaries of this category was very close to the point at which time-shrinking reaches its maximum, i.e., the point at which *t*<sup>S</sup> − *t*<sup>P</sup> 80 ms; the overestimation of *t*<sup>S</sup> typically appeared when the difference between *t*<sup>P</sup> and *t*<sup>S</sup> doubled this value. This is an idea to be kept for future research, but some difficulty arises if we are to explain why the contrast appeared not immediately when the 1:1 category gave way but when the difference between *t*<sup>P</sup> and *t*<sup>S</sup> increased further.

Although human listeners are able to discriminate temporal patterns more precisely than specified by musical notations, they tend to establish perceptual categories represented by simple ratios between neighboring durations as in musical notations (Honing, 2013; see also Povel, 1981). It is understandable that humans have to categorize temporal patterns in order to memorize, imitate, or respond quickly to them. This might lead to the human listeners' tendency to make the subjective ratios between neighboring durations closer to those in the prototypical patterns, which are made of simple ratios. As Fraisse (1978, 1982) indicated, the perceptual system tends to make the perceived ratio closer to a simple integral ratio as 1:1 or 1:2 (see also Honing, 2013). Supporting this observation, Nakajima (1979) reported that a pattern of two neighboring time intervals of 80 and 160 ms was perceived in ratios close to 1:1 or 1:2 avoiding intermediate cases, and Povel (1981) systematically showed the stability of the ratio 1:2 in a task to reproduce repeated temporal patterns. It is very likely that a temporal pattern to be perceived as in a ratio 1:1.7, for example, is perceptually distorted to be closer to 1:2, causing the overestimation of the second time interval. However, this alone cannot account for the overestimation observed in the present study. Suppose that *t*<sup>P</sup> = 200 ms in the paradigm of Experiments 1, 2, and 3. Nakajima et al. (1988, Table 1) showed that the temporal pattern 200|400 ms was perceived in a ratio 1:1.78, i.e., closer to 1:1 than the physical ratio 1:2, and this tendency was in line with their psychophysical hypothesis. If the perceptual system tries to shift toward a simpler ratio 1:2, then the second time interval may be overestimated. Although this hypothesis seemed attractive, a further examination of our own data was not very promising. For example, in the pattern 200|520 ms in Experiment 3, which would correspond to a subjective ratio 1:2.14 according to Nakajima et al.'s (1988) psychophysical hypothesis, the second time interval should be underestimated to make the subjective ratio closer to 1:2. In reality, this pattern still caused the overestimation of *t*S. As in this example, the overestimation took place more widely than was predicted from the perceptual system's tendency toward simpler ratios. No literature or experimental data are within the present authors' knowledge about the mechanism to show such perceptual tendencies, and the present experimental paradigm will be useful to solve this problem in the future. It should also be interesting for future research to examine the assimilation and contrast in a more complex context (e.g., Jones and McAuley, 2005).

One may wonder whether the overestimation of *t*<sup>S</sup> in the present results can be explained by time-order error (TOE), which is a phenomenon observed in psychophysics in general. Previous studies reported that TOE is expected to be positive for short durations of a few hundred milliseconds, as the durations utilized as *t*<sup>P</sup> in the present experiment (although it should be noted that in TOE studies two successive and distinct intervals are used instead of two intervals sharing a common marker; Woodrow, 1951; Eisler et al., 2008). This means that the duration of *t*<sup>P</sup> should be overestimated

relative to *tS*.. In the present experiments, *t*<sup>S</sup> was overestimated (Experiments 1–3) but *t*<sup>P</sup> was not (Experiment 4). It seems difficult to explain the tendencies of the present results with TOEs as reported in classical literature (e.g., Hellström, 1985).

We began the present study in order to observe what would happen if the temporal patterns causing time-shrinking were modified by lengthening the second of the two neighboring time intervals. This tactic worked well to find clear cases in which assimilation gave way to contrast. As the overestimation was so systematic, however, it will be necessary in the future to investigate this issue in a broader paradigm apart from time-shrinking. First, it is of some interest whether the first of the neighboring time intervals is also affected when the second time interval is overestimated. The results of Experiment 4 were negative, suggesting that the contrast was unilateral, but we need further studies on this point. It attracts our interest as well whether any perceptual contrast would take place if the temporal order between the longer and the shorter time interval is reversed. Although there are some previous data for some speculation, we basically need a new set of experiments.

Arao et al. (2000) showed that time-shrinking occurred also in the visual modality, and it took place when the neighboring time intervals *t*<sup>P</sup> and *t*S, in this order, had the relationship *t*<sup>P</sup> < *t*<sup>S</sup> < ∼1.8 × *t*P. If we see their data from the present viewpoint, it is suggested that overestimation of *t*<sup>S</sup> is likely to replace timeshrinking if *t*<sup>S</sup> is far above this range, and this is worth investigating immediately. The same argument may hold also for the tactile modality (Hasuo et al., 2014).

One big problemfor ourfuture research is that the experimental data are not always very stable in the present paradigm, and this can be the case in other related paradigms. The individual differences were sometimes as big as the effects to be investigated. Fortunately, our present purpose was simple, i.e., to examine whether systematic overestimation of the second time interval would or would not appear; we somehow reached tentative conclusions. If the many issues suggested here are to be investigated in the future, however, we will need more sophisticated methods. One possible solution is to design experiments that enable us to perform some multivariate analyses. Another possibility is to obtain a lot of data from a few participants, and to compare results in different conditions for each individual participant.

We investigated the perception of empty time intervals marked by tone bursts, and employed temporal patterns of two neighboring time intervals. Our research question was whether the overestimation of the second time interval would replace the underestimation (time-shrinking) if the difference between the neighboring time intervals was increased. The overestimation took place very systematically when the first time interval was 80–280 ms, and its amount sometimes exceeded 100 ms, indicating that this was an important phenomenon related to rhythm perception. It is very likely that similar temporal patterns appear often in music. Assimilation and contrast, which Fraisse (1978) considered to be two important principles to construct rhythm, were manifested in an *in vitro* situation.

#### **ACKNOWLEDGMENTS**

This work was supported by the Japan Society for the Promotion of Science [Grants-in-Aid for Scientific Research S (19103003) and A (25242002) to Yoshitaka Nakajima, and a Grant-in-Aid for JSPS Fellows (25-6091) to Emi Hasuo], and its final stage was a part of Kyushu University Interdisciplinary Programs in Education and Projects in Research Development (The Kyushu University Project for Interdisciplinary Research of Perception and Cognition).

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 February 2014; accepted: 16 April 2014; published online: 14 May 2014. Citation: Nakajima Y, Hasuo E, Yamashita M and Haraguchi Y (2014) Overestimation of the second time interval replaces time-shrinking when the difference between two adjacent time intervals increases. Front. Hum. Neurosci. 8:281. doi: 10.3389/fnhum. 2014.00281*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Nakajima, Hasuo, Yamashita and Haraguchi. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Cortical activity associated with the detection of temporal gaps in tones: a magnetoencephalography study

# *Takako Mitsudo1\*, Naruhito Hironaga2 and Shuji Mori <sup>1</sup>*

<sup>1</sup> Department of Informatics, Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan

<sup>2</sup> Department of Clinical Neurophysiology, Neurological Institute, Faculty of Medicine, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan

#### *Edited by:*

Willy Wong, University of Toronto, Canada

#### *Reviewed by:*

Koji Inui, National Institute for Physiological Sciences, Japan Bernhard Ross, University of Toronto, Canada

#### *\*Correspondence:*

Takako Mitsudo, Department of Informatics, Faculty of Information Science and Electrical Engineering, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka, 819-0395, Japan e-mail: mitsudo@cog.inf.kyushu-u.ac.jp We used magnetoencephalogram (MEG) in two experiments to investigate spatio-temporal profiles of brain responses to gaps in tones. Stimuli consisted of leading and trailing markers with gaps between the two markers of 0, 30, or 80 ms. Leading and trailing markers were 300 ms pure tones at 800 or 3200 Hz.Two conditions were examined: the within-frequency (WF) condition in which the leading and trailing markers had identical frequencies, and the between-frequency (BF) condition in which they had different frequencies. Using minimum norm estimates (MNE), we localized the source activations at the time of the peak response to the trailing markers. Results showed that MEG signals in response to 800 and 3200 Hz tones were localized in different regions within the auditory cortex, indicating that the frequency pathways activated by the two markers were spatially represented. The time course of regional activity (RA) was extracted from each localized region for each condition. In Experiment 1, which used a continuous tone for the WF 0-ms stimulus, the N1m amplitude for the trailing marker in the WF condition differed depending on gap duration but not tonal frequency. In contrast, N1m amplitude in BF conditions differed depending on the frequency of the trailing marker. In Experiment 2, in which the 0-ms gap stimulus in the WF condition was made from two markers and included an amplitude reduction in the middle, the amplitude in WF and BF conditions changed depending on frequency, but not gap duration.The difference in temporal characteristics betweenWF and BF conditions could be observed in the RA.

**Keywords: gap detection, within-frequency (WF), between-frequency (BF), regional activity (RA), cortical tonotopy**

# **INTRODUCTION**

The human auditory system is sensitive to temporal changes in sounds. Gap detection is a frequently used task that measures auditory temporal resolution by requiring a listener to judge whether a stimulus contains a brief silent interval (gap). When leading and trailing markers share the same frequency, this task is referred to as a within-frequency (WF) detection task (Formby and Forrest, 1991; Formby et al., 1998; Phillips, 1999), and the gap-detection threshold (i.e., the minimally detectable gap duration) is usually found to be around 2–3 ms (Plomp, 1964; Penner, 1977). When the leading and trailing markers differ in frequency, the task is referred to as a between-frequency (BF) detection task. Psychophysical evidence has shown that gap detection becomes more difficult as the frequency difference between the leading and trailing markers increases; the gap-detection threshold can be as high as 50 ms when the frequencies are separated by two octaves (Formby and Forrest, 1991; Phillips et al., 1997; Formby et al., 1998; Phillips, 1999).

In contrast to the many psychophysical studies concerningWFgap detection and differences between WF and BF gap-detection thresholds (Moore et al., 1989; Phillips et al., 1997; Phillips, 1999; Heinrich and Schneider, 2006), physiological studies regarding BF conditions are relatively few and the underlying neural mechanisms are not yet well understood. Electrophysiological studies that have investigated cortical responses to BF- and WF-gap

detection have highlighted the importance of trailing-marker onset in relation to leading marker offset (Eggermont, 2000; Lister et al., 2007; Ross et al., 2010). Lister et al. (2007) recorded electroencephalograms (EEG) containing P1-N1-P2 auditory evoked responses to leading and training markers in WF and BF conditions. In the BF condition, trailing-marker onset elicited P1-N1-P2 responses for all gap durations, while in the WF condition they did so only when gaps were at least as long as the gap-detection threshold. Heinrich et al. (2004) focused on central processing in BF-gap detection by recording mismatch negativity (MMN) waves in an odd-ball paradigm. The results showed no significant effect of gap duration on MMN amplitude and suggested that primary auditory cortex plays a central role in the computation required for WF- and BF-gap detection.

To further investigate activity in the auditory cortex in response to silent gaps under BF conditions, we recorded magnetoencephalograms (MEG), a technique not yet used in studies of BF-gap detection. Specifically, we measured auditory evoked fields (AEFs) to reveal the spatio-temporal characteristics of cortical activity that may underlie psychophysical performance in WFand BF-gap detection. MEG was conducted with minimum norm estimate (MNE), a visualization method that uses distributed source modeling with additional *a priori* constraints and can represent a number of local or distributed sources (Hamalainen and Ilmoniemi, 1994). Owing to high temporal and spatial resolution, MEG-source analysis can extract fine temporal information from localized regions. As in EEG studies that showed clear differences between WF and BF conditions in response to the trailing marker (Lister et al., 2007), here we observed the response to trailingmarker onsets in concentrated regions and looked in the auditory cortex for activity related to the gaps.

We examined spatial characteristics of cortical activity in terms of the frequency pathways for leading and trailing markers that were represented by tonotopic organization of auditory cortex. Neurons responding best to tones at specificfrequencies are known to form tonotopic maps in auditory cortex (Woolsey, 1960). Studies using functional magnetic resonance imaging (fMRI) and MEG have shown that tonotopic organization exists not only in nonhuman primates but also in the human auditory cortex (e.g., Pantev et al., 1988, 1995; Formisano et al., 2003). In the present study, we used MNE and the marked inspection region of interest (iROI) to localize source activations at the time of the peak response to the trailing markers. We then analyzed the regional activities (RAs) in the iROI to compare the time courses across conditions. By visualizing activity in the auditory cortex during both BF and WF conditions, we were able to observe how the leading and trailing markers of different frequencies activated distinct areas in the auditory cortex.

The present study consisted of two experiments which differed primarily in the construction of the 0-ms-gap stimulus in the WF condition. In Experiment 1, it was a pure tone lasting 600-ms, which matched the total length of leading and trailing markers used in other conditions. In Experiment 2, it was constructed from two pure tones, each lasting 300 ms. While amplitude was not reduced in the middle of the 0-ms-gap stimulus in Experiment 1, it was reduced between the two markers in Experiment 2. Thus, in Experiment 2, the 0-ms-gap stimulus was qualitatively similar to the other stimuli, while in Experiment 1 it was slightly different.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Ten (five females, aged 23–53 years) and six (four females, aged 23–37 years) healthy volunteers participated in Experiment 1 and 2, respectively. No participants reported a hearing deficit or had difficulty hearing any of the stimuli used in the experiment. Informed consent was obtained from each participant after receiving an explanation of the purpose and procedures of the experiment. The study was approved by the Kyushu University Ethics Committee of the Faculty of Information Science and Electrical Engineering.

#### **STIMULI AND PROCEDURE**

Stimuli were synthesized on a personal computer (Dimension 4500C, DELL Inc., Round Rock, TX, USA) with a sampling frequency of 44.1 kHz. Stimuli were presented by a personal computer using STIM2 software (Neuroscan Co. Ltd., Charlotte, NC, USA), were amplified (PS3001, DMglobal Co. Ltd., Mahwah, NJ, USA), and presented monaurally to the participants' right ears via a pair of inserted earphones (ER-3A, Etymotic Research Inc., Elk Grove Village, IL, USA). All stimuli were presented at 82 dB SPL measured by a sound-level meter with a 1/2-inch condenser microphone (Brüel and Kjær, models 2250 and 4192). Participants were instructed to listen passively to the stimuli, stay alert, and keep their eyes open throughout each experimental block. Each participant's behavior during MEG measurement was monitored using a TV-monitor system, and auditory responses were checked using online averaging.

#### *Experiment 1*

Except for theWF 0-ms-gap stimulus, all stimuli consisted of leading and trailing markers, which were pure tones lasting 300 ms each. The 300-ms leading marker included 20-ms rise and 3 ms fall times, and the trailing marker contained 3-ms rise and 3-ms fall times (**Figure 1A**). For the WF condition, the frequencies of the two markers were identical to each other, being either 800/800 or 3200/3200 Hz. For the BF condition, the frequencies of the two markers were different, being either 800/3200 or 3200/800 Hz. The gap duration was either 0 (no gap), 30, or 80 ms. The 30- and 80-ms-gap durations were used to match those found in the gap-detection literature (Phillips et al., 1997; Elangovan and Stuart, 2008), which show that while both durations are clearly detectable in WF conditions, in BF conditions, the 30-ms gap is close to gap-detection threshold while the 80-ms gap is well beyond threshold. In the WF condition, the 0-msgap stimulus was a pure tone lasting 600 ms, with no amplitude reduction in the middle (**Figure 1A**, left). In the BF condition, it was a concatenation of leading and trailing markers (both 300 ms). This resulted in amplitude reduction in the middle owing to their 3-ms rise and fall times (**Figure 1A**, right). For each frequency combination (FC), each gap-duration stimulus was presented 80 times in pseudo-random order. These 960 trials (4 FCs × 3 gap durations × 80 trials) were divided into four blocks of 240 trials. Inter-trial intervals randomly varied from 1.5 to 1.8 s. Condition order was counterbalanced across participants.

# *Experiment 2*

The stimuli were identical to those of Experiment 1, except in two respects. First, both the leading and the trailing markers contained 3-ms rise/fall times. Second, the 0-ms-gap stimuli for both conditions consisted of leading and trailing markers, with the fall time of the leading markers and the rise time of the trailing markers overlapping each other (**Figure 1B**). Thus the 0-ms-gap stimuli contained small amplitude reductions in both the WF and BF conditions. For each FC, the stimulus presentation and other parameters were the same as in Experiment 1.

#### **DATA ACQUISITION**

MEG measurement was conducted in the Brain Center in Kyushu University Hospital. AEFs were measured using a whole-head 306 channel biomagnetometer system (Elekta, Neuromag, Helsinki, Finland) in a quiet, magnetically shielded room. The detector array comprised 102 identical triple-sensor elements, with each sensor element comprising two orthogonally oriented planar-type gradiometers and one magnetometer. Before recording, four headposition indicator (HPI) coils were attached to the scalp, and a 3D digitizer was used to measure head shapes with respect to the HPI coils. Magnetic responses were digitally sampled at

1000 Hz, and online filtered with a bandpass of 0.1–330 Hz. MRI data were acquired using a 3.0-T high resolution MRI scanner (Achieve, Philips N.V. Eindhoven, The Netherlands) for analysis (TE, 60 ms; TR, 100 ms; voxel size, 1.5 mm × 1.5 mm × 1.5 mm) and interpretation of MEG data.

#### **SIGNAL PROCESSING AND SOURCE RECONSTRUCTION**

After recording, Maxfilter (Taulu et al., 2005) was used to reduce artifact signals arising from outside the sensor array. A 1–100 Hz off-line bandpass filter and a 60 Hz notch filter were applied to highlight the AEFs. AEFs measured from ∼80 responses for each FC were averaged for each gap duration. Using the averaged data, we focused on the contralateral hemisphere because AEFs are usually larger there than they are ipsilaterally (Pantev et al., 1986). The peak latencies and amplitudes of the AEFs were picked up from the gradiometer that showed the most salient activation in the AEFs for each FC.

Following off-line signal processing, we performed an MEG source reconstruction. A distributed source model of the MEG signals (recorded from the entire head surface) was estimated using MNE to obtain the current strength of cortical sources. This method offers high spatial resolution for detecting simultaneous magnetic sources distributed across the entire cortical surface. The precise procedure for performing MNE has been described elsewhere (Hamalainen and Ilmoniemi, 1994; Molins et al., 2008). Each participant's cortical surface was reconstructed from high-resolution T1-weighted MR images using FreeSurfer software (Fischl et al., 1999). An anatomical MRI image was co-registered with the MEG head coordinate system using head-shape points obtained by Polhemus measurement.

An inverse solution was calculated based on the forward solution that models the signal pattern generated by a unit dipole at each location on the cortical surface using a single homogeneous realistic head model and a boundary element method (BEM). The activation at each cortical location was estimated at each time point of the activity, and was simultaneously estimated using a noise-normalized linear estimation approach [dynamic statistical parametric maps (dSPM); Dale et al., 2000]. A noise covariance matrix was created using pre-trigger periods from −100 to 0 ms via trigger onset. The activation patterns derived from the analysis were mapped onto the cortical surface images of each participant to make visualization clear. Each participant's data were transformed into a standard brain (MNI305; Collins et al., 1994) to estimate the source activations across subjects on the same scale (Fischl et al., 1999).

#### **GROUP ANALYSIS**

To confirm the primary activated areas in each of the four frequency conditions (800/800, 3200/3200, 3200/800, and 800/3200), activation maps at the peak latencies (N1m) of the trailing markers were estimated using dSPM and averaged with standardization (divided by max value) after transforming them into the standard brain. We estimated the target areas in each of the four frequency conditions in two steps. First, we averaged the activation map using a set ROI that covered the transverse temporal gyrus and its immediate vicinity (i.e., the auditory cortex; Pantev et al., 1988) to obtain a common activated area across all participants (**Figure 2**). Second, referencing the common activated area marked by the first step and the strongest activation in the auditory cortex from each individual, we marked the iROIs on the auditory cortex in the left hemisphere of each participant's cortex for all four conditions. Then, activity of each marked iROI was re-transformed into the standard brain and averaged again (**Figures 3** and **4**). After obtaining the iROIs corresponding to the 800- and 3200-Hz trailing markers, each activity pattern and tendency was examined individually. To statistically evaluate the accuracy of source localization, the center locations of N1m responses to the trailing markers were estimated for all four FCs in each participant. The center locations for the marked 800 and 3200-Hz iROIs were calculated and transformed into the standard brain so that location estimates would be on the same scale. Finally, a center location on the standard brain was estimated for each FC using weighted averaging that followed our established methods (Hironaga et al., 2014). The coordinate system used to express the location is based on the MNI Talairach. The x-axis indicates the medial/lateral direction, y-axis indicates the anterior/posterior direction, and z-axis indicates the inferior/superior direction. The RA in each iROI for each stimulus was extracted from each individual, and the activities were also averaged with standardization. The N1m peak latencies for both the leading and trailing markers were extracted from RAs for all conditions and corresponding amplitudes were evaluated. We defined the peak latencies of the 0-ms gap in the WF condition in Experiment 2 as a peak that occurred within the 100–200 ms time window after the onset of the trailing marker (i.e., gap offset).

# **RESULTS**

#### **SOURCE ACTIVATION GROUP ANALYSIS** *Experiment 1*

**Figure 3** shows the averaged AEF responses of 10 participants to the trailing marker after converting the activity in marked individual iROIs to the standard brain. The areas showing responses to the 800-Hz tone were located in anterior Heschl's gyrus (HG), while those to the 3200-Hz tone were located in posterior HG. Responses to the 800-Hz tone appeared concentrated in a single area regardless of condition (**Figures 3A,B**), while those to the 3200-Hz tone were dispersed across the auditory cortex (**Figures 3C,D**). This was especially true for the 800/3200 condition (**Figure 3D**). **Table 1** gives the mean estimated centers of N1m responses to the leading marker for both frequencies, and **Table 2** shows those to the trailing marker for all four conditions.

An ANOVA was performed using IBM SPSS statistics 21 (IBM Co. Ltd., Armonk, NY, USA) to assess the center locations of N1m iROI, as well as the amplitudes and the latencies of RA patterns for each condition. To check whether the N1m sources for the leading and trailing markers were localized, we chose "frequency" as a factor for both leading and trailing marker. For the frequency factor of the leading marker, we averaged the coordinates of the 800/800 and 800/3200 conditions and those of the 3200/3200 and 3200/800 conditions. In contrast, for the frequency factor of the trailing marker, we averaged the coordinates of the 800/800 and 3200/800 conditions and those of the 3200/3200 and 800/3200 conditions. One-way (Frequency: 800, 3200) ANOVAs were performed separately on the center coordinate values of the three axes (x, y, and z) obtained for the leading and the trailing markers. The Greenhouse–Geisser correction was applied when the assumption of sphericity was violated in the dependent measures. *Post hoc* Bonferroni corrections multiple comparisons were applied when required. The η<sup>2</sup> <sup>p</sup> (partial eta-squares) were calculatedfor the quantitative comparison of effect sizes. For the leading marker, the main effect of frequency was significant in the y-axis [*F*(1,9) = 7.02, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.48], but not in the x-axis [*F*(1,9) = 3.16, *p* = n.s., η2 <sup>p</sup> = 0.26] or the z-axis [*F*(1,9) = 0.12, *p* = n.s., η<sup>2</sup> <sup>p</sup> = 0.01]. The center location of the 800-Hz N1m (*y* = −24.99) was more anterior than that of the 3200-Hz N1m (*y* = −30.62). For the trailing marker, the main effect of trailing frequency was significant in the

**FIGURE 2 | Dynamic statistical parametric maps (dSPM) results of mean activations to the trailing marker on the standard brain for 800-Hz (A) and 3200-Hz (B) within-frequency (WF) trailing markers.** Source activation of AEF responses in auditory cortex ROI of the left hemisphere **(C)** to the trailing marker both for 30- and 80-ms gap durations were transformed from individual brains into the standard brain (MNI305) and averaged across 10

participants with standardization. The coloring threshold levels were set at fthres (low threshold) = 10, fmid (middle) = 12.5, and fmax (maximum) = 15 for all figures. **(C)** A lateral view of the left hemisphere of the standard brain showing the region of interest (left Heschl's gyrus). **(A)** and **(B)** represent enlargements of the area surrounded by the yellow square. dSPM, dynamic statistics parameter mapping. MNI, Montreal Neurological Institute.

**FIGURE 3 | Mean activations in response to the trailing marker depicted on the standard brain for each condition: (A) 800/800, (B) 3200/800, (C) 3200/3200, and (D) 800/3200 in Experiment 1.** Source activation results of the averaged AEF responses to the trailing marker were obtained after transferring data from marked individual ROIs to the standard brain (MNI305). The threshold levels were set at fthres = 5, fmid = 9, and fmax = 15 for all figures. MNI, Montreal Neurological Institute.

**depicted on the standard brain for each condition: (A) 800/800, (B) 3200/800, (C) 3200/3200, and (D) 800/3200 in Experiment 2.** Source activation results of the averaged AEF responses to the trailing marker were obtained after transferring data from marked individual ROIs to the standard brain (MNI305). The threshold levels were set at fthres = 1.2, fmid = 1.6, and fmax = 2.0 for all figures. MNI, Montreal Neurological Institute.

y-axis [*F*(1,19) = 8.89, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.32], but not in the x-axis [*F*(1,19) = 1.46, *p* = n.s., η<sup>2</sup> <sup>p</sup> = 0.07] or the z-axis [*F*(1,19) = 0.17, *p* = n.s., η<sup>2</sup> <sup>p</sup> = 0.01]. The center of the 800-Hz N1m (*y* = −23.96) was more anterior than that of the 3200-Hz N1m (*y* = −28.63).

#### *Experiment 2*

**Figure 4** shows the averaged AEF responses of 6 participants to the trailing marker in Experiment 2. Core activations appeared in almost identical locations to those obtained in Experiment 1 (**Figure 3**), but dispersion of the individual iROIs was much less in the 3200-Hz condition (**Figures 4C,D**). The mean estimated centers of N1m responses to the leading and trailing markers are given in **Tables 1** and **2**. As in Experiment 1, one-way (Frequency: 800, 3200) ANOVAs were performed separately for the leading and



Coordinates are given as mean (±SD).

\*Montreal Neurological Institute (MNI) coordinates [Right Anterior Superior (RAS) coordinate in the standard brain].

trailing markers on each coordinate axis. For the leading marker, the main effect of frequency was significant in all axes [x-axis: *F*(1,5) = 9.38, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.65; y-axis: *F*(1,5) = 16.31, *p* < 0.01, η2 <sup>p</sup> = 0.77; z-axis: *F*(1,5) = 8.87, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.64]. The center of the 800-Hz N1m was more lateral (*x* = −46.42), anterior (*y* = −25.64), and superior (*z* = 8.01) than that of the 3200-Hz N1m (*x* = −47.00, *y* = −30.80,*z* = 5.95). For the trailing marker, the main effect of frequency was also significant for all axes [xaxis: *F*(1,11) = 6.11, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.36; y-axis: *F*(1,11) = 24.20, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.69; z-axis: *F*(1,11) = 17.91, *p* < 0.091, η<sup>2</sup> <sup>p</sup> = 0.62]. The center of the 800-Hz N1mfor the trailing marker was more lateral (*x* = −43.68), anterior (*y* = −25.46), and superior (*z* = 7.64) than that of the 3200-Hz N1m (*x* = −47.00, *y* = −29.60, *z* = 4.94).

#### **ANALYSIS OF REGIONAL ACTIVITY** *Experiment 1*

**Figure 5** presents the averaged RAs for the trailing markers from 10 participants that were extracted from individually marked iROIs. While onset responses for the trailing marker were not observed for the 0-ms gap in the WF condition (**Figure 5A**, *green line*), they were clearly evident in the BF condition (**Figure 5B**, *green line*). We also compared the RAs from **Figure 5** with sensor-level average waveforms (data not shown) and confirmed that the N1m in our study was equivalent to a P1-N1-P2 response pattern (e.g., Ross et al., 2010). For RA amplitudes, we used the relative amplitudes (peak value of the trailing marker divided by that of the leading marker) as an independent variable of interest (**Table 3**). The values for individual participants were subjected to a 2 [Frequency (Fr): 800 vs. 3200 Hz] × 2 [Gap duration (GD): 30 vs. 80 ms] ANOVA for the WF condition and a 2 (Fr: 800 vs. 3200 Hz) × 3 (GD: 0, 30, 80 ms) ANOVA for the BF condition. For the WF condition, we observed a significant main effect of GD [*F*(1,9) = 5.82, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.39], but no significant main effect of Fr [*F*(1,9) = 0.02, n.s., η<sup>2</sup> <sup>p</sup> = 0.002]. The peak amplitudes for 30-ms trailing marker were larger than those for the 80-ms trailing marker. In the BF condition, ANOVA revealed a significant main effect of Fr [*F*(1,9) = 27.02, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.75]. The


#### **Table 2 |The center location of trailing markers' N1m on the standard brain for each frequency combination in Experiments 1 and 2.**

Coordinates are given as mean (±SD).

\*Montreal Neurological Institute (MNI) coordinates [Right Anterior Superior brain (RAS) coordinate in the standard brain].



Amplitudes are given as mean (±SD).

peak amplitudes for the 800-Hz trailing marker were larger than those for the 3200-Hz trailing marker. There was no significant main effect of GD [*F*(2,18) = 2.17, n.s., η<sup>2</sup> <sup>p</sup> = 0.19]. For both WF and BF conditions, the interaction between Fr and GD was not significant [WF: *F*(1,9) = 3.39, n.s., BF: *F*(2,8) = 2.56, n.s.].

**Table 4** shows the peak latencies of the RAs from the iROI obtained in Experiment 1. For the leading marker, we performed a 4 (FC: 800/800, 800/3200, 3200/800, 3200/3200) × 3 (GD: 0, 30, 80 ms) ANOVA. For the trailing marker, we performed a 2 (Fr: 800 vs. 3200 Hz) × 2 (GD: 30 vs. 80 ms) ANOVA for the WF condition and a 2 (Fr: 800 vs. 3200 Hz) × 3 (GD: 0, 30, 80 ms) ANOVA for the BF condition. For the leading marker, ANOVA showed no significant main effect of FC [*F*(3,27) = 0.99, n.s., η<sup>2</sup> <sup>p</sup> = 0.10] or GD [*F*(2,18) = 0.54, n.s., η<sup>2</sup> <sup>p</sup> = 0.06]. Peak RA latencies in response to the leading marker appeared to be around 110 ms after stimulus onset for all FCs, which corresponded to the N1m in the sensor-level AEF. For the trailing marker, there was a significant main effect of GD in both conditions [WF: *F*(1,9) = 406.44, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.98; BF: *F*(2,18) = 344.55, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.98], but no significant main effect of Fr in either condition [WF: *F*(1,9) = 0.00, n.s., η<sup>2</sup> <sup>p</sup> = 0.00; BF: *F*(1,9) = 1.25, n.s., η<sup>2</sup> <sup>p</sup> = 0.12]. Similar to the leading marker, peak RA latencies in response to the trailing marker appeared to be around 110 ms after stimulus onset. For example, in the 800/3200-BF case, the average onset latencies for trailing markers were 108, 138, and 194 ms

for the 0-, 30-, and 80-ms gaps, respectively. The differences of these latencies (30 ms between 0- and 30-ms gaps and 55 ms between 30- and 80-ms gaps) corresponded to the differences in gap durations.

#### *Experiment 2*

We calculated the averaged RAs for the trailing markers (**Figure 6**) and the relative amplitudes of the RAs in the iROI (**Table 3**). In the WF condition, onset responses for the trailing marker were observed for the 0-ms gap condition (**Figure 6A**, *green line*), but amplitudes were smaller than those of the 30-ms and 80-ms gap conditions (**Figure 6A**, *red* and *blue* lines). In the BF condition, the onset responses for the trailing marker were observed for the 0-ms gap condition (**Figure 6B**, *green line*). This tendency is consistent with the results in Experiment 1. We performed a 4 (FC: 800/800, 800/3200, 3200/800, 3200/3200) × 3 (GD: 0, 30, 80 ms) ANOVA on the peak amplitude for the trailing marker. The result showed a significant main effect of FC [*F*(3,15) = 9.93, *p* < 0.01, η2 <sup>p</sup> = 0.67] but not for GD [*F*(2,10) = 0.63, n.s.]. The peak amplitudes for the 3200/800 trailing marker were larger than those for the 800/3200 trailing marker (*t* = 0.57, *p* < 0.05). The interaction between FC and GD was not significant [*F*(1.32,6.60) = 0.20, n.s.].

**Table 4** shows the peak RA latencies in the iROI from Experiment 2. A 4 (FC: 800/800, 800/3200, 3200/800, 3200/3200) × 3 (GD: 0, 30, 80 ms)ANOVA was performed on the peak RA latencies of the leading marker as well as the trailing marker to observe the

**FIGURE 5 | Averaged regional activities (RAs) in the left auditory inspection region of interest from 10 participants in Experiment 1. (A)** RAs for the WF conditions. **(B)** RAs for the BF conditions. Thick lines under the

horizontal axis of each RA represent the time range of stimulus presentation for the 0-ms (green), 30-ms (red), and 80-ms (blue) gaps. Filled lines denote the 800-Hz markers, while open lines denote the 3200-Hz markers.



Latencies are given as mean (±SD).

timing of the onset responses for both markers. For the leading marker, no significant main effect of FC [*F*(3,15) = 0.356, n.s., η2 <sup>p</sup> = 0.67] or GD [*F*(2,10) = 1.57, n.s., η<sup>2</sup> <sup>p</sup> = 0.24] were found. Peak RA latencies in response to the leading marker appeared to be around 100 ms after stimulus onset for all FCs. For the trailing marker, ANOVA revealed a significant main effect of GD [*F*(2,10) = 204.50, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.98], but not for FC

[*F*(3,15) = 2.17, n.s., η<sup>2</sup> <sup>p</sup> = 0.35]. The interaction between FC and GD was significant [*F*(2.74,13.68) = 16.86, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.76]. In WF conditions, there was no difference in latency between 0 and 30-ms gaps (800/800: 0 vs. 30 ms: *t* = 3.83; 3200/3200: 0 vs. 30 ms: *t* = 12.83). The N1m peak latency for the 0-ms gap in WF conditions appeared to be around 30 ms after the onset of the trailing marker.

**inspection region of interest from 6 participants in Experiment 2. (A)** RAs for the WF conditions. **(B)** RAs for the BF conditions. Thick lines under the

# **DISCUSSION**

This study investigated spatio-temporal characteristics of cortical responses corresponding to WF and BF gap detection in human auditory cortex using MEG. In terms of their temporal characteristics, in Experiment 1 we found that N1m amplitude for the trailing marker inWF condition was larger for 30-ms gaps than for 80-ms ones, while in BF condition, it was larger when the training marker was 800 Hz than when it was 3200 Hz. In Experiment 2, N1m amplitude was larger for 800-Hz markers than for 3200-Hz markers, regardless of the type of condition. Spatially, Experiment 1 showed that 800 and 3200 Hz markers generated activation that differed in the anterior-posterior direction, while in Experiment 2 activity differed in all directions. These results indicate different activation patterns for WF and BF conditions in spatial and temporal dimensions.

#### **AEF SOURCE LOCALIZATION DURING WF AND BF CONDITIONS**

The MNE results from the group analysis, which focused on an onset response to the trailing marker, were in line with previous MEG and fMRI studies. Our current results show that activations were estimated to be in the auditory cortex in both Experiments 1 and 2: 800-Hz responses are located more anteriorly than 3200-Hz ones (**Figures 3** and **4**). Other MEG studies

horizontal axis of each RA represent the time range of stimulus presentation for the 0-ms (green), 30-ms (red), and 80-ms (blue) gaps. Filled lines denote the 800-Hz markers, while open lines denote the 3200-Hz markers.

have shown that when stimulus frequencies are increased, the N1m shifts to lateral to medial direction along the surface of the auditory cortex (Romani et al., 1982; Pantev et al., 1988, 1995). Several fMRI studies have reported that areas most responsive to high frequency tones are located in the posterior and medial regions, while those selective for low frequency tones are located at the anterior and lateral regions (Talavage et al., 2000; Formisano et al., 2003).

The source locations activated by the 3200-Hz tone were less concentrated, while those activated by the 800-Hz tone were reproducible and stable (especially in Experiment 1), as indicated by the relatively larger standard deviations in the y-axis for 3200-Hz tones compared with 800-Hz tones (**Figure 3** and **Table 2**). Additionally, the statistical significance of differences along the x- and z-directions differed between the two experiments. Because the participants of Experiments 1 and 2 were not identical, differences in the estimated center locations between the two experiments might in part be owing to differences in auditory cortex anatomy across individuals. Indeed, inter-participant variability in the location of the recorded cortical activity has often been reported in MEG and fMRI studies (e.g., Formisano et al., 2003; Lütkenhöner et al., 2003; Zatorre and Schönwiesner, 2011).

#### **REGIONAL ACTIVITY FOR THE WF AND BF CONDITIONS**

In both Experiments 1 and 2, in theWF condition, stable activation patterns of the N1m-peak amplitude were observed for both FCs (800/800 and 3200/3200) across 30- and 80-ms gap durations. In contrast, in the BF condition, the RA pattern for the trailing marker was different depending on the trailing markers'frequency (3200/800 or 800/3200; **Figures 5B** and **6B**). In both Experiments 1 and 2, amplitudes were significantly higher for 800-Hz tones than for 3200-Hz ones.

The results of Experiment 2 showed onset responses to the trailing marker in all conditions including WF with a 0-ms gap. Two of the six participants exhibited onset responses to the trailing marker with a 0-ms gap. When re-analyzing the N1m response in the WF condition after excluding these two participants, the N1m amplitudes to the 0-ms gap condition (0.29 for 800 Hz and 0.31 for 3200 Hz) were as small as waveform baseline (about 0.2, as indicated in **Figures 5** and **6**), although there were no significant differences among the three gap durations [*F*(2,6) = 3.34, n.s., η2 <sup>p</sup> =0.53]. In theWF condition, neurophysiological sensory sensitivity to the gap might be highly correlated with its psychophysical threshold. Indeed, amplitude of gap-evoked responses has been shown to increase as a function of gap duration, and be correlated with the psychological threshold of each participant (Witton et al., 2012). Therefore, we assume that differences in N1m amplitude for the 0-ms gap in the WF condition might be related to individual differences in the sensitivity to gaps. For the BF condition, onset responses clearly appeared for all trailing markers, even when the gap was lacking. The response to the 0-ms gap in the BF condition was close in amplitude to those in which the gap lasted 30 or 80 ms, making comparisons between the gap and no-gap responses difficult for the BF condition. These complexities of response patterns in the BF condition might be connected to the large individual differences that are seen in gapdetection thresholds during the BF condition (Formby and Forrest, 1991).

The difference betweenWF and BF conditions in onset response to trailing marker when no gap was present might indicate a difference in the underlying neural processing for WF- and BF-gap detection. For the WF condition, responses to the onset of the leading and trailing markers occurred for a single frequency in temporally close timing. In this case, a neuronal population in a single area should activate to respond to the leading and the trailing marker. As there was no additional cue indicating frequency change after the gap, the response to the trailing marker was not robust, especially when the amplitude difference between the two markers was absent or very small (i.e., a 0-ms gap). For the BF condition, the different responses to the onset of the leading and trailing markers occurred for different frequencies. Because the response to the trailing marker occurred in neural populations in different areas than the leading marker, the onset response to the trailing markers should be salient even when a gap is absent (Phillips, 1999; Eggermont, 2000; Heinrich et al., 2004; Lister et al., 2007).

#### **FUNCTIONAL CHANNELS AND TONOTOPIC ORGANIZATION**

In WF-gap detection, both leading and trailing markers are considered to be processed in a frequency-selective auditory pathway (i.e., channel) in the auditory stream for detecting temporal discontinuity, and WF-gap detection can be achieved peripherally with relative ease, with a gap-detection threshold around 2–3 ms (Plomp, 1964; Penner, 1977). Such small gap-detection thresholds have been explained in terms of the properties of the auditory periphery (Shailer and Moore, 1983). Conversely, in BF-gap detection, the leading and trailing markers are processed through separate frequency pathways because both markers usually have different or non-overlapping spectral content. BF-gap detection is presumably performed centrally (Phillips et al., 1997; Phillips, 1999; Eggermont, 2000). Multi-unit recordings in cat primary auditory cortex showed that the firing patterns of neurons in auditory cortex reflect minimum detectable gap thresholds that are similar to thresholds measured psychophysically in humans (Phillips et al., 1997; Eggermont, 2000). Eggermont (2000) suggested that the secondary auditory cortex and anterior auditory field are also involved in gap detection. Because the N1m response to the sound marker was suggested to be related to the psychophysical threshold in humans (Witton et al., 2012), the N1m sources, such as the supra temporal plane, could be involved in gap detection as well. In humans, tonotopic organization in auditory cortex has been verified with MEG (Romani et al., 1982; Pantev et al., 1988), EEG (Bertrand et al., 1988), and fMRI (Talavage et al., 2000; Formisano et al., 2003). Tonotopic organization has been observed in the superior temporal plane, including HG, Heschl's sulcus, and the superior temporal gyrus (e.g., Talavage et al., 2000). Examining the frequency channel from the perspective of tonotopic alignment in human auditory cortex could yield new and interesting findings.

So far, studies have reported modulation of EEG components related to the processing of the leading and the trailing markers via a sensor-level approach (Heinrich et al., 2004; Lister et al., 2007). Compared with EEG, MEG measurement allows for more advanced analyses, especially in respect to the spatial resolution. By employing MEG, we showed the spatial separation between the frequency channels corresponding to the leading and trailing markers in terms of tonotopic organization in the auditory cortex. We assumed that frequency channels can be represented by iROI and RAs in iROI (i.e., RAs; **Figures 2–6**). The investigation of iROI and RA in the auditory cortex is the first step to delineate cortical activation related to the processing of gap detection. Our approach using iROI and RA will be useful for investigating the gap-detection mechanism.

#### **LIMITATIONS AND FUTURE RESEARCH**

Using MEG/EEG for source localization of auditory responses to high frequency ranges can be difficult because of their limited spatial resolution. Studies that record auditory evoked brain responses often adopt 500–2000 Hz tones because the sources for these frequency tones have been consistently estimated to be in the auditory cortex (e.g., Stapells et al., 1994). Because we used a higher frequency tone (i.e., 3200 Hz) than usually examined frequency ranges, the results of iROI did not exhibit concentrated locations. Therefore, we were unable to make systematic analyses across the participants, i.e., we were not able to mark ROI on the standard brain first and then project it onto the individual's brain. We need to accumulate more evidence regarding the tonotopic organization of wider frequency ranges to confirm the reliability of our results. In addition, the gap duration adopted in our current study was determined somewhat arbitrarily and we did not measure gap-detection thresholds to WF and BF stimuli individually for each participant. Therefore, whether the durations used in our experiments really reflect the gap thresholds of the participants is unclear. Moreover, we did not measure the hearing levels for each participant, and we are unable to say whether auditory sensitivity to the tones might contribute to the amplitude differences found in the current data. A more detailed analysis will require several patterns of FCs for BF stimuli and individual gap-detection thresholds for each participant under appropriate stimulus settings. Our RA analysis that was based on tonotopic organization has provided a clue that helps us understand how gap detection in the auditory cortex is accomplished.

#### **CONCLUSION**

Auditory gap detection is one of the most popular issues with respect to human mental chronometry. Here, we used MEG and focused on how the auditory cortex responds to gaps bounded by tones of either the same or different frequencies. The sourceactivation maps and regional time-course waveforms indicated distinct patterns between the WF and BF conditions at the cortical level. One clear difference in temporal patterns between the two conditions was in the sensitivity to trailing marker onsets when no gap was present: the onset responses to the trailing marker depended on length of the gap in the WF condition, whereas it depended mainly on the differences in tonal frequency in the BF condition. Further, we showed frequency sensitive brain activity in the human auditory cortex that was related to gap detection and based on tonotopic organization. Frequency channels can be represented by iROI and RAs in iROI (i.e., RA). Although future studies are required, our findings open a new door to better understanding of gap-detection processing.

#### **ACKNOWLEDGMENTS**

We would like to express our gratitude to Drs. Koji Inui and Bernhard Ross for their valuable suggestions that helped improve our manuscript. This research was supported by a Research Grant from the Kawai Foundation for Sound Technology and Music, Grant-in-Aid from the Japan Society for the Promotion of Science for Scientific Research (A) 25240023 to Shuji Mori, and Education/Research Program/Research Center Formation Project (P & P) for female researchers (category F) and Ono Acoustics Research Fund to Takako Mitsudo.

#### **REFERENCES**


and MEG for high-resolution imaging of cortical activity. *Neuron* 26, 55–67. doi: 10.1016/S0896-6273(00)81138-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 April 2014; accepted: 10 September 2014; published online: 09 October 2014.*

*Citation: Mitsudo T, Hironaga N and Mori S (2014) Cortical activity associated with the detection of temporal gaps in tones: a magnetoencephalography study. Front. Hum. Neurosci. 8:763. doi: 10.3389/fnhum.2014.00763*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Mitsudo, Hironaga and Mori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Temporal dysfunction in traumatic brain injury patients: primary or secondary impairment?

# *Giovanna Mioni 1,2\*, Simon Grondin1 and Franca Stablum2*

*<sup>1</sup> École de Psychologie, Université Laval, Québec, QC, Canada*

*<sup>2</sup> Department of General Psychology, University of Padova, Padova, Italy*

#### *Edited by:*

*José M. Medina, Universidad de Granada, Spain*

#### *Reviewed by:*

*Shuji Mori, Kyushu University, Japan Deana Davalos, Colorado State University, USA*

#### *\*Correspondence:*

*Giovanna Mioni, École de Psychologie, Pavillon Félix-Antoine-Savard, 2325, rue des Bibliothèques, Université Laval, Québec, QC G1V 0A6, Canada e-mail: mioni.giovanna@gmail.com*

Adequate temporal abilities are required for most daily activities. Traumatic brain injury (TBI) patients often present with cognitive dysfunctions, but few studies have investigated temporal impairments associated with TBI. The aim of the present work is to review the existing literature on temporal abilities in TBI patients. Particular attention is given to the involvement of higher cognitive processes in temporal processing in order to determine if any temporal dysfunction observed in TBI patients is due to the disruption of an internal clock or to the dysfunction of general cognitive processes. The results showed that temporal dysfunctions in TBI patients are related to the deficits in cognitive functions involved in temporal processing rather than to a specific impairment of the internal clock. In fact, temporal dysfunctions are observed when the length of temporal intervals exceeds the working memory span or when the temporal tasks require high cognitive functions to be performed. The consistent higher temporal variability observed in TBI patients is a sign of impaired frontally mediated cognitive functions involved in time perception.

#### **Keywords: traumatic brain injury, time perception, time reproduction, time production, time discrimination, executive functions**

Adequate temporal abilities are important to perform most of everyday activities and understanding how human perceive time is always an engaging question. Good temporal skills are essential for normal social functioning, such as crossing a busy street, preparing a meal or organizing the daily activities. Indeed, humans have to process time across a wide range of intervals, from milliseconds up to the hour range (Fraisse, 1984; Pöppel, 2004; Buhusi and Meck, 2005; Grondin, 2010).

One of the most influential models of time processing, the Scalar Expectancy Theory (SET; Gibbon et al., 1984) assumes that temporal judgments are based on three processing stages: the clock, memory, and decision stages. According to the SET model, the first stage consists of a pacemaker emitting pulses; these pulses pass through a switch and are stored into an accumulator. The content of the accumulator provides the raw material for estimating time (clock stage). The outcome from the accumulator is stored in the working memory system for comparison with the content in the reference memory, which contains a long-term memory representation of the number of pulses accumulated on past trials (memory stage). Finally, a decision process compares the current duration values with those in working and reference memory to decide on the adequate temporal response (decision stage).

Errors in temporal processing may depend on different factors and occur at each stage of the SET model. Variations in the rate of pulses' emission by the pacemaker are often reported to be an important cause of temporal errors. These variations have several causes like changes in body temperature (Hancock, 1993; Aschoff, 1998), experiencing emotions (Angrilli et al., 1997; Droit-Volet et al., 2013; Grondin et al., in press) and using pharmacological substances (Meck, 1996; Rammsayer, 2008). The switch is the part of the clock process that is directly associated with the mechanisms of attention. When the switch is closed, the pulses that are emitted by the pacemaker are accumulated in the counter. Indeed, it is the amount of attention paid to time that determines the accumulation of pulses in the counter. The demonstration of the role of attention in temporal processing is often based on the dual-task paradigm, in which attention has to be divided between temporal and non-temporal tasks. Results showed that when more attention is dedicated to time, more pulses are accumulated in the counter and less temporal errors are produced (Zakay and Block, 1996, 2004; Block and Zakay, 2006). When subjects are asked to estimate time and execute other cognitive tasks, the accuracy of time estimation is reduced because time estimation shares attentional resources with the non-temporal tasks and the amount of the shared resources depends on the nature of the second task (Brown, 1997). Finally, a part of the variance in the processing of time depends on memory and decisional processes (Penney et al., 2000; Pouthas and Perbal, 2004; Wittmann and Paulus, 2008). In fact, the quality of the interval's representation in reference memory is a source of variability in temporal processing (Pouthas and Perbal, 2004; Grondin, 2005). When the content of the accumulator is transferred to working memory for the comparison with the content stored in reference memory, the temporal representation retrieved from the reference memory might have been modified according to the characteristics of the memory system (Harrington and Haaland, 1999; Penney et al., 2000; Ogden et al., 2008).

# **DIFFERENT TEMPORAL RANGES AND DIFFERENT METHODS FOR INVESTIGATING TIME PERCEPTION**

For investigating time perception, two factors are critical, namely the temporal range (Grondin, 2001, 2012) and the method employed (Zakay, 1990, 1993; Grondin, 2008; Tobin et al., 2010). Regarding the temporal range, very brief intervals have received special attention because they are directly involved in motor coordination and in the processing of speech and music (Pöppel, 2004; Grondin, 2010). There are reasons to believe that distinct temporal processes are involved with intervals above vs. below 1 s (Penney and Vaitilingam, 2008; Rammsayer, 2008). While the basal ganglia and the cerebellum are involved in the processing of both the short and the long intervals, the contribution of the prefrontal regions seems limited to the processing of long intervals (Meck, 2005; Rubia, 2006). Indeed, the cerebellum and basal ganglia would be related to the internal clock mechanism, cognitive functions necessary to complete a temporal task being assumed by the prefrontal areas.

Traditionally, authors distinguish four methods for investigating time perception: time production, verbal estimation, time reproduction and time discrimination (Allan, 1979; Block, 1989; Zakay, 1993; Mangels and Ivry, 2000; Gil and Droit-Volet, 2011a). There are many other methods described in the timing and time perception literature (Grondin, 2008, 2010), but for the sake of the present review, it is relevant to focus on classical ones. Time production and verbal estimation tasks may be considered the two sides of the same coin and reflect the same underlying temporal processes and mechanisms (Allan, 1979; Block, 1990). In time production tasks a participant has to produce an interval equal to an interval previously reported (i.e., "Produce 2 s"). In the verbal estimation tasks, after experiencing target duration, a participant has to translate this subjective duration into clock units. Time production and verbal estimation are appropriate ways for investigating individual differences related to the internal clock (its speed rate or the variables influencing it). Because humans have a tendency to round off the time estimates with chronometric units, verbal estimations produce more variability and is less accurate than time production method. In time reproduction tasks, after first experiencing target duration, a participant is asked to delimit a time period, usually with finger taps, equivalent the target duration (Mioni et al., 2014). Compared to time production or verbal estimation tasks, a time reproduction task is less used to investigate individual differences at the internal clock level. In fact, the speed rate of the internal clock is the same when experiencing the target duration and when reproducing it. Finally, in time discrimination tasks, a participant has to compare the relative duration of two successive intervals (standard—comparison) by indicating which one was longer or shorter. Note that a time-order error (TOE) is often observed when performing a time discrimination task with the presentation of two successive stimuli. The TOE is defined as positive if the first stimulus is over-estimated or as negative if the first stimulus is under-estimated relative to the second stimulus (Hellström, 1985; Eisler et al., 2008). Just like with the time reproduction method, any clock rate variation would not be detected with a time discrimination task because the processing of both the standard and the comparison intervals would be affected (Zakay, 1990; Rammsayer, 2001; Mioni et al., 2013a).

Researchers are using the entire repertory of methods but in most cases they give no explanation for the selection of a specific one. It is obvious that each method activates different timerelated processes and presents some specific perceptual errors. For example, participants tested with the verbal estimation methods are prone to respond to the estimated duration in round number and produced a great amount of variability compared to the other methods (Zakay, 1990; Grondin, 2010). Time reproduction is considered to be more accurate and reliable than time production and verbal estimation; however, it is less useful for investigating variations in the pacemaker rate. Block (1989) noted that time production and verbal estimation show more intersubject variability than time reproduction or time discrimination, but can be successfully used in studies where the rate of the internal pacemaker is manipulated. Others have pointed out that time discrimination is the purest measure of time perception because briefer intervals can be used, limiting the involvement of additional cognitive processes caused by the processing of long temporal intervals (Rubia et al., 1999; Block and Zakay, 2006; Mioni et al., 2013b). However, the time discrimination task is prone to TOE (Eisler et al., 2008).

Taken into consideration that each method activates different time-related processes, one way to select the appropriate method is to take the temporal interval under investigation into account (Gil and Droit-Volet, 2011a). Time discrimination tasks are often chosen for very brief intervals (from 50 ms up to a few seconds) while verbal estimation, time production, and time reproduction tasks are often used with longer intervals (Grondin, 2008, 2010).

Data collected from time reproduction, time production and verbal estimation tasks may be scored in term of absolute score, relative error and/or coefficient of variation. Briefly, the absolute score reflects the errors' magnitude, regardless its direction (Brown, 1985; see also Glicksohn and Hadad, 2012). The relative error reflects the direction of the timing error. It is measured by dividing the estimated duration (*Ed*) of the participant by the target duration (*Td*) (RATIO = *Ed/Td*). A score of 1 means that the estimation is perfect; a score above 1 reflects an overestimation; and a score below 1 means that the interval was underestimated. Finally, the coefficient of variance (CV) is an index of timing variability over a series of trials. The CV is the variability (for instance, one standard deviation) divided by the mean judgments. In the case of time discrimination tasks, performance is analyzed in terms of sensitivity and perceived duration (Grondin, 2008, 2010). Depending on the exact method used for discriminating intervals, different dependent variables can be used. For instance, for sensitivity, it could be the proportion of correct responses, *d*- , difference threshold or a coefficient of variation (difference threshold divided by the bisection point); and, for perceived duration, it could be the proportion of "long" responses, *c*, or a bisection point on a psychometric function.

### **CEREBRAL BASES OF TEMPORAL PROCESSING**

Different brain areas have been identified to play a critical role in temporal processing. By identifying the brain areas and networks responsible for governing temporal processing, researchers can now study the reasons of temporal impairment. Studies have shown that patients with focal lesions to frontal brain regions (both right and left frontal areas) are impaired in their ability to estimate temporal intervals (Nichelli et al., 1995; Rubia et al., 1997; Harrington et al., 1998; Mangels et al., 1998; Casini and Ivry, 1999). In particular, the integrity of the right dorso-lateral prefrontal cortex and right inferior parietal lobe has been shown to be necessary for the discrimination and estimation of intervals of several seconds (Rubia et al., 1997; Harrington et al., 1998; Mangels et al., 1998; Kagerer et al., 2002). The importance of the cerebellum in timing processes is also well-established. Patients with cerebellar lesions showed poor performances on both motor tapping and time estimation tasks, both in the range of hundreds of milliseconds and of a few seconds (Ivry and Keele, 1989; Ivry and Diener, 1991; Harrington et al., 2004; Gooch et al., 2010). The role of the basal ganglia in time estimation and motor timing functions is confirmed by studies with Parkinson's disease patients showing deficits in motor timing and time perception that can be improved with dopaminergic treatments (Jones et al., 2008; Merchant et al., 2008). Finally, the parietal cortex is also emerging as an important locus of multimodal integration of time, space and numbers and the right inferior parietal cortex seems to be necessary for rapid discrimination of temporal intervals (Walsh, 2003a,b; Alexander et al., 2005; Bueti and Walsh, 2009; Hayashi et al., 2013).

However, most of the brain areas and networks involved in temporal processing are also involved in other cognitive functions (Kane and Engle, 2002; Busch et al., 2005; Aharon Peretz and Tomer, 2007). While frontally mediated cognitive processes (i.e., attention, working memory, executive functions, etc.) play an important role in temporal processing (Rao et al., 2001; Perbal et al., 2002; Baudouin et al., 2006a,b; Mioni et al., 2013a,b), frontally mediated cognitive deficits are well-documented in traumatic brain injury (TBI) patients (Azouvi, 2000; Leclercq et al., 2000; Boelen et al., 2009; Stuss, 2011).

# **TIME PERCEPTION IN TRAUMATIC BRAIN INJURY PATIENTS**

Temporal impairments in patients with TBI are expected considering the disruption of cognitive functions involved in temporal processing. However, what is less clear is whether TBI patients present a "pure" temporal impairment due to disruption of some brain areas and of the network specifically involved in temporal processing, or present a temporal dysfunction mainly because of an impairment of the cognitive functions involved in temporal processing.

#### **MAIN CHARACTERISTICS OF TBI PATIENTS**

TBI presents unique problems to its survivors, their relatives and others involved in their rehabilitation. It occurs predominantly in young adults, most commonly males. Neuropathological evidences suggest a marked heterogeneity of injuries across individuals and the delineation of the precise nature and extent of an injury in an individual might be very difficult. However, it is apparent that diffuse axonal injury is common, and that damage occurs most frequently in the frontal and temporal lobes. TBI usually results in immediate loss or impairment of consciousness, followed by a period of confusion. Following the return of orientation, TBI patients exhibit sensorimotor, cognitive and behavioral sequels, which vary widely in their severity. In the majority of cases, it is the cognitive changes which are most disruptive and disabling in the long term. These may include deficits of attention, speed of processing, memory, planning and problem solving, and lack of self-awareness (Ponsford et al., 1995; Lezak, 2004).

Although investigating time perception in TBI patients is of particular interest from both a clinical and experimental point of view, there is not much empirical work on the temporal dysfunctions of these patients. Indeed, TBI patients often report such dysfunctions. Considering that an impaired sense of time could affect the daily adaptive functioning of patients recovering from TBI, understanding fully the causes of the temporal impairments observed in TBI patients is crucial. In addition to contribute to the understanding of the brain areas and networks involved in temporal processing, studying temporal dysfunctions in TBI patients should conduct to the elaboration of appropriate rehabilitation programs.

#### **METHODOLOGICAL ISSUES**

A computer-based search involving PsycInfo, PubMed and Web of Science was conducted using the terms: TBI, closed head injury, temporal perception, time estimation, time reproduction, time production, time discrimination, duration reproduction and duration production. In addition, reference lists from published reviews, books, and chapters were checked to identify studies that may not have been found when searching on databases. The research was conducted independently by the first author and by the library assistance at Padova University, and covered a period from 1950 to February 2014. These search methods resulted in a combined total of 88 published articles. Only studies involving specifically TBI patients and matched controls that performed temporal tasks (i.e., time reproduction, time production, verbal estimation, and time discrimination tasks) were included in the present review. Out of the 88 papers identified, 27 articles were found in more than one computer-based source. Out of the 61 different articles, were excluded from the review five articles reporting animal data, two dissertation abstracts, 18 papers reporting data with other patients (cerebellar patients, autistic patients, etc.), and 27 articles in which it was not a timing or time perception task that was used, but tasks related for instance to processing speed deficits, time recover after TBI, or temporal context memory. Finally, two articles were also excluded because they did not report new data, but data that have been published earlier in other articles.

In the end, in spite of the importance of adequate temporal abilities in everyday activities, only seven studies investigating time perception following TBI were identified and included in the present work (Meyers and Levin, 1992; Perbal et al., 2003; Schmitter-Edgecombe and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011; Mioni et al., 2012, 2013a,b). **Table 1** provides a summary of the findings reported in these articles.

#### **APPROACHING THE LITERATURE FROM A METHOD PERSPECTIVE**

Among the study selected, 4 included the performances on a time reproduction task (Meyers and Levin, 1992; Perbal et al., 2003; Mioni et al., 2012, 2013b), 3 on a verbal estimation task (Meyers and Levin, 1992; Schmitter-Edgecombe


**141**


*aloud for the stimulus duration; Concurrent reading, temporal task* + *non-temporal task. aReferences to neuropsychological tasks are not reported because the authors referred to different versions of the tasks; please refer to the specific articles for the appropriate references. bAnalyses were conducted on RATIO.*

and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011), 2 on a time production task (Perbal et al., 2003; Mioni et al., 2013b), and 2 on time discrimination task (Mioni et al., 2013a,b).

The studies conducted with the time reproduction task showed that TBI patients were as accurate as controls (RATIO) and showed higher variability (CV), indicating dysfunction in maintaining a stable representation of the temporal intervals. In the study conducted by Perbal et al. (2003), participants were also asked to perform a secondary task (non-temporal task) together with the time reproduction task to investigate the effect of reduced attentional resources on time perception. Similar RATIO was observed in TBI patients and controls in both simple (time reproduction only) and concurrent (time reproduction + nontemporal task) conditions. Both TBI patients and controls underreproduced temporal intervals, in particular when the secondary non-temporal task was performed together with the time reproduction task. When the CVs were taken into consideration, TBI patients were more variable than controls when the secondary task was included.

The studies conducted with a time production task confirmed the results obtained with the time reproduction task. TBI patients were as accurate as controls (RATIO) but showed higher temporal variability (CV) (Perbal et al., 2003; Mioni et al., 2013b). Regarding the impact of a concurrent non-temporal task, no effect was found (time production only vs. time production + non-temporal task) and this finding applies to both groups. TBIs and controls showed the same performances (RATIO and CV) in both simple and concurrent conditions (Perbal et al., 2003).

Three studies were conducted with a verbal estimation task but performance was analyzed only in two of them. Indeed, in Meyers and Levin's (1992) study, performance at verbal estimation task was not analyzed due to the extreme variability noted in the TBI sample. Schmitter-Edgecombe and Rueda (2008), as well as Anderson and Schmitter-Edgecombe (2011), reported lower accuracy (absolute score), higher under-estimation (RATIO) and more variability (CV) in TBI patients than controls.

Finally, two studies were conducted with a time discrimination task. TBI patients were less accurate (proportion of correct responses) and more variable (CV) than controls (Mioni et al., 2013a,b). Moreover, Mioni et al. (2013a) examined the TOE in the time discrimination task. TBI showed a greater TOE than controls, indicating a bias in responding "short" when the standard was 500 ms (positive TOE) and responding "long" when the standard was 1300 ms (negative TOE). It is worth mentioning that a TOE is always observed in a time discrimination task (Hellström, 1985), but that the magnitude is greater in TBI patients.

In brief, TBI patients and controls have similar performances (absolute score or RATIO) when time reproduction and time production tasks are employed. However, TBI patients performed less accurately than controls when verbal estimation and time discrimination tasks were used. Moreover, in all studies, variability is higher with TBI patients than with controls.

#### **APPROACHING THE LITERATURE FROM A TEMPORAL RANGE PERSPECTIVE**

A review as a function of the length of the intervals under investigation first reveals that most studies (5 out of 7) are concerned

*cAnalyses were conducted between TBI patients (oriented and disoriented TBI together) and controls.*

*dAnalyses were conducted on RATIO as reported in the original paper.*

*eAnalyses were conducted on proportion of correct responses.* with long intervals (between 4 and 60 s). Lower performances are observed only when temporal intervals are longer than 45 s, probably because the temporal intervals exceed the working memory span (Mimura et al., 2000). In the range between 4 and 38 s, TBI patients seem to be as accurate as controls in terms of absolute score and RATIO. Only two studies have investigated temporal abilities in TBI patients with short durations (in the range of milliseconds to a few seconds), which might be particularly interesting considering that some of everyday activities are executed within this time range (Block, 1990; Block et al., 1998; Pöppel, 2004). Moreover, by employing short durations, there is a reduced load of higher cognitive processes because the processing of temporal intervals below 1 s is expected to be more automatic (Lewis and Miall, 2003). Nevertheless, it cannot be excluded that the involvement of higher cognitive functions are deployed when short intervals are processed. This involvement is expected to be task-related rather than time-related. In fact, the involvement of higher cognitive processes is expected in task that requires more cognitive control (e.g., time reproduction and time discrimination). The two studies that used short temporal intervals (between 500 and 1500 ms) reported that TBI patients were less accurate (absolute score and proportion of correct responses) than controls in particular when the standard duration was 500 ms; when relative errors were analyzed, both TBI and controls over-estimated 500 ms duration and under-estimated longer durations (1000 and 1500 ms). Consistent with previous finding obtained with longer temporal intervals, TBI patients showed higher temporal variability (Mioni et al., 2013a,b).

#### **LINKING TIME PERCEPTION AND NEUROPSYCHOLOGICAL TASKS**

As we mentioned before, frontally mediated cognitive processes (i.e., attention, working memory, executive functions, etc.) play an important role in temporal processing (Rao et al., 2001; Perbal et al., 2002; Baudouin et al., 2006a,b). Moreover, considering that TBI patients often present frontally mediated cognitive dysfunctions, it is of interest to determine what the impact of frontally mediated cognitive impairment on time perception is. **Table 2** provides a summary of correlation analyses conducted between time perception and neuropsychological tasks.

Despite the fact that, different duration ranges are employed in different studies, and considering the fact that different studies consistently showed that different systems are involved in the processing of short (hundreds of milliseconds) and long (few seconds) temporal intervals, only three studies (Schmitter-Edgecombe and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011; Mioni et al., 2013a) reported correlation analyses between cognitive functions and different range of temporal intervals. In Mioni et al. (2013a), results showed that attention, working memory and speed of processing functions were involved when the temporal interval was 1300 ms (long standard interval) in both TBI and controls; but only in TBI patients working memory and speed of processing were involved when the standard interval was 500 ms. In the other two studies (Schmitter-Edgecombe and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011) the results showed significant correlations between longer temporal intervals (45 and 60 s) and spatial and verbal memory.

Overall, when the correlations analyses were reported, a representative index for the temporal tasks was calculated and correlated with the performance at the neuropsychological tests. Regarding the time reproduction task, significant correlations were found with the working memory index (Perbal et al., 2003; Mioni et al., 2012, 2013b1). Moreover, in Mioni et al. (2013b), significant correlations were also found between time reproduction index (absolute score) and attention and executive functions indices, suggesting a high involvement of cognitive resources for executing accurately the time reproduction task.

In Perbal et al. (2003), the time production index of temporal accuracy (RATIO) correlated significantly with indices of free tapping and 1-s finger tapping2 . Moreover, the time production index of temporal variability (CV) correlated with speed of processing. In Mioni et al. (2013b), there was minimal involvement of higher order cognitive functions (attention, working memory and speed of processing) in the time production task. In both Schmitter-Edgecombe and Rueda (2008) and Anderson and Schmitter-Edgecombe (2011), significant correlations were found between verbal estimation task and indices of visuo-spatial and verbal memory tests. Finally, regarding time discrimination task, both Mioni et al. (2013a,b) reported significant correlations between time discrimination index and all measures of high cognitive functions included (attention, working memory, speed of processing, and executive functions), indicating a high involvement of cognitive resources in the time discrimination task.

#### **LINKING TIME PERCEPTION AND CLINICAL CHARACTERISTICS**

Overall, the studies reported the temporal performance of 151 TBI patients (male = 86) and 129 controls (male = 79) matched by age (TBI = 35.48 years; controls = 34.10 years) and level of education (TBI = 12.01 years; controls = 12.75 years). The Glasgow Coma Scale (GCS; Teasdale and Jennett, 1974) was often used to define the severity of trauma. A score of 8 or less defines a severe TBI, a score between 9 and 12 defines moderate TBI and a score above 12 defines a mild TBI. The majority of TBI patients (115 out of 151) were scored as severe TBI, 25 were moderate TBI and 11 were mild TBI. The mean time of post-traumatic amnesia (PTA) (when available) was 33.54 days. The time between the injury and the testing varied consistently across studies from 37 days to 31.40 months. The majority of patients included where tested long time after trauma. In Meyers and Levin (1992) patients were evaluated with the Galveston Orientation and Amnesia Test (GOAT; Levin et al., 1979) and they were divided into two groups according to their orientation level. The disoriented TBI patients showed a greater under-reproduction (RATIO) of long temporal intervals (15 s) compared to controls and, in the combined TBI group, the GOAT score correlated with long interval (15 s). Schmitter-Edgecombe and Rueda (2008) and Anderson and Schmitter-Edgecombe (2011) reported the results of correlations

<sup>1</sup>Meyers and Levin (1992) is the fourth study that used a time reproduction task but no correlations with neuropsychological tasks are included.

<sup>2</sup>In the finger-tapping task, participants were required to tap with their index finger, as regularly as possible at the pace they preferred (free tempo) or at a 1 s pace (1 s tempo) (Perbal et al., 2003).

#### **Table 2 | Summary table of studies that have investigated the correlation between time perception and neuropsychological tasks.**


*RATIO, relative error; CV, coefficient of variation; Simple, temporal task alone; Concurrent, temporal task* + *non-temporal task; NA, not available; ns, not significant.*

analyses conducted between performance at the temporal tasks and injury characteristics. Surprisingly, no significant correlations were found between the verbal estimation score (RATIO) and GCS, PTA or time since injury.

# **DISCUSSION**

The present work was conducted for reviewing the literature on the temporal dysfunctions of TBI patients, and for evaluating whether the temporal impairment observed is due to a disruption at the clock stage, or to the dysfunctions of the high cognitive functions involved in temporal processing. Taken together, the studies reported poorer temporal performances for TBI patients than for controls. This finding applies when investigations involve durations exceeding working memory span (Schmitter-Edgecombe and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011) or when temporal tasks require a high involvement of cognitive functions as is the case with time reproduction and time discrimination (Mioni et al., 2013a,b).

Verbal estimation and time production tasks are suitable methods to highlight variations in the internal clock rate (Block, 1990; Block et al., 1998). Lower temporal performances were observed in TBI patients when verbal estimation task was used, but only when long temporal intervals were employed (above 45 s) (Schmitter-Edgecombe and Rueda, 2008). In the case of time production, TBI were as accurate as controls both with long (4, 14, and 38 s: Perbal et al., 2003) and with short (500, 1000, and 1500 ms: Mioni et al., 2013b) intervals. The results suggest that TBI patients' temporal impairment is not due to a dysfunction at the internal clock level but to a dysfunction of high cognitive functions involved in temporal processing. This hypothesis is confirmed by the correlational analyses between time production and indices of spontaneous tempo. The positive correlation between duration production and spontaneous tempo indicated that the participants with accelerated time pacing (shorter intertap interval) were those who produced shorter durations, and the participants with the slower time pacing (longer inter-tap interval) were those who produced the longer durations (Perbal et al., 2003). These results are consistent with the accumulation process postulated by Church's model (1984) in which changes in the internal clock rate lead to differences in the production of the same objective target duration.

In the case of time discrimination, short temporal intervals were used to reduce the cognitive load required due to process long temporal intervals (Block et al., 2010). Significant differences were found between TBI and controls indicating that TBI were less accurate (proportion of correct responses) and more variable (CV) than controls. However, the high correlations observed between time discrimination index and high cognitive functions (i.e., attention, working memory and executive functions) suggest that lower performances observed in TBI patients are mainly due to reductions at the level of cognitive functions involved in temporal processing rather than a dysfunction at the interval clock rate (Mioni et al., 2013a,b).

More complicated are the results observed with the time reproduction task. In both Mioni et al. (2012) and Perbal et al. (2003), participants performed a time reproduction task together with a concurrent non-temporal task with durations ranging from 4 to 38 s. The authors employed a concurrent non-temporal task to prevent participants from using counting strategies (Grondin et al., 2004; Hemmes et al., 2004) and to investigate the effect of reduced attentional resources on time perception. The authors expected lower temporal performance in the concurrent (time reproduction + non-temporal task) compared to the simple (time reproduction only) condition and expected a higher effect of the non-temporal task on TBI patients due to the attentional dysfunction often observed in TBI patients (Busch et al., 2005; Boelen et al., 2009; Stuss, 2011). Both TBI and controls were less accurate in the concurrent-task condition compared to the single-task condition, confirming that time perception is influenced by attention. When attention is divided between the temporal task and the non-temporal task, less attention is dedicated to time, less pulses are accumulated and, consequently, there are under-reproductions of temporal intervals (Zakay and Block, 1996, 2004). However, the effect of non-temporal task was similar on TBI patients and controls and both groups underreproduced temporal intervals. Different results were observed when short intervals were used (500, 1000, and 1500 ms; Mioni et al., 2013b). TBI patients were less accurate (absolute score) and more variable (CV) than controls but showed a similar pattern of under-reproduction (RATIO). It is important to note that using the time reproduction task with short intervals is highly problematic due to the motor component required to perform the task (Droit-Volet, 2010; Mioni et al., 2014). In time reproduction tasks, participants need to integrate their motor action in order to produce a precise button press to reproduce the temporal interval. Preparing and executing a motor action requires planning and execution of motor movements that might result in additional variance (Bloxham et al., 1987; Stuss et al., 1989; Caldara et al., 2004). Therefore, it is possible that the lower performances (higher absolute score and higher variability) observed were mainly due to motor dysfunctions rather than temporal impairment. In fact, neuromotor impairment is a common symptom in TBI patients, and reaction time (RT) tests with this population have consistently revealed slowness of information processing and a deficit in divided attention (Stuss et al., 1989; Walker and Pickett, 2007). Overall, the performance at time reproduction tasks is highly correlated with working memory index and with other measures of cognitive functions (i.e., attention, executive functions).

A consistent result across all studies is the higher variability observed in TBI patients compared to controls. The difficulty of maintaining a stable representation of duration might be accentuated in patients with TBI because of problems in working memory, but also in other high cognitive functions such as sustained attention or speed of processing (Brouwer et al., 1989).

Surprisingly, no strong correlations were observed between temporal performance and clinical measures. The only significant correlation was observed between the GOAT and time reproduction task at 15 s (Meyers and Levin, 1992). The GOAT includes questions about both the past and the present events and is used to help caregivers to learn when the person no longer has PTA. The significant correlation observed might explain the higher temporal variability observed in TBI patients. It is important to note that the lack of significant correlations can also be caused by the weakness of statistical power due, in most studies, to small sample sizes.

In sum, the revision of the existing literature investigating time perception in TBI patients showed that temporal dysfunctions in TBI patients were related to deficits in cognitive functions involved in temporal processing such as working memory, attention and executive functions rather than an impairment in time estimation *per se*. In fact, temporal dysfunctions were observed when the temporal intervals exceeded the working memory span (Schmitter-Edgecombe and Rueda, 2008; Anderson and Schmitter-Edgecombe, 2011) or when the tasks employed required high cognitive functions to be performed (Mioni et al., 2013a,b). The consistent higher temporal variability observed is a sign of impaired frontally mediated cognitive functions that affect temporal representation. The involvement of high cognitive functions in temporal processing is confirmed by the correlations observed between temporal tasks and working memory, attention and speed of processing in both short and long temporal intervals (Perbal et al., 2003; Schmitter-Edgecombe and Rueda, 2008; Mioni et al., 2013a,b).

#### **FUTURE STUDIES AND DIRECTIONS**

The revision of the literature investigating time perception in TBI patients showed that authors have used, over a wide range of temporal intervals (from 500 ms to 60 s) and the classical time perception methods (Grondin, 2008, 2010). Despite the limited number of studies, the results point in the same direction and show that temporal dysfunction in TBI patients is mainly a secondary impairment due to deficits in the cognitive functions involved in temporal processing rather than to an impairment in time estimation *per se*. However, more studies should be conducted for drawing a more complete picture of the temporal dysfunctions in TBI patients, or of the source of these dysfunctions.

Future studies should assess the temporal performances in tasks where time is marked by stimuli delivered from different modalities. All the studies conducted used visual stimuli, and it is well-known that the nature of the stimuli (i.e., visual, auditory, tactile) influences temporal performance (Grondin, 2010). In particular, temporal sensitivity is higher when the stimuli are presented in the auditory modality rather than in the visual modality (Grondin, 1993; Grondin et al., 1998). By reducing the noise produced by the presentation of visual stimuli marking time, chances are probably increased to access the sources of temporal variability in TBI performances and to disentangle the variability produced by clinical characteristics and the variability due to some methodological characteristics.

Moreover, future studies should investigate the effects of emotion on time perception in TBI patients. The literature reveals that marking time with images of faces expressing different emotions can affect time perception. Facial expressions of anger, fear, happiness, and sadness generate an overestimation of time, but the facial expression of shame generates an underestimation of time (Gil and Droit-Volet, 2011a,b). Some studies also have shown that the ability to read emotion in other people's faces can be selectively impaired as a result of the head injury (Jackson and Moffat, 1987; Bornstein et al., 1989; Fleming et al., 1996; Green et al., 2004; Martins et al., 2011). Investigating the effect of emotion on time perception in TBI patients can provide important information regarding the degree of emotional impairment in TBI patients.

Finally, some studies have shown that time perception (as measured in time estimation and time production tasks) may be related to impulsiveness (Barratt and Patton, 1983; Stanford and Barratt, 1996). In particular, the internal clocks of impulsive individuals may run faster than those of non-impulsive individuals (Barratt and Patton, 1983); therefore, an impulsive individual would likely experience some temporal distortions (Van den-Broek et al., 1992). TBI patients often demonstrate impulsive behavior, in particular after damage to the orbitofrontal cortex (Berlin et al., 2004). Although, there is no clear evidence of a specific contribution of orbitofrontal cortex on time perception vs. other parts of frontal cortex, it is of interest to further investigate the different contribution of frontal areas on time perception and distinguish how impulsivity, personality, and cognitive dysfunctions are involved in the temporal dysfunctions.

### **REFERENCES**


*Prospective Memory,* eds J. Glickson and M. Myslobodsky (New York, NY: World Scientific Publishing), 25–49. doi: 10.1142/9789812707123\_0002


Fraisse, P. (1984). Perception and estimation of time. *Annu. Rev. Psychol.* 35, 1–36.

Gibbon, J., Church, R. M., and Meck, W. H. (1984). Scalar timing in memory. *Ann. N. Y. Acad. Sci.* 423, 52–77. doi: 10.1111/j.1749-6632.1984.tb23417


Zakay, D., and Block, R. A. (2004). Prospective and retrospective duration judgements: an executive controls perspective. *Acta Neurobiol. Exp.* 64, 319–328.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 February 2014; accepted: 10 April 2014; published online: 30 April 2014.*

*Citation: Mioni G, Grondin S and Stablum F (2014) Temporal dysfunction in traumatic brain injury patients: primary or secondary impairment? Front. Hum. Neurosci. 8:269. doi: 10.3389/fnhum.2014.00269*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Mioni, Grondin and Stablum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Does time ever fly or slow down? The difficult interpretation of psychophysical data on time perception

# *Miguel A. García-Pérez\**

*Departamento de Metodología, Facultad de Psicología, Universidad Complutense, Madrid, Spain*

#### *Edited by:*

*José M. Medina, Universidad de Granada, Spain*

#### *Reviewed by:*

*Yoshitaka Nakajima, Kyushu University, Japan Hannes Eisler, Stockholm University, Sweden*

#### *\*Correspondence:*

*Miguel A. García-Pérez, Departamento de Metodología, Facultad de Psicología, Universidad Complutense, Campus de Somosaguas, Madrid 28223, Spain e-mail: miguel@psi.ucm.es*

Time perception is studied with subjective or semi-objective psychophysical methods. With subjective methods, observers provide quantitative estimates of duration and data depict the *psychophysical function* relating subjective duration to objective duration. With semi-objective methods, observers provide categorical or comparative judgments of duration and data depict the *psychometric function* relating the probability of a certain judgment to objective duration. Both approaches are used to study whether subjective and objective time run at the same pace or whether time flies or slows down under certain conditions. We analyze theoretical aspects affecting the interpretation of data gathered with the most widely used semi-objective methods, including single-presentation and paired-comparison methods. For this purpose, a formal model of psychophysical performance is used in which subjective duration is represented via a psychophysical function and the scalar property. This provides the timing component of the model, which is invariant across methods. A decisional component that varies across methods reflects how observers use subjective durations to make judgments and give the responses requested under each method. Application of the model shows that psychometric functions in single-presentation methods are uninterpretable because the various influences on observed performance are inextricably confounded in the data. In contrast, data gathered with paired-comparison methods permit separating out those influences. Prevalent approaches to fitting psychometric functions to data are also discussed and shown to be inconsistent with widely accepted principles of time perception, implicitly assuming instead that subjective time equals objective time and that observed differences across conditions do not reflect differences in perceived duration but criterion shifts. These analyses prompt evidence-based recommendations for best methodological practice in studies on time perception.

#### **Keywords: perception of duration, psychophysical methods, psychophysical function, psychometric function, probabilistic models**

"There is nothing so practical as a good theory" Lewin (1951, p. 169) "There is nothing so theoretical as a good method" Greenwald (2012)

# **INTRODUCTION**

Time is crucial in our lives. We do not have a sense organ for time, but even infants 3–4 months old show some ability to discriminate short durations of different lengths (Provasi et al., 2011; Gava et al., 2012). During childhood and adolescence we develop a fine-grained perception of time surely based on our daily experience with objective time. Finely-tuned time perception seems to arise after higher-level cognitive processes are sufficiently developed (Block et al., 1999; Droit-Volet, 2013) and given explicit experience with objective time. It is nevertheless unclear whether our ability to represent and quantify time stems from a timing mechanism (an "internal clock") that keeps track of time and can be read like a watch or, rather, only reflects our learning to translate experienced intervals into magnitudes expressed in the physical units of time that we got accustomed to. In the former case, empirical differences between subjective and objective time would be caused by an acceleration or deceleration of the internal clock, which thus gives an inexact reading (i.e., the internal clock is fast or slow); in the latter, they would reflect a subjective lengthening or shortening of duration, which nevertheless gets properly quantified afterwards. Figuring out which of these processes is taking place seems impossible because the process is unobservable and any observable outcome is compatible with these two and maybe also other accounts (Block, 1990; Grondin, 2010). Mechanistic accounts of timing processes have the status of metaphors (Wackermann, 2011), but difficulties to unravel those processes does not reduce our interest in investigating the phenomenon of time perception and the factors that affect it.

Time perception studies range from descriptions of the limits of our ability to judge and discriminate elapsed time or time differences, through the study of subject variables or stimulus conditions that affect such judgments, to assessments of distorted time perception in patients with psychiatric or neurological disorders. Detailed reviews have been published that describe the state and outcomes of this research in several areas (e.g., Grondin, 2010; Spence and Parise, 2010; Vroomen and Keetels, 2010; Allman and Meck, 2012; Chen and Vroomen, 2013; Allman et al., 2014) and this paper will not provide yet another review of this type. Our focus is instead on the methods used to gather data on the relation between subjective and objective time and on the conflicting or puzzling results that use of alternative (and presumably interchangeable) methods sometimes provides. Our main goal is to analyze the assumptions underlying these methods and to derive implications on the interpretation of data gathered with them. For this purpose, widely accepted principles of time perception will be built into a model of performance in psychophysical tasks in order to analyze the underpinnings, implications, and shortcomings of the various methods. To define our context, section Experimental Methods Used in Studies on Time Perception gives a brief overview of the classes of subjective and semi-objective methods used in studies on time perception. Section A Unified Model of Performance across Semi-Objective Psychophysical Tasks presents a model of performance in semiobjective psychophysical tasks that includes widely accepted components. Application of the model to different tasks in sections Single-Presentation Methods and Paired-Comparison Methods reveals how they can render conflicting results when time perception as implemented in the model is invariant across tasks. These results and their implications for best research practices are discussed in section General Discussion and Evidence-Based Recommendations.

# **EXPERIMENTAL METHODS USED IN STUDIES ON TIME PERCEPTION**

Research on time perception comprises two major types of study (Grondin, 2010): retrospective and prospective. *Retrospective studies* assess remembered time by asking observers to quantify the time elapsed while they had been engaged in a task performed without knowledge that their time estimates would be eventually assessed (e.g., Kellaris and Kent, 1992; Friedman and Kemp, 1998; Campbell and Bryant, 2007; Arstila, 2012; Misuraca and Teuscher, 2013; Dong and Wyer, 2014). In contrast, *prospective studies* assess immediately experienced time through psychophysical tasks that can be categorized as subjective or semi-objective. In subjective methods, observers report the perceived duration of the stimulus presented in each trial, whether through the *verbal estimation task* (requesting a numerical estimate of presentation duration), the *temporal reproduction task* (asking observers to reproduce a duration of the same length), or the *temporal production task* (asking observers to produce a duration lasting the amount of time indicated verbally). The ultimate goal of subjective methods is to estimate the *psychophysical function* expressing the functional relation of subjective to objective time (see **Figure 1**), a historical endeavor of classical psychophysics (Eisler, 1976; Allan, 1983).

Semi-objective methods also involve the display of stimuli whose presentation duration varies across trials but observers are not requested to produce quantitative estimates. Instead, they are asked for categorical or comparative judgments. Semiobjective tasks include single-presentation methods and pairedcomparison methods. In the former, each trial presents a single

**FIGURE 1 | Sample psychophysical functions described by Equation (1).** The psychophysical function describes the mapping of objective time (in physical units, e.g., ms) onto subjective time (in arbitrary units). **(A)** With α = 1, β = 1, and τ = 0, μ is the identity function by which subjective duration equals objective duration. **(B)** With α = 5, β = 0.75, and τ = −10, μ is a concave function by which durations shorter than *t* = 654 ms are subjectively perceived longer than they are whereas increasingly longer durations are progressively compressed. Solid black lines illustrate the mapping for sample durations *t* = 200 ms, *t* = 1200 ms, and their midpoint at *t* = 700 ms. Dashed lines in the bottom panel illustrate that, due to the non-linear μ, the midpoint between μ(200) and μ(1200) on the vertical axis does not correspond to the midpoint between *t* = 200 ms and *t* = 1200 ms on the horizontal axis.

stimulus and a categorical response is requested; in the latter, two stimuli are presented in each trial and a comparison is requested. (Multiple-comparison methods involving three or more stimuli per trial will not be discussed here.) Single-presentation methods include the *bisection task* (asking observers to report whether the currently displayed duration is closer to a short or to a long exemplar repeatedly displayed in a preceding training phase) and the *temporal generalization task* (asking observers to report whether or not the currently displayed duration is the same as an exemplar duration also repeatedly displayed in a preceding training phase). Paired-comparison methods include the *two-alternative forcedchoice (2AFC)* or *comparative task* (asking observers to indicate which of the two stimuli in each trial had, say, a longer duration) and the *equality* or *same–different task* (asking observers to indicate whether or not the two stimuli had the same duration), although many other variants exist. Almost invariably, semi-objective methods are used to estimate the *psychometric function* describing how the probability of some response varies with duration (see **Figure 2**). Landmark points on the psychometric function are subsequently extracted to characterize aspects of time perception, including the bisection point (BP, the 50% point on the psychometric function in a bisection task), the point of subjective equality (PSE, the location of the peak of the psychometric function in the temporal generalization task or the 50% point on the psychometric function in the 2AFC task), or the difference limen (DL, a measure of the spread of a psychometric function). Also often computed is the Weber ratio (WR) defined as either DL/BP or DL/PSE. Superficially, psychometric functions estimated with semi-objective methods offer an account that differs from that provided by psychophysical functions estimated with subjective methods. Yet, the psychometric function embeds the psychophysical function, as will be seen later.

Prospective studies typically include several conditions to investigate differences in subjective time across experimental

cases γ = 0.15. The red curve in **(A)** is the same psychometric function plotted in **Figure 4A**; the red curve in **(B)** is the same psychometric function manipulations (Ulbrich et al., 2007; Wearden et al., 2010; Ogden, 2013) or subject variables (Carlson and Feinberg, 1970; Eisler and Eisler, 1994; Glicksohn and Hadad, 2012). Differences in time perception could naturally be expected to occur as a result of these factors. Our theoretical analyses will assess how the use of alternative semi-objective tasks and the way in which data are analyzed can speak about these differences.

# **A UNIFIED MODEL OF PERFORMANCE ACROSS SEMI-OBJECTIVE PSYCHOPHYSICAL TASKS**

This section presents a unified model of performance in all the semi-objective psychophysical tasks used to investigate time perception. Specific models have been proposed for individual tasks, but they are not always applicable to other tasks and, thus, they offer a fragmentary view of time perception. The model used for our purpose here extends the signal detection theory (SDT) model of Gibbon (1981), which was indeed the basis for most models of performance in semi-objective tasks. The model includes a timing component and a decisional component determining how observers use the outcome of the timing component to make a judgment and give a response.

For the timing component, the model assumes that objective time is internally represented as described by the psychophysical function μ, irrespective of the mechanism by which this representation is obtained. The psychophysical function μ reflects a mapping of objective onto subjective time that can be measured with subjective methods. This does not imply that the psychophysical function estimated with those methods for some particular stimulus should exactly govern the judgments expressed by observers in semi-objective tasks with the same stimulus. Psychophysical functions vary with the subjective method used to estimate them (Carlson and Feinberg, 1970; Angrilli et al., 1997; Gil and Droit-Volet, 2011), but also with instructions (Rattat and Droit-Volet, 2012) or with the interface used to collect responses (Mioni et al., 2014). Yet, judgments reported in semi-objective tasks must arise from a representation of time analogous to that subserving performance in subjective tasks. Extensive research has shown that the psychophysical function for duration is well-approximated by the three-parameter power function

$$
\mu(t) = \alpha (t - \ \mathfrak{r})^{\beta}, \tag{1}
$$

with an exponent β close to unity and a shift τ close to zero. Parameter values vary across stimulus types and experimental conditions (Marks and Stevens, 1968; Fagot, 1975; Eisler, 1976; Dawson and Miller, 1978; Allan, 1983) and one must consider a family μ*i*, in which the subscript (also in the parameters) denotes condition. Within a condition, subjective duration exceeds objective duration within the range of *t* for which μ*i*(*t*) > *t* whereas subjective duration is shorter than objective duration wherever μ*i*(*t*) < *t*. **Figure 1** showed two psychophysical functions described by Equation (1). If μ(*t*) = *t* (**Figure 1A**), subjective and objective time run identically; if β -= 1 (**Figure 1B**), subjective time runs faster or slower than real time (see Gibbon, 1986, his Figure 1). Across conditions, μ*i*(*t*) -= μ*j*(*t*) implies that time runs (relatively) faster in one condition than in the other.

plotted with continuous black trace in **Figure 5A**.

The parameters of μ are estimated from the durations reported by observers across repeated presentations of a set of objective durations. The fitted function thus reflects the average subjective duration of a stimulus of duration *t*. Scalar expectancy theory derived from studies with non-human animals (Church and Deluty, 1977; Gibbon, 1977; Church and Gibbon, 1982) posits that the standard deviation of subjective duration is proportional to the average subjective duration at *t*, namely,

$$
\sigma(t) = \chi \mu(t). \tag{2}
$$

This is known as the *scalar variance* assumption or the *scalar property*. A family of functions σ*<sup>i</sup>* must also be considered across conditions. With scalar variance, the coefficient of variation σ*i*(*t*)/μ*i*(*t*) of the distribution of subjective durations equals γ*<sup>i</sup>* at all *t*. Scalar variance holds only approximately in human timing although the standard deviation certainly increases with *t* (Wearden, 1991b; Lewis and Miall, 2009). In any case, the subjective duration *S* of a stimulus of duration *t* under condition *i* can be regarded as a random variable with mean μ*i*(*t*) and standard deviation σ*i*(*t*) and, without loss of generality, *S* is assumed to be normally distributed (**Figure 3**). This provides the output of the timing component in the model.

This characterization implies that the subjective duration elicited by presentation of a stimulus of duration *t* is a random value sampled from the applicable distribution, regardless of the psychophysical task or the occasion that motivated the presentation of such stimulus. In semi-objective psychophysical tasks, observers are assumed to make a decision and respond according to the values drawn for each of the stimuli presented in each trial. Modeling performance on these tasks thus calls for a decision rule specifying how observers use the current sample (or samples) of subjective duration to make a judgment and give a response. This decisional component must vary across tasks but its elements must be consistent in the sense that the decision rule for some task cannot imply aspects or processes that are explicitly regarded as inexistent or impossible under alternative tasks.

partly-occluded thick curve on the plane surface shows this psychophysical function). The distributions obey the scalar property in Equation (2) with γ = 0.15.

This is a reasonable demand on consideration that trials from different tasks can be interwoven in a session, with the response requested on each trial withheld until after stimulus presentation. In such conditions, duration(s) must be internally represented before observers know which decision rule must be used to give a response. Empirical evidence shows that the operation of the stimulus-dependent component precedes and is unaffected by the task-dependent decisional component (Schneider and Komlos, 2008; García-Pérez and Alcalá-Quintana, 2012; García-Pérez and Peli, 2014). On the same grounds, one must assume that μ*<sup>i</sup>* and σ*<sup>i</sup>* do not vary across tasks when stimuli and conditions are invariant.

Sections Single-Presentation Methods and Paired-Comparison Methods describe the model (timing output and decision rule) describing performance on the most common semi-objective psychophysical tasks, also discussing other assumptions needed to interpret the data.

#### **SINGLE-PRESENTATION METHODS**

In single-presentation methods, observers are shown a single stimulus on each trial for them to report a categorical judgment. Making this judgment nevertheless requires that the internal representation of the current stimulus is judged relative to what has sometimes been called an *internal standard*. The two methods described next differ as to how the internal standard is instated and what type of categorical judgment is requested.

#### **THE TEMPORAL GENERALIZATION TASK**

The temporal generalization task consists of a training phase and a test phase. In the training phase, observers are repeatedly shown instances of an exemplar duration *t*st designated as the standard. The test phase comprises a series of trials each of which displays a duration *t* around *t*st and asks observers to indicate whether that duration was the same as *t*st. A plot of the proportion of "same" responses as a function of test duration describes the empirical psychometric function.

When both phases use the same stimulus and conditions are identical, subjective duration must be governed by the same μ and σ in both phases. Hence, the data are expected to reveal that the test duration at which "same" responses are maximally prevalent is *t* = *t*st. But stimuli or conditions may differ across phases: The training phase may use a neutral stimulus such as an oval while the test phase uses emotional stimuli such as angry faces or taboo words. In such case, μ and σ will differ across phases if subjective time runs differently in each condition. Let subscripts "s" and "t" respectively denote the functions that apply to the standard and to the test. Then, one would expect the data to disclose the test duration whose subjective duration equals the subjective duration of the standard. Formally, this is the value *t*PSE at which μt(*t*PSE) = μs(*t*st).

Fitting model-based psychometric functions to the data is useful for these purposes. Formal models from which theoretical psychometric functions for the temporal generalization task can be derived were first proposed by Church and Gibbon (1982) and Wearden (1992). The model described next differs from these in minor respects discussed in section Differences with Previous Models.

#### *Model and assumptions*

The training phase helps observers to set an anchor point (internal standard) from the sample of subjective durations elicited by repeated presentation of the standard. The anchor is presumably placed at *S*st = μs(*t*st) and kept invariant. In the test phase, observers compare the random subjective duration *S* elicited by the current test with *S*st and respond according to the magnitude of |*S* − *S*st|. Observers are assumed to have a limited resolution to tell small differences from zero, for otherwise they would always respond "different." Under these assumptions, the decision rule states that observers respond "same" when |*S* − *S*st| ≤ δ and "different" when |*S* − *S*st| > δ, where δ is the *resolution limit* and the interval from *S*st − δ to *S*st + δ is the *indifference region*.

The mathematical form of the psychometric function for "same" responses is easily derived from these assumptions. Given that *S*st = μ(*t*st) is assumed constant and that *S* in the test phase is normally distributed with mean μt(*t*) and standard deviation σt(*t*), the probability same of a "same" response varies with test duration *t* as

$$\begin{split} \Psi\_{\text{same}}(t) &= \text{Prob}\left(|S - S\_{\text{st}}| \le \delta\right) \\ &= \text{Prob}\left(S\_{\text{st}} - \delta \le S \le S\_{\text{st}} + \delta\right) \\ &= \Phi\left(\frac{S\_{\text{st}} + \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right) - \Phi\left(\frac{S\_{\text{st}} - \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right), \end{split} (3)$$

where is the unit-normal cumulative distribution function. The psychophysical function μ is thus embedded in the psychometric function, as is the scalar property. **Figure 4A** shows the psychometric function when μ<sup>s</sup> = μ<sup>t</sup> ≡ μ and σ<sup>s</sup> = σ<sup>t</sup> ≡ σ (see the legend for parameter values) so that Equation (3) becomes

$$\Psi\_{\text{same}}(t) = \Phi\left(\frac{\mathbb{S}\_{\text{st}} + \mathbb{S} - \mu(t)}{\sigma(t)}\right) - \Phi\left(\frac{\mathbb{S}\_{\text{st}} - \mathbb{S} - \mu(t)}{\sigma(t)}\right). \tag{4}$$

Even in these conditions, same does not peak at the standard duration because *t* = *t*st maximizes Equation (4) only when σ(*t*) is a constant function independent of *t*. When σ(*t*) obeys the

governs perception of duration for the training stimulus in both columns and also for the test stimulus in **(A)**; the surface in **(B)** only differs in that α = 5.4

ordinate at each value of *t* equals the area under the cross-section at *t* of the surface in the top panel within the region that results in "same" responses.

scalar property, Equation (4) peaks at *t* = <sup>√</sup><sup>1</sup> <sup>+</sup> <sup>4</sup>γ<sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>γ<sup>2</sup> *S*st. The right-hand side of this expression evaluates to 0.99*S*st when γ = 0.1 and to 0.92*S*st when γ = 0.3, but the spacing used in empirical studies (typically 100–200 ms) is too coarse to reveal this shift. If differences in stimuli or conditions across training and test phases affect subjective time, same is further shifted because the peak of Equation (3) is further away from *t* = *t*st when μ<sup>s</sup> -= μ<sup>t</sup> (see **Figure 4B**). Note also that differences between σ<sup>s</sup> and σ<sup>t</sup> are inconsequential, as σ<sup>s</sup> plays no role in same. The gradients on either side of same are determined by how σ*<sup>t</sup>* varies with *t*. With scalar variance, same is positively skewed. The skew is further emphasized when μ<sup>t</sup> is a concave function (β*<sup>t</sup>* < 1 in Equation 1) and reduced or even reversed when μ*<sup>t</sup>* is convex (compare the two curves in **Figure 2A**).

Because same does not peak at *t* = *t*st even when μ<sup>s</sup> = μt, observed shifts of same away from *t*st cannot be interpreted as evidence that μ<sup>s</sup> -= μ<sup>t</sup> and, thus, of differences in subjective time in the conditions of the test phase relative to the training phase. Other difficulties in the interpretation of data from the temporal generalization task will be discussed in section Summary and Discussion of Single-Presentation Methods.

#### *Differences with previous models*

Wearden (1992) proposed three variants of the model of Church and Gibbon (1982), which are discussed next in our notation. All variants share two characteristics: (1) subjective duration is assumed accurate on average so that the psychophysical function is μ(*t*) = *t* in all cases and (2) the internal standard is not regarded as a fixed value but as a random variable with mean *t*st. Also, the resolution parameter (called threshold by Wearden) is fixed at δ in some variants but regarded as random with mean δ in others. All three variants use a decision rule analogous to that in section Model and Assumptions and they differ as to the assumed variances of subjective duration, internal standard, and threshold (when random).

The *modified-Church-and-Gibbon* (MCG) model assumes σ(*t*) = 0. Thus, subjective duration is not a random variable and, given μ(*t*) = *t*, it is identical to objective duration. This model places the scalar property at the internal standard (drawn in each trial from the memory representation of the standard) whereas the threshold is regarded as a random variable with fixed variance. The MCG model is structurally equivalent to our model because the distribution of |*X* − *Y*| is the same regardless of which of *X* or *Y* is the random variable and which is the constant and also because the variability of δ can be formally transferred to the internal standard. But this model presents an empirical difficulty: If subjective duration equals objective duration (a consequence of assuming μ(*t*) = *t* and σ(*t*) = 0), observers would be perfectly accurate in paired-comparison tasks asking them to judge the relative durations of two stimuli displayed in each trial (see section Paired-Comparison Methods).

The *fixed-threshold* model removes the variability of δ while leaving other assumptions of the MCG model intact. This model is also formally equivalent to our model and to the MCG model, and results reported by Wearden (1992; see his Table 1) reveal that the estimated variability of the internal standard increases under this model to capture the variability attributed to threshold under the MCG model. Finally, the *timing-variability* model assumes scalar variance for subjective duration in place of σ(*t*) = 0, also assuming scalar variance with the same γ for the internal standard and a threshold randomly drawn in each trial from a distribution with fixed variance. Structurally, this model is not equivalent to the others because it involves a ratio of independent normal random variables, whose distribution is not normal (Simon, 2002, formula 7.7). The model nevertheless produces nearly identical psychometric functions and is also functionally equivalent to the previous two and to our model, although scalar variance affects two random variables here and must result in smaller estimates of γ to account for the same data (see Table 1 in Wearden, 1992).

Because all the models of Wearden (1992) use μ(*t*) = *t*, they explicitly assume that subjective and objective time run identically and, hence, the models are incompatible with the notion that subjective time may run at a different pace, or with an interest in assessing what that pace may be and how it varies across conditions. Fitting these models to empirical data enforces the assumption of veridical time perception and succeeding at that shows that temporal generalization data are compatible with the notion that subjective time is equivalent to objective time. This outcome is not to be taken as a proof that time perception is never distorted relative to objective time but as a manifestation of non-identifiability issues hampering the interpretation of data, whose discussion is deferred to section Summary and Discussion of Single-Presentation Methods.

#### **THE TEMPORAL BISECTION TASK**

The temporal bisection task also consists of a training phase and a test phase. In the training phase, observers are shown repeated instances of exemplar durations *t*short and *t*long designated short and long, respectively. The test phase comprises trials displaying a test duration *t* typically between *t*short and *t*long. Observers are asked to judge whether the current test duration is closer to the short or to the long exemplars. A plot of the proportion of "long" responses at each test duration describes the empirical psychometric function and the 50% point on this function is taken to be the BP.

Performance is governed by common μ and σ if stimuli and conditions do not differ across phases. The BP might then be expected to lie at the midpoint between *t*short and *t*long only if μ is linear (**Figure 1A** and the blue curve in **Figure 2B**). With non-linear μ, the objective midpoint does not map onto the subjective midpoint (**Figure 1B**) and the BP would be expected to lie at the test duration associated with the subjective midpoint (red curve in **Figure 2B**). Note that the two cases in **Figure 2B** reflect an exquisite ability to bisect the subjective continuum; the different BPs simply reflect the form of μ. Bisection tasks are also used with different stimuli or conditions in the training and test phases so that μ<sup>t</sup> may differ from μ<sup>s</sup> and σ<sup>t</sup> may differ from σ<sup>s</sup> (using the same notation as before). In such cases, one expects the BP to identify the test duration that is subjectively midway between the subjective durations of the short and long standards.

Quite often, cumulative Gaussian or logistic functions are fitted to data to estimate the BP and the DL from location and slope parameters. Formal models from which suitable psychometric functions can be derived were first proposed by Gibbon (1981) and Wearden (1991a; see also Wearden and Ferrara, 1995). The model described next differs from these in some respects discussed in section Differences with Previous Models.

#### *Model and assumptions*

The training phase helps observers to set anchor points from the sample of subjective durations elicited by repeated presentation of short and long exemplars. In principle, the anchor points are assumed to be placed at *S*<sup>s</sup> = μs(*t*short) and *S*<sup>l</sup> = μs(*t*long) and also to be invariant. In the test phase, observers compare the subjective duration *S* of the current stimulus with *S*<sup>s</sup> and *S*<sup>l</sup> and respond according to which of |*S* − *S*s| or | *S* − *S*l| is the smallest. In principle, "long" responses are in order when |*S* − *S*l| < |*S* − *S*s|, which simplifies to *S* > (*S*<sup>s</sup> + *S*l)/2. For all purposes, this is as if observers set a single anchor at the subjective midpoint *S*mp = (*S*<sup>s</sup> + *S*l)/2. Assuming that observers use a point criterion and always classify *S* as closer to the short or the long exemplars is incompatible with assumptions in the model for temporal generalization: If observers can use a point criterion in the bisection task to decide whether *S* is above or below *S*mp, they should show the same capability in the temporal generalization task for deciding whether *S* is above or below *S*st, and thus they would always respond "different." Observers surely have limited resolution also in the bisection task so that they respond "short" when *S* < *S*mp − δ, respond "long" when *S* > *S*mp + δ, and cannot tell when *S*mp − δ ≤ *S* ≤ *S*mp + δ, also involving a resolution limit and an indifference region. But, because observers are forced to respond "short" or "long," they must use an extra criterion when they cannot tell. The model assumes that they respond "long" with probability ξ, reflecting their *response bias* and regardless of the criteria that render such outcome.

The psychometric function long for "long" responses is easily derived from these assumptions. Since *S*mp is assumed constant and *S* in the test phase is normally distributed with mean μt(*t*) and standard deviation σt(*t*), the probability of a "long" response varies with test duration *t* as

$$\begin{split} \Psi\_{\text{long}}(t) &= \text{Prob}\{S > S\_{\text{mp}} + \delta\} + \xi \text{Prob}\{S\_{\text{mp}} - \delta \le S \le S\_{\text{mp}} + \delta\} \\ &= \left[1 - \Phi\left(\frac{S\_{\text{mp}} + \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right)\right] + \xi \left[\Phi\left(\frac{S\_{\text{mp}} + \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right) \right. \\ & \left. - \Phi\left(\frac{S\_{\text{mp}} - \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right)\right] \\ &= 1 - \xi \Phi\left(\frac{S\_{\text{mp}} - \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right) - (1 - \xi)\Phi\left(\frac{S\_{\text{mp}} + \delta - \mu\_{\text{t}}(t)}{\sigma\_{\text{t}}(t)}\right) . \end{split} (5)$$

**Figure 5A** shows sample psychometric functions when μ<sup>s</sup> = μ<sup>t</sup> = μ and σ<sup>s</sup> = σ<sup>t</sup> = σ for several values of the response bias parameter ξ. It is noteworthy that the location of the 50% point on long varies greatly with ξ. In principle, only when μ<sup>s</sup> and μ<sup>t</sup> are linear does *t*mp map onto *S*mp (**Figure 1**). But, even when they are linear, long has its 50% point at *t*mp only when ξ = 0.5 (blue curve in **Figure 2B**). It is also noteworthy that the location and slope of long are greatly affected by the irrelevant response bias and resolution parameters ξ and δ (compare

shows a decision space analogously partitioned (now using δ = 70) but the regions now result in "short," "I can't tell," and "long" judgments although "I can't tell" judgments must still be reported as "short" or "long" responses. The bottom panel shows the psychometric functions (from Equation 5) that may result according to how observers respond when

Equation 5); the black curve arises when "I can't tell" judgments are reported as "short" or "long" responses with equiprobability (double-headed black arrow in the top panel; ξ = 0.5 in Equation 5); the dashed curve arises if δ = 0 so that observers use a point criterion at the subjective midpoint and are never undecided.

with the dashed black curve for δ = 0 in the bottom panel of **Figure 5A**), undermining interpretation of the BP and the DL.

If differences in stimuli or conditions across training and test phases affect subjective time, long is further shifted (**Figure 5B**) although given the influence of response bias, the 50% on long carries no information that can be readily interpreted in terms of the pace of subjective time. As in temporal generalization, differences between σ<sup>s</sup> and σ<sup>t</sup> are inconsequential as σ<sup>s</sup> plays no role in long. As resolution decreases (i.e., δ increases) with fixed ξ, long becomes shallower (compare the dashed and solid black curves in the bottom panels of **Figure 5**). Finally, long is not symmetric about its 50% point: Its left side rises more sharply than its right side levels off. This is due to the scalar property (for a discussion of this issue, see Killeen et al., 1997).

It is worth mentioning a mixed task in which a standard is used in the training phase (as in temporal generalization) and observers report whether the current test duration is longer or shorter than the standard (as in temporal bisection). In such case (Grondin and Rammsayer, 2003), observers do not need to build *S*mp from *S*<sup>s</sup> and *S*<sup>l</sup> in the training phase but build *S*st directly and use it in the test phase. The model psychometric function is still given by Equation (5) with *S*st in place of *S*mp. Another variant of the bisection task is the "partition bisection" of Wearden and Ferrara (1995), in which the training phase is omitted and observers are simply asked to classify test stimuli as "short" or "long" using whichever criterion they wish. The psychometric function is again given by Equation (5), except that *S*mp is a free parameter that captures the arbitrary criterion used by observers. In yet a further variant, observers receive feedback relative to the objective midpoint of the range of test durations (Grondin, 1998), which should help them to set a stable criterion.

#### *Differences with previous models*

The seminal model of Gibbon (1981) is analogous to the model just described except that he omitted the indifference region. His model thus arises by setting δ = 0 to revert to a point criterion. Gibbon analyzed versions of the model in which μ is non-linear and σ obeys the scalar property so that estimated model parameters speak of the pace at which subjective time runs. In addition, he considered the implications of decision rules involving point criteria other than *S*mp = (*S*<sup>s</sup> + *S*l)/2.

Wearden (1991a) adapted his fixed-threshold model of temporal generalization for application to bisection tasks, thus including the indifference region missing in Gibbon's (1981) model. In this model, observers draw random memories of the long and the short durations (both of which are accurate on average and have scalar variance) to compare them with the exactly perceived test duration (i.e., μ(*t*) = *t* and σ(*t*) = 0), responding "long" or "short" according to which distance is the smallest but provided that the difference of distances is beyond a fixed threshold (resolution limit). On trials in which the threshold is not exceeded, observers are undecided and always respond "long." This model is formally equivalent to our model in Equation (5) with ξ = 1, μt(*t*) = *t*, and σt(*t*) reflecting instead the variability of the memory representations. Wearden and Ferrara (1995) later made two amendments to this model: Undecided observers respond "short" or "long" with equiprobability (ξ = 0.5) and the anchor *S*mp is randomly drawn in each trial from a distribution whose mean equals the average of the set of test durations. This is the only random variable in the model but Wearden and Ferrara's (1995) writing is unclear about whether its standard deviation was fixed or increased with *t* so as to incorporate the scalar property.

By embedding the assumption that μ(*t*) = *t*, these models are unsuitable for assessing how subjective time runs compared to objective time. A 50% point found to be away from *t*mp is implicitly attributed to response bias or to a criterion *S*mp placed away from μs(*t*mp) = *t*mp (for an amendment of the model in this respect, see Wearden, 2004). Such decisional or response aspects are unrelated to time perception, which is regarded as accurate under these models. The same holds for the model of Killeen et al. (1997), which also assumes μ(*t*) = *t* and the scalar property but uses a logistic function as an approximation to .

Kopec and Brody (2010) presented a model of an entirely different nature for the bisection task. This model is not considered here because it involves assumptions, processes, and decision rules that are specific to bisection tasks and cannot describe performance in any other task. For instance, applied to a temporal generalization task, the model posits that same should have a symmetric Gaussian shape peaking at *t* = *t*st and such that same(*t*st) = 1.

#### **SUMMARY AND DISCUSSION OF SINGLE-PRESENTATION METHODS**

Psychometric functions describing performance in singlepresentation methods embed a representation of subjective duration (the functions μ and σ) and decisional aspects pertaining to how judgments are made and reported (the anchor points *S*st or *S*mp, the observers' resolution δ and, where applicable, their response bias ξ). All of these components affect the psychometric function, including its location and slope. With reasonable assumptions about these components, **Figures 4**, **5** showed that neither the empirical location of the peak of same and its gradient on either side nor the empirical location of the 50% point on long and its slope can be interpreted as pure indices of timing processes. But the interpretation of data is further complicated if three other implicit assumptions of single-presentation methods are violated.

The first assumption is that the indifference region is symmetric about the anchor point. In general, boundaries might be placed at *S*st − δ<sup>1</sup> and *S*st + δ<sup>2</sup> in the temporal generalization task (or at *S*mp − δ<sup>1</sup> and *S*mp + δ<sup>2</sup> in the bisection task), with symmetry occurring when δ<sup>1</sup> = δ<sup>2</sup> = δ. The effects of an asymmetric region are illustrated in **Figure 6**: Psychometric functions shift as a result of this *decisional bias*. Obtaining direct evidence of the symmetry of the indifference region is impossible with single-presentation data, but methods allowing this determination exist and their use has revealed that the indifference region is generally asymmetric (García-Pérez and Alcalá-Quintana, 2013; García-Pérez and Peli, 2014).

The second assumption is that the anchor points *S*st and *S*mp are respectively placed at μs(*t*st) and at (μs(*t*short) + μs(*t*long))/2 during the training phase, as if observers used the arithmetic mean of a large sample of subjective durations elicited by repeated presentation of the standard (or the short and long exemplars). If the anchor were placed elsewhere during the training phase, the

decision criterion during the test phase would not be at its presumed location and same (or long) would shift accordingly. In these conditions, shifts of the psychometric function do not necessarily reflect differences in subjective time across training and test phases even if μs(*t*) -= μ*t*(*t*) -= *t*. Obtaining evidence as to where the anchor point was placed seems impossible.

The third assumption is that anchors presumably placed at *S*st = μs(*t*st) or *S*mp = (μs(*t*short) + μs(*t*long))/2 are stable. If they drifted systematically during the test phase, aggregating data across the session would shift the psychometric function. Concerns that anchor drift may occur come from adaptation level theory (Helson, 1948), which posits that the set of durations used during the test phase defines a context that relocates the internal standard. Stimulus range effects on temporal generalization do not seem to have been studied in a way that allows determining observable consequences on the location of same, but these effects have been reported for the bisection task (Wearden and Ferrara, 1995, 1996; Penney et al., 2014). The model of Wearden and Ferrara (1995) assumes that, as a result of this, the anchor is placed at the arithmetic mean of the set of test durations (or at 95% of this value; see Wearden, 2004). But the dynamics of the underlying processes are unknown, which precludes devising ways to eliminate or compensate for their effects so that bisection data are not contaminated by criterion placement.

These difficulties undermine the interpretation of temporal generalization and bisection data even under identical conditions in the training and test phases. Consider the bisection results reported by Gil et al. (2009). The training phase used a picture of an oval with *t*short = 400 ms and *t*long = 1600 ms so that *t*mp = 1000 ms. Among conditions involving pictures of liked and disliked foods, the test phase also included a condition with the oval picture. Averaged across observers, results with the oval showed a remarkable shift: long had its 50% point

or "I can't tell" judgments is displaced upwards by 30 units relative to its location in **Figure 4** or **5**. Psychometric functions are accordingly shifted laterally to the right.

at *t* ≈ 800 ms, with long(*t*mp) ≈ 0.8. Assuming μ<sup>s</sup> = μ<sup>t</sup> = μ, σ<sup>s</sup> = σ<sup>t</sup> = σ, incorporating the scalar property, and removing the indifference region (i.e., δ = 0), Equation (5) reduces to long(*t*) = (γ − *S*mp/σ(*t*)). If *S*mp = μ(*t*mp) and τ in Equation (1) is removed, long(*t*) = (γ − *t* β mp/γ*t* <sup>β</sup>) obtains. Reproducing the shape described by data from the oval condition in Gil et al.'s Figure 2 with this function requires β ≈ 2.39 and γ ≈ 1.46, unreasonable values compared to common estimates of β in μ and γ in the scalar property. Data are nonetheless unquestionable and a 50% point at *t* ≈ 800 ms with long(*t*mp) ≈ 0.8 are empirical facts. What is less clear is what the data say about the relation of subjective to objective time, or whether time is under- or over-estimated as opposed to veridically perceived. The same data could have arisen if β = 1 (i.e., μ(*t*) = *t*) and the assumption that *S*mp = μ(*t*mp) is removed, implying that observers perceive duration veridically but for some reason they do not set the anchor at μ(*t*mp) during the training phase (Raslear, 1985; Allan and Gerhardt, 2001; Allan, 2002a,b). And the same shift could have been caused also with μ(*t*) = *t* and by reinstating the assumption that *S*mp = μ(*t*mp) if observers had a non-null indifference region (i.e., δ -= 0) and responded with bias when undecided (**Figure 5**). Which scenario is responsible for the observed results is indiscernible because all account for the data equally well.

It is remarkable that virtually all analyses of bisection data have explicitly or implicitly assumed μt(*t*) = *t* and, hence, that duration is accurately perceived. Yet, what should have thus been regarded as criterion shifts or response bias has been inconsistently interpreted as evidence of differences in perceived duration. To see that the assumption of veridical time perception is implicit when two-parameter psychometric functions are fitted to bisection data, make δ = 0 (i.e., a point criterion), μt(*t*) = *t* (i.e., veridical time perception), and σt(*t*) = *k* (i.e., remove the scalar property). In these conditions, Equation (5) becomes long(*t*) = ((*t* − *S*mp)/*k*), which is the widespread cumulative Gaussian fitted to bisection data and sometimes replaced for convenience with a logistic function. On fitting this psychometric function to data, *S*mp is regarded as a free parameter to account for observed shifts with respect to *t*mp, but this is synonymous with observers using an arbitrary criterion that varies across conditions (i.e., they do not set *S*mp at *t*mp in all conditions) and perceived duration being veridical and invariant across conditions (since μt(*t*) = *t* in all cases).

Interpretation of bisection data is more difficult when the test phase does not include a condition with the training stimulus. Consider the results reported by Tipples (2010). Stimuli in the training phase were eight-consonant strings with *t*short = 400 ms and *t*long = 1600 ms so that *t*mp = 1000 ms. The test phase used words of six different types: high arousal negative or positive, low arousal negative or positive, neutral, and sexual taboo. Since the 50% point on long was 30–40 ms higher for taboo words than for the other types of word, Tipples concluded that time flies when one reads taboo words. Yet, and leaving other issues aside, without a reference provided by the 50% point on the psychometric function for eight-consonant strings, the 50% point on long for test words is uninterpretable: The conclusion would have differed if the 50% point on long for eight-consonant strings were above that for taboo words or below that for the other types of word. Tipple's conclusion is even more puzzling on consideration that, on average across observers, the 50% point lay between 955 and 970 ms for non-taboo words and nearly at 1000 ms for taboo words (see his Figure 2). Since *t*mp = 1000 ms, the conventional (though unwarranted) conclusion should have been that time is perceived accurately only with taboo words.

These considerations apply also to the temporal generalization task, although studies assessing if time flies or slows down under certain conditions have almost exclusively used the bisection task. Measuring the psychometric function (be it long or same) for training stimuli sets a reference for comparison with the psychometric function for other types of stimuli, but this does not solve the problems of single-presentation methods. The multiplicity of factors that can shift the psychometric function away from *t*mp (or *t*st) preclude the interpretation of observed shifts as evidence of differences in subjective time across conditions. Bisection tasks are more seriously affected by this problem because response bias further alters the slope of the psychometric function (**Figure 5**) and contaminates DL estimates.

One might think that these problems would be solved by fitting psychometric functions such as those in Equations (3) or (5) to the data. Replacing the assumption of symmetry built into them (i.e., using δ<sup>1</sup> and δ<sup>2</sup> as needed instead of the single δ in them) puts into the fitted function all the factors that contribute to observed performance. Estimated parameter values would thus provide all the information needed for a proper interpretation of the data. With a psychophysical function given by Equation (1), estimates of the exponent β would directly indicate how time runs in each experimental condition provided the condition used to set the anchors is also tested. Unfortunately, models for single-presentation tasks are non-identifiable: There are infinite sets of parameter values that produce the same psychometric function (Yarrow et al., 2011; García-Pérez and Alcalá-Quintana, 2013; García-Pérez and Peli, 2014). This is not a problem of the models, but an indication that the intervening factors are inextricably confounded in single-presentation data. Data gathered with single-presentation methods are simply uninterpretable. Luckily, paired-comparison methods offer a suitable and dependable alternative with which these influences can be separated out.

# **PAIRED-COMPARISON METHODS**

Trials in paired-comparison methods display two stimuli (a standard and a test, both of which may vary across trials) for observers to make a comparative judgment. Single-presentation methods imply a comparison too, but with respect to an internal standard. In paired-comparison tasks, the standard is explicit and subject to the same type of processing as is the test. A training phase is not needed to instate an internal standard, nor are assumptions about its placement and stability. Some modifications of the temporal generalization and bisection tasks turn them into paired-comparison methods, and the models discussed here apply to them too. For instance, the *roving standard* task of Allan and Gerhardt (2001) or Rodríguez-Gironés and Kacelnik (2001) presents in each trial a short and a long exemplar (which vary across trials) so that observers compare the test duration with the current exemplars. Similarly, the *episodic temporal generalization* task of Wearden and Bray (2001) presents a variable standard in each trial which is the reference for the observers' current judgment.

In paired-comparison trials, standard and test elicit subjective durations from the applicable distributions and observers judge by comparing the values drawn in the current trial. Observers can be asked to report whether both stimuli have the same subjective duration (the *equality task*), whether the first or the second appeared to have a longer duration (the *comparative task*), or whether the first, the second, or neither was subjectively longer than the other (the *ternary task*, which blends the two other tasks). [Incidentally, the bisection task can also be administered in a ternary format (Droit-Volet and Izaute, 2009) and its application reveals an indifference region whose width and symmetry differs across observers (García-Pérez and Peli, 2014).] The outcome of the timing component of a psychophysical model of performance in paired-comparison tasks cannot vary with the question asked at the end of the trial, as discussed in section A Unified Model of Performance across Semi-Objective Psychophysical Tasks. The next section describes the model for paired-comparison tasks, including a common timing outcome and a decision rule that varies with the task.

#### **THE MODEL FOR PAIRED-COMPARISON JUDGMENTS**

The model is analogous to an indecision model derived from SDT for use in other psychophysical tasks (García-Pérez and Alcalá-Quintana, 2010a,b, 2011a,b, 2013; Alcalá-Quintana and García-Pérez, 2011; García-Pérez and Peli, 2014). Its relation to other models will be discussed in section Differences with Previous Models. In the general case when standard and test differ qualitatively (as might be when test and standard are, e.g., eight-consonant strings vs. taboo words, or pictures of an oval vs. pictures of liked foods), the subjective duration *S*st of a standard duration *t*st is normally distributed with mean μs(*t*st) and standard deviation σs(*t*st) whereas the subjective duration *S*<sup>t</sup> of a test duration *t* is normally distributed with mean μt(*t*) and standard deviation σt(*t*). Sample psychophysical functions that differ across test and standard stimuli are shown in the two right panels of **Figure 7A**. When standard and test stimuli are the same or when their differences do not affect subjective duration, μ<sup>s</sup> = μ<sup>t</sup> and σ<sup>s</sup> = σ*<sup>t</sup>* (two left panels of **Figure 7A**). Our goal here is to derive the psychometric function relative to a standard of fixed duration *t*st across trials, whether or not such trials are interwoven with trials for other standards (which will define separate psychometric functions).

The model assumes that *S*st and *S*<sup>t</sup> are independent from one another and that observers' judgments are based on the magnitude of a decision variable *D* = *S*<sup>2</sup> − *S*<sup>1</sup> computed from the subjective durations of the second and first stimuli in the current trial. (The direction in which the difference is computed is

two left columns but γ<sup>t</sup> = 0.10 in the two right columns. Light and dark green lines show in each scenario the mapping of the standard at *t*st = 700 ms and a sample test at *t* = 600 ms. **(B)** Distribution of the decision variable for the standard-test pair just mentioned in a trial in which the test is presented first. The distribution is narrower in the two columns on the right due to the smaller variance. The decision space is partitioned into three regions by vertical lines at *D* = δ<sup>1</sup> and *D* = δ2, with δ<sup>1</sup> = −150 and δ<sup>2</sup> = 150 in the first and third columns (i.e., no decisional bias) but δ<sup>1</sup> = −70 and δ<sup>2</sup> = 230 in the second and fourth

test was presented first; pale blue denotes "interval 2" responses when the test was presented second). A thin vertical line indicates the true location of the PSE defined as *t*PSE = μ−<sup>1</sup> <sup>t</sup> (μs(*t*st)). With parameters given above, *t*PSE = 700 ms in the two columns on the left whereas *t*PSE = 630.8 ms in the two columns on the right. **(E)** Psychometric functions for "test longer" responses in the comparative task for each presentation order and with response bias ranging from ξ = 0.5 (paler curves) through ξ = 0.75, to ξ = 1 (darker curves). The location of the true PSE is also indicated in each panel by a thin vertical line.

immaterial, as will become evident below.) Because test and standard can (and should) be presented in either order with equal frequency across trials, each presentation order must be considered separately. Thus, on trials in which the test is presented first, *D* = *S*st − *S*<sup>t</sup> is normally distributed with mean μs(*t*st) − μt(*t*); on trials in which the test is presented second, *D* = *S*<sup>t</sup> − *S*st is also normally distributed but with mean μt(*t*) − μs(*t*st). In both cases the variance of *D* is the sum of the variances of *S*st and *S*t. **Figures 7B,C** show the distributions of *D* for each presentation order on a trial with *t* = 600 ms when *t*st = 700 ms, given the psychophysical functions in **Figure 7A**. Limited resolution also prevents observers from using a point criterion and the decision space is partitioned into three regions separated by boundaries δ<sup>1</sup> and δ2, which are symmetric about *D* = 0 when δ<sup>1</sup> = −δ<sup>2</sup> (first and third panels in **Figures 7B,C**) and otherwise reflect a decisional bias (second and fourth panels in **Figures 7B,C**). Judgments turn into responses in a way that varies with the task.

Consider first a ternary task in which observers report whether duration was subjectively longer in the first interval, in the second, or in neither. Observers respond "interval 2" when *D* > δ2, "interval 1" when *D* < δ1, and "I can't tell" when δ<sup>1</sup> ≤ *D* ≤ δ<sup>2</sup> (see labels in the top part of **Figures 7B,C**). Response probabilities vary with presentation order due to the different mean of *D* in each case. Specifically, the probability *<sup>i</sup>* of an "interval *i*" response when the test is presented in interval *i*, with *i* ∈ {1, 2}, varies with *t* as

$$\Psi\_1(t) = \text{Prob}\left(\mathcal{S}\_{\text{st}} - \mathcal{S}\_{\text{t}} < \delta\_{\text{l}}\right) = \Phi\left(\frac{\delta\_{\text{l}} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \mu\_{\text{t}}(t)}{\sqrt{\sigma\_{\text{s}}^2\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^2(t)}}\right) \tag{6a}$$

$$\Psi\_2(t) = \text{Prob}\left(\mathbf{S\_t} - \mathbf{S\_{st}} > \delta\_2\right) = 1 - \Phi\left(\frac{\delta\_2 - \mu\_t(t) + \mu\_\mathbf{x}\left(t\_\mathbf{t}\right)}{\sqrt{\sigma\_\mathbf{s}^2\left(t\_\mathbf{st}\right) + \sigma\_\mathbf{t}^2(t)}}\right),\text{ (6b)}$$

the probability ϒ*<sup>i</sup>* of an "I can't tell" response when the test is presented in interval *i* varies with *t* as

$$\Upsilon\_{1}(t) = \text{Prob}\left(\delta\_{1} \le S\_{\text{st}} - S\_{\text{t}} \le \delta\_{2}\right) = \Phi\left(\frac{\delta\_{2} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \mu\_{\text{t}}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right)$$

$$-\Phi\left(\frac{\delta\_{1} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \mu\_{\text{t}}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right) \tag{7a}$$

$$\Upsilon\_{2}(t) = \text{Prob}\left(\delta\_{1} \le S\_{\text{t}} - S\_{\text{st}} \le \delta\_{2}\right) = \Phi\left(\frac{\delta\_{2} - \mu\_{\text{t}}(t) + \mu\_{\text{s}}\left(t\_{\text{st}}\right)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right)$$

$$-\Phi\left(\frac{\delta\_{1} - \mu\_{\text{t}}(t) + \mu\_{\text{s}}\left(t\_{\text{st}}\right)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right),\tag{7b}$$

and the probability of responding as the interval in which the standard was presented is 1 − *<sup>i</sup>* − ϒ*i*. **Figure 7D** plots psychometric functions in each scenario and their features are discussed next.

The most conspicuous aspect is that psychometric functions do not differ across presentation orders in the absence of decisional bias (first and third columns in **Figure 7**) and they differ otherwise (second and fourth columns in **Figure 7**). Differences (or lack thereof) in the psychophysical functions for standard and test also have observable effects. Consider the PSE as a proxy to these differences. By definition, the PSE is the duration *t* that the test must have for its subjective duration to equal the subjective duration of the standard. Thus, the PSE is the duration *t*PSE = μ−<sup>1</sup> <sup>t</sup> (μs(*t*st)) and its location is readily identifiable in the psychometric functions. Consider the left column of **Figure 7**, where δ<sup>1</sup> = −δ<sup>2</sup> and *t*PSE = *t*st because μ<sup>s</sup> = μt. Here, -<sup>1</sup> (blue curve) crosses 1 − -<sup>2</sup> − ϒ<sup>2</sup> (pale red curve) at *t* = *t*st and -<sup>2</sup> (red curve) also crosses 1 − -<sup>1</sup> − ϒ<sup>1</sup> (pale blue curve) at that point. It can be easily seen from Equations (6) to (7) that these crossings occur under any conditions at the duration *t* satisfying μt(*t*) = μs(*t*st). In contrast, ϒ<sup>1</sup> and ϒ<sup>2</sup> (black and gray curves) peak below *t* = *t*st due to the scalar property. Hence, "I can't tell" responses are not maximally prevalent at the PSE and, thus, it is not the location of the peak of ϒ<sup>1</sup> or ϒ<sup>2</sup> that signals the PSE.

With decisional bias under the same conditions (second column in **Figure 7**), psychometric functions differ across presentation orders but the PSE is identically encoded because the crossing property holds always. In contrast to the preceding case, where ϒ<sup>1</sup> and ϒ<sup>2</sup> superimpose, their crossing here also occurs at the PSE. It can again be easily seen from Equations (7) that ϒ1(*t*) = ϒ2(*t*) at all *t* when δ<sup>1</sup> = −δ<sup>2</sup> (the conditions in the first column of **Figure 7**) and that they cross at the duration *t* satisfying μt(*t*) = μs(*t*st) when δ<sup>1</sup> -= −δ<sup>2</sup> (the conditions in the second column of **Figure 7**). The third and fourth columns in **Figure 7** show that identification of the PSE is also not hampered when μ<sup>s</sup> -= μ<sup>t</sup> and σs -= σt, as would occur when subjective time runs differently for test and standard. In the absence of decisional bias (third column), the crossing occurs at *t* = μ−<sup>1</sup> <sup>t</sup> (μs(*t*st)) = 630.8 ms (thin vertical line); with decisional bias (fourth column), the crossings still occur at the same location. In sum, in the ternary pairedcomparison task, the effects of decisional bias are not confounded with those of psychophysical functions that differ for standard and test. In this task, the "I can't tell" option also eliminates the contaminating influence of response bias because observers are not forced to give uninformative "interval 1" or "interval 2" responses when undecided.

The equality task, where observers report whether or not the two durations are subjectively equal, renders analogous outcomes. Observers respond "same" when they would have responded "I can't tell" in the ternary task whereas they respond "different" when they would have responded "interval 1" or "interval 2." The psychometric functions in Equations (7) hold for the equality task, and the preceding discussion applies also to this task. It should be noted that the PSE is not identifiable by eye in ϒ<sup>1</sup> and ϒ<sup>2</sup> in the absence of decisional bias (i.e., when δ<sup>1</sup> = −δ<sup>2</sup> and the functions superimpose). This is not a problem, as will be discussed in section Summary and Discussion of Paired-Comparison Methods.

In contrast, the comparative task in which observers are forced to respond "interval 1" or "interval 2" calls again for a response bias parameter ξ describing how observers give arbitrary responses when they cannot tell which duration was longer. It should be clear by now that this can only bring complications. Assume that, as a result of response bias, observers respond "interval 2" with probability ξ when they cannot tell. In this task, "interval *i*" responses are translated as "test longer" when the test had been presented in interval *i*. The probability *<sup>i</sup>* of "test longer" responses when the test was presented in interval *i* varies with *t* as

$$\begin{split} \Psi\_{1}(t) &= \text{Prob}(\mathcal{S}\_{\text{st}} - \mathcal{S}\_{\text{t}} < \delta\_{\text{t}}) + (1 - \xi)\text{Prob}(\delta\_{\text{1}} \le \mathcal{S}\_{\text{st}} - \mathcal{S}\_{\text{t}} \le \delta\_{\text{2}}) \\ &= \xi \Phi\left(\frac{\xi\_{1} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \mu\_{\text{t}}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right) \\ &\quad + (1 - \xi)\Phi\left(\frac{\xi\_{2} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \mu\_{\text{t}}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right) \\ &\quad + (1 - \xi)\Phi\left(\frac{\xi\_{2} - \mu\_{\text{s}}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right) \\ &\quad \quad + (1 - \xi)\Phi\left(\frac{\xi\_{2} - \mu\_{\text{s}}(t)}{\sqrt{\sigma\_{\text{s}}^{2}\left(t\_{\text{st}}\right) + \sigma\_{\text{t}}^{2}(t)}}\right) \end{split} \tag{8a}$$


$$=1-\xi\,\Phi\left(\frac{\delta\_{1}-\mu\_{t}(t)+\mu\_{s}\left(t\_{\mathrm{st}}\right)}{\sqrt{\sigma\_{s}^{2}\left(t\_{\mathrm{st}}\right)+\sigma\_{t}^{2}(t)}}\right)$$

$$-(1-\xi)\,\Phi\left(\frac{\delta\_{2}-\mu\_{t}(t)+\mu\_{s}\left(t\_{\mathrm{st}}\right)}{\sqrt{\sigma\_{s}^{2}\left(t\_{\mathrm{st}}\right)+\sigma\_{t}^{2}(t)}}\right).\tag{8b}$$

These psychometric functions are plotted in **Figure 7E** for sample values of ξ in each of the same four scenarios. The PSE is still defined with respect to the underlying psychophysical functions, but **Figure 7D** shows that the 50% point on the psychometric function does not relate to this definition. Consider again the first column of **Figure 7**, in which psychophysical functions are identical for test and standard and there is no decisional bias. The psychometric functions are identical for both presentation orders only when ξ = 0.5 (dashed curves) and their 50% point lies at the true PSE in such case; as ξ increasingly exceeds 0.5, -<sup>1</sup> (blue curves) shifts progressively to the right whereas -<sup>2</sup> (red curves) shifts progressively to the left, with their 50% points symmetrically placed with respect to the PSE. Both functions also turn progressively steeper in this transition and it is also clear that -1 and -<sup>2</sup> have different shapes (i.e., they do not differ by translation only), which is another consequence of the scalar property. In the third column of **Figure 7**, still without decisional bias but when psychophysical functions differ for test and standard, the psychometric functions are displaced laterally toward the true PSE, maintaining the properties described above. Yet, with decisional bias (second and fourth columns), lack of response bias (ξ = 0.5) produces psychometric functions that also differ across presentation orders, although -<sup>1</sup> and -<sup>2</sup> still have their 50% points symmetrically placed around the true PSE.

Data from the comparative task are usually aggregated across presentation orders, although this practice is unadvisable (Ulrich and Vorberg, 2009). The resultant psychometric function is then -2AFC(*t*) = (-1(*t*) + -2(*t*))/2 and it is easy to see from Equations (8) that the 50% point on -2AFC occurs at *t* = μ−<sup>1</sup> <sup>t</sup> (μs(*t*st)). Thus, the average of the psychometric functions for each presentation order has its 50% point at the PSE, a result derived by Ulrich and Vorberg for the case in which μ<sup>s</sup> = μ<sup>t</sup> and generalized by García-Pérez and Alcalá-Quintana (2010a, 2011b) for the case in which μ<sup>s</sup> -= μ<sup>t</sup> and *t*PSE -= *t*st. Although data from the comparative task are still useful for estimating the PSE, estimation of the true DL from percent points on the psychometric function is impossible due to the strong influence of response bias on its slope.

#### **DIFFERENCES WITH PREVIOUS MODELS**

Models for paired-comparison tasks are used in many areas of psychophysics. Almost all of them derive from SDT principles and share structural characteristics with our model, except that they do not include an indifference region (i.e., they assume δ<sup>1</sup> = δ2), nor are they adapted to ternary tasks. With δ<sup>1</sup> = δ<sup>2</sup> = 0, Equations (8) become

$$
\Psi\_1(t) = \Psi\_2(t) = \Phi\left(\frac{\mu\_t(t) - \mu\_s(t\_{\rm st})}{\sqrt{\sigma\_s^2 \left(t\_{\rm st}\right) + \sigma\_t^2(t)}}\right). \tag{9}
$$

Such model does not seem to have been used in time perception. Conventional practice fits instead cumulative Gaussian or logistic functions to the data or, equivalently, fits straight lines to the *z*-scores of observed proportions. This entails a model analogous to (and with the same problems as) the model discussed in section Summary and Discussion of Single-Presentation Methods for bisection data. In the comparative task, the argument of the sigmoidal function is also of the form (*t* − *a*)/*b* and the consequences are identical: The free location parameter *a* replaces μs(*t*st) in Equation (9) and allows the 50% point to be placed as needed without connection to the subjective duration of the standard; the free spread parameter *b* replaces the entire denominator of the argument of in Equation (9), thus removing the scalar property; and replacement of μt(*t*) with *t* amounts to assuming that subjective and objective time run identically. Succeeding in fitting such sigmoidal function and observing differences in estimated location parameters across conditions can only be justifiably interpreted as criterion shifts.

In contrast, a model proposed by Rammsayer and Ulrich (2001) does justice to the assumptions and goals of studies on time perception. In their model, consideration of the statistics of counting processes yielded a non-identity psychophysical function and subjective durations whose standard deviation increases with *t*. The model was also developed for application to the ternary task used to gather their empirical data. With appropriate replacements for μ and σ, their model and the resultant psychometric functions for the ternary task are identical to those in Equations (7) and (8) above except that Rammsayer and Ulrich set δ<sup>1</sup> = −δ<sup>2</sup> (i.e., no decisional bias). For unknown reasons, this model was subsequently abandoned by their proponents, as was the ternary task.

Another model for the comparative task was proposed by Dyjas et al. (2012). Again in comparison with our model, they assumed no indifference region (i.e., δ<sup>1</sup> = δ<sup>2</sup> = 0), undistorted time perception (i.e., μt(*t*) = μs(*t*) = *t*), non-scalar timing (i.e., σt(*t*) = σs(*t*) = σ, a constant), and a history component that alters an "internal standard" in line with adaptation level theory. The internal standard is updated on every trial as the convex sum of its value on the previous trial and the *subjective duration of the first interval* in the current trial. Such internal standards can be described as normally distributed with mean μs(*t*st) = *t*st and a standard deviation that varies with presentation order (see expressions for their variances in Equations 12–13 and 15–16 of Dyjas et al.). For simplicity, let σ<sup>1</sup> and σ<sup>2</sup> represent the equivalent standard deviation of the internal standard when the test is presented first or second. All of this turns Equations (8) into

$$\Psi\_i(t) = \Phi\left(\frac{t - t\_{\rm st}}{\sqrt{\sigma\_i^2 + \sigma^2}}\right), i \in \{1, 2\}. \tag{10}$$

We use the term "equivalent" because substituting the expression for σ<sup>1</sup> coming from Dyjas et al.'s (2012) Equation (16) into our Equation (10) does not render the psychometric function in their Equation (26). This is due to an additional term in the numerator of the argument in the first line of their Equation (26), which they transferred to the denominator in the second line. Also, the standard deviation of the internal standard varies according to whether the two presentation orders are blocked or randomly interwoven (see also Dyjas and Ulrich, 2014). The use of "equivalent" standard deviations permits our Equation (10) to cover all applicable cases while facilitating verbal descriptions of their model.

Participation of such internal standard was invoked to produce different slopes for -<sup>1</sup> and -2, something that is accomplished by the different σ*<sup>i</sup>* in Equation (10). The scalar property excluded from Dyjas et al.'s model would have produced the same effect (**Figure 7E**). Since *<sup>i</sup>* in Equation (10) has its 50% point at *t* = *t*st for all *i* while empirical data contradict this property, Dyjas et al. fitted their model using a logistic version of Equation (10) with *ai* (in place of *<sup>t</sup>*st) and *bi* (in place of σ2 *<sup>i</sup>* + σ2) as free parameters subject to Ulrich and Vorberg's (2009) constraint. Since μt(*t*) = μs(*t*) = *t* is assumed, this implies that shifts of the psychometric function away from *t*st are caused by criterion setting, not by differences in perceived duration. In a variant of this model, Dyjas and Ulrich (2014) displaced the point criterion to some arbitrary δ (i.e., δ<sup>1</sup> = δ<sup>2</sup> = δ), which turns Equation (10) into

$$\Psi\_i(t) = \Phi\left(\frac{t - t\_{\rm st} - (-1)^i \delta}{\sqrt{\sigma\_i^2 + \sigma^2}}\right), i \in \{1, 2\}. \tag{11}$$

The success of Equation (11) at fitting the empirical data of Dyjas and Ulrich provides further support to the notion that shifts of -<sup>1</sup> and -<sup>2</sup> can be attributed to criteria, not necessarily reflecting differences in perceived duration (which are explicitly excluded by their assumptions). Dyjas and Ulrich also presented a version of their model for the equality task, for which they introduced a potentially asymmetric indifference region. This renders psychometric functions identical to our Equations (7) with the amendments discussed above to include the participation of an internal standard. Their model for the equality task is thus incompatible with their model for the comparative task, as the latter assumes that observers never judge stimuli to have the same subjective duration. Interestingly, Dyjas et al. (2012) had allowed observers to hit a separate response key when they judged the two presentations in a trial to have the same duration, but they did not describe how those responses were treated and they presented and analyzed data as if such responses had never been given. Dyjas and Ulrich did not include this extra response option.

Dyjas and Ulrich also described a model including sensation weighting as implemented in the model of García-Pérez and Alcalá-Quintana (2011a), but this model is not discussed here because it is empirically indistinguishable from the internal standard model.

#### **SUMMARY AND DISCUSSION OF PAIRED-COMPARISON METHODS**

The shape of psychometric functions for paired-comparison tasks is determined by an embedded representation of subjective duration (μ and σ) and by aspects of the decision process. In contrast to single-presentation methods, paired-comparison methods are free of complications arising from untestable assumptions regarding the placement and stability of anchors. An added value of paired-comparison methods is that they lend themselves to a separate analysis of data for each presentation order (**Figure 7**), by which the influence of criteria and decisional bias on observed performance is separated from that of true differences in subjective duration (different μ for test and standard) or in its variance (different σ for test and standard).

But these are only potential benefits. If data are analyzed by fitting psychometric functions implying μ(*t*) = *t* in all cases, the potential of paired-comparison methods is wasted: Differences in observed performance across conditions can only be justifiably attributed to different criterion settings. To harvest the benefits, fitted psychometric functions must include a non-identity μ whose parameters capture the relation of subjective to objective time that best accounts for the data in each condition. The universally accepted scalar property should also be included in place of the fixed-variance assumption of typical analyses. Using subscripts for the parameters of μ and σ in Equations (1)–(2) (and setting τ = 0 for simplicity), Equations (6) and (7) for the ternary task become on substitution

$$\Psi\_1(t) = \Phi\left(\frac{\mathbb{s}\_1 - \alpha\_s t\_{\rm st}^{\beta\_\mathbf{s}} + \alpha\_t t^{\beta\_\mathbf{t}}}{\sqrt{\chi\_s^2 \alpha\_\mathbf{s}^2 t\_{\rm st}^{2\beta\_\mathbf{s}} + \chi\_\mathbf{t}^2 \alpha\_\mathbf{t}^2 t^{2\beta\_\mathbf{t}}}}\right) \tag{12a}$$

$$\Psi\_2(t) = 1 - \Phi\left(\frac{\delta\_2 - \alpha\_t t^{\beta\_t} + \alpha\_s t\_{\rm st}^{\beta\_s}}{\sqrt{\gamma\_s^2 \alpha\_s^2 t\_{\rm st}^{2\beta\_s} + \gamma\_t^2 \alpha\_t^2 t^{2\beta\_t}}}\right) \tag{12b}$$

$$\Upsilon\_1(t) = \Phi\left(\frac{\delta\_2 - \alpha\_s t\_{\rm st}^{\beta\_s} + \alpha\_t t^{\beta\_t}}{\sqrt{\gamma\_s^2 \alpha\_s^2 t\_{\rm st}^{2\beta\_s} + \gamma\_t^2 \alpha\_t^2 t^{2\beta\_t}}}\right)$$

$$-\Phi\left(\frac{\delta\_1 - \alpha\_s t\_{\rm st}^{\beta\_\ast} + \alpha\_t t^{\beta\_\ast}}{\sqrt{\gamma\_s^2 \alpha\_s^2 t\_{\rm st}^{2\beta\_\ast} + \gamma\_t^2 \alpha\_t^2 t^{2\beta\_\ast}}}\right) \tag{12c}$$

$$\Upsilon\_2(t) = \Phi\left(\frac{\delta\_2 - \alpha\_t t^{\beta\_t} + \alpha\_s t\_{\rm st}^{\beta\_s}}{\sqrt{\gamma\_s^2 \alpha\_s^2 t\_{\rm st}^{2\beta\_s} + \gamma\_t^2 \alpha\_t^2 t^{2\beta\_t}}}\right)$$

$$-\Phi\left(\frac{\delta\_1 - \alpha\_t t^{\beta\_t} + \alpha\_s t\_{\rm st}^{\beta\_s}}{\sqrt{\gamma\_s^2 \alpha\_s^2 t\_{\rm st}^{2\beta\_s} + \gamma\_t^2 \alpha\_t^2 t^{2\beta\_t}}}\right). \tag{12d}$$

Equations (12c)–(12d) apply also to the equality task, and a similar substitution in Equations (8) renders explicit functions for the comparative task. It should be noted from **Figure 7E** that response bias combined with a non-identity μ and the scalar property act together to produce the Type A and Type B order effects discussed by Ulrich and Vorberg (2009), which can thus be accounted for without ad hoc assumptions involving internal standards or sensation weighting.

Parameter estimates for these model-based psychometric functions can be easily obtained with maximum-likelihood methods. Technicalities are omitted here but empirical examples involving other classes of psychophysical functions are available (García-Pérez and Alcalá-Quintana, 2013; García-Pérez and Peli, 2014). Simulation studies have also shown that parameters can be recovered from data collected with the usual numbers of trials in empirical studies, but these results are too lengthy to be reported here.

It is also worth noting that performance measures such as PSEs or DLs can be computed from parameter estimates without reference to percent points on the psychometric functions. Indeed, since model parameters refer to underlying processes common to all tasks and not to aspects of the shape of the psychometric function for some task, PSEs and DLs can be computed according to their theoretical definition. As shown earlier, the PSE defined as *t*PSE = μ−<sup>1</sup> <sup>t</sup> (μs(*t*st)) can be obtained given the functional forms of μ<sup>s</sup> and μ<sup>t</sup> and estimates of their parameters. The DL, on the other hand, is usually computed as the distance between some percent points on the psychometric function for the comparative task. As seen in **Figure 7E**, the location of these points is greatly affected by the width and location of the indifference region and also by response bias. Ulrich and Vorberg (2009) proposed computing a separate DL from the psychometric function for each presentation order, but this practice also results in a description of time perception that is contaminated by all the non-timing processes that affect observed performance. Ultimately, computation of the DL seeks the durations satisfying, say, Prob(*S*<sup>t</sup> > *S*st) = 0.25 and Prob(*S*<sup>t</sup> > *S*st) = 0.75. Since parameter estimates give a full description of μ and σ for test and standard stimuli, uncontaminated estimates of the *latent* DL (García-Pérez and Alcalá-Quintana, 2012, 2013) can easily be obtained by noting that the latent point at which Prob(*S*<sup>t</sup> > *S*st) = *p* is the duration *tp* satisfying

$$\Phi\left(\frac{\alpha\_t t\_p^{\beta\_t} + \alpha\_s t\_{st}^{\beta\_s}}{\sqrt{\chi\_s^2 \alpha\_s^2 t\_{st}^{2\beta\_s} + \chi\_t^2 \alpha\_t^2 t\_p^{2\beta\_t}}}\right) = p,\tag{13}$$

an equation that can be directly solved from parameter estimates. DLs and WRs thus computed are free of the contaminants that affect the probability of observed responses, and they are also independent of the task with which the data were collected.

# **GENERAL DISCUSSION AND EVIDENCE-BASED RECOMMENDATIONS**

The pace of subjective time surely differs from that of objective time, and different stimulus types or conditions surely alter the pace of subjective time further. This means that the psychophysical function μ describing the relation of subjective to objective time cannot be the identity function and that its parameters must vary across conditions. Yet, studies in which semi-objective tasks have been used to assess differences in time perception routinely fit psychometric functions implying μ(*t*) = *t* in all conditions, also including a location parameter allowed to vary across conditions. If someone wanted to make the case that time perception is always accurate and different conditions only make observers set different response criteria, fitting such psychometric functions would be the way to gather supporting evidence. The success with which empirical data are accounted for with that type of psychometric function has nevertheless been taken as evidence of differences in perceived duration across conditions. Although the theoretical underpinnings of the fitted psychometric functions do not permit such interpretation, the overwhelming success with which data have historically been accounted for as if only criterion differences were involved cannot be taken as ruling out differences in subjective time across conditions. For a proper assessment of the various determinants of observed performance, model-based psychometric functions should be fitted to data to interpret the parameters describing each of the influences that affect performance. But data should also be collected using psychophysical tasks that allow separating out those influences. The following sections discuss what the theoretical analyses presented in this paper say about these issues.

#### **PSYCHOPHYSICAL TASKS**

The model presented in this paper renders psychometric functions tailored to the characteristics of each semi-objective psychophysical task. The functions μ and σ describing subjective duration are always included in the psychometric functions and, in principle, the parameters of μ and σ could be estimated from data gathered with any task. But observed performance is also affected by decisional and response processes that lend additional parameters to the psychometric function, and not all psychophysical tasks provide informative data for an estimation of the parameters describing all of these influences. We showed that all determinants of performance are inextricably confounded in data gathered with single-presentation methods, which are thus unsuitable for assessing time perception (or any other perceptual process; see García-Pérez and Alcalá-Quintana, 2013). The use of single-presentation methods should be discontinued.

Paired-comparison methods, on the other hand, provide data from which these influences can be separated out, allowing a proper assessment of each of the determinants of performance. Of the various formats that paired-comparison methods may take, the ternary task is best suited for these purposes. It should be noted that any study conducted with a singlepresentation method can also be conducted with the ternary paired-comparison task. Consider the studies of Gil et al. (2009) or Tipples (2010) discussed in section Summary and Discussion of Single-Presentation Methods, which used a bisection task to investigate whether subjective time runs differently for different types of stimuli. In a ternary paired-comparison task, each trial would present the standard (a picture of an oval or an eightconsonant string) with some fixed duration (say, *t*st = 1000 ms) along with the test stimulus for a duration that varies across trials. Trials with different types of test stimuli (or different standard durations) could be randomly interwoven in a session and the order of presentation of test and standard in each trial would also be randomly determined. Fitting the psychometric functions in Equations (12) to the resultant data would thus provide estimates of the parameters of μ and σ that describe observed performance, permitting a proper assessment of how subjective time varies across conditions besides providing parameters describing decisional determinants. García-Pérez and Peli (2014) illustrated this approach in a study of spatial bisection that used the conventional single-presentation format and its conversion into the ternary paired-comparison format.

#### **FITTING PSYCHOMETRIC FUNCTIONS**

Current practice fits two-parameter (location and slope) psychometric functions separately to data from each of the conditions included in a study. Yet, when the same standard is used for all conditions, model parameters describing the perceived duration

functions for test and standard may cross at or very near *t* = *t*st. In this illustration, μ<sup>s</sup> is the identity function whereas μ<sup>t</sup> is given by Equation (1) with α<sup>t</sup> = 3.01, β<sup>t</sup> = 0.83, and τ<sup>t</sup> = −10. As regards the

psychophysical functions for test and standard. **(C)** Psychometric functions for "test longer" responses in the comparative task, again

showing the effects of decisional and response bias.

of the standard should not vary across them. Psychometric functions are thus expected to differ only in the parameters describing subjective duration for test conditions. This implies that psychometric functions ought to be fitted jointly across conditions with some of their parameters constrained to have common values across them. This strategy reduces the number of free parameters needed to describe the data but it also entails a coherent use of models and provides the means to test hypotheses concerning the effect of manipulations. There are several other situations in which some parameters must be regarded as common across conditions, but these are determined by the experimental design. For illustrative examples, see García-Pérez and Alcalá-Quintana (2007a; 2009a; 2012; under review), Magnotti et al. (2013), or García-Pérez and Peli (2014).

#### **ADAPTIVE METHODS**

Studies on time perception often use the comparative task to estimate PSEs or DLs via adaptive methods that bypass estimating the psychometric function, directly targeting specific percent points on it. This practice is unadvisable for several reasons. Firstly, and least importantly, μ<sup>t</sup> may differ from μ<sup>s</sup> in a way that they cross near *t* = *t*st. Thus, finding the PSE at or near *t*st does not allow concluding that subjective duration is identical for test and standard stimuli (see **Figure 8**). Secondly, due to the effects of decisional and response bias on the slope and location of the psychometric function in comparative tasks, PSEs or DLs estimated from percent points are contaminated by these influences and do not portray time perception. Finally, and even in the absence of the previous two problems, the most widespread adaptive methods have been shown to provide percent-point estimates that are biased in magnitudes which cannot be assessed without knowledge of the shape of the psychometric function (García-Pérez, 1998, 2000, 2001, 2002, 2011; Alcalá-Quintana and García-Pérez, 2004, 2007; Faes et al., 2007; García-Pérez and Alcalá-Quintana, 2007b, 2009b; Hsu and Chin, 2014).

The foregoing discussion does not mean that adaptive methods should be entirely abandoned. On the contrary, some up– down methods provide dependable and efficient strategies for data collection and, thus, they gather maximally informative data for fitting psychometric functions. Adaptive methods tailored to the peculiarities of equality and ternary tasks have recently been developed (García-Pérez, 2014). For an illustration of their use, see García-Pérez and Peli (2014). What should be avoided by all means is the practice of estimating percent points by averaging reversal levels.

#### **PENDING ISSUES**

It is unclear at this point whether the mathematical form of μ and σ in Equations (1)–(2) describe adequately the mean and standard deviation of subjective duration across the continuum from a few milliseconds to several seconds. Empirical studies suggest that a power function is adequate for μ within narrow time ranges but its parameters vary across ranges (Eisler, 1976), suggesting that a power function is only piecewise approximate. Although a yet unknown mathematical form might be more appropriate, the narrow range of durations used in any given study supports the use of Equation (1) on fitting psychometric functions. On the other hand, the scalar property in Equation (2) is known to be inaccurate in human timing but alternative mathematical forms have been proposed (Killeen et al., 1997; Rammsayer and Ulrich, 2001) that may prove more useful in practice. Also in this respect, it is unclear whether the referent for the scalar property is subjective time (as in Equation 2) or objective time.

Consideration of errors made by observers upon reporting judgments via the response interface has been intentionally excluded in this description. Extensions incorporating error parameters for more accurate parameter estimation have been discussed for analogous models elsewhere (García-Pérez and Alcalá-Quintana, 2012; García-Pérez and Peli, 2014) and their inclusion in the models presented here is straightforward.

#### **ACKNOWLEDGMENT**

Supported by grant PSI2012-32903 from Ministerio de Economía y Competitividad (Spain).

#### **REFERENCES**


A. Vatakis, A. Esposito, M. Giagkou, F. Cummins, and G. Papadelis (New York, NY: Springer), 246–257. doi: 10.1007/978-3-642-21478-3\_19


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 March 2014; accepted: 23 May 2014; published online: 10 June 2014. Citation: García-Pérez MA (2014) Does time ever fly or slow down? The difficult interpretation of psychophysical data on time perception. Front. Hum. Neurosci. 8:415. doi: 10.3389/fnhum.2014.00415*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 García-Pérez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org