# WHAT CAN SIMPLE BRAINS TEACH US ABOUT HOW VISION WORKS

EDITED BY: Davide Zoccolan, David D. Cox, Andrea Benucci and R. Clay Reid PUBLISHED IN: Frontiers in Neural Circuits

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-678-4 DOI 10.3389/978-2-88919-678-4

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **WHAT CAN SIMPLE BRAINS TEACH US ABOUT HOW VISION WORKS**

# Topic Editors:

**Davide Zoccolan,** Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy

**David D. Cox,** Department of Molecular and Cellular Biology and Center for Brain Science, Harvard University, Cambridge, USA

**Andrea Benucci,** Laboratory for Neural Circuits and Behavior, Brain Science Institute, RIKEN, Wako-shi, Japan

**R. Clay Reid,** Allen Institute for Brain Science, Seattle, Washington, USA

Cover image:

Close-up of a pigmented, Long-Evans rat. This strain is the most widely used to investigate visual processing in rats at both behavioral and neurophysiological level. Art work by Fabrizio Manzino, Marco Gigante and Davide Zoccolan.

#### 2nd page image:

Perspective view of the marmoset cerebral cortex, showing the location of visual areas and their accessibility. The color indicates the different visual cortical areas according to the Paxinos et al (2012) atlas. Art work by Tristan Chaplin (Marcello Rosa's lab)

Vision is the process of extracting behaviorally-relevant information from patterns of light that fall on retina as the eyes sample the outside world. Traditionally, nonhuman primates (macaque monkeys, in particular) have been viewed by many as the animal model-of-choice for investigating the neuronal substrates of visual processing, not only because their visual systems closely mirror our own, but also because it is often assumed that "simpler" brains lack advanced visual processing machinery. However, this narrow view of visual neuroscience ignores the fact

that vision is widely distributed throughout the animal kingdom, enabling a wide repertoire of complex behaviors in species from insects to birds, fish, and mammals.

Recent years have seen a resurgence of interest in alternative animal models for vision research, especially rodents. This resurgence is partly due to the availability of increasingly powerful experimental approaches (e.g., optogenetics and two-photon imaging) that are challenging to apply to their full potential in primates. Meanwhile, even more phylogenetically distant species such as birds, fish, and insects have long been workhorse animal models for gaining insight into the core computations underlying visual processing. In many cases, these animal models are valuable precisely because their visual systems are simpler than the primate visual system. Simpler systems are often easier to understand, and studying a diversity of neuronal systems that achieve similar functions can focus attention on those computational principles that are universal and essential.

This Research Topic provides a survey of the state of the art in the use of animal models of visual functions that are alternative to macaques. It includes original research, methods articles, reviews, and opinions that exploit a variety of animal models (including rodents, birds, fishes and insects, as well as small New World monkey, the marmoset) to investigate visual function. The experimental approaches covered by these studies range from psychophysics and electrophysiology to histology and genetics, testifying to the richness and depth of visual neuroscience in non-macaque species.

**Citation:** Zoccolan, D., Cox, D. D., Benucci, A., Reid, R. C., eds. (2015). What Can Simple Brains Teach Us about How Vision Works. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-678-4

# Table of Contents


Emily E. LeDue, Jillian L. King, Kurt R. Stover and Nathan A. Crowder


Federica B. Rosselli, Alireza Alemi, Alessio Ansuini and Davide Zoccolan


Nicholas J. Priebe1 and Aaron W. McGee

*90 Treatment of amblyopia in the adult: insights from a new rodent model of visual perceptual learning*

Joyce Bonaccorsi, Nicoletta Berardi and Alessandro Sale

*104 Mapping arealisation of the visual cortex of non-primate species: lessons for development and evolution*

Jihane Homman-Ludiye and James A. Bourne

*120 Visual cortical areas of the mouse: comparison of parcellation and network structure with primates*

Marie-Eve Laramée and Denis Boire


*194 Illusory patterns are fishy for fish, too*

Christian Agrillo, Maria Elena Miletto Petrazzini and Marco Dadda

*197 The brain creates illusions not just for us: sharks* **(Chiloscyllium griseum)** *can "see the magic" as well*

Theodora Fuss, Horst Bleckmann and Vera Schluessel


Cait Newport, Guy Wallis and Ulrike E. Siebeck


Lei Xiao, Pu-Ming Zhang, Hai-Qing Gong and Pei-Ji Liang


Martin Egelhaaf, Roland Kern and Jens Peter Lindemann

# Editorial: What can simple brains teach us about how vision works

Davide Zoccolan<sup>1</sup> \*, David D. Cox <sup>2</sup> and Andrea Benucci <sup>3</sup>

*<sup>1</sup> Visual Neuroscience Lab, International School for Advanced Studies, Trieste, Italy, <sup>2</sup> Department of Molecular and Cellular Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA, <sup>3</sup> Laboratory for Neural Circuit and Behavior, RIKEN Brain Science Institute, Wako City, Japan*

Keywords: rodent, development, motion processing, object recognition, illusory contours

Vision is the process of extracting behaviorally-relevant information from patterns of light that fall on retina as the eyes sample the outside world. Traditionally, non-human primates have been viewed by many as the animal model-of-choice for investigating the neuronal substrates of visual processing, not only because their visual systems closely mirror our own (e.g., Orban, 2008; Nassi and Callaway, 2009 for a review), but also because it is often assumed that "simpler" brains lack advanced visual processing machinery. However, this narrow view of visual neuroscience ignores the fact that vision is widely distributed throughout the animal kingdom, enabling a wide repertoire of complex behaviors in species from insects to birds, fish, and mammals.

Recent years have seen a resurgence of interest in alternative animal models for vision research, such as rodents (see Huberman and Niell, 2011; Zoccolan, 2015 for a review). This resurgence is partly due to the availability of increasingly powerful experimental approaches (e.g., optogenetics and two-photon imaging) that are challenging to apply to their full potential in primates. Meanwhile, even more phylogenetically distant species such as birds, fish, and insects have long been workhorse animal models for gaining insight into the core computations underlying visual processing (see Baier, 2000; Bilotta and Saszik, 2001; Borst et al., 2010; Aptekar and Frye, 2013 for a review ). In many cases, these animal models are valuable precisely because their visual systems are simpler than the primate visual system. Simpler systems are often easier to understand, and studying a diversity of neuronal systems that achieve similar functions can focus attention on those computational principles that are universal and essential.

This Research Topic provides a survey of the state of the art in the use of non-primate models of visual functions. It includes original research, methods articles, reviews, and opinions that exploit a variety of animal models (including rodents, birds, fishes and insects) to investigate visual function. The experimental approaches covered by these studies range from psychophysics and electrophysiology to histology and genetics, testifying to the richness and depth of visual neuroscience in non-primate species. Below, we briefly summarize the contributions to this Research Topic.

## Edited and reviewed by:

*Claude Desplan, New York University, USA*

#### \*Correspondence: *Davide Zoccolan*

*zoccolan@sissa.it*

Received: *30 July 2015* Accepted: *14 September 2015* Published: *29 September 2015*

#### Citation:

Rodent Studies

*Zoccolan D, Cox DD and Benucci A (2015) Editorial: What can simple brains teach us about how vision works. Front. Neural Circuits 9:51. doi: 10.3389/fncir.2015.00051* Roughly half of the articles in this Research Topic (6 research studies and 4 reviews) focus on the visual system of two rodent species more commonly used as laboratory animals: rats and mice. Following a trend that has been established over the past 6–7 years, the mouse studies investigate tuning properties of visual neurons in low-level visual centers through in-vivo electrophysiology and, in one case, genetic manipulation (LeDue et al., 2013; Liu et al., 2014), while the rat studies explore higher-level perceptual functions (such as pattern discrimination) through visual psychophysics and, in one case, in-vivo neurophysiology (Meier and Reinagel, 2013; Reinagel, 2013;

Rosselli et al., 2015; Vermaercke et al., 2015). The reviews focus on the role of rats and mice as

models of development and plasticity of the visual system (Bonaccorsi et al., 2014; Priebe and McGee, 2014), and on the comparison among the visual cortical organizations of rodents, primates and other species (Homman-Ludiye and Bourne, 2014; Laramée and Boire, 2015).

LeDue et al. (2013) investigate the stimulus-dependence properties of contrast adaptation in mouse primary visual cortex (V1). When a high-contrast stimulus is shown even for a few seconds, the response amplitude of V1 primate neurons to subsequent stimuli is weakened. LeDue et al., report the same stimulus-specificity in mouse V1. This observation opens the possibility that network, synaptic, and intrinsic cellular mechanisms contributing to contrast adaptation operate in mouse V1 in a similar way as in higher mammals.

Liu et al. (2014) present a paper on mouse superior colliculus (SC) and take full advantage of transgenic technologies. In particular, the authors study the receptive fields (RFs) of SC neurons. Such RFs are shaped by converging retinal on- and offpathways, guided by molecular guidance cues (e.g., EphAs and ephrin-As). In addition to these cues, retinal function also plays a critical role. Knockout mice where retinal activity is altered during development (nAChR-β2−/−) have SC neurons with severely disrupted direction and orientation selectivity. Liu et al., show that knocking out guidance cues (ephrin-A knockout) has very little impact on the RFs, making them just slightly larger—an elegant example of how transgenic technologies can help dissect the relative contribution of activity-dependent mechanisms and genetic programs.

In rats, Meier and Reinagel (2013) investigate whether the detection of a centrally-presented grating is similarly affected in rats and humans by the concomitant presentation of two flanking gratings. They report that, in both species, the flankers with the greatest impact on target detection are those that are collinear to the target (i.e., they are located and oriented to sit along a virtual line passing through the three stimuli). However, while collinear flankers maximally impair detection in rats, they maximally improve it in humans. This implies that rats, like humans, are sensitive to higher-order configurations of oriented elements, but the sign of this phenomenon is the opposite in the two species. This raises intriguing questions about differences between neuronal mechanisms that, in rodents and primates, underlie spatial integration of visual features, spatial attention and center-surround stimulus interactions.

In a second study, Reinagel (2013) investigates whether visual sensory decisions in rats are constrained by the speedaccuracy trade-off that is typical of primate vision. The author reports that rat accuracy in discriminating static images increases with reaction time. Additionally, accuracy and speed are both modulated by task difficulty and the penalty associated with an incorrect response. This represents an interesting basis for comparing the dynamics of perceptual decisions in rodents and primates, and provides useful insights for effectively training rats in visual discrimination tasks.

Rosselli et al. (2015) also investigate the impact of stimulus discriminability on rat pattern vision, but focus on the difference between the perceptual strategies underlying the recognition of structurally similar vs. dissimilar objects across view changes (i.e., variations in position, size and orientation). They report that the pattern of diagnostic features underlying the discrimination of highly similar objects are more scattered, more view-dependent, and more subject dependent, as compared to those found in a previous study using more dissimilar disciminanda (Alemi-Neissi et al., 2013). These findings suggest that in rats, as in primates, transformation-tolerant recognition can flexibly rely on either view-invariant representations of distinctive object features or view-specific representations that are acquired through exposure to multiple object views.

Rat pattern vision is also the topic of the study of Vermaercke et al. (2015), who compare the discriminability of different pairs of visual shapes at a behavioral level with their discriminability at the neuronal level. The authors report that neuronal discriminability correlates well with behavioral discriminability only in the extrastriate visual cortical areas that are lateral to primary visual cortex (V1), but not in V1 itself (where, instead, they find a good correlation with shape discriminability at the pixel level). This suggests that rat lateral visual cortex represents behaviorally relevant shape features, in a way that could be homologous to the primate ventral stream.

Two reviews focus on the plasticity of the rodent visual system during development (Priebe and McGee, 2014) and in adulthood (Bonaccorsi et al., 2014). Priebe and McGee (2014) comment on some of the major distinctive features of the mouse early visual system: from the retina to the primary visual cortex. They then delve into the most studied form of experience-dependent plasticity in the visual cortex: ocular-dominance (OD) plasticity. Activity-dependent changes in OD patterns during the critical period have been observed in all mammals and mice are no exception. This review highlights the key genetic mechanisms involved, with special attention to the role of inhibition during the narrow critical period (P20-32) of plasticity.

Bonaccorsi et al. (2014) provide a comprehensive overview of amblyopia, with a focus on the role of perceptual learning as a possible treatment for this condition in both humans and animals. The authors discuss recent experiments in which adult amblyopic rats showed a full recovery of visual functions as a result of extensive training in a spatial frequency discrimination task. The associated decrease of the inhibition-excitation balance highlights the fundamental role that the reduction of GABAergic inhibition can play in restoring cortical plasticity and enhancing recovery of function in the adulthood. This confirms the effectiveness of rodent models in the study of visual cortical plasticity and their role in the development of new therapeutic approaches.

Two other reviews compare the anatomy, connectivity, parcellation and hierarchical organization of the visual systems of different species, with a special focus on primates and rodents (Homman-Ludiye and Bourne, 2014; Laramée and Boire, 2015). Homman-Ludiye and Bourne (2014) provide a comparative review of the studies concerning the cellular, molecular and genetic mechanisms responsible for visual cortical arealisation in a variety of mammalian species. The authors draw evidence from methodological approaches ranging from the application of anterograde and retrograde tracers, histological mapping of activity-dependent cellular markers (e.g., immediate-early genes), determination of the regulatory events that roughly define area borders during development (e.g., the graded expression of transcription factors along brain axes), and understanding of the molecular guidance cues that refine these borders into the sharp boundaries of the mature visual cortex. Overall, the review makes the point that the analysis of multiple species is important to understand the evolution and development of the mammalian visual system, with an emphasis on the experimental advantages that genetically modified mice afford.

In a similar spirit, Laramée and Boire (2015) focus on the order Rodentia, which represents over 40% of all mammalian species. This order is incredibly diverse: more than 2000 species with 1000-fold change in body size and 200-fold change in brain size. Such diversity within the same order represents a great opportunity to identify general principles of anatomical and functional organization. Laramée and Boire (2015) look at what is preserved and what is lost across species and discuss such observations in the context of theories of optimality (wiring economy, small-world networks, etc.), which is a convenient theoretical framework to reveal the underlying organizational principles.

# "Simpler" Primates Studies

While this Research Topic was mostly focused on non-primate systems, we included one exception: a review of what is known about the visual system of a new world monkey, the marmoset (Solomon and Rosa, 2014). In their review, the authors compare the "simpler" brain of the marmoset to that of the macaque monkey, which is still considered the benchmark model for primate vision. In this thorough and comprehensive review of the marmoset visual system, Solomon and Rosa (2014) start from the retina and end in frontal association areas, touching on subcortical structures as well. In this voyage through the marmoset brain, the authors discuss distinctive functional and anatomical features that make it a promising alternative to the larger, more complex macaque brain.

# Bird Studies

Object recognition is a topic that is also addressed by two bird studies, one research article (Wood and Wood, 2015) and one review (Soto and Wasserman, 2014). Wood and Wood (2015) follow up on previous work exploring the visual object recognition abilities of newborn chickens (Wood, 2013). The authors rely on the innate imprinting behaviors of this species, in which the chick approaches stimuli that it has previous seen in its early life. The study reports that the animals are capable of generalizing from extremely limited exposure to visual objects in some cases just a handful of views. These results suggest that the chicken's visual system is able to learn robust visual representations of objects from extremely little training data.

Taking a broader view on avian vision, Soto and Wasserman (2014) review the large body of work focused on the object recognition abilities of pigeons. Pigeons have long been known to exhibit sophisticated visual recognition abilities. The authors argue that many core components of object recognition behavior are found across a wide range of vertebrate species, and that birds represent a fruitful model system for studying these abilities.

# Fish and Amphibian Studies

High-level visual functions, such as shape processing and object recognition, are also addressed by several behavioral studies on fish in this Research Topic.

The perception of illusory shapes and boundaries is the subject of two reviews/opinions (Agrillo et al., 2013; Rosa Salva et al., 2014) and one research article (Fuss et al., 2014). Based on the observation that different groups of teleost fish exhibit both modal and amodal completion (e.g., perception of illusory contours, as in the Kanizsa figures), Agrillo et al. (2013) argue that fish represent an excellent experimental model for studying the development of gestalt principles of visual perception in newborn animals. In particular, the authors stress the potential of investigating such principles in the zebrafish, one of the main model organisms for the study of neurodevelopmental genetics.

Along these lines, Fuss et al. (2014) present new results suggesting that bamboo sharks perceive at least some illusory contour stimuli in a manner similar to how they are perceived in other non-fish species. Bamboo sharks generalize training with visual shapes to their equivalent Kanizsa figures, though results with some illusions, such as Mueller-Lyer figures, are less clear. These results speak both to the universality of certain mechanisms of contour perception and to the ability to probe detailed behavior in a wide range of fish species.

In their review, Rosa Salva et al. (2014) stress the important advantages of working with fish for comparative studies of brain evolution. Fish diverged from other vertebrates about 450M years ago, and diversified into a collection of taxa. This diversification makes fish an excellent model system to study how recognized homologies have evolved using diverse neural resources and substrates. The authors discuss a number of complex visual processing functions in relation to visual illusions, 2nd order motion, perceptual binding, attentional prioritization, etc. Together, these observations challenge the assumption that a complex neural circuitry (e.g. an associative cortex) is needed for adaptive object perception.

Two other behavioral studies investigate object categorization (Newport et al., 2014) and spectral sensitivity (Siebeck et al., 2014) in fish models. Newport et al. (2014) assess the ability of the archer fish to categorize objects using a range of challenging psychophysical tasks. While it is difficult to train these fish to perform some more complicated tasks, such as match-to-sample and odd-one-out tasks, the fish are able to robustly learn twoalternative forced choice tasks, providing a powerful window into the visual abilities of this species.

Siebeck et al. (2014) examine luminance perception in reef fish, focusing in particular on spectral sensitivity of luminance vision. They find that, as in many terrestrial vertebrates, long and medium wavelength cones contribute to luminance perception, but short wavelength (blue) cones do not.

Finally, one research article uses the retina of an amphibian, the bullfrog, to study the effect of dopamine on the processing of visual information (Xiao et al., 2014). Dopamine is synthesized and released by interplexiform and amacrine cells in the bullfrog retina and is known to exert a number of important modulatory effects on retinal responses. The authors, by systematically changing the duration of visual stimuli and using an informationtheoretical approach, dissect the complex role of dopamine in the encoding of stimulus duration.

# Insect Studies

The Topic also includes two articles focusing on motion perception and visual tracking behavior in insect models (Aptekar et al., 2014; Egelhaaf et al., 2014). Aptekar et al. (2014) present methods and software for probing the motion processing system of Drosophila. This work represents just one example of the high degree of sophistication in stimulus generation and data analysis that exists for interrogating the visual system in insets.

Finally, in their review, Egelhaaf et al. (2014) take a broader perspective on motion processing in insects, noting that their

# References


motion detection system is sensitive to non-motion visual properties such as texture, and that these properties may reflect the adaptation of the visual system to its environment and the needs of the animal. The review represents an interesting perspective on how a simple visual system and the statistics of the natural environment can interact to enhance the readout of behaviorally-relevant cues, such as the location of nearby objects, while suppressing the representation of less-relevant distal cues.

# Concluding Remarks

As the depth and breadth of the contributions to this Research Topic attest, "simpler" brains have a great deal to teach us about vision. From insects to fish, birds, amphibians, and finally mammals, research aimed at understanding vision in simple animal models is flourishing.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Zoccolan, Cox and Benucci. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Spatiotemporal specificity of contrast adaptation in mouse primary visual cortex

# *Emily E. LeDue , Jillian L. King , Kurt R. Stover and Nathan A. Crowder\**

*Department of Psychology and Neuroscience, Dalhousie University, Halifax, NS, Canada*

#### *Edited by:*

*Davide Zoccolan, International School for Advanced Studies, Italy*

#### *Reviewed by:*

*Inah Lee, Seoul National University, Edward S. Ruthazer, McGill University, Canada South Korea*

#### *\*Correspondence:*

*Nathan A. Crowder, Department of Psychology and Neuroscience, Dalhousie University, 1355 Oxford Street, PO Box 15000, Halifax, NS B3H 4R2, Canada e-mail: nathan.crowder@dal.ca*

Prolonged viewing of high contrast gratings alters perceived stimulus contrast, and produces characteristic changes in the contrast response functions of neurons in the primary visual cortex (V1). This is referred to as contrast adaptation. Although contrast adaptation has been well-studied, its underlying neural mechanisms are not well-understood. Therefore, we investigated contrast adaptation in mouse V1 with the goal of establishing a quantitative description of this phenomenon in a genetically manipulable animal model. One interesting aspect of contrast adaptation that has been observed both perceptually and in single unit studies is its specificity for the spatial and temporal characteristics of the stimulus. Therefore, in the present work we determined if the magnitude of contrast adaptation in mouse V1 neurons was dependent on the spatial frequency and temporal frequency of the adapting grating. We used protocols that were readily comparable with previous studies in cats and primates, and also a novel contrast ramp stimulus that characterized the spatial and temporal specificity of contrast adaptation simultaneously. Similar to previous work in higher mammals, we found that contrast adaptation was strongest when the spatial frequency and temporal frequency of the adapting grating matched the test stimulus. This suggests similar mechanisms underlying contrast adaptation across animal models and indicates that the rapidly advancing genetic tools available in mice could be used to provide insights into this phenomenon.

**Keywords: adaptation, mouse vision, primary visual cortex, sinusoidal gratings, pattern-specificity, electrophysiology, context**

# **INTRODUCTION**

Our perception of the world around us, and the neural activity underlying this experience, is strongly dependent on the recent stimulus history. In the visual system, there is evidence for a number of self-calibration mechanisms that rapidly adapt visual processing according to the prevailing attributes of the stimulus being viewed (Carandini, 2000). Contrast adaptation has been used extensively to study this form of short-term plasticity. In psychophysical studies, prolonged viewing of a high-contrast pattern can produce a perceived fading of the adapting stimulus and reduce sensitivity to low contrasts, but it can also improve sensitivity and discrimination around the adapting contrast (Blakemore and Campbell, 1969; Greenlee and Heitger, 1988; Foley and Chen, 1997; Abbonizio et al., 2002). Primary visual cortex (V1) neurons have sigmoidal contrast response functions when spike rate is plotted as a function of stimulus contrast, and contrast adaptation has been shown to shift the most sensitive part of the curve toward the adapting contrast (Movshon and Lennie, 1979; Ohzawa et al., 1982, 1985; Sclar et al., 1989; Bonds, 1991; Ibbotson, 2005). A case has also been made that contrast adaptation (and similar processes) must be incorporated into models of V1 to better predict the responses of real neurons to natural stimuli (Carandini et al., 2005). Thus, there is converging evidence that contrast adaptation is a fundamental process that the visual system uses to make moment-to-moment adjustments in its sensitivity to incoming input.

Both psychophysical observations and single unit recording studies in V1 indicate that contrast adaption is pattern-specific such that its magnitude can depend on the spatial frequency (SF), temporal frequency (TF), or orientation of the adapting and test stimuli (Blakemore et al., 1973; Vautin and Berkley, 1977; Movshon and Lennie, 1979; Albrecht et al., 1984; Ohzawa et al., 1985; Saul and Cynader, 1989a,b; Snowden and Hammett, 1996; Müller et al., 1999). This pattern-specificity has been used to constrain possible mechanisms underlying contrast adaptation. For example, both psychophysical and V1 data indicate that contrast adaptation is strongest when the SF of the adapting stimulus matches the test stimulus (psychophysics: Blakemore and Campbell, 1969; Blakemore and Nachmias, 1971; Blakemore et al., 1973; Snowden and Hammett, 1996; neurophysiology: Movshon and Lennie, 1979; Ohzawa et al., 1985; Saul and Cynader, 1989a), but this SF specificity must develop in the cortex because contrast adaptation in the lateral geniculate nucleus (LGN) does not appear to be SF specific (Duong and Freeman, 2007).

Several cellular and circuit mechanisms have been proposed to play a role in contrast adaptation (for a review see Kohn, 2007), but understanding of the cause of contrast adaptation remains incomplete. Several useful genetic tools available in mice could provide another avenue to explore contrast adaptation, but baseline conditions must first be established in this species to make any genetic manipulation related to contrast coding interpretable. Several recent studies of mouse V1 have revealed similarities between mice and higher mammals, including tuning for spatial and temporal frequencies, selectivity for orientation and direction, and the presence of simple and complex cells (Niell and Stryker, 2008; Gao et al., 2010; Van den Bergh et al., 2010). However, contrast adaptation in mouse V1 has been reported in only two studies (Niell and Stryker, 2008; Stroud et al., 2012). Stroud et al. (2012) investigated the orientation specificity of contrast adaptation, but the other two aspects of the patternspecificity of contrast adaptation that have been so important for linking electrophysiological studies in higher mammals to human psychophysical observations, namely specificity for SF and TF, remain unexplored. Therefore, we examined the spatiotemporal specificity of contrast adaptation in mouse V1 using a top-up adaptation protocol that was comparable with previous studies in cat and monkey (Movshon and Lennie, 1979; Duong and Freeman, 2007; Dhruv et al., 2011). We also used dynamic contrast ramp stimuli of varying SF and TF to obtain rapid measures of contrast adaptation with a wide variety of adaptors (Crowder et al., 2008; Stroud et al., 2012).

Mouse V1 neurons showed robust contrast adaptation when the adapting grating matched the neuron's preferred stimulus, which confirms earlier findings (Stroud et al., 2012). Furthermore, in the top-up protocol contrast adaptation was diminished or absent when the SF or TF of the adaptor did not match the neuron's preference, indicating that mouse V1 neurons show adaptation specificity similar to that observed in cats and primates. Adaptation observed in the contrast ramp experiments was also pattern-selective, but maximal adaptation often occurred at slightly higher-than-preferred SFs, indicating that the exact properties of the contrast adaptation observed depends on the nature of the testing protocol.

# **MATERIALS AND METHODS**

#### **ANESTHESIA AND SURGICAL PROCEDURES**

The experimental procedures reported herein conform to the guidelines established by the Canadian Council on Animal Care, which were approved by the University Committee on Laboratory Animals at Dalhousie University. Electrophysiological recordings were made from 25 adult male C57 BL/6 J mice weighing between 20 and 30 g, which were purchased from Jackson Laboratories (Bar Harbor, Maine). In early experiments, mice (*n* = 15) were sedated with chlorprothixene (5 mg/kg ip; Sigma, St. Louis, MO), and then anesthetized with urethane (0.5–1.2 g/kg ip; Sigma). If needed, a small dose of ketamine (20 mg/kg ip; Wyeth) was given to accelerate descent to the surgical plane of anesthesia, and allow a tracheotomy to be performed quickly (see Moldestad et al., 2009 for details). Mice were left free-breathing throughout the experiment and a tube located in front of the mouse delivered oxygen (0.1 L/min) to supplement room air. In later experiments, mice (*n* = 10) were sedated with chlorprothixene (5 mg/kg ip) and anesthetized with isoflurane delivered through a customized nose cone (2.5% during induction, 1.5% during surgery, and 0.4–1% during recording), which decreased preparation time by eliminating the need for a tracheotomy. Gas anesthesia did not appear to affect the frequency of encountering responsive units, and produced no significant differences in the tuning strength or selectivity of recorded units (assessed with discrimination indices, see *Initial data analysis* below; two-sample *t*-tests, *p* > 0.2 for all; c.f. Kaneko et al., 2012). For all mice, body temperature was maintained at 37.5◦C with a heating pad, and their corneas were protected by frequent application of a thin layer of optically neutral silicone oil (30000 cSt; Sigma). The skull was stabilized in a stereotax, and a craniotomy (∼1 mm2) was made over the monocular retinotopic representation in primary visual cortex (∼0.8 mm anterior and 2.3 mm lateral to lambda; Paxinos and Franklin, 2001). Recordings were made using either glass micropipettes (2–5µm tip diameter, filled with 2 M NaCl) or carbon-fiber in glass microelectrodes (0.6–1.5 M impedance). Electrode depth was controlled using a micromanipulator (FHC, Bowdoin, ME). Extracellular signals from individual units were amplified (Xcell 3+, FHC) and filtered (bandpass: 50–2000 Hz) before being digitized (Cambridge Electronic Design Power1401 with Spike2, Cambridge, England). Acquired signals were sampled at 40 kHz, and online analysis was performed on triggered TTL pulses with Spike2, but subsequent analysis was done offline.

# **VISUAL STIMULI**

Upon isolation of a visually responsive unit, the receptive field (RF) was mapped using hand-driven light bars and spots. Quantitative testing was then performed with custom computer generated visual stimuli programmed in MatLab (MathWorks, Natick, MA) using the Psychophysics Toolbox extension (Brainard, 1997; Pelli, 1997), and presented on a calibrated CRT monitor (LG Flatron 915FT plus 19" display, 100 Hz refresh, 1024 <sup>×</sup> 768 pixels, mean luminance <sup>=</sup> 30 cd/m2) at a viewing distance of 10–25 cm. All stimuli were presented in a circular aperture surrounded by a gray field of mean luminance. Orientation selectivity and surround suppression were characterized online using drifting square wave gratings. Spatiotemporal tuning was then assessed with full contrast drifting sine wave gratings with 36 combinations of SFs [0.01, 0.02, 0.04, 0.08, 0.16 and 0.32 cycles per degree (cpd)] and TFs (0.25, 0.5, 1, 2, 4, 8 Hz). All spatiotemporal and adapting stimuli were presented at the optimal orientation and size for each unit, and drifted in the direction that elicited maximal excitation. Presentations of each combination of SF and TF were randomized with 8–10 repeats for each stimulus. The presentation time of the stimulus was 1.5 s, and a gray of mean luminance was shown between stimuli for 0.5 s. Grating start-phase was staggered on each repetition to average out periodic firing of phase-sensitive neurons. The spatiotemporal tuning of each unit was then examined online and appropriate adaptors were selected for the subsequently presented contrast adaptation protocols. Two stimulus protocols that have previously been used to investigate contrast adaptation in mice, cats, and primates were modified to study the spatiotemporal specificity of contrast adaptation in mouse V1: top-up adaptation (Sclar et al., 1989; Duong and Freeman, 2007; Stroud et al., 2012), and contrast ramps (Crowder et al., 2008; Stroud et al., 2012).

#### *Top-up adaptation*

We chose the top-up contrast adaptation protocol because it has commonly been used to study the stimulus specificity of contrast adaptation in higher mammals (e.g., Movshon and Lennie, 1979; Duong and Freeman, 2007; Dhruv et al., 2011), which facilitates cross-species comparisons. Sine-wave contrast is defined as:

$$\text{Mickelson contract} = \frac{(\text{Luminance}\_{\text{max}} - \text{Luminance}\_{\text{min}})}{(\text{Luminance}\_{\text{max}} + \text{Luminance}\_{\text{min}})} \tag{1}$$

where Luminancemax and Luminancemin are the maximum and minimum luminances, respectively. Non-adapted contrast response functions were obtained by recording responses to ten contrasts (0.04, 0.08, 0.12, 0.16, 0.24, 0.32, 0.48, 0.64, 0.82, 1) presented in random order for 0.5 s tests (8–12 repetitions) interleaved with 4 s of mean luminance. Adapted contrast response functions were collected in blocks where: (1) the adapting grating matched the cell's spatiotemporal peak; (2) the SF of the adapting grating was 1–3 octaves higher or lower than the cell's preferred SF; and (3) the TF of the adapting grating was set to 8 Hz. Adaptation blocks consisted of 60 s of the adapting grating at a contrast of 0.32 followed by 0.5 s tests (aforementioned contrasts for 8–12 repetitions) interleaved with 4 s adaptation top-ups. An adapting contrast of 0.32 was chosen because our previous study of contrast adaptation in mouse V1 (Stroud et al., 2012) indicated that this contrast produced reliable adaptation while still allowing the data to be easily fit with sigmoid curves (see Curve Fitting below).

#### *Contrast ramps*

One drawback of the top-up protocol described above is that it takes a long time to record even a single adapted contrast response function (Sclar et al., 1989; Crowder et al., 2006). Therefore, when exploring the SF or TF specificity of contrast adaptation, only a few conditions can be examined for any single cell. To more fully assess the nature of contrast adaptation in the spatiotemporal domain, we used contrast ramp stimuli. Contrast ramps are dynamic contrast stimuli where the contrast of the sine wave grating is changed linearly on each animation frame over the time-course of the presentation. Importantly, these ramps are able to measure several key markers of contrast adaptation with fairly short presentation times (Crowder et al., 2008; Stroud et al., 2012). Contrast ramp stimuli were first presented at a contrast of 0, and contrast was increased linearly over 2 s until it reached 1 (rising phase). The contrast of the grating was then ramped back down from 1 to 0 (falling phase) over the next 2 s. Thus, the neuron is presented with identical contrasts in the rising and falling phases, but the order of presentation (i.e., temporal context) is reversed. A full screen gray of mean luminance was shown between ramp stimuli for 2 s. In this protocol, the spatiotemporal specificity of contrast adaptation was tested by varying the SF and TF of the contrast ramps using the 36 combinations of SFs (0.01, 0.02, 0.04, 0.08, 0.16, and 0.32 cpd) and TFs (0.25, 0.5, 1, 2, 4, 8 Hz) that were directly comparable with the spatiotemporal profile obtained for each neuron. Contrast ramps with different spatiotemporal combinations were randomized and repeated 8–12 times for each combination.

We were interested in determining whether the spatiotemporal combination that caused maximal firing also caused maximum hysteresis between the rising and falling portions of the contrast ramp. In order to test this, we used a *symmetrical* contrast ramp procedure, which maintained the same spatiotemporal parameters for both the rising and falling phase of the ramp. We also collected a second type of contrast ramp from a subset of neurons referred to as *peak-tested* contrast ramps that were more directly comparable to the top-up protocol. In the peak-tested protocol, the rising phase of the contrast ramp was one of the 36 combinations of SF and TF, but the falling phase was always shown at the neuron's preferred SF and TF (see Results), as chosen from the online tuning function.

#### **INITIAL DATA ANALYSIS**

Spike sorting was performed offline with Spike2 software, which first searched for and sorted spikes using a supervised templatematching algorithm, and then displayed candidate spikes with a principle components analysis for approval. Data was exported to MatLab and neuronal responses were represented as spike density functions (SDF) with 1 kHz resolution generated by convolving a delta function at each spike arrival time with a Gaussian window. For each unit, we calculated the magnitude of orientation, size, and spatiotemporal tuning using a discrimination index (DI) (DeAngelis and Uka, 2003):

$$\text{DI} = \frac{\text{(Resp}\_{\text{Max}} - \text{Resp}\_{\text{Min}}\text{)}}{\left(\left(\text{Resp}\_{\text{Max}} - \text{Resp}\_{\text{Min}}\right) + 2\sqrt{\text{SSE}}/(N - M)\right)} \tag{2}$$

RespMax is the neuron's max response, while RespMin is the neuron's minimum response. SSE is the sum of squared error of the mean, *N* is the total number of presentations of the stimuli, and M is the number of different stimuli presented. In order to classify cells as simple or complex, we divided the first Fourier coefficient of a neuron's response to a grating near the spatiotemporal peak (*F*1) by the mean time-averaged response to this grating (*F*0) (Movshon et al., 1978a,b; Skottun et al., 1991). Despite some recent controversy (Mechler and Ringach, 2002; Crowder et al., 2007; Henry and Hawken, 2013; Hietanen et al., 2013), the *F*1/*F*<sup>0</sup> ratio has been used to quantitatively classify simple and complex cells in numerous studies, and an *F*1/*F*<sup>0</sup> ratio less than 1 indicates a cell is complex.

#### *Curve fitting*

We used the least squares method to fit contrast response functions. Sigmoid curves (Albrecht and Hamilton, 1982) were fit to the mean responses from top-up contrast response functions and SDFs produced by contrast ramps:

$$\mathcal{R}\left(c\_{i}\right) = \frac{R\_{\text{max}} \times c\_{i}^{n}}{c\_{i}^{n} + c\_{50}^{n}} + \mathcal{M} \tag{3}$$

where R(*ci*) is the amplitude of the evoked response at contrast *ci*, M is the spontaneous rate, *n* is the exponent that determines the steepness of the curve, Rmax is the maximum elevation in response above the spontaneous rate, and c50 is the contrast that generates a response elevation of half Rmax. Response saturation was evident for almost all non-adapted top-up contrast response functions and rising ramp responses allowing for well constrained fits. When fitting adapted curves where the response to maximal contrast was similar to or less than the non-adapted response but saturation was not evident, we assigned an upper bound on the adapted Rmax of 15% above the non-adapted Rmax in order to obtain tractable fits.

#### *Neuronal latency*

To examine the amount of hysteresis for each contrast ramp, responses were latency-corrected as previously described (Crowder et al., 2008; Stroud et al., 2012). Briefly, for each unit a response threshold was established based on the 99% cut-off from a Poisson distribution fitted to the spontaneous firing rate. Each unit's response latency was calculated as the first time the spiking rate in the response to gratings of optimal SF and TF (from the spatiotemporal tuning stimulus) exceeded the aforementioned Poisson threshold and stayed above the threshold for the subsequent 25 ms (Price et al., 2005). For each unit, responses to contrast ramps were shifted back in time by the neural latency then split into the rising and falling phases and re-plotted using units of contrast on the abscissa instead of time (which resulted in the falling phases of contrast ramps being flipped left-to-right).

# **RESULTS**

Recordings were collected from 188 visually responsive units in the primary visual cortex of 25 C57BL/6 J mice. We obtained contrast adaptation data from 65 units using the top-up protocol, and 125 units using the ramp protocol (*n* = 90 for *symmetrical* contrast ramps; *n* = 35 for *peak-tested* contrast ramps). The stimulus preferences of units in our sample were generally consistent with previous reports. Discrimination indices for orientation selectivity (0.49 ± 0.1; mean ± s.d.), size tuning (0.62 ± 0.1), and spatiotemporal selectivity (0.64 ± 0.08) were similar to those reported by Gao et al. (2010). Peak SFs and TFs were broadly distributed, with preferred SFs ranging from 0.01 to 0.18 cpd (mean = 0.03 cpd) and preferred TFs ranging from 0.25 to 8 Hz (mean = 1.77 Hz). **Figure 1A** shows the grid-like array of responses used to measure the spatiotemporal tuning of a sample neuron, and **Figure 1B** shows how these responses can be summarized as a contour plot to indicate the combination of SF and TF that produced the maximal response. Our range of peak SFs and TFs were similar to recent electrophysiological studies of mouse visual cortex (Niell and Stryker, 2008; Gao et al., 2010; LeDue et al., 2012), and within the ranges shown by recent multi-photon calcium imaging studies (Andermann et al., 2011; Marshel et al., 2011). Finally, 157 units were classified as complex (*F*1/*F*<sup>0</sup> ratio < 1) and 35 units were classified as simple (*F*1/*F*<sup>0</sup> ratio > 1). Since simple and complex cells showed similar trends for all measures of contrast adaptation, they were pooled into a single group.

#### **TOP-UP CONTRAST ADAPTATION**

Robust contrast adaptation following prolonged exposure to an adaptor of the preferred SF and TF has been shown previously in mouse V1 (Stroud et al., 2012). However, to our knowledge the spatial and temporal frequency specificity of contrast adaptation in mouse V1 have not been explored. Therefore, we compared the magnitude of contrast adaptation induced by an adaptor with preferred SF and TF with that induced by an adaptor with nonpreferred SF or TF. We chose a non-preferred adapting TF of 8 Hz because high TFs rarely elicited strong responses. This high TF adaptor also permitted comparisons with primate work, which has shown that high TFs can reliably induce contrast adaptation in V1 without strongly driving the recorded neurons (Dhruv et al., 2011), presumably by inducing adaptation in the LGN (Solomon et al., 2004). We selected non-preferred SFs 1–3 octaves higher or lower than the peak SF depending on the breadth and location of the recorded unit's spatiotemporal tuning. Care was taken to ensure non-preferred SFs elicited weak responses from the recorded unit but also were within the range of peak SFs of our sample population and below the mean SF cutoff reported in previous studies of LGN and V1 (Grubb and Thompson, 2003; Gao et al., 2010). Higher adapting SFs were selected more often than lower ones since SFs lower than 0.01 cpd can begin to appear as global changes in luminance within the stimulus aperture.

**Figures 1C–F** shows the SF and TF specificity of contrast adaptation for four example neurons. Contrast response functions are shown for non-adapted (black squares), preferred adapted (red circles), non-preferred SF adapted (green triangles), and non-preferred TF adapted (blue stars) conditions. For the cell in **Figure 1F**, contrast response functions from two different non-preferred SFs (low SF = pink diamonds; high SF = green triangles) are shown. The spatiotemporal tuning of each unit is shown inset with the SF and TF values of the adapting stimuli indicated with matching symbols. In each case the preferred adaptor induced the most contrast adaptation. Non-preferred adaptors either induced virtually no adaptation (**Figures 1C,D**), or less adaptation than the preferred stimulus (**Figures 1E,F**).

Sigmoid fits to each contrast response function are shown as thin lines in **Figure 1**, and we used the c50 and Rmax parameters extracted from these fits to quantitatively analyze changes in contrast response functions following top-up adaptation. For each adaptation condition we measured the change from the non-adapted curve as a difference-over-sum calculation (parametershift = [adapted – non-adapted]/[adapted + nonadapted]), and plotted this metric as population histograms in **Figure 2**. For **Figures 2A,C,E** positive values of c50-shift indicate a rightward shift in the adapted contrast response function. Nearly all cells showed a rightward shift following preferred adaptation (**Figure 2A**, mean c50-shift = 0.26), but the population was centered closer to zero for both adaptors with non-preferred SF (**Figure 2C**, mean c50-shift = 0.09) and TF (**Figure 2E**, mean c50-shift = 0.05). A One-Way ANOVA followed by a Tukey-Kramer *post-hoc* indicated that the preferred adaptation produced significantly larger values of c50-shiftthan the other two adaptation conditions [*F*(2, <sup>169</sup>) = 27.54, *p* < 0.001], while nonpreferred SF and TF c50-shift did not differ. For **Figures 2B,D,F** negative values of Rmax-shift indicate a decrease in firing to maximal contrast following adaptation. Most cells showed a modest decrease in Rmax following preferred adaptation (**Figure 2B**, mean Rmax-shift = −0.17), but the population was centered near zero for both adaptors with non-preferred SF (**Figure 2D**, mean Rmax-shift = 0.02) and TF (**Figure 2F**, mean Rmax-shift = 0.05). A One-Way ANOVA followed by a Tukey-Kramer *post-hoc* showed similar results to the c50 data, with preferred adaptation producing significantly more negative values of Rmax-shift than the other 2 adaptation conditions [*F*(2, <sup>169</sup>) = 7.72, *p* < 0.001], while

non-preferred SF and TF Rmax-shift did not differ. Another way of quantifying the spatiotemporal specificity of contrast adaptation is simply to rank order the adapted curves for each cell. Preferred adaptation c50 values were larger than c50 values measured following non-preferred SF adaptation for 90% of cells, and nonpreferred TF adaptation for 92% of cells. Preferred adaptation Rmax values were smaller than Rmax values measured following non-preferred SF adaptation for 75% of cells, and non-preferred TF adaptation for 70% of cells.

#### **CONTRAST RAMP ADAPTATION**

The top-up adaptation data above clearly demonstrates that the magnitude of contrast adaptation in mouse V1 depends on the adapting SF and TF, however, as noted in the Methods section only a few adaptation conditions can be studied for any single cell due to the time constraints imposed by this protocol. Therefore, we used *symmetrical* contrast ramp stimuli to more extensively map the spatiotemporal selectivity of contrast adaptation. **Figure 3A** shows the response of a representative neuron to a contrast ramp of optimal SF and TF. Even though the rising and falling phases of the ramp stimulus are symmetrical, the spiking response shows clear hysteresis. If this spiking response is latency-corrected and re-plotted with contrast on the abscissa (see Materials and Methods), the difference between the responses to the rising (red) and falling (blue) phases of the contrast ramp is accentuated further (**Figure 3B**). As in previous studies (Crowder et al., 2008; Stroud et al., 2012), the SDFs were fit to sigmoid curves (thin lines). The most useful parameter extracted from the sigmoid fits was c50, since it captured the rightward shift in the contrast response function by comparing semi-saturation contrasts of the rising (upward pointing arrowhead) and falling phases (downward pointing arrowhead) of the contrast ramp. For

our sample (*n* = 90), c50 values from the rising phase were almost always smaller than c50s from the falling phase (**Figure 3C**), and this difference was significant (*p* < 0.01, paired *t*-test). This replicates earlier findings in cats (Crowder et al., 2008), and mice (Stroud et al., 2012).

To map the spatiotemporal specificity of contrast adaptation we measured the hysteresis of ramp responses when the SF and TF of the ramp grating were varied (for easy comparison to the spatiotemporal tuning also obtained for each neuron we used the same 36 combinations of SF and TF). This stimulus protocol examined the spatiotemporal specificity of contrast adaptation from a slightly different perspective than the top-up protocol. The top-up protocol measured whether the magnitude of contrast adaptation was affected if the adapting grating did not match the test grating, which emphasized the importance of the adapting stimulus. Symmetrical contrast ramps measured the combination of SF and TF that produced the most hysteresis, which emphasized the importance of the cell's own preferred stimulus in determining the strength and specificity of the adaptation effect (Saul and Cynader, 1989a). **Figure 4A** shows a grid of SDF ramp responses from a sample cell, each with the same format as **Figure 3B**. This neuron had strong ramp responses with substantial hysteresis around 0.02–0.04 cpd and 1–2 Hz. Responses to lower TFs (∼0.25 Hz) showed little hysteresis despite monotonic increases in firing with contrast, and the entire ramp response

from fits to the rising and falling phases, respectively. Population data comparing c50 values obtained from fits to the rising (abscissa) and falling

phase responses (ordinate) is shown in **(C)**.

flattened out at the highest SFs and TFs. The former effect was observed in 81/90 neurons, indicating that diminished adaptation was not solely due to lack of responding. We wanted to summarize the pattern of hysteresis for each neuron as a contour plot, but c50 values taken from sigmoid fits to SDFs were unreliable for spatiotemporal combinations away from the peak, so we measured adaptation by calculating the mean difference between the responses to rising and falling phases of the contrast ramps. During adaptation, the semi-saturation contrast of the falling phase ramp response shifts to higher values, causing the falling ramp response to be lower than the rising phase response at most contrasts. This method of analysis has been shown by Stroud et al. (2012) to capture the major features of contrast adaptation without relying on fitting the ramp SDFs to sigmoid functions. **Figure 4B** shows the contour plot for this neuron summarizing the magnitude of hysteresis evoked by each combination of SF and TF. The first feature to note is the clear peak around 0.04 cpd and 1–2 Hz. Likewise, 85 out of 90 units produced contour plots with an easily identifiable single peak that was at least four times higher than the level of hysteresis produced by the least effective ramp. This supports our earlier finding of spatiotemporal specificity of contrast adaptation using a different method. The second feature of the hysteresis contour plot that we were interested in was whether the combination of SF and TF that produced maximum hysteresis for contrast ramps matched the

symmetrical contrast ramps of varying TFs (rows) and SFs (columns). Each SDF follows a similar format to **Figure 3B**, with responses to the rising phase of the contrast ramp shown in blue and responses to the falling phase shown in red. A scale bar depicting time vs. impulses per second (ips) is shown in the lower right (0.32 cpd and 0.5 Hz). The spatiotemporal pattern of hysteresis for the sample neuron is represented as a blue-tinted contour plot in **(B)**, with larger mean differences between the rising and falling phase responses shown as more desaturated hues (see Results). For comparison, the spatiotemporal tuning of the sample neuron is shown in **(C)** as a grayscale contour plot, and the correlation between the two contour plots is indicated (double-headed arrow). For both contour plots SF is on the abscissa and TF in on the ordinate.

neuron's peak in the spatiotemporal domain tested with regular grating blocks (**Figure 4C**). For this neuron, the two contour plots look similar (*R* = 0.78 from a 2D correlation analysis), but the gratings that produced maximum hysteresis had a slightly higher SF than the gratings that produced maximum firing. **Figure 5** shows two more example cells, one where the spatiotemporal locations of maximum firing and maximum hysteresis match quite closely (**Figures 5A,B**; *R* = 0.87), and another where spatiotemporal location of maximum hysteresis is at a higher SF and lower TF (**Figures 5C,D**; *R* = 0.51). **Figure 5E** plots the difference in peak locations from the two types of contour plots in the spatiotemporal domain for each cell as "hatpins" (*n* = 85), with the empty dots indicating the location of maximum hysteresis. No pattern was apparent, indicating that there was not one specific combination of SF and TF that universally induced maximal hysteresis across cells. **Figure 5F** normalized the data from **Figure 5E** by calculating the octave difference in SF and TF between each pair of peaks to show the location of maximal contrast hysteresis (empty dots) relative to each cell's peak in spatiotemporal tuning (all normalized to 0). Although the differences were small (45% of cells had peaks within 1 octave of each other, and the median *R*-value from 2D correlations was 0.71), at the population level the gratings that produced maximum hysteresis tended to have slightly higher SFs (mean: 0.47 octaves) and lower TFs (mean = −0.22 octaves) than the gratings that produced maximum firing. A 2-way repeated measures ANOVA (with peaks from spatiotemporal tuning vs. hysteresis contour plots and SF vs. TF as factors) showed a significant main effect of peak type [*F*(1, <sup>84</sup>) = 3.95, *p* < 0.05], indicating that the spatiotemporal location of peak hysteresis and peak firing tended to be different. Furthermore, a significant interaction between factors indicated that the difference between peaks was larger for SF than for TF [*F*(1, <sup>84</sup>) = 15.75, *p* < 0.001].

Considering that we consistently observed the strongest adaptation when using the preferred SF and TF in the top-up protocol we were surprised by the results of the ramp protocol. However, as noted above there was one key difference between adapting procedures: in the top-up protocol the adapting grating varied but the test gratings were always set at the preferred SF and TF, whereas in the symmetrical ramp protocol the SF and TF of the grating remained constant throughout the rising and falling phases of the contrast ramp. To determine whether the different adaptation effects observed between the top-up and contrast ramp protocols were due to switching the SF/TF between adapting and test stimuli or the dynamic nature of the contrast ramp stimuli we presented a subset of cells with a modified ramp protocol referred to as *peak-tested* ramps. For these peak-tested ramps, the rising phase could have any one of the 36 combinations of spatiotemporal frequencies (a proxy for the adapting gratings in the top-up protocol), but the falling phase was always shown at the neuron's preferred spatiotemporal frequency (a proxy for the test gratings). For these stimuli, we compared the response to the falling phase at the spatiotemporal peak with the falling phase responses at every other spatiotemporal combination since these stimuli were identical with only the preceding rising phase differing (**Figure 6A**). We expected the difference to be large if no contrast adaptation occurred, or small if contrast adaptation did occur. Responses from a representative neuron are shown in **Figures 6A–C**. We again represented the spatiotemporal specificity of adaptation for each neuron as a contour plot (**Figure 6B**), compared the adaptation contour plot to each neuron's spatiotemporal profile (**Figure 6C**), and calculated the octave difference in SF and TF between peaks for the population (**Figure 6D**). For peak-tested ramps, there were clear peaks in the hysteresis contour plots of every cell (e.g., **Figure 6B**). Importantly, the contrast hysteresis and spatiotemporal profile contour plots were much more similar using this protocol. For the sample neuron shown in **Figures 6A–C** the 2D correlation between contour plots was 0.96, and the population median was 0.86. Furthermore, the 2D correlations between contour plots for the peak-tested protocol were significantly higher than for the symmetrical ramp protocol (*p* < 0.0001; *t*-test). Mean octave differences in peak location between contour plots were −0.08 and −0.06 for SF and TF, respectively (**Figure 6D**). A Two-Way repeated measures ANOVA (with peaks

**FIGURE 5 | Population data from symmetrical contrast ramp adaptation.** The spatiotemporal specificity of adaptation induced with contrast ramps (left column) and spatiotemporal tuning (right column) are compared for two additional sample neurons. As in **Figure 4**, contrast ramp hysteresis is shown as blue-tinted contour plots, and spatiotemporal tuning is shown as grayscale contour plots. The contour plots from the neuron in the top row **(A,B)** match quite closely, indicating that similar grating parameters produced maximum firing and maximum contrast ramp hysteresis. The contour plots from the second neuron (middle row; **C,D)** match less closely, with the spatiotemporal location of maximum hysteresis occurring at a higher SF and lower TF than the peak in spatiotemporal tuning. The correlations between the contrast ramp and spatiotemporal tuning contour plots are indicated for each neuron (double-headed arrows). **(E)** shows population data comparing the spatiotemporal location of maximal hysteresis (empty dots) with the locations of peak responding from spatiotemporal tuning (lines). For **(A–E)**, SF is on the abscissa and TF in on the ordinate. **(F)** normalized the data from **(E)** by calculating octave differences in SF (abscissa) and TF (ordinate) to show the location of maximal contrast hysteresis (empty dots) relative to each cell's spatiotemporal tuning (all normalized to 0). Population mean is shown as a solid red circle.

from spatiotemporal tuning vs. hysteresis contour plots and SF vs. TF as factors) indicated that neither the main effect of peak type [*F*(1, <sup>34</sup>) = 0.31, *p* > 0.57], nor the interaction between factors were significant [*F*(1, <sup>34</sup>) = 0.01, *p* > 0.92]. Thus, switching the SF/TF between adapting and test stimuli appear to be the important difference between the top-up and symmetrical ramp protocols because when the contrast ramp stimulus was altered to more closely resemble the top-up protocol the adaptation effects

**FIGURE 6 | Peak-tested contrast ramp adaptation.** The grid of SDFs in **(A)** show the responses of a sample neuron to peak-tested ramps where the TFs (rows) and SFs (columns) of the rising phase of the ramp were varied, but the falling phase was always shown at the neuron's peak SF and TF (red lines). The transition between non-preferred and preferred gratings is especially apparent at high SFs. A scale bar depicting time vs. impulses per second (ips) is shown in the top left (0.01 cpd and 8 Hz). The spatiotemporal pattern of adaptation for the sample neuron is represented as a red-tinted contour plot in **(B)**, with smaller mean differences between the preferred falling phase and other falling phase responses shown as more desaturated hues (see Results). For comparison, the spatiotemporal tuning of the sample neuron is shown in **(C)** as a grayscale contour plot, and the correlation between the two contour plots is indicated (double-headed arrow). For both contour plots SF is on the abscissa and TF in on the ordinate. **(D)** shows the octave differences in SF (abscissa) and TF (ordinate) between the locations of maximal contrast adaptation (empty dots) relative to each cell's spatiotemporal tuning (all normalized to 0). Population mean is shown as a solid pink circle.

also matched the top-up results more closely. Overall, each of the three adaptation protocols along with their differing methods of analysis demonstrated the spatiotemporal specificity of contrast adaptation in mouse V1, even though differences between protocols produced some subtle variations in the nature of the adaptation.

#### **DISCUSSION**

This study demonstrated that contrast adaptation in mouse V1 is specific in the spatiotemporal domain. The magnitude of contrast adaptation observed in single units was found to depend on both the SF and TF of the adapting grating, and the nature of the adaptation effect could also be affected by the SF and TF of the test stimuli. The properties of contrast adaptation we observed were broadly similar to single unit studies in higher mammals (monkeys: Sclar et al., 1989; Dhruv et al., 2011; cats: Movshon and Lennie, 1979; Ohzawa et al., 1982, 1985; Saul and Cynader, 1989a,b; Bonds, 1991) and psychophysical data (e.g., Blakemore and Campbell, 1969). This suggests that contrast adaptation can be thought of as a general feature of the mammalian geniculostriate pathway along with other classical response properties (e.g., Niell and Stryker, 2008; Gao et al., 2010; Van den Bergh et al., 2010). Despite marked differences between animal models (frontal eyes vs. lateral eyes, nocturnal vs. diurnal, acuity that varies over several orders of magnitude), adaptation in mouse visual cortex appears to follow similar rules and is of similar complexity to higher mammals. We believe that these findings uphold the viability of the mouse model for studying vision, and support the validity of a multi-species approach for investigating cortical visual processing.

# **COMPARISON WITH PREVIOUS STUDIES**

To our knowledge only two previous studies have investigated contrast adaptation in mouse V1 (Niell and Stryker, 2008; Stroud et al., 2012). Stroud et al. (2012) were able to make direct comparisons between adaptation in mouse and cat V1 neurons, and reported that most key features of contrast adaptation were similar between species. When adapted and tested with an optimal grating, adaptation shifted contrast response functions down and to the right. Moreover, contrast ramps produced relatively robust contrast adaptation given their brief presentation times. The current study is in agreement with these previous findings, so the Discussion will focus on the spatiotemporal specificity of contrast adaptation.

Several previous papers have used some version of the top-up protocol to investigate either the SF or TF dependence of contrast adaptation (e.g., Movshon and Lennie, 1979; Duong and Freeman, 2007; Dhruv et al., 2011), and are therefore readily comparable to our own top-up data. Within our top-up protocol, the test stimuli were always at the optimal SF and TF for each neuron, which is most similar to the stimuli used by Dhruv et al. (2011) in their study of TF and orientation specificity of contrast adaptation in macaque V1. For these two studies, one central question was whether an adapting stimulus that itself does not strongly drive the recorded neuron could induce adaptation. Another study examining the SF specificity of contrast adaptation in cat V1 used a complementary design where the test gratings were not at each neuron's optimal SF, but rather at an SF that evoked approximately the same firing rate as the adapting grating (Movshon and Lennie, 1979). Regardless of these design differences, the general findings of these studies indicate that contrast adaptation is most robust when the parameters of the adapting grating are similar to the test grating. The current study extends this finding into a genetically tractable animal model where there exists an expanded toolbox to investigate the mechanisms underlying this specificity.

Comparing orientation specificity and spatiotemporal specificity of contrast adaptation in mouse V1 is also worthwhile. Most neurons in V1 adapt to any orientation, even ones that elicit low firing rates (Stroud et al., 2012), which contrasts with our current results in the spatiotemporal domain. This pattern of results could be produced if cortical adaptation mechanisms pooled over orientation (or the sharpness of tuning was diluted by non-oriented cells), but were at least somewhat selective in the spatiotemporal domain (Andermann et al., 2011; LeDue et al., 2012).

It has been shown that adapting gratings with a high TF can induce modest but reliable contrast adaptation in macaque V1 without strongly driving the recorded neuron (Dhruv et al., 2011), presumably by inducing adaptation in magnocellular cells in the LGN (Solomon et al., 2004). Therefore, we were somewhat surprised to observe only occasional adaptation to higher TFs in our data set. In their study of macaque V1, Dhruv et al. (2011) used an adaptor with a TF of 30–50 Hz, which was 2–3 times higher than the peak TF of LGN neurons (10–16 Hz: Derrington and Lennie, 1984; Hawken et al., 1996). For our top-up protocol, the high TF adaptor (8 Hz) was also approximately double the peak TF of mouse LGN neurons (3.8 Hz: Grubb and Thompson, 2003), yet we observed little consistent adaptation to this stimulus. We had also predicted that the peaks in the contrast hysteresis contour plots obtained with our contrast ramp stimuli may be skewed toward higher TFs, but this was not the case. We are unsure what underlies this apparent species difference, but as outlined in a model of multiple sources of adaptation used by Dhruv et al. (2011), it suggests that less adaptation is occurring in (or being inherited from) the LGN in mice. This observation, in conjunction with the finding that contrast adaptation in cat LGN does not show SF specificity (Duong and Freeman, 2007), provide two good reasons for future work to investigate adaptation in mouse LGN.

Finally, the contrast ramp stimuli we used in this study were quite unique and therefore less comparable to previous papers (although the staircase-like stimuli used by Bonds (1991) also measured hysteresis when contrasts were presented in an ordered manner). However, this data is relevant to a longstanding issue in the contrast adaptation literature. Vautin and Berkley (1977) were the first to discuss how the adaptation measured in a recorded neuron could arise from processes occurring within the cell itself (intrinsic) or be inherited from other neurons in the circuit/network (extrinsic). Saul and Cynader (1989a) suggested that intrinsic mechanisms may be more narrowly tuned since they depend on the cell's own tuning, while the aforementioned model by Dhruv et al. (2011) specified that some extrinsic sources of adaptation should be broadly tuned for certain stimulus attributes like orientation. The main strength of the contrast ramp stimulus is that it can measure features of contrast adaptation on a relatively short time-scale, and this allows for a large stimulus-space to be explored in a reasonable amount of time. This seems ideal for exploring the putative differences in tuning between intrinsic and extrinsic sources of adaptation. In the current study, both the symmetrical and peak-tested contrast ramp protocols support the spatiotemporal specificity of contrast adaptation initially described using the top-up protocol despite the fact ramp stimuli measured adaptation on a different time scale and used different metrics. It would be interesting to obtain comparative data from higher mammals for these stimuli.

#### **USING MOUSE MODELS TO STUDY CONTRAST ADAPTATION**

Electrophysiological studies in various animal models additively suggest that contrast adaptation is initiated at pre-cortical stages (retina: reviewed in Demb, 2008; LGN: Sanches-Vives et al., 2000a; Solomon et al., 2004; Duong and Freeman, 2007), and then refined and strengthened in the cortex (Ohzawa et al., 1985; Carandini, 2000; Dhruv et al., 2011). Furthermore, hyperpolarization of the membrane potential has been associated with contrast adaptation (Carandini and Ferster, 1997; Sanches-Vives et al., 2000a,b). However, questions about the specific mechanisms involved, their relative contributions, and the stage each one is implemented remain unanswered. It seems that some of the investigative tools currently most readily applied in the mouse could provide insights into these cellular and circuit mechanisms. The same biochemical and genetic flexibility that has allowed the use of optogenetic modulation to attribute particular functions to genetically defined inhibitory neurons within mouse V1 (e.g., Adesnik et al., 2012; Atallah et al., 2012), or allowed genetically encoded calcium-indicator proteins to explore the response properties of hundreds of visually responsive neurons simultaneously (e.g., Andermann et al., 2011), could also be used to explore the mechanisms underlying contrast adaptation. Moreover, if specific mechanisms are isolated they could be knocked-out or modulated

# **REFERENCES**


*Vision Res*. 16, 1043–1045. doi: 10.1016/0042-6989(76)90241-8


in real-time to probe the perceptual relevance of contrast adaptation using psychophysical tasks developed for the mouse (e.g., Busse et al., 2011). The possibility of causally linking neural processing to contrast perception is especially intriguing considering psychophysical studies of the performance enhancement conferred by contrast adaptation have been somewhat equivocal (Barlow et al., 1976; Määttänen and Koenderink, 1991; Abbonizio et al., 2002; Kohn, 2007). We are hopeful that insights gleaned from the mouse model will be relevant to higher mammals because the properties of cortical contrast adaptation that have already been explored appear quite similar between species.

# **AUTHOR CONTRIBUTIONS**

Nathan A. Crowder and Emily E. LeDue designed the study; Emily E. LeDue, Jillian L. King, Kurt R. Stover, and Nathan A. Crowder collected data; Emily E. LeDue, Jillian L. King, and Nathan A. Crowder analyzed the data; Nathan A. Crowder and Emily E. LeDue wrote the manuscript.

# **ACKNOWLEDGMENTS**

This work was supported by the Natural Sciences and Engineering Research Council of Canada and the Canada Foundation for Innovation.

*J. Neurosci*. 25, 10577–10597. doi: 10.1523/JNEUROSCI.3726-05.2005


*J. Neurophysiol*. 90, 3594–3607. doi: 10.1152/jn.00699.2003


Profound contrast adaptation early in the visual pathway. *Neuron* 42, 155–162. doi: 10.1016/S0896-6273 (04)00178-3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 July 2013; accepted: 12 September 2013; published online: 03 October 2013.*

*Citation: LeDue EE, King JL, Stover KR and Crowder NA (2013) Spatiotemporal specificity of contrast adaptation in mouse primary visual cortex. Front. Neural Circuits 7:154. doi: 10.3389/fncir. 2013.00154*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2013 LeDue, King, Stover and Crowder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Different roles of axon guidance cues and patterned spontaneous activity in establishing receptive fields in the mouse superior colliculus

# *Mingna Liu1 ‡, LupengWang1,2 †‡ and Jianhua Cang1\**

<sup>1</sup> Department of Neurobiology, Northwestern University, Evanston, IL, USA

<sup>2</sup> Interdepartmental Neuroscience Program, Northwestern University, Evanston, IL, USA

#### *Edited by:*

Andrea Benucci, RIKEN Brain Science Institute, Japan

#### *Reviewed by:*

Sarah L. Pallas, Georgia State University, USA Edward S. Ruthazer, McGill University, Canada

#### *\*Correspondence:*

Jianhua Cang, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA e-mail: cang@northwestern.edu

#### *†Present address:*

Lupeng Wang, Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD 20892, USA

‡Mingna Liu and Lupeng Wang have contributed equally to this work.

**INTRODUCTION**

Neurons in the visual system respond to specific features of visual stimuli in their receptive fields (Kuffler, 1953; Hubel and Wiesel, 1962). The receptive field (RF) properties are determined by precise and selective connections in the brain and established by elaborative processes during development. For example, the RFs of neurons in many visual structures are organized into retinotopic maps, where neighboring neurons respond to neighboring locations in the visual space (Cang et al., 2005a,b; Wang and Burkhalter, 2007; Andermann et al., 2011; Marshel et al., 2011). The topographically precise projections from the retina to their targets, such as the superior colliculus (SC), are established by graded expression of molecular guidance cues such as EphAs and ephrin-As, and refined by activity-dependent processes driven by patterned spontaneous retinal activity (Cang and Feldheim, 2013). Disruption of either process could result in profound deficits in retinotopic mapping and subcortical visuomotor behaviors (Pfeiffenberger et al., 2006; Haustead et al., 2008; Wang et al., 2009). For the RFs of individual SC neurons, their structure and selectivity are disrupted when the patterns of retinal activity are altered during development (the nAChR-β2−/<sup>−</sup> mice, Chandrasekaran et al., 2005; Wang et al., 2009). In contrast, the consequences of deleting ephrin-As or EphAs on collicular RF properties have not been studied, and as a result, the roles of molecular guidance cues and activity-dependent processes in the development of collicular RFs have not been directly compared.

Visual neurons in the superior colliculus (SC) respond to both bright (On) and dark (Off) stimuli in their receptive fields. This receptive field property is due to proper convergence of On- and Off-centered retinal ganglion cells to their target cells in the SC. In this study, we have compared the receptive field structure of individual SC neurons in two lines of mutant mice that are deficient in retinotopic mapping: the ephrin-A knockouts that lack important retinocollicular axonal guidance cues and the nAChR-β2 knockouts that have altered activity-dependent refinement of retinocollicular projections. We find that even though the receptive fields are much larger in the ephrin-A knockouts, their On–Off overlap remains unchanged. These neurons also display normal level of selectivity for stimulus direction and orientation. In contrast, the On–Off overlap is disrupted in the β2 knockouts. Together with the previous finding of disrupted direction and orientation selectivity in the β2 knockout mice, our results indicate that molecular guidance cues and activity-dependent processes play different roles in the development of receptive field properties in the SC.

**Keywords: mouse visual system, superior colliculus, ephrins, retinal wave, on–off, direction selectivity, orientation selectivity**

> In addition to spatial location, visual RFs are also characterized by their On and Off properties. The parallel On and Off pathways first diverge in the retina, with On- and Off-centered ganglion cells (RGCs) responding, respectively, to light increment and decrement, and a small population of On–Off RGCs responding to both (Kuffler, 1953). The On and Off pathways converge in the SC such that the On/Off subregions in the RFs of individual collicular neurons overlap almost completely (McIlwain and Buser, 1968; Cynader and Berman, 1972; Wang et al., 2010b). This On–Off convergence in the SC is believed to be important for detecting object salience, irrespective of its contrast (Knudsen, 2011).

> In this study, we have compared the functions of guidance cues and activity-dependent processes in establishing the On–Off convergence in the SC. Surprisingly, we find that even though the RFs of SC neurons are much larger in the ephrin-A knockout mice, their On–Off overlap remains unchanged. These neurons also display normal level of direction and orientation selectivity. In contrast, the On-Off overlap is disrupted in the nAChR-β2−/<sup>−</sup> mice. Together with the previous finding of disrupted direction and orientation selectivity in the β2−/<sup>−</sup> mice, our results indicate that these two developmental processes play different roles in the development of RF properties in the SC.

### **MATERIALS AND METHODS ANIMALS**

Ephrin-A2/A5 double andA2/A3/A5 triple mutant mice were originally generated by the Feldheim Lab at University of California at Santa Cruz by crossing of each single line (Pfeiffenberger et al., 2006), and maintained in the animal facility at Northwestern University. Their genotypes were determined using the published protocols (Frisén et al., 1998; Feldheim et al., 2000; Cutforth et al., 2003). We previously studied the collicular RF properties in mice that lack the β2 subunit of nicotinic acetylcholine receptor (Wang et al., 2009) and in this study reanalyzed those data in the same way as for ephrin-A KOs (details below). Similarly, data from adult wild type C57BL/6 mice (Wang et al., 2010b) were reanalyzed for comparison. Both genders were used and all experiments were performed in accordance with protocols approved by Northwestern University Institutional Animal Care and Use Committee.

#### *IN VIVO* **ELECTROPHYSIOLOGY**

Following our published procedures (Wang et al., 2010b), adult mice were anesthetized with urethane (1.2–1.3 g/kg in 10% saline solution, i.p.) and supplemented with chlorprothixene (10 mg/kg in 4 mg/ml water solution, i.m.). Atropine (0.3 mg/kg) and dexamethasone (2.0 mg/kg) were injected subcutaneously. Additional urethane (0.2–0.3 g/kg) was administered as needed. A tracheotomy was performed in some experiments and electrocardiograph leads were attached across the skin to monitor the heart rate continuously throughout the experiment. The animal's temperature was monitored with a rectal thermal probe and maintained at 37◦C through a feedback heater control module (FHC). Silicone oil was applied on the eyes to prevent from drying. A craniotomy (4–8 mm2) was performed on the left hemisphere to expose the brain for recording with 5–10 M tungsten microelectrodes (FHC). The electrode was inserted vertically into the overlying cortex at a distance of 0.7–1.5 mm lateral of the midline suture and 0.2–0.8 mm anterior to the lambda suture. The identification of the SC surface followed our published procedure (Wang et al., 2010b). Only neurons within 300 μm below the SC surface were included in our analysis, corresponding to the superficial retinal recipient layers of the SC. Electrical signals were acquired using a System 3 workstation (Tucker Davis Technologies). Only one unit at a time was recorded in most cases. OpenSorter was used offline to remove occasional large electrical artifacts, or to sort two very different waveforms in a few cases. The animals were killed at the end of recordings by an overdose of euthanasia solution (150 mg/kg pentobarbital, in Euthasol, Virbac).

#### **VISUAL STIMULI AND DATA ANALYSIS**

Visual stimuli were generated with customized Matlab programs (Niell and Stryker, 2008) using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). The stimuli were displayed on a flat panel CRT video monitor (40 cm × 30 cm, 60 Hz refresh rate, <sup>∼</sup>35 cd/m<sup>2</sup> mean luminance) placed 25 cm from the animal, and delivered to the eye contralateral to the recorded hemisphere while the ipsilateral eye was occluded. Stimulus sets included a blank condition in which the screen was at mean luminance. Responses to all such blank presentations were averaged to obtain the spontaneous firing rate.

To determine RF structures of SC neurons, 5◦ light squares were flashed at different locations on either a 13 × 13 or 11 × 11 grid with 5◦ spacing. The flashes stayed on for 500 ms on a gray background and off for 500 ms between stimuli, and were repeated for 4–6 times for each grid location in a pseudorandom sequence. Spontaneous firing was analyzed in the blank stimulus condition and the mean + 2 × SD of the spontaneous rate was calculated as threshold. The responses to flashing spots at each location were analyzed by counting spikes within a time window of 200 ms (starting from 50 ms after flash onset or offset) in each trial. The cell was considered responsive to On or Off at a given grid location, if there were more spikes than the threshold in at least 40% of the trials (Sarnaik et al., 2013). An On–Off overlap ratio was then calculated as the number of grids that showed both On and Off responses divided by the total number of responsive locations regardless of On or Off polarity. Additionally, correlation coefficients were calculated between On and Off responses over the entire grid from raw spike rates without thresholding (Wang et al., 2010b).

Full field and full contrast of drifting sinusoidal gratings were presented to probe selectivity for stimulus direction/orientation (0–360◦, 12 steps at 30◦ spacing) and spatial frequency (0.01– 0.32 cpd at six logarithmic steps; Wang et al., 2010a; Zhao et al., 2013a). Temporal frequency was fixed at 2 cycle/s. Each stimulus of given direction and spatial frequency (or a blank condition) was presented for 1.5 s in a pseudorandom order for 4–6 trials. The interval between stimuli was 0.5 s. The response to a particular stimulus condition, *R*, was obtained by averaging the number of spikes over the 1.5 s stimulus duration, across all trials and subtracting the spontaneous rate. The preferred direction was determined as the one that gave maximum response (*Rpref* ), averaging across all spatial frequencies. The preferred spatial frequency was the one that gave peak response at this direction. Responses across all directions at the preferred spatial frequency, *R*(θ), were used for further analysis. The depth of modulation was described using two parameters: (1) Direction Selectivity Index = *Rpref /(Rpref* + *Ropp)*, where *Rpref* was the response at θ*pref* and *Ropp* at θ*pref* <sup>+</sup><sup>π</sup> and (2) Orientation Selectivity Index = *R pref* /(*R pref* + *Rorth*), where *R pref* was the mean response of *Rpref* and *Ropp*, *Rorth* was the mean response to the two directions orthogonal to θ*pref* . The tuning curves were fitted with a sum of two Gaussians centered at θ*pref* and θ*pref* <sup>+</sup><sup>π</sup> using the *nlinfit* function in Matlab (Mathworks, Natick, MA, USA), and the tuning width was calculated as the half-width at half maximum of the fitted curve above the baseline. For mean tuning curves, each curve was normalized to the peak response and then aligned to the direction that elicited the maximum response.

#### **STATISTICAL ANALYSIS**

All values were presented as mean ± SEM. Non-parametric tests that do not require any assumptions about the distribution of the data were used in all cases. Comparison of distributions was done using the two-sample Kolmogorov–Smirnov test (K–S test) and comparisons between means or medians of datasets were done using two-sample Mann–Whitney test. All statistical tests were evaluated at α = 5% probability of false positives. Two-sided statistical tests were performed. Statistical analyses and graphing were done in MATLAB and Prism (GraphPad Software Inc.).

#### **RESULTS**

**DISRUPTED RECEPTIVE FIELDS IN SC NEURONS OF EPHRIN-A KO MICE** Ephrin-A2, A3, and A5 are the three main ephrin-As expressed in the developing visual system in mice. In this study, we used single unit recording to characterize the RFs of SC neurons in mice lacking all three of the ephrins (triple knockouts, TKO), or two of them (A2 and A5, double knockouts, DKO). To determine RF

shows peri-stimulus timing histograms (PSTH) in response to spots flashed at different locations on a 13 × 13 grid in visual space. Scale bars are 50 spikes/s (y-axis, for firing rate in each 50 ms bin) and 1 s (x-axis). Both On and Off responses were evoked within the receptive field, as indicated by the two peaks in individual PSTHs. The receptive field structure determined by the PSTHs is shown in **B** in a color scale (right, in spikes/s, for mean firing rate in the 1 s stimulus duration). **(C–H)** Example receptive fields of SC neurons in ephrin-A double (DKO) and triple KO (TKO) mice.

structure, we flashed small spots (5◦×5◦) at different positions in the visual field (Wang et al., 2010b). Compared to wild type (WT) SC neurons, which only responded to flashes within a small region in the visual space (**Figures 1A,B**), the RFs of many SC neurons in the ephrin-A KOs were much larger. By visual inspection, some neurons in the mutant mice had multiple patches within their RFs (e.g., **Figures 1C,D**; *n* = 36 out of 85 cells in DKO and 11/33 cells in TKO), while others had single patches that still appeared larger than in WT (**Figures 1E,F**; *n* = 16/85 in DKO and 8/33 in TKO). A small number of cells even had very diffuse RFs that expanded across almost the entire stimulus monitor (**Figures 1G,H**; *n* = 8/85 in DKO and 3/33 in TKO).

Because the RFs of many neurons in the ephrin-A KOs had irregular shapes, they could not be fitted into 2-d Gaussians as we previous did in WTs to quantify RF size (Wang et al., 2010b). We thus simply counted the number of grid positions where visual responses were evoked by the flashing spots (see Materials and Methods for details). The RFs of SC neurons in ephrin-A KOs (DKOs: mean <sup>=</sup> 894.7 <sup>±</sup> 69.8◦degree2, median <sup>=</sup> 750.0 degree2, *<sup>n</sup>* <sup>=</sup> 85; TKOs: mean <sup>=</sup> 819.7 <sup>±</sup> 84.5 degree2, median <sup>=</sup> 700.0 degree2, *n* = 33) indeed occupied much larger area compared to those in the WT (mean <sup>=</sup> 516.3 <sup>±</sup> 35.5 degree2, median <sup>=</sup> 400.0 degree2, *n* = 101; *p* < 0.0001 Mann–Whitney test; **Figure 2A**). The RFs were similarly enlarged in the DKOs and TKOs, consistent with the notion that ephrin-A2 and A5 are the most important cues in retinocollicular mapping (Feldheim et al., 2000; Pfeiffenberger et al., 2006). We also examined whether the disruption was restricted to the azimuth axis of the visual space since ephrin-As mediate the mapping of retinocollicular axons along the naso-temporal axis (Cang and Feldheim, 2013). We calculated the azimuth and elevation extent covered by individual RFs and found that they were enlarged along both axes in the ephrin-A KOs, though the disruption appeared more severe along the azimuth axis (**Figures 2D,G**. Azimuth: WT, mean = 32.8 ± 1.4◦, median = 30.0◦, *n* = 101; DKO, mean = 46.3 ± 1.8◦, median = 50.0◦, *n* = 85, *p* < 0.0001; TKO, mean = 49.6 ± 2.8◦, median = 60.0◦, *n* = 33, *p* < 0.0001; Elevation: WT, mean = 30.0 ± 1.4◦, median = 25.0◦, *n* = 101; DKO, mean = 37.7 ± 2.0◦, median = 35.0◦, *n* = 85, *p* < 0.01; TKO, mean = 43.9 ± 2.8◦, median = 45.0◦, *n* = 33, *p* < 0.0001; Mann–Whitney test). These results thus demonstrate that axonal guidance cues are needed, either directly or indirectly, for the development of spatially compact RFs of SC neurons.

# **NORMAL ON–OFF OVERLAP IN EPHRIN-A KOs DESPITE DISRUPTED RECEPTIVE FIELDS**

Most visual neurons in WT SC respond to both bright (On) and dark (Off) stimuli and the ON and Off regions overlap spatially within their RFs (Wang et al., 2010b). Such On–Off overlap is a conserved feature in the SC of all the species studied so far (McIlwain and Buser, 1968; Cynader and Berman, 1972; Rhoades and Chalupa, 1977; Prevost et al., 2007), and is the basis of SC's ability to detect salient visual events irrespective of contrast. We thus investigated whether such a fine scale feature of RF organization is disrupted in the ephrin-A KOs.

We first divided On and Off responses and analyzed their subregion size separately. In the ephrin-A KOs, both ON (DKO: mean <sup>=</sup> 650.0 <sup>±</sup> 52.7 degree2, median <sup>=</sup> 500.0 degree2, *<sup>n</sup>* <sup>=</sup> 85, *<sup>p</sup>* <sup>&</sup>lt; 0.0001; TKO: mean <sup>=</sup> 594.7 <sup>±</sup> 75.6 degree2, median <sup>=</sup> 500.0 degree2, *<sup>n</sup>* <sup>=</sup> 33, *<sup>p</sup>* <sup>&</sup>lt; 0.01; Mann–Whitney test) and Off subregions (DKO: mean <sup>=</sup> 678.8 <sup>±</sup> 62.2 degree2, median <sup>=</sup> 525.0 degree2, *<sup>n</sup>* <sup>=</sup> 85, *<sup>p</sup>* <sup>&</sup>lt; 0.0001; TKO: mean <sup>=</sup> 564.4 <sup>±</sup> 59.7 degree2, median <sup>=</sup> 500.0 degree2, *n* = 33, *p* < 0.0001) were bigger than in WTs (ON, mean <sup>=</sup> 400.5 <sup>±</sup> 32.2 degree2, median <sup>=</sup> 300.0 degree2, *<sup>n</sup>* <sup>=</sup> 101; Off, mean <sup>=</sup> 336.9 <sup>±</sup> 26.6 degree2, median <sup>=</sup> 300.0 degree2, *n* = 101). The subfield expansion in ephrin-A KOs was along both elevation and azimuth axes, consistent with their enlarged RF in general (**Figure 2**).

subregion size between groups. **(D–F)** Comparison of receptive field

We next quantified On–Off overlap using an overlap ratio for each neuron, calculated as the ratio of the number of grids that showed both On and Off responses over the total number of responsive locations regardless of On or Off polarity. The overlap ratio ranges from 0 to 1, with a value of 1 indicating complete On–Off overlap, and 0 no overlap (or the cell \*\*p < 0.01, and \*\*\*p < 0.001. only has one subfield). Surprisingly, despite the disruption of RF size, the On–Off overlap ratios in the ephrin-A KOs (e.g.,

**Figure 3B**; DKO: mean = 0.50 ± 0.03, median = 0.53, *n* = 85; TKO: mean = 0.42 ± 0.05, median = 0.46, *n* = 33) were similar to that in WT (e.g., **Figure 3A**; mean = 0.46 ± 0.03, median = 0.50, *n* = 101; *p* = 0.47 and *p* = 0.94, respectively, K–S test; **Figure 3D**). We also quantified the On–Off overlap by calculating the correlation coefficient, which takes into account response magnitude at each stimulus location. Again, the On–Off correlations did not show a significant difference between ephrin-A KOs (**Figure 3E**; DKO: mean = 0.68 ± 0.03, median = 0.74, *n* = 85; TKO: mean = 0.61 ± 0.05, median = 0.71, *n* = 33) and WT (mean = 0.71 ± 0.03, median = 0.81, *n* = 100; *p* = 0.12 and 0.06, respectively, K–S test). In other words, the On–Off overlap in collicular RFs is largely maintained in the absence of ephrin-A guidance cues.

#### **DISRUPTED ON–OFF OVERLAP IN nAChR-β2 KOs**

The above results prompted us to ask what factors, if not ephrin-As, might be required for the development of On–Off convergence

**FIGURE 3 | On–Off overlap is disrupted in nAChR-β2 KOs, but not in ephrin-A KOs. (A)** On (red) and Off (green) responses of a WT SC neuron, showing substantial On–Off overlap. Color scales represent evoked responses in spikes/s during the 500 ms duration of stimulus presentation. The calculated values of overlap ratio (O.R.) and correlation coefficient (C.C.) are listed at the upper right corner of the "Off" plot. **(B)** On and Off responses of an example neuron in ephrin-A DKO mice. **(C)** Responses of an example neuron in nAChR-β2 subunit knockout. **(D)** Comparison of On–Off overlap ratio between genotypes, with only β2 KOs showing a significant disruption comparing to the WT. **(E)** Comparison of On–Off correlation coefficient between genotypes. Panel **D** and **E** are box plots with ends of each plot representing 5th and 95th percentiles. \*\*p < 0.01 and \*\*\*p < 0.001.

in the SC. Previous studies showed that spontaneous retinal waves drive the refinement of retinocollicular map (McLaughlin et al., 2003; Chandrasekaran et al., 2005; Xu et al., 2011). In mice that lack the β2 subunit nicotinic ACh Receptor (β2 KOs), the patterns of retinal waves are disrupted (Bansal et al., 2000; McLaughlin et al., 2003; Sun et al., 2008; Stafford et al., 2009) and the RF of SC neurons were enlarged (Chandrasekaran et al., 2005; Wang et al., 2009). We thus analyzed the On–Off overlap in these mice. The On and Off subregions in β2 KOs were similarly large as in the ephrin-A KOs (**Figure 2**). But importantly, unlike in the ephrin-A KOs, the On–Off overlap in β2 KOs, both by overlap ratio (mean = 0.36 ± 0.03, median = 0.41, *n* = 59; *p*=0.01, K–S test) and correlation coefficient (mean=0.53±0.03, median = 0.61, *n* = 59, *p* < 0.0001), was significantly reduced (**Figures 3C–E**).

Together, these results indicate that ephrin-As are not required for establishing the overlapped On–Off subfields of mouse SC neurons, but instead the activity-dependent refinement process is necessary for its development.

#### **NORMAL RESPONSES TO DRIFTING GRATINGS IN EPHRIN-A KOs**

In addition to static contrast changes, SC neurons are also sensitive to moving stimuli (Wang et al., 2010b). We thus examined the tuning properties of SC neurons in the ephrin-A KO mice in response to drifting gratings. Recordingsfrom the DKOs and TKOs were combined together since no difference was seen between them. Remarkably, many SC neurons in the KOs were selective for stimulus direction or orientation, just like in WT. Across the population, the preferred directions did not show any bias towards certain angles (**Figure 4A**), similar to those in WT SC (Wang et al., 2010b). This result is clearly different from that of the β2 KOs, in which fewer SC neurons are tuned to horizontal motion (Wang et al., 2009). The degree of direction/orientation selectivity was also normal in the ephrin-A KO mice, both by averaged tuning curves (**Figure 4B**) and the distribution of direction and orientation selectivity index (**Figures 4C,D**). Consistently, the orientation tuning width in the ephrin-A KOs (mean = 39.8 ± 1.9◦, median = 42.8◦, *n* = 78) was also similar (**Figure 4E**, *p* = 0.41, K–S test) to that in the WT mice (mean = 40.8 ± 1.2◦, median = 42.5◦, *n* = 115). Furthermore, no change of response linearity as determined by F1/F0 ratio (Wang et al., 2010b) was found between the ephrin-A KOs (mean = 0.62 ± 0.04, median = 0.51, *n* = 137) and WT (mean = 0.64 ± 0.05, median = 0.41, *n* = 132; *p* = 0.36, K–S test). Finally, although the distribution of preferred spatial frequency was statistically different between the ephrin-A KOs and WTs (*p* < 0.001, χ<sup>2</sup> test), most neurons preferred 0.04, 0.08 and 0.16 cpd in both genotypes (**Figure 4F**).

These results thus indicate that the removal of ephrin-As has little effect on the orientation and direction selectivities of individual SC neurons, despite their altered RF structures. Together with our previous findings that the SC neurons in the β2 KOs display axis-specific disruption of direction and orientation selectivity (Wang et al., 2009), our results demonstrate that axonal guidance cues and activity-dependent processes play different roles in the development of visual response properties in SC neurons.

# **DISCUSSION**

In this study, we have examined the RF structure of SC neurons in two lines of mutant mice that are deficient in retinocollicular mapping, the ephrin-A KOs and the nAChR-β2 KOs that have altered retinal waves. Our results reveal that even though the collicular RFs are similarly enlarged in the two mutants, the On/Off overlap

within the RF is maintained in the ephrin-A KOs but disrupted in the β2 KOs. During development, retinal axons are guided to their target cells in the SC by graded guidance cues such as ephrin-As and the remaining aberrant projections are then eliminated through activity-dependent processes driven by spontaneous retinal waves (Eglen et al., 2003; Grimbert and Cang, 2012). As a result, only ganglion cells from a small patch of the retina, both On and Off, are left innervating individual collicular neurons, giving rise to spatially compact RFs with overlapping On and Off subregions. In the absence of ephrin-As, the nasal-temporal retinotopic information is lost and RGCs from distant regions of the retina can terminate onto the same SC neurons. Our results indicate that nearby On and Off neurons still co-terminate in the ephrin-A KOs, presumably driven by largely normal retinal waves in these mice, which display WT level of correlation within small distances (Pfeiffenberger et al., 2005). On the other hand, in the β2 KOs, this process is disrupted, leading to some nearby On and Off RGCs no longer innervating the same SC neurons, due to either compromised elimination or aberrant expansion of axonal terminals in these animals (Dhande et al., 2011).

The exact patterns of retinal waves in the β2 KOs have been controversial. Whereas earlier studies showed that there were no correlated activities in the RGCs of these mice (Bansal et al., 2000; McLaughlin et al., 2003), more recent studies revealed that they did display retinal waves (Sun et al., 2008; Stafford et al., 2009), which appeared to correlate RGCs over broader distances (about

twice as far as in WT retinas) and with a weaker intensity (about half the WT peak amplitude; Stafford et al., 2009). Importantly, whether there are larger waves or no waves, the information for differentiating RGCs that are immediately next to each other and those that are further apart is compromised, which could then lead to disrupted retinotopic mapping and On/Off convergence.

Our explanation of the On/Off phenotypes in ephrin-A KOs and WT mice assumes that On and Off RGCs are similarly correlated in retinal waves during the time of retinocollicular development. At postnatal day 12, when retinocollicular mapping has reached the mature level (Dhande et al., 2011) and retinal waves are already mediated by glutamatergic transmission, On and Off RGCs fire with a temporal offset during the waves (Kerschensteiner and Wong, 2008). Such an asynchronous pattern was not seen earlier during development when the waves are cholinergic (Kerschensteiner and Wong, 2008). It is thus highly likely that neighboring On and Off RGCs fire synchronously when retinocollicular connections are established, which would lead to On/Off convergence and consequently On–Off overlap in the RF of SC neurons.

SC neurons' selectivity for stimulus orientation and direction is also different between ephrin-A KOs and β2 KOs. The mechanism of SC selectivity is still unclear. On the one hand, it could be inherited from the retina, given a substantial population of RGCs are direction/orientation selective in mice (Elstrott et al., 2008; Huberman et al., 2009; Zhao et al., 2013b). The direction selective RGCs (DSGCs), including On, Off, and On–Off subtypes, are tuned to motions of unique directions, such as the four cardinal directions for On–Off DSGCs (Elstrott et al., 2008; Kim et al., 2008; Huberman et al., 2009). These RGCs could converge onto SC neurons and give rise to a preference for certain directions or axes of motion. The results that the selectivity is largely normal in ephrin-A KOs but disrupted along the azimuthal axis in β2 KOs thus suggest that the activity-dependent refinement could be important for converging different subtypes of DSGCs, just as in converging On and Off-centered RGCs in creating overlapped RFs. On the other hand, SC direction/orientation selectivity could result from circuits within the colliculus, such as inhibition from local GABAergic interneurons. These interneurons are known to shape many aspects of SC responses (Binns and Salt, 1997), although their roles in SC selectivity have not been investigated. In such a scenario, our results would indicate that molecular guidance cues such as ephrin-As are not critical, while the activity-dependent processes are more important, in establishing these intracollicular connections.

# **AUTHOR CONTRIBUTIONS**

Mingna Liu and Lupeng Wang performed the experiments. Mingna Liu, Lupeng Wang and Jianhua Cang designed the study, analyzed data and wrote the article. The authors declare no competing financial interests.

# **ACKNOWLEDGMENTS**

We thank Dr. David Feldheim for providing ephrin-A KO mice and Dr. Rashmi Sarnaik for help with data analysis. This work was supported by US National Institutes of Health (NIH) grants (EY018621 and EY020950) and a Klingenstein Fellowship Award in Neurosciences to Jianhua Cang

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 December 2013; accepted: 03 March 2014; published online: 26 March 2014.*

*Citation: Liu M, Wang L and Cang J (2014) Different roles of axon guidance cues and patterned spontaneous activity in establishing receptive fields in the mouse superior colliculus. Front. Neural Circuits 8:23. doi: 10.3389/fncir.2014.00023*

*This article was submitted to the journal Frontiers in Neural Circuits. Copyright © 2014 Liu, Wang and Cang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution*

*or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Rats and humans differ in processing collinear visual features

# *Philip M. Meier <sup>1</sup> and Pamela Reinagel <sup>2</sup> \**

<sup>1</sup> Department of Neurosciences, Division of Medicine, University of California at San Diego, La Jolla, CA, USA <sup>2</sup> Section of Neurobiology, Division of Biology, University of California at San Diego, La Jolla, CA, USA

#### *Edited by:*

Davide Zoccolan, International School for Advanced Studies, Italy

#### *Reviewed by:*

Shreesh P. Mysore, Stanford University, USA Damian James Wallace, Max Planck Institute for Biological Cybernetics, Germany

#### *\*Correspondence:*

Pamela Reinagel, Section of Neurobiology, Division of Biology, University of California at San Diego, 9500 Gilman Drive #0357, La Jolla, CA 92093, USA e-mail: preinagel@ucsd.edu

Behavioral studies in humans and rats demonstrate that visual detection of a target stimulus is sensitive to surrounding spatial patterns. In both species, the detection of an oriented visual target is affected when the surrounding region contains flanking stimuli that are collinear to the target. In many studies, collinear flankers have been shown to improve performance in humans, both absolutely (compared to performance with no flankers) and relative to non-collinear flankers. More recently, collinear flankers have been shown to impair performance in rats both absolutely and relative to non-collinear flankers. However, these observations spanned different experimental paradigms. Past studies in humans have shown that the magnitude and even sign of flanker effects can depend critically on the details of stimulus and task design. Therefore either task differences or species could explain the opposite findings. Here we provide a direct comparison of behavioral data between species and show that these differences persist – collinear flankers improve performance in humans, and impair performance in rats – in spite of controls that match stimuli, experimental paradigm, and learning procedure. There is evidence that the contrasts of the target and the flankers could affect whether surround processing is suppressive or facilitatory. In a second experiment, we explored a range of contrast conditions in the rat, to determine if contrast could explain the lack of collinear facilitation. Using different pairs of target and flanker contrast, the rat's collinear impairment was confirmed to be robust across a range of contrast conditions. We conclude that processing of collinear features is indeed different between rats and humans.We speculate that the observed difference between rat and human is caused by the combined impact of differences in the statistics in natural retinal images, the representational capacity of neurons in visual cortex, and attention.

**Keywords: rodent, collinearity, flanker task, visual perception, contrast, attention, psychophysics, cortical computation**

# **INTRODUCTION**

Specialized interaction of nearby collinear features is thought to play an important role in contour integration and figure/ground segregation of scenes. In natural images, collinear features enjoy prominent statistical correlations across spatial regions and feature types. It has been suggested (Barlow, 1961; Olshausen and Field, 1996; Simoncelli and Olshausen, 2001; Coen-Cagli et al., 2012) that neurons learn to represent the world by exploiting the joint statistics between their inputs. Thus cortical computation may latch on to events induced by collinear stimuli, and appropriately enhance or suppress them. But what is appropriate? Are collinear features redundant, and should be suppressed in order to optimize the channel capacity of the neural code? Or are collinear features highly informative about scenes, and thus should be emphasized as salient features for subsequent processing? We argue that a good way to begin understanding the cortical code is to examine the neural and behavioral responses to stimuli with features that are correlated in the natural world, but are made independent in the course of the study. In this paper, we directly compare the behavioral responses of

humans and rats detecting a visual target surrounded by collinear flankers.

Rodents are increasingly used as a model system for the study of cortex, including the visual system (Bussey et al., 2001; Niell and Stryker, 2008; Andermann et al., 2011; Bonin et al., 2011; Huberman and Niell, 2011; Meier et al., 2011; Reid, 2012; Alemi-Neissi et al., 2013; Haider et al., 2013). Many aspects of visual processing are conserved in the thalamo-cortical visual pathway of mammals, including center-surround antagonism, light adaptation, contrast adaptation, orientation tuning, spatial bandpass filtering, and phase selectivity. Yet, there are also differences between primates and rodents in the organization early visual processing (van den Bergh et al., 2010). These include differences in connectivity across layers of V1 (Zarrinpar and Callaway, 2006), as well as differences in organizational principles like orientation tuning maps (Ohki et al., 2005). When we learn about the function of rodent visual cortex, will it generalize to human vision? Mammals likely share many common computational goals in early vision, and achieve similar algorithmic solutions with the same biological components. In other respects, surely divergence or specialization

"fncir-07-00197" — 2013/12/13 — 11:49 — page 1 — #1

will result in differences between species. Using multiple species to elucidate mechanisms of mammalian vision, it will be important to determine both the similarities and differences at each level of description.

A recent study in rodent behavior demonstrated perceptually guided behaviors in rats that are specific to collinear stimuli (Meier et al., 2011). All patterns of flanking stimuli ("flankers") impair rats' ability to detect a target stimulus. Collinear flankers impair their detection even more. This finding stands in contrast to previous reports of human psychophysics in which collinear flankers improve a human subject's capacity to detect a central visual target (Polat and Sagi, 2007; Chen and Tyler, 2008). It was possible, however, that these differences between previous studies could be attributed to differences in stimuli, experimental paradigm, or learning procedure. For example, many of the experiments in the human literature do not vary the orientation of the target on each trial, such that feature-based attention could contribute to the observed effects. Before this study, there were no experiments on human visual detection with collinear flankers that controlled for the subject's expectation of the orientation of the target feature.

Here we present a new study of both human perception and rodent behavior in which the parameters and experimental conditions were matched. For rats, we extend our previous finding that collinear flankers impair detection to a broader range of contrast conditions. For humans, we replicate the previous finding that collinear flankers improve human's ability to detect visual targets, extending this result to a new task variant that includes controls which were lacking in past human studies. Together these findings constitute the first direct comparison that demonstrates that the perceptual mechanisms involved in processing collinear features differ between the species. Importantly, it is not simply that rats lack pattern-specific processing (sensitivity to higher order configurations or feature conjunctions). Both humans and rats demonstrate a perceptual sensitivity particular to collinear stimuli, but between the species, the sign of the effect is reversed.

# **MATERIALS AND METHODS**

In the first experiment, the spatial patterns of stimuli were varied, while the contrast was held constant (**Figures 1** and **2**). In the second experiment, spatial patterns were held constant and the contrast of stimuli was varied (**Figures 3** and **4**).

#### **EXPERIMENT 1**

Both humans and rats performed the same detection task. Subjects performed one of two symmetric actions to indicate either that the target grating was present in the center of the screen, or that it was absent. If the two flanking stimuli were present, they were located on opposite sides of the target, with a diagonal offset (**Figure 1**).

Compared to most experiments that explore the influence of collinear flankers in human perception, this experiment differs in three ways. Instead of receiving instructions, humans learned from trial and error that it was a detection task. Instead of viewing sinewave gratings, humans viewed square-wave gratings. And instead of performing the detection task on all of the collinear trials in a row, humans viewed collinear trials that were randomly interleaved with other non-collinear patterns. Within the controlled comparisons of this study, all three of these traits were consistent for both species.

First, both humans and rats learned to perform correct trials by trial and error. Rats licked one of three ports; humans pressed one of three buttons. The central port/button initiated a new trial. The ports/buttons on the left and right side indicated either "target present" or "target absent"; these meanings were randomly assigned for each subject. Rats were motivated to collect water rewards, and humans were instructed to seek the incidences of positive tones that were audible after completing a trial correctly. Second, both humans and rats viewed target stimuli that were oriented gratings with a square wave pattern. Both viewed stimuli frontally, such that binocular vision could be used. Both viewed stimuli that were 32 pixels per cycle on the screen, but rats viewed from a distance 10 cm, resulting in a target 0.15 cpd in a Gaussian envelope with a STD of 10 degrees, and humans viewed from a distance of 2.15 m, resulting in a target 3.3 cpd in a Gaussian envelope with a STD of 0.45 degrees. Note that in both species, the Gaussian envelope of the grating, in degrees, maintained a fixed proportion to the spatial frequency. Specifically, the only stimulus transformation across species was the depth from the monitor. This global scaling preserves the number of cycles present with the Gaussian mask. These distances were chosen in order to render the stimulus with a spatial frequency that is comparably sensitive for each species' behaviorally measured contrast sensitivity (Keller et al., 2000). Third, both humans and rats viewed stimuli in which the spatial context surrounding the target varied randomly on each trial. As a consequence, a subject could not rely on a flanking stimulus to appear at a particular position or to have a particular orientation. Nor would the subject know that the next stimulus was going to be a particular orientation. This experimental paradigm should prevent a subject from ignoring a particular orientation, which might have been a good strategy if there had been a block of trials in which the target orientation was constant and differed from the flanker orientation.

#### *Training*

Rats were trained to perform the task by progressing through a sequence of five shaping steps, as previously described (Meier et al., 2011). To summarize the training steps, rats first learned to detect a large grating, which was then decreased in contrast, increased in spatial frequency, reduced in spatial extent, and was finally embedded in a spatial context with flankers of increasing contrast. This training process took rats multiple weeks to complete, with a 2-h session each day. Most of the training was spent on the last two stages. Humans began immediately on the final task. They learned to perform it over hundreds of trials, all in a single session. For humans, testing and training occurred on the same day, in a single 2-h session. Qualitatively, both rats and humans learned the task through trial and error. Quantitatively, rats observed many more trials before attaining adequate performance.

#### *Display*

Stimuli were presented on a CRT monitor (100 Hz, 1024×768 pixels). When humans performed the exact same task as the rats, they were close to 100% correct (preliminary study, data not shown). To increase the difficulty of the task for humans, the contrast of

"fncir-07-00197" — 2013/12/13 — 11:49 — page 2 — #2

the target was reduced, *T*<sup>c</sup> = [0.0625]. The contrast of the flankers was kept the same as was used for the rats, *F*<sup>c</sup> = [1.0]. Additionally, the stimulus duration was reduced to 100 ms. During training, as well as the first experiment, rats were allowed to view the stimulus indefinitely.

Stimuli were presented on a monitor 10 cm from the rat's eyes. It is possible that rats' acuity or sensitivity is higher at other viewing distances. Optimal viewing distances for Long Evans (hooded) rats have been reported to be between 20 and 30 cm (Wiesenfeld and Branchek, 1976), and many behavioral studies present stimuli at depths within this range (Lashley, 1930; Birch and Jacobs, 1979; Dean, 1981; Alemi-Neissi et al., 2013). Yet other studies report that, compared to 30 cm, detection sensitivity did not consistently decrease at proximal depths like 12 cm (Dean, 1981) or 15 cm (Birch and Jacobs, 1979). Successful visual experiments have been performed on touch screens with display surfaces as close as 7 cm (Keller et al., 2000) or 2 cm (Bussey et al., 2008). We chose 10 cm as a viewing distance for experimental convenience; the compact arrangement of training chamber and monitor allowed a rack of nine simultaneously operating rigs to occupy a small footprint of floorspace. After selecting a distance, we chose a contrast and spatial frequency that yielded detection above perceptual threshold, favoring high contrast and a moderately high spatial frequency.

#### **EXPERIMENT 2**

A second experiment was performed for two of the rats. To adequately sample many combinations of target contrast and flanker contrast, all spatial parameters were held constant. Thus, if flankers were present, they were collinear (**Figure 3**). In this experiment, there were twenty possible stimulus conditions: four target contrasts (*T*<sup>c</sup> =[0.25, 0.5, 0.75, 1]), and five flanker contrasts ([*T*<sup>f</sup> =0, 0.25, 0.5, 0.75, 1]). On each block, the flanker contrast was constant, and the target, if present, was also constant. On half of the trials, the target was not present (*T*<sup>c</sup> = 0). Conditions were randomly assigned to a block of 100 trails. One subject performed an average of 485 trials per day, resulting in 21 blocks per stimulus

condition; the other subject performed an average of 585 trials per day resulting in 27 blocks per stimulus condition. The stimulus was present for 200 ms on each trial. In all other respects the methods were the same as for Experiment 1. Data were collected in 96 sessions over 101 days.

#### **DATA COLLECTION**

Rat behavioral data was collected from seven male Long Evans rats (Harlan Laboratories) and four university student volunteers. Experiments were conducted under the supervision and with the approval of either the Human Research Protections Program or the Institutional Animal Care and Use Committee at the University of California San Diego.

The rodent data is from the same trained rats and the same experimental protocol as previously reported (Experiment 1: Meier et al., 2011; Experiment 2: Meier and Reinagel, 2011) but the data have been analyzed differently. Specifically, we report performance with flankers in relation to each subject's detection performance of the target alone. Additionally, we have grouped the performance estimate of the three types of non-collinear stimuli, because they were not significantly different from each other in our analysis. The human data were collected to approximate the same task as the one performed by the rats, and was analyzed the same way. One human subject was excluded from analysis because they never learned to perform the task above chance.

#### **ANALYSIS**

Behavioral performance is reported as both the fraction of correct trials and *d*'. The former provides an intuitive sense of the raw data; the latter is a metric of signal detection theory that aims to separate a subject's sensitivity to the target from errors due to their bias to choose a particular response.

Confidence intervals in **Figure 2C** were generated using a permutation test that would reject the hypothesis that a subject's sensitivity to two stimulus categories was equal. Each trial has a stimulus identity (e.g., collinear or non-collinear) and the subject's response (e.g., reporting that the target was present or absent). The

"fncir-07-00197" — 2013/12/13 — 11:49 — page 3 — #3

**FIGURE 2 | Impact of flanking stimuli on the performance of humans and rats detecting a faint visual target.** Each arrow indicates the difference in fraction of correct trails across stimulus categories, for a single subject (cyan for humans, red for rats). The pale arrow denotes the impact of non-collinear flankers with respect to no flankers. The darker arrow indicates the additional impact of collinear flankers, above and beyond the effect of non-collinear flankers. The sum of the two arrows captures the difference in performance between detecting targets without any flankers, and detecting targets in the presence of collinear flankers. The absolute performance on each of the three stimulus conditions (collinear, non-collinear, no flank) is

captured by the tip and the base of the arrows. **(A)** Effect of flankers on performance (% correct). Humans performed better on trials with collinear flanking stimuli (upwards arrows) and rats performed worse on the trials with collinear flanking stimuli (downwards arrows). **(B)** Effect of flankers on sensitivity (d'). **(C)** Each subject's sensitivity on the collinear trials minus their sensitivity on the non-collinear trials. The gray shaded region indicates chance differences within the range spanned by 95% of 10,000 random permutations of the subject's response with respect to the stimulus. If performance for a given subject is significant beyond this chance range, it is marked with an asterisk.

subject's response was randomly permuted within all trails with a target, and again within all trials without a target, destroying the relationship between the stimulus identity and the response. *d*' was computed for each of the two stimulus categories (collinear and non-collinear) and the difference between the two was computed. The permutation and the analysis was repeated 10,000 times, resulting in a distribution of differences that would be expected if the sensitivity was not different. The top and bottom 250 samples were removed, providing an estimate of the boundary that would contain the observed measure 95% of the time, if the null hypothesis were true.

# **RESULTS**

Both humans and rats performed the same task to detect a faint target. During each trial of the task the configuration of the flankers was randomly varied (**Figure 1**). The many possible stimulus patterns were organized into three groups for analysis: trials without any flanking stimuli ("no flanker"), trials with two collinear flankers ("collinear"), and trials with two flankers present, neither of which was collinear to the target ("non-collinear"). These three non-overlapping categories fully contained all stimuli presented to the subjects. Both humans and rats learned to perform the task above chance levels. Humans learned the task and were tested in the course of a single session; the average performance of a single human ranged between 60% and 80% correct. Rats learned over the course of many weeks; the average performance for rats during the testing phase was between 60% and 70% correct. The absolute performance of each subject was not of particular interest, beyond confirming that it was it belonged to a range that could potentially reveal improvements or impairments.

To isolate the impact of collinear flankers, we compare a subject's detection performance between the stimulus types (**Figure 2**). Each human subject performed better on trials with collinear stimuli than on trials with non-collinear stimuli (significant in 2 of 4). This is consistent with reports that humans can detect fainter contrasts when flanking stimuli are collinear to the target (Polat and Sagi, 2007). On the other hand, each rat performed worse on collinear than non-collinear stimuli (significant in 6 of 7), as previously reported in rats. Notably, the rats' behavior reveals that their visual system is specifically influenced by collinear

"fncir-07-00197" — 2013/12/13 — 11:49 — page 4 — #4

**flanker contrasts were varied.** The contrast of the target varied ([0.25, 0.5, 0.75, 1.0]). The contrast of the flankers was varied independently ([0, 0.25, 0.5, 0.75, 1.0]). **(A)** Example of a stimulus condition with high contrast flankers (1.0) and a low contrast target (0.25). The target is

present in the left sub-panel and absent from the right sub-panel. **(B)** Example of a stimulus with a high contrast target (1.0) and low contrast flankers (0.25). Again, the left and right sub-panels differ by the presence vs. absence of the target. Only collinear flankers were used in this experiment.

flankers, above and beyond the influence to non-collinear flanking stimuli. However, the additional impact of collinear stimuli is to impair, rather than improve their performance. The absolute effect of flankers also differed between species: each human subject performed better on trials with flankers (collinear or not) than on trials without flankers; each rat performed worse when flankers were present.

Could the difference in contrast of the target alone explain the differences observed between rats and humans? Previous findings about human performance suggest that the impact of flankers on target detection at high contrasts may be different than on threshold target detection at low contrasts (Williams and Hess, 1998). Moreover, studies in human psychophysics as well as mammalian neurophysiology (Seriès et al., 2003) suggest that in some circumstances, the relative contrast of flanker to target could switch the influence of flankers from facilitative to suppressive. Because the humans were better at performing the detection task in a pilot study, the contrast of the target had been set to a lower value for humans in the first experiment, to achieve detection performance near threshold. Might the observed collinear impairment disappear, or even invert (Polat et al.,1998), if rats view collinear flankers that are substantially higher contrast than the target? Importantly, it was not known if the contrast of the target alone is the parameter that matters, or if the relative contrast of the target to the flanker matters more.

To address these questions, in a second experiment, two rats were tested with many combinations of target contrast and flanker contrast. This experiment includes a condition where that flanker contrast is four times as large as the target contrast (**Figure 3A**), as well as a condition where the flanker contrast is one quarter the strength of the flanker contrast (**Figure 3B**), as well as many steps in between. We want to know if there is a contrast regime where rats will perform better in the presence of collinear flankers compared to no flankers (**Figure 2B**, the combined length of both arrows). Specifically, will the sign of the effect ever invert for rats, such that collinear flankers improve detection performance, as they do for humans? We find the answer is no.

In none of the tested cases do collinear flankers improve rats' detection (**Figure 4**). More specifically, for each target contrast, "no flank" performance was always better than a "collinear" stimulus with a matched target contrast. As the target contrast is increased, the "no flank" condition improved more than the "collinear condition," increasing the difference in performance between the conditions (**Figures 4A,D**). In other words, given an increment of target contrast, rats were less sensitive to the additional signal when the collinear flankers were present. As the flanker contrast increased, the impairment caused by flankers increased (**Figures 4B,E**). This rules out the hypothesis that higher contrast flankers might improve rats' detection, either by creating a sub-threshold pedestal for the low contrast target (Chen and Tyler, 2002) or by providing a consistent salient visual anchor for spatial attention (Petrov et al., 2006). To restate, all tested conditions (**Figures 4C,F**) produced a collinear impairment in both rats.

# **DISCUSSION**

The goal of this study was to directly compare collinear processing between humans and rats, and to synthesize findings in both species for a better understanding of canonical cortical computations.

We tested humans and rats on the same detection task. In both species, flankers with collinear spatial patterns had the strongest effect on performance. However, the nature of the collinear effect is strikingly different between the species: collinear features helped humans perform the task, but they impaired rats. This makes it unlikely that the previously reported difference was due to differences in task design or contrast regime. Instead, it appears that some aspects of visual processing, specifically regarding correlations of spatially adjacent features, differs between rodent and human vision.

"fncir-07-00197" — 2013/12/13 — 11:49 — page 5 — #5

**condition.** Performance of two rats in which the target contrast and the flanker contrasts were independently varied. In all cases, we measured d' for detection of a target with collinear flankers, and subtracted the d' we measured for the same target contrast with no flankers (reference condition). This difference (change) in d' is indicated by arrowheads in the bar graphs. The base of each arrow is zero by definition because the flanker condition is the reference condition. **(A)** The reduction in detection

performance caused by full contrast flankers, at four different target contrasts. **(B)** The reduction in detection performance caused by four different flanker contrasts, for a full contrast target. **(C)** All possible combinations of target contrast and flanker contrast impaired the rat's performance. The reference condition for each comparison has no flanker present, and an equivalent target contrast. Eight of the 16 comparisons are identical to panels **(A,B)**. Panels **(A–C)** show data from one subject; **(D–F)** show equivalent data from the second subject.

This is the first report of human performance on a detection task with oriented flankers where the experimental design randomized target and flanker orientations on every trial. This design prevents subjects from using the spatial pattern of the previous trial to attend to features that would make the detection task easier. Our results provide evidence against the model that feature-based attention underlies collinear facilitation in humans.

#### **POSSIBLE EXPLANATIONS OF OBSERVED SPECIES DIFFERENCE**

Some of the behavioral consequences of flanking stimuli are likely mediated by lateral interactions between the cortical layers that represent both the target features and the flanking features. However, we cannot rule out that features may interact via the computations of higher order visual features, or even via the subject's decision process.

Our prior was that the detection of collinear contours would be a fundamental visual computation conserved across mammals. The different effect of collinear stimuli in rats and humans therefore came as a surprise; we do not have an explanation for it. Below we consider five hypotheses that could explain our observations: between humans and rats there may be (1) differences in task understanding, (2) differences in recent visual experience, (3) differences in the statistics of natural retinal images (4) difference in neural resources and thus over-completeness of pattern representations (5) differences in attention.

# *(1) A difference in how humans and rats understand the task*

In this study, both rats and humans inferred the task goal by trial and error, without explicit instructions. All human subjects were given an exit survey. Of the four included in this study, three subjects were able to articulate the stimulus properties they used to answer correctly, such as attending to the region between the two flankers. One subject was not able to articulate which visual properties influenced their judgments. Strikingly, this subject was above chance, yet did not seem to understand what the task was. Indeed, this subject was not even aware of performing the task better at the beginning or the end of the session. Given that this human subject did not understand the task, and hypothesizing that the rats did not understand the task, is the subject's performance consistent with the rats? The answer is no: all humans tested (who were

"fncir-07-00197" — 2013/12/13 — 11:49 — page 6 — #6

above chance) had the same collinear facilitation, even the one that did not seem to understand the task. Based on this anecdotal evidence, we do not favor differences in task understanding as an explanation of the species difference. Of course, despite our efforts to match the learning procedures, there could still be differences between humans and rats about how they understand the task.

#### *(2) A difference in visual experience in the training phase*

Since human training was very short (about 15 min, and a few hundred stimuli) the preceding stimuli might have had a different effect than for the rats, who viewed more stimuli (over months, tens of thousands of trials). Additionally, humans only viewed the final task, whereas rats were shaped to perform the task through a series of shaping steps. One of these steps included a large target. Therefore it is possible that the rats learned to use information that was collected from the region that the flankers were going to occupy in later testing phases, and failed to unlearn that these regions were in fact foils during the testing phase. However, we note that the number of trials with large targets was small (hundreds of trials, months ago) compared with the majority of trials in which the target and the surround were independent. Therefore we think it is unlikely that this explains the full reversal of the collinear effect across species.

#### *(3) A difference in anatomy and visual experience across evolution*

Natural scenes are self-similar (Tolhurst et al., 1992; Ruderman and Bialek, 1994; van der Schaaf and van Hateren, 1996). Thus, the statistics of collinear line elements should be similar across scales spanned by rat vision and human vision. However, humans and rats see the world from a different point of view. Rats' eyes are closer to the floor, and they rarely shift their gaze vertically (Chelazzi et al., 1989). This latter fact may be particularly important because it seems that the rodent retina has adapted to different evolutionary constraints for the stimuli in different spatial locations. For example, the upper visual hemifield contains a different distribution of cones than the lower hemifield (Ortín-Martínez et al., 2010), possibly due to representing different features in the land and sky. In this study, rats viewed stimuli that were in the upper visual field. Rats have lower visual acuity than humans (Keller et al., 2000), a smaller fraction of cones than humans (Kimble and Williams, 2000). Taken together, these differences in the early visual system suggest that at an evolutionary time scale, the statistics of visual input was different for rats, and that the visual system optimized differently to represent them. It could be argued that spatial vision in rats is different in their upper and lower hemifields. The lower hemifield may be used to detect and identify the spatial patterns of nearby visual objects, and the upper visual field may be more relevant for more distant cues such as landmarks or swooping predators. The limited optical range of the rodent eye may render the retinal image blurry for distant objects. In summary, humans may have more evolutionary experience bringing structured objects into focus, regardless of their depth, and thus more opportunity and selective pressure to evolve mechanisms exploiting the relative correlations of local image features. This could explain why rats might lack collinear facilitation, at least in the upper visual field. But it fails to account for collinear-specific impairment.

# *(4) Flankers impair performance by crowding, but primates have mechanisms to combat crowding*

One possible explanation is that all flanking stimuli cause a universal impairment in target detection, but that some organisms have attentional and/or perceptual resources that capitalize on collinear edges and overpower the deficit of crowding. If there are fewer cortical neurons to represent each square degree of a visual scene (as in a rodent or the primate para-fovea), the impact of crowding may be stronger, and the deficit observed in from the presence of flankers will be greater. Indeed, crowding from a distant flanker is stronger in the para-fovea than in the fovea of humans (Levi, 2008), and rats, compared to humans, display greater detection deficits in the presence of flankers. Crowding could account for the pattern of deficits observed in the range of contrasts conditions tested in rats (**Figure 4**; Levi and Carney, 2011; Meier and Reinagel, 2011). Crowding could explain why any flanker might impair target detection, and why collinear features, or more proximal features, or higher contrast features would impair more. However, crowding will not explain the benefits in target detection that are conferred to humans when flanking stimuli are present. To explain the improvement, one would have to posit an additional resource, unique to primates and absent in rodents.

### *(5) Collinear facilitation requires selective visual attention, which is more developed in primates*

Collinear facilitation could be a hallmark of deployed attention. Previous studies suggest that collinear facilitation in humans depends on the allocation of attention (Freeman et al., 2001). Many anatomical structures in the deployment of spatial visual attention overlap with the neural resources involved in the guidance of eye movements – notably the superior colliculus and frontal eye fields. Rats do not have a fovea, and lack the rich saccadic eye movements found in primates. Therefore rats may lack specializations of the spatial visual attention system that primates have evolved in association with saccadic foveation. A difference in attentional mechanisms could explain the cross-species differences in the detection task observed in this paper.

The five explanations considered above are speculative, and not mutually exclusive. We suspect the difference is due to a combination of the latter three: differences in the statistics of the natural retinal images, the representational capacity of neurons in visual cortex, and the attention mechanisms of an organism.

In closing, the neural mechanisms of collinear interactions remain unknown in either species. We presented strong evidence that processing of collinear features is different between rats and humans. Elucidating the circuit mechanisms in either species would be of great value, and the best model would be one that could account for the differences between the species.

# **ACKNOWLEDGMENTS**

This work was supported by an Innovative Research Grant from the Kavli Institute for Brain and Mind, a Scholar Award from the J. S. McDonnell Foundation, and NIH R01 Grant EY016856. Philip M. Meier was supported by NSF IGERT Grant DGE-0333451 to GW Cottrell/VR de Sa and by an NSF Graduate Research Fellowship.

**"fncir-07-00197" — 2013/12/13 — 11:49 — page 7 — #7**

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 August 2013; accepted: 28 November 2013; published online: 13 December 2013.*

*Citation: Meier PM and Reinagel P (2013) Rats and humans differ in processing collinear visual features. Front. Neural Circuits 7:197. doi: 10.3389/fncir.2013.00197 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2013 Meier and Reinagel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fncir-07-00197" — 2013/12/13 — 11:49 — page 8 — #8

# Speed and accuracy of visual image discrimination by rats

# *Pamela Reinagel\**

Section of Neurobiology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USA

#### *Edited by:*

Davide Zoccolan, International School for Advanced Studies, Italy

#### *Reviewed by:*

Samuel Gavan Solomon, University College London, UK Athena Akrami, Princeton University – Howard Hughes Medical Institute, USA

*\*Correspondence:*

Pamela Reinagel, Section of Neurobiology, Division of Biological Sciences, University of California at San Diego, 9500 Gilman Drive #0357, La Jolla, CA 92093, USA e-mail: preinagel@ucsd.edu

The trade-off between speed and accuracy of sensory discrimination has most often been studied using sensory stimuli that evolve over time, such as random dot motion discrimination tasks.We previously reported that when rats perform motion discrimination, correct trials have longer reaction times than errors, accuracy increases with reaction time, and reaction time increases with stimulus ambiguity. In such experiments, new sensory information is continually presented, which could partly explain interactions between reaction time and accuracy. The present study shows that a changing physical stimulus is not essential to those findings. Freely behaving rats were trained to discriminate between two static visual images in a self-paced, two-alternative forced-choice reaction time task. Each trial was initiated by the rat, and the two images were presented simultaneously and persisted until the rat responded, with no time limit. Reaction times were longer in correct trials than in error trials, and accuracy increased with reaction time, comparable to results previously reported for rats performing motion discrimination. In the motion task, coherence has been used to vary discrimination difficulty. Here morphs between the previously learned images were used to parametrically vary the image similarity. In randomly interleaved trials, rats took more time on average to respond in trials in which they had to discriminate more similar stimuli. For both the motion and image tasks, the dependence of reaction time on ambiguity is weak, as if rats prioritized speed over accuracy. Therefore we asked whether rats can change the priority of speed and accuracy adaptively in response to a change in reward contingencies. For two rats, the penalty delay was increased from 2 to 6 s. When the penalty was longer, reaction times increased, and accuracy improved. This demonstrates that rats can flexibly adjust their behavioral strategy in response to the cost of errors.

**Keywords: decision making, sequential decision, speed–accuracy trade-off, rodent vision, visual behavior, perceptual decision, choice**

# **INTRODUCTION**

The temporal dynamics of decision making have been most thoroughly studied using the random dot motion task, in which a number of randomly positioned dots move coherently in one of two directions (the signal), while a number of other randomly positioned dots move in random directions (the noise). Thus information about the direction of coherent motion is embedded in noise, and averaging over time improves the signalto-noise ratio of the sensory information available in the physical stimulus. When human and primate subjects perform this task, subjects wait longer to respond when the stimuli are less coherent (more ambiguous), and there is a trade-off between accuracy and speed (Palmer et al., 2005). Speed–accuracy trade-off in primate vision has been the subject of a rich experimental and theoretical literature (Britten et al., 1996; Shadlen and Newsome, 1996, 2001; Leon and Shadlen, 1998; Kim and Shadlen, 1999; Gold and Shadlen, 2001, 2007; Hastie and Dawes, 2001; Roitman and Shadlen, 2002; Glimcher, 2003; Huk and Shadlen, 2005; Palmer et al., 2005; Bogacz et al., 2006, 2009; Churchland et al., 2008; Ratcliff and McKoon, 2008; Kahneman, 2011; Liu and Pleskac, 2011; Drugowitsch et al., 2012).

Compared with primates, little is known about the trade-off of speed and accuracy in sensory decisions by rodents. In the past decade, studies have begun to address this question in rodents using olfactory (Uchida and Mainen, 2003; Abraham et al., 2004; Kepecs et al., 2006, 2007; Rinberg et al., 2006; Uchida et al., 2006; Felsen and Mainen, 2008, 2012) and auditory (Jaramillo and Zador, 2011; Sanders and Kepecs, 2012; Brunton et al., 2013) tasks. For the case of rodent vision, it was recently shown that that when rats perform the random dot visual motion task, accuracy improves with viewing time and viewing time increases with the discrimination difficulty (Reinagel, 2013). The improvement in accuracy with reaction time required the presence of the ongoing motion stimulus. This raised the question whether this improvement with viewing time required that the stimulus be dynamically updated with new independent evidence for the decision (as is the case with random dot motion), or whether the same would hold true when the stimulus was well abovethreshold and static. In motion discrimination, the increase in reaction time with difficulty was smaller than expected for integration to a bound, and more resembled the responses of humans and monkeys when given a deadline or instructed to prioritize speed over accuracy. Moreover, the increase in reaction time with difficulty was found even under conditions (after stimulus offset) when the delay impaired rather than improved reward outcome. Thus the dependence of reaction time on difficulty could reflect

"fncir-07-00200" — 2013/12/16 — 14:55 — page 1 — #1

confidence (Kepecs et al., 2008; Kiani and Shadlen, 2009) rather than sensory integration time. It remained unclear, then, whether rats have the capacity to prioritize accuracy any more highly in this task, and whether doing so would result in a change in speed.

To address these questions, this study describes the relationship between reaction time and accuracy in the responses of rats discriminating between high-contrast static visual images. The visual similarity of the image pair was varied parametrically by image morphing. Rats' ability to modulate reaction time in response to task demands was tested by changing the duration of the error penalty.

# **MATERIALS AND METHODS**

#### **ANIMALS**

Twelve female Long-Evans rats (Harlan) were water restricted and trained to perform visual tasks for water reward (Meier et al., 2011). Subjects began training at age p30 for 2 h/day 7 days a week. Subjects performed 500–1500 trials per day, and received water in 50% of trials when performing at chance. No supplemental water (outside of the task) was given at any time, but carrots were given after each training session. During training sessions subjects had free access to return to the home cage at any time; thus they had access to food during periods of water consumption. On this protocol, all subjects maintained normal growth curves (within 5% of published values for unrestricted food and water). Between training sessions, subjects were pair-housed with enrichment (chew toys, PVC tubes). Subjects were housed in a reverse 12 h light/dark cycle and were trained and tested in the housing environment during the dark cycle. All 12 rats that began the study learned the task and completed the study. The total training time in calendar days from naive animal to beginning the testing phase (shaping steps 1–5) ranged from 29 to 108 days (56.1 ± 26.3, mean ± SD), corresponding to ages between p59 and p138. The calendar days required to complete the testing period (step 6) ranged from 20 to 42 days (27.4 ± 5.9, mean ± SD). All procedures were performed with the approval and under the supervision of the UCSD IACUC, within an ALAAC accredited animal facility. The image discrimination task was described previously (Clark et al., 2011). The reaction time data reported here were collected from the pre-lesion and un-lesioned subjects of that earlier study.

### **APPARATUS**

The training apparatus and software are described in detail in Meier et al. (2011). Briefly, training occurred in a small, clear Lucite training chamber with a CRT monitor visible through one wall (**Figure 1A**). The CRT monitor (NEC FE992-19, 100 Hz,

**FIGURE 1 |Training and testing paradigm. (A)** Diagram of cage-attached operant conditioning chamber. **(B)** One of the subjects in this study in the operant chamber performing the statue-shuttle image discrimination (shaping step 4). **(C)** The exemplar image pair E and examples of the intermediate morph pairs for the flashlight–paintbrush image discrimination used in the testing phase (shaping step 6). **(D)** Example learning curve for one subject showing performance as a course of training from naïve to study completion. Training day indicates number of calendar days since initiating training. Chance performance is 0.5 (lower dotted line). In the first two shaping steps (acclimation to apparatus, shaping steps 1–2, days 1–6 in this case) all

responses are valid, so performance is undefined (not plotted). For all subsequent shaping steps, each symbol shows the average performance on one task over one training day. Error bars show 95% binomial confidence intervals. Color indicates task: go to statue (shaping step 3, red), discriminate statue from shuttle (shaping step 4, green), discriminate flashlight from paintbrush exemplars (shaping step 5, blue), or discriminate flashlight from paintbrush including exemplars and morph probe trials (shaping step 6, black). Subjects were automatically graduated to the next task when performance exceeded 80% (upper dotted line) for at least 200 trials, and graduated from the final task when each morph level had been tested exactly 150 times.

"fncir-07-00200" — 2013/12/16 — 14:55 — page 2 — #2

1024 × 768 resolution) was linearized with a minimum, mean, and maximum luminance of 4, 42, and 80 cd/m2, respectively (Colorvision, spyder2express). From the position of the center request port, the monitor was about 10 cm from the rat's eye and subtended 104◦ of visual angle (0.1 degrees/pixel). Images were displayed immediately above the two response ports and subtended about 35◦ of visual angle (shaping steps 3 and 4) or 20◦ (shaping steps 5 and 6) in their maximum dimension. A central "request" port was located near the bottom of the display wall; two "response" ports were located 90 mm left and right of this. Request and response ports were triggered by licking a water tube, which was detected when the rat's tongue broke an infrared beam. Lick times were the only recorded behavioral output; nose position was not separately monitored. The volume of water drop delivered for reward was determined by the duration of valve opening (50 ms) on a low-pressure water line. Due to pressure variations, the precise volume varied from day to day, but was matched across the ports.

In this apparatus, response required locomotion, which introduces a time and effort cost for the rat. This may increase the rats' prioritization of accuracy in our tasks overall. Although long and variable response times might have overwhelmed any systematic differences between stimulus and reward conditions, we found that such differences could still be resolved. Nevertheless, other response modalities can be executed and detected more quickly, and could be used to place tighter bounds on the time required for rats to make sensory decisions.

#### **SHAPING**

In preliminary shaping, subjects moved throughfour shaping steps (**Table 1**) to acquire a two alternative forced-choice (2AFC) visual discrimination between static grayscale photographic images of two real world objects (a statue and a space shuttle; **Figure 1B**). In this and all subsequent steps, each trial was initiated by the subject by licking a central request port, which caused the two images to appear on the screen, one above each response port. The rewarded (S+) stimulus was randomly assigned to either the left (L) or right (R) side of the screen, and the unrewarded (S−) stimulus to the other side. The two images were large and high contrast, and were matched in luminance, size, contrast, and orientation. The images persisted until the subject licked a response port (L or R), with no time limit. Responses at the port co-localized with the S+ stimulus were rewarded with water delivered at the same location with <10 ms delay, after which the subject could immediately initiate a new trial. Responses at the port co-localized with the S− stimulus were penalized with a timeout of 2–8 s before a new trial could be initiated. After each correct trial, the S+ stimulus was assigned to L or R side with equal probability. After an error trial, however, there was a fixed probability (0.25–0.5) of entering a correction trial instead, in which case the S+ stimulus was deterministically placed at the port opposite the previous trial's response. This method was highly successful in helping rats overcome bias (overall preference for one response port over the other) as well as perseveration (preference to return to the most recently visited or recently rewarded port) over months of automated training and testing. However it alters the statistics of the task in trials after errors. Therefore only trials after correct trials are analyzed here.

Reward magnitude was not varied in this study. Penalty time out duration was empirically adjusted for each rat to discourage guessing, while avoiding excessive subject frustration as judged by quitting. The penalty duration was always fixed for each rat within a training session. All rats began with a penalty duration set at 2 s. For seven of the subjects, this value was never changed over the course of training and testing. For five subjects, the penalty was increased by steps of 2 s, waiting on average 5000 trials between adjustments, up to a maximum of 8 s.

After mastering the first 2AFC visual discrimination (shaping step 4), subjects learned a second visual discrimination between two novel images (a paintbrush and a flashlight), one of which


"fncir-07-00200" — 2013/12/16 — 14:55 — page 3 — #3

was assigned to be the S+ stimulus for each rat (shaping step 5). Subjects were trained on this "exemplar" discrimination until performance exceeded 80% accuracy for at least 200 trials (**Figure 1D**) before entering the test phase (shaping step 6). After completing shaping step 5, animals appear to make stereotypical head and body movements toward one or the other response port as soon as they leave the center port (see Video S1 in Supplementary Material), but head and eye movements were not tracked during training or testing.

## **TESTING**

In the test phase, subjects continued to be tested on the exemplar discrimination in 80% of trials; later analysis confirmed that performance on the exemplar pair was stationary for the duration of the test phase. In the remaining 20% of trials (interleaved), subjects were presented with a pair of images of parametrically varied similarity, obtained by morphing between the S+ and S− exemplar images (**Figure 1C**). In these probe trials, subjects were rewarded for responding at the port co-localized with the stimulus that was closer to S+ of the two images. A previous study had shown that rats were unlikely to be relying on any one local cue to discriminate the morphs, because results were qualitatively similar if any quadrant of the image was masked in both images of the pair (Clark et al., 2011). The order of probe trial types was pseudorandom with the constraint that each of the 14 non-exemplar difficulty levels had to be presented once before any one difficulty level could be repeated. This procedure ensured that data for probe trials accrued at the same rate for every difficulty level. Each rat continued the test phase until each probe type was tested exactly 150 times. During testing the penalty duration was fixed at 2 s for all rats.

# **ANALYSIS**

The data for each trial in the test phase consist of: which specific image pair was shown (selected independently each trial); on which side the rewarded target appeared (selected independently each trial); the time of subject-initiated stimulus request; the latency from stimulus onset to response; and the outcome of the trial (correct/reward or error/timeout). Data analysis was performed using custom programs written in Matlab (Mathworks, Natick, MA, USA).

Calculations are based on all valid trials (after excluding trials after errors) of the indicated type in the relevant testing block. In **Figures 2A,B**, reaction time distributions were computed from 4583 (correct) and 1483 (error) trials. Same data were used to compute **Figure 3A**. In **Figure 2C**, each point was computed from an average of 5815 correct trials (range 4583–7031) and 952 error trials (range 516–1483). The same data were used to compute the *N* = 12 curves that underlie the average curve in **Figure 3B**, and to compute the values per rat plotted in **Figure 3C**.

**Figure 4A**, analysis of level 1 (exemplar discrimination) was based on 6126 valid trials; other levels (morph probe trials) were based on an average of 110 valid trials each (range 101–119). **Figure 4B** represents result from *N* = 12 rats, number of trials per condition similar to the example in **Figure 4A**. Cumulative probability in **Figure 4C** is based on 6126 (easy) vs. 442 (hard) trials. Median decision times in **Figure 4D** are based on an average of 6894 trials for the easy condition (range 6126–7627) and an average of 490 trials for the hard condition (range 442–539). Results in **Figures 5** and **6** are based on an average of 4089 valid trials per condition (range 2320–4807).

# **RESULTS**

Twelve Long-Evans rats were trained to discriminate between grayscale photographs of two perceptually similar objects – a flashlight and a paintbrush – in a self-paced 2AFC operant conditioning paradigm (**Figures 1A–D**; Materials and Methods; **Table 1**; Video S1 in Supplementary Material). After performance was asymptotic on this "exemplar" discrimination, subjects began the testing phase. During testing, the exemplar discrimination was tested in 80% of trials; the remaining 20% of trials were probe trials in which the discriminated images were rendered more similar by

showed in any trial or task (0.403 s). **(B)** Cumulative distributions of reaction time, the integrals of curves shown in panel **(A)**. **(C)** Median decision time in error trials (x-axis) and in correct trials (y-axis) for each subject (N = 12), for exemplar discriminations in the test phase. The example subject used in panels **(A,B)** is highlighted (gray). Symbols are above the identity line (diagonal) if correct trials had longer median reaction time.

"fncir-07-00200" — 2013/12/16 — 14:55 — page 4 — #4

**FIGURE 3 | Accuracy improves with reaction time. (A)** Accuracy of exemplar discrimination as a function of reaction time for a single subject (same rat as **Figures 1D** and **2A,B**); error bars show the 95% binomial confidence intervals. **(B)** Accuracy of exemplar discrimination as a function of reaction time averaged over all N = 12 rats; error bars show SEM over the

population. **(C)** Accuracy on exemplar discrimination in fast trials vs. in slow trials in the test phase. Each symbol represents data from a single rat, and error bars show 95% binomial confidence intervals. The example subject used in **(A,B)** is highlighted (gray). Symbols are above the identity line (diagonal) if slow trials had higher accuracy.

**FIGURE 4 | Reaction time increases with trial difficulty. (A)** Performance (% correct responses) as a function of stimulus ambiguity (morph level) for one rat (cf. **Figures 1D, 2A,B,** and **3A**). Error bars show 95% binomial confidence intervals. **(B)** Average performance of all 12 subjects as a function of the similarity of the two images discriminated. Error bars show SEM over the population of N = 12 subjects. (Data re-analyzed from Clark et al., 2011). **(C)** Cumulative distribution of reaction time for the subject analyzed in panel **(A)**, for the easiest (level 1, black curve) and hardest

morphing between the exemplar images (**Figure 1C**). In probe trials, subjects were rewarded for selecting the image that more closely resembled the learned target.

#### **REACTION TIME IS LONGER IN CORRECT TRIALS**

For each trial, the "reaction time" is defined here as the time between voluntary initiation of the trial (lick at center, at which

(levels 12–15, gray curve) trials. Arrows indicate the median latencies of the two distributions (0.793 vs. 0.873 s). This subject's minimum RT (estimated sensorimotor delay) was 0.403 s. **(D)** Median decision time (DT; reaction time minus sensorimotor delay) for easiest vs. hardest trial types for all N = 12 rats; data for the subject shown in panels **(A,C)** is highlighted in gray. Symbols above the diagonal identity line (N = 10/12) indicate a subject that takes more time to respond on harder discriminations.

time images appear) and the time of the subject's response (lick at left or right response port, at which time the images disappear and reward or penalty occurs). The probability distribution of reaction times for exemplar discriminations is shown for both correct trials and for error trials for one rat (**Figures 2A,B**). For this subject, shorter reaction times (0.5–1.0 s) are more frequent among error trials, while long reaction times (1.0–1.5 s) are more frequent

"fncir-07-00200" — 2013/12/16 — 14:55 — page 5 — #5

among correct trials. The median reaction time was longer in correct trials than error trials for this subject (arrows in **Figures 2A,B**), and this was the case for all 12 subjects (*P* < 10−<sup>3</sup> by Wilcoxon signed rank test).

The minimum reaction time of a given subject across all trials and all visual 2AFC tasks (dashed line, **Figures 2A,B**) places an upper bound on the time required for the center-port to responseport motor response for that subject. The minimum reaction time was stable over time and tasks for a given subject, probably representing occasional pure motor responses (fast guessing). It ranged from 0.323 to 0.413 s across subjects. During visual tasks, responses were rarely as fast as the rat's estimated motor delay.

The "decision time" in each trial is operationally defined here as the reaction time minus the subject's sensory/motor delay as defined above. The median decision time (DT) for correct trials was longer than in error trials for all 12 subjects (**Figure 2C**; *P* < 10−<sup>3</sup> by Wilcoxon signed rank test). Note that the DT differs from reaction time only by the subtraction of the same constant from both values for any given point, and therefore does not affect the sign or magnitude of the difference between compared conditions within subject.

#### **DEPENDENCE OF ACCURACY ON REACTION TIME**

The fact that reaction times tended to be longer in correct trials implies that accuracy (% correct) was higher in trials with longer reaction times. The relationship between reaction time and accuracy on exemplar trials is shown for an example subject in **Figure 3A**. For this rat, performance improved with reaction time over the range of 0.5–1.2 s, beyond which there was no improvement, despite the fact that performance remained below 100%.

The population average curve is shown in **Figure 3B**. Every subject showed a monotonic, saturating improvement in accuracy with reaction time, but the reaction time distributions and accuracy variedfrom subject to subject. For each rat, trials with reaction times in that rat's lowest quartile were defined as "fast," and trials

analysis shown. Error bars show standard errors of the means (SEM).

with reaction time in the rat's highest quartile were defined as "slow." Every rat performed better in slow trials than fast ones (**Figure 3C**); this improvement with reaction time was significant for 10/12 rats individually (the 95% binomial confidence intervals do not overlap), and the effect was significant at the population level (*P* < 10−3, Wilcoxon signed rank test).

# **RATS TAKE MORE TIME TO RESPOND WHEN IMAGES ARE MORE SIMILAR**

To test whether rats take longer to make a decision when the sensory stimuli are more ambiguous, the similarity of the two images was parametrically varied in probe trials with morphed images (see Materials and Methods; **Figure 1C**; Clark et al., 2011). Exemplar and morph trials were randomly interleaved in the experiment, but exemplar trials were far more numerous (see Materials and Methods).

Accuracy of discrimination decreased as the images became more similar, as shown for one rat in **Figure 4A** and summarized for all rats in **Figure 4B**. For the subject whose performance is shown in **Figure 4A**, the distribution of response latencies was shifted to longer latencies in the trials with more ambiguous stimuli (**Figure 4C**), indicating that this subject took more time on more difficult trials. For most subjects (*N* = 10/12 rats), the median reaction time on the easiest trials (exemplar, level 1) was lower than the median reaction time on the most difficult or ambiguous trials (morph levels 12–15; **Figure 4D**), and this trend was significant at the population level (*P* < 10−2,Wilcoxon signed rank test).

# **RATS TAKE MORE TIME TO RESPOND WHEN THE ERROR PENALTY IS INCREASED**

For two rats, we also compared reaction times and accuracy in paired testing blocks differing only in penalty duration (2 vs. 6 s). For both rats, increasing the duration of the error penalty led to a significant increase in DT (**Figure 5A**). This was accompanied by a substantial improvement in accuracy (**Figure 5B**),

intervals.

"fncir-07-00200" — 2013/12/16 — 14:55 — page 6 — #6

and therefore a lower probability of incurring the penalty. One rat (black lines) was tested with the exemplar discrimination pair described above. The other (gray lines) was tested using a more difficult discrimination pair (box/car image pair), after having trained to asymptotic performance of 65% on that discrimination. Incidentally, this second subject did not have longer reaction times on harder trials when they were interleaved (symbol below diagonal in **Figure 4D**; median DT 0.375 s for easy, 0.360 s for hard, penalty duration 2 s). Nevertheless, in an extended testing block with only difficult trials, reaction time was longer (median DT 0.630 at 2 s penalty duration) than in the easier discrimination block. Thus the subject did modulate reaction time with difficulty on the block timescale, even with penalty held constant.

Increasing the penalty duration led to a reduction in fast responses (0.5–1 s latency), and an increase in slow responses (1.0– 1.5 s latency), for both rats (**Figures 6A–D**). Regardless of penalty condition, responses were rarely as fast as the rat's estimated motor delay (vertical lines in **Figures 6A–D**). For the subject that was tested with a more difficult discrimination pair (gray in **Figure 5**; **Figures 6B,D**), performance was only 65% with the short penalty. Thus penalty was incurred in 45% of trials, substantially limiting reward rate. This rat's reaction times shifted more dramatically in response to penalty increase.

# **DISCUSSION**

These data demonstrate an interaction between reaction time and accuracy in the visual discrimination of images of natural objects by rats. Rats performed better when they responded later (**Figures 2** and **3**), despite the absence of any temporal information in the stimulus itself. Moreover, most rats responded more slowly when confronted with more difficult discriminations (**Figure 4**), or when the cost of an error was higher (**Figures 5** and **6**).

#### **ACCURACY INCREASES WITH REACTION TIME**

When rats discriminate static visual images without a deadline, their discrimination accuracy for a given discrimination difficulty improves with reaction time (**Figures 2** and **3**). The reaction times, accuracy, and dependence of accuracy on time, were all comparable to those reported for discrimination of random dot motion stimuli under similar conditions (Reinagel, 2013). In the random dot motion task, stimuli are rendered difficult both by reducing signal (fewer dots contributing to coherent motion) and adding noise (more dots moving randomly). In

the exemplar discrimination (black lines in **Figure 5**), with short penalty (solid curve) or long penalty (dashed curve). Median reaction time increased from 0.724 to 0.787 s. The rat's lifetime minimum reaction time is indicated by the thin vertical line. **(B)** Cumulative probability distribution

from 0.993 to 1.102 s. The rat's lifetime minimum reaction time is indicated by the thin vertical line. **(C)** Raw reaction time distributions corresponding to data of panel **(A)**. **(D)** Raw reaction time distributions corresponding to data of panel **(B)**.

"fncir-07-00200" — 2013/12/16 — 14:55 — page 7 — #7

such stimuli, new sensory evidence is presented continuously over time, and temporal integration should improve signal-tonoise ratio. In our task, stimuli are rendered difficult by making them more similar (**Figure 1C**). The generalization to static images shows that the improvement in accuracy with time is not specific to temporally evolving visual stimuli, nor restricted to tasks with noise corruption in the physical stimulus. In our task, errors for very difficult morphs may be due to failure to perceive differences, but could also arise from a noisy category boundary.

We hypothesize that accuracy is determined by the amount of sensory evidence accumulated at the time the rat decides, regardless of what determines the time of the decision. In the case of motion discrimination this hypothesis was tested by uncoupling reaction time from viewing time (Reinagel, 2013), but the equivalent experiment has not been done for the image task.

In a related image discrimination task performed by rats (Zoccolan et al., 2009), accuracy was higher in the trials with short reaction times (Tafazoli et al.,2012). This seemingly opposite result was explained by priming effects in their experiment, however. In trials with congruent primes, rats were both faster and more accurate. The results reported here are not in conflict with that finding.

Our findings are also consistent with results from mice in a 2AFC auditory discrimination task (Sanders and Kepecs, 2012). In that task, like the random dot motion task, the stimulus unfolded over time and the signal was stochastic, such that optimal performance requires evidence accumulation. Accuracy increased with reaction time for easy discriminations, and reaction time increased with discrimination difficulty, as we found for visual tasks in rats. In that study, monitoring behavior during the decision interval revealed that mice make choice reversals that improve accuracy. Choice reversal could explain a correlation between accuracy and long reaction times in their task and in ours. We have no data, however, on the location or locomotion of the rats during the decision interval.

When primates perform visual reaction time tasks with interleaved trials of varying sensory difficulty, accuracy is widely reported to decline as a function of reaction time – the opposite of our result (Roitman and Shadlen, 2002; Mazurek et al., 2003; Palmer et al., 2005; Churchland et al., 2008). In those data this result is attributed to a collapsing decision bound, which can be explained in terms of accumulation of evidence during the decision interval about the quality of the sensory evidence in that the trial (Hanks et al., 2011; Huang et al., 2012). We still do not know if task differences, species differences, or both underlie these different experimental findings. The most obvious task difference is that we imposed no minimum response delay, no additional reward delays, and no minimum inter-trial interval in our task. Such enforced delays are typically used in the primate studies to discourage fast guessing, and have the consequence that DT is a small fraction of total trial time. Our task makes the cost of DT significant to the rate of reward harvesting, a regime that is not well explored in the speed– accuracy literature. Yet from the point of view of the animal, fast guessing is a valid reward harvesting strategy that may be optimal under some conditions. It will be interesting to develop quantitative models that include and account for this basic choice behavior.

# **DETERMINANTS OF REACTION TIME**

Using morphing to vary image discrimination difficulty, we found that rats responded later on more difficult trials (**Figure 4**). A similar result was found for rats in a random dot motion task (Reinagel, 2013). In a transformation-invariant visual object recognition task (Zoccolan et al., 2009), it has also been noted that reaction times are longer on more extreme transformations (Tafazoli et al., 2012; Alemi-Neissi et al., 2013). Accuracy decreased with difficulty while reaction time increased, consistent with our findings. In that task as in ours, discrimination difficulty was varied but the stimulus did not unfold over time or contain stochastic noise.

Although reaction time increased with difficulty in our task, the increase was modest – only about 100 ms on the most difficult trials. The difference in reaction time may reflect the lower confidence of the animal in hard trials (Kepecs et al., 2008; Kiani and Shadlen, 2009) rather than an accumulation of evidence strategy. One explanation for the rats' failure to wait longer could be that rats lack the capacity to control impulsivity to optimize reward rate.

But here we report that rats can modulate their behavioral strategy in response to the cost of errors. When the duration of penalty was increased, rats waited longer before responding, and their accuracy improved (**Figures 5** and **6**). This is consistent with the idea that longer viewing time leads to more accurate discriminations. But it is equally possible that a third cause (such as increased attention) caused an increase in both reaction time and accuracy.

# **SOURCE OF TIME-DEPENDENCE**

The results presented here provide evidence for a time-dependent improvement in image discrimination, despite the absence of dynamics or time-varying noise in the stimulus. Because the physical stimulus was unchanging, this implies some temporal process arising in the animal. Possibilities are numerous and include: variation in the animal's state (e.g., attention, motivation, or arousal) from trial to trial; active sampling of the visual stimulus (e.g., saccades, involuntary eye movements, head or body movements), sensory neural processing (e.g., temporal integration of noisy firing rates, spike time pattern codes), or cognitive processing involved in decision *per se*. The data presented here do not distinguish among these alternatives.

In particular, we do not know what the animal is doing, or when the decision occurs, within the interval between stimulus onset and detected response. If we had detected removal of the rat's nose from the center port, this would have provided additional information, but we still would not know whether or when the rat made a decision until a response was made. A task in which motor output is monitored continuously could provide more insight into the time of the decision, including decision reversals within this interval (Sanders and Kepecs, 2012).

#### **GENERALITY OF FINDINGS**

For the image discrimination task described here, we have shown that rats' accuracy increases with reaction time, and reaction time is longer on harder stimuli, consistent with results from

"fncir-07-00200" — 2013/12/16 — 14:55 — page 8 — #8

rats and mice tested with other visual and auditory stimuli, as summarized above. Nevertheless, these results may not be true for all sensory discrimination tasks. Clearly changes to the reward, penalty, or delay schedule of a task are expected to manipulate the relative priority of accuracy vs. speed. The relationship between reaction time and accuracy may also depend on the difficulty of the sensory discrimination, the sensory modality, or the qualitative nature of the sensory decision being made. In olfaction, for example, rats' discrimination accuracy improves with reaction time in some tasks but not others (Uchida and Mainen, 2003; Abraham et al., 2004; Rinberg et al., 2006; Uchida et al., 2006). A complete theory of decision making will ideally encompass and account for such differences between tasks.

# **ACKNOWLEDGMENTS**

This work was supported by the Kavli Institute of Mind and Brain at UCSD, and the James S. McDonnell Foundation. I thank Sarah Petruno and Danielle Dickson for expert technical assistance. I thank Robert Clark for reading a draft of this manuscript, and for allowing me to use unpublished reaction time data collected during our behavior experiments for an unrelated study.

# **AUTHOR CONTRIBUTIONS**

The behavioral training protocol and visual task are from a previously published study (Clark et al., 2011), in which the performance of these same rats was already described without consideration of reaction time. Pamela Reinagel conceived of the present study, collected these additional reaction time data, analyzed the data, interpreted the results, and wrote this manuscript.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fncir.2013.00200/ abstract

# **REFERENCES**


"fncir-07-00200" — 2013/12/16 — 14:55 — page 9 — #9


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 August 2013; accepted: 02 December 2013; published online: 18 December 2013.*

*Citation: Reinagel P (2013) Speed and accuracy of visual image discrimination by rats. Front. Neural Circuits 7:200. doi: 10.3389/fncir.2013.00200*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2013 Reinagel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fncir-07-00200" — 2013/12/16 — 14:55 — page 10 — #10

# Object similarity affects the perceptual strategy underlying invariant visual object recognition in rats

#### Federica B. Rosselli 1 † , Alireza Alemi 1, 2, 3 †, Alessio Ansuini <sup>1</sup> and Davide Zoccolan<sup>1</sup> \*

*<sup>1</sup> Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy, <sup>2</sup> Department of Applied Science and Technology, Center for Computational Sciences, Politecnico di Torino, Torino, Italy, <sup>3</sup> Human Genetics Foundation, Torino, Italy*

In recent years, a number of studies have explored the possible use of rats as models of high-level visual functions. One central question at the root of such an investigation is to understand whether rat object vision relies on the processing of visual shape features or, rather, on lower-order image properties (e.g., overall brightness). In a recent study, we have shown that rats are capable of extracting multiple features of an object that are diagnostic of its identity, at least when those features are, structure-wise, distinct enough to be parsed by the rat visual system. In the present study, we have assessed the impact of object structure on rat perceptual strategy. We trained rats to discriminate between two structurally similar objects, and compared their recognition strategies with those reported in our previous study. We found that, under conditions of lower stimulus discriminability, rat visual discrimination strategy becomes more view-dependent and subject-dependent. Rats were still able to recognize the target objects, in a way that was largely tolerant (i.e., invariant) to object transformation; however, the larger structural and pixel-wise similarity affected the way objects were processed. Compared to the findings of our previous study, the patterns of diagnostic features were: (i) smaller and more scattered; (ii) only partially preserved across object views; and (iii) only partially reproducible across rats. On the other hand, rats were still found to adopt a multi-featural processing strategy and to make use of part of the optimal discriminatory information afforded by the two objects. Our findings suggest that, as in humans, rat invariant recognition can flexibly rely on either view-invariant representations of distinctive object features or view-specific object representations, acquired through learning.

Keywords: object recognition, rodent vision, invariance, perceptual strategy, view-invariant, view-dependent

# Introduction

Over the past few years, rat vision has become the subject of intensive investigation (Zoccolan et al., 2009, 2010; Meier et al., 2011; Tafazoli et al., 2012; Vermaercke and Op de Beeck, 2012; Alemi-Neissi et al., 2013; Brooks et al., 2013; Meier and Reinagel, 2013; Reinagel, 2013a,b; Wallace et al., 2013; Vermaercke et al., 2014; Vinken et al., 2014), because of the experimental advantages that rodent species might offer as models to study visual functions (see Zoccolan, 2015 for a review). Recent

#### Edited by:

*Andrea Benucci, RIKEN Brain Science Institute, Japan*

#### Reviewed by:

*Edward A. Wasserman, University of Iowa, USA Justin N. Wood, University of Southern California, USA*

#### \*Correspondence:

*Davide Zoccolan, Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy zoccolan@sissa.it*

> † *These authors have contributed equally to this work.*

Received: *04 November 2014* Accepted: *23 February 2015* Published: *12 March 2015*

#### Citation:

*Rosselli FB, Alemi A, Ansuini A and Zoccolan D (2015) Object similarity affects the perceptual strategy underlying invariant visual object recognition in rats. Front. Neural Circuits 9:10. doi: 10.3389/fncir.2015.00010* studies have found that rats are capable of invariant (a.k.a. transformation-tolerant) recognition, i.e., they can recognize visual objects in spite of substantial variation in their appearance (Zoccolan et al., 2009). This ability has been found to rely on the spontaneously perceived similarity between novel and previously learned views of an object, as well as on the gradual, explicit learning of each newly encountered view (Tafazoli et al., 2012). This suggests that rats achieve invariant object recognition by combining the automatic tolerance afforded by partially invariant representations of distinctive object features with the more complete invariance acquired by learning and storing multiple, view-specific object representations.

This account is in agreement with the large body of experimental and theoretical work on human visual object recognition. Following a decade of debate about whether human object vision is better accounted for by view-invariant (structural description) or view-based theories (Biederman and Gerhardstein, 1995; Tarr and Bülthoff, 1995; Hayward and Tarr, 1997; Hayward, 2003), most investigators now agree that view-invariant feature detectors and view-specific object representations can be both employed by the visual system (under different circumstances) to achieve invariant recognition (Tarr and Bülthoff, 1998; Lawson, 1999; Hayward, 2003). In fact, it has been shown that humans display view-invariant recognition of familiar objects, but have a view-dependent performance in recognition tasks involving novel objects or unfamiliar object views (Edelman and Bülthoff, 1992; Spetch et al., 2001). Nonetheless, even novel objects or object views can be recognized in a view-invariant manner, if they contain distinctive features that remain "diagnostic" of object identity despite (e.g.) rotation in the image plane (Tarr et al., 1997; Lawson, 1999; Spetch et al., 2001; Wilson and Farah, 2003). More in general, it has been proposed that recognition ranges from view-invariant to view-dependent, depending on how demanding is the object discrimination task (Newell, 1998; Hayward and Williams, 2000; Vuong and Tarr, 2006). Several studies suggest that the same argument applies to the recognition strategies of other species, e.g., monkeys (Logothetis et al., 1994; Logothetis and Pauls, 1995; Wang et al., 2005; Nielsen et al., 2008; Yamashita et al., 2010) and pigeons (Wasserman et al., 1996; Spetch et al., 2001; Spetch and Friedman, 2003; Gibson et al., 2007), although a number of differences with human recognition (in addition to commonalities) has also been found (e.g., see Soto and Wasserman, 2014 for a review).

While performance-based studies (as many of those mentioned above) can assess to what extent object recognition, in a given task, is transformation-tolerant, the question of what object features are selected to recognize an object, and whether the same features are relied upon, across different object views, as preferential markers of object identity can be more directly addressed by the use of classification image methods (Nielsen et al., 2006, 2008; Vermaercke and Op de Beeck, 2012; Alemi-Neissi et al., 2013). In a recent study, we have used one of such approaches (the Bubbles method; Gosselin and Schyns, 2001) to show that the diagnostic visual features underlying rat discrimination of two multi-lobed visual objects (see **Figure 1A**, left panels) remained remarkably stable across a variety of transformations—translation, scaling, in-plane and in-depth rotation. This result, while consistent with a view-invariant representation of diagnostic object features, does not rule out the possibility that, under more challenging conditions (e.g., discrimination of very similar objects), rat recognition may become more view-dependent. The goal of the present study was to test this hypothesis and provide a quantitative comparison between the recognition strategies used by rats under two different levels of object discriminability.

We trained a group of rats to discriminate a new pair of multi-lobed objects (see **Figure 1B**, left panels), presented across a range of sizes, positions, in-depth rotations and in-plane rotations. Compared to the object pair used in our previous study (shown in **Figure 1A**, left panels), these new objects were more similar to one another at the pixel level and were made of less distinctive structural parts. The recognition strategies underlying discrimination of this new object pair was uncovered using the Bubbles method, and the results were compared with those reported in our previous study. New analyses of the previous set of data were also performed, so as to thoroughly quantify the influence of stimulus structure on object recognition strategy.

Our results show that, in contrast to what we observed under conditions of high stimulus discriminability, where rats relied on a largely view-invariant, multi-featural recognition strategy, discrimination of structurally similar objects led to a more view-dependent and subject-dependent, albeit still multifeatural, object processing strategy.

# Materials and Methods

With the exception of the visual stimuli and some of the data analyses, the materials and methods used in this study are the same as those used in Alemi-Neissi et al. (2013). As such, we provide here a short description only and we invite the reader to refer to our previous study for a complete account.

# Subjects

Six adult male Long Evans rats (Charles River Laboratories) were tested in a visual object discrimination task. Animals were 8 weeks old at their arrival and weighted approximately 250 g. They typically grew to over 600 g over the course of the study. Rats had free access to food but were water-deprived during the days they underwent behavioral training, that is, they were dispensed with 1 h of water pro die after each experimental session, and received an amount of 4–8 ml of pear juice as reward during the training. Note that, out of these six rats, only three reached the criterion performance to be admitted to the main experimental phases (i.e., 70% correct discrimination of the default views of the target objects shown in **Figure 1B**). Therefore, only three out of six rats were included in the analyses shown throughout the article.

All animal procedures were conducted in accordance with the National Institutes of Health, International, and Institutional Standards for the Care and Use of Animals in Research and after consulting with a veterinarian.

# Experimental Rig

Each rat was trained in an operant box, equipped with: (1) a 21.5′′ LCD monitor for presentation of the visual stimuli; (2) an array of three feeding needles, connected to three touch sensors

FIGURE 1 | Visual objects, behavioral task and the Bubbles method. (A) Default views of the two objects that rats were trained to discriminate in Alemi-Neissi et al. (2013). In the present study, these objects are referred to as Object 1 and 2 and, collectively, as Stimulus Set 1. The panel on the right shows to what extent these views of the objects overlapped, when superimposed. (B) Default views of the two objects that rats were trained to discriminate during Phase I of the present study. These objects are referred to as Object 3 and 4 and, collectively, as Stimulus Set 2. The panel on the right shows to what extent these views of the objects overlapped, when superimposed. (C) Schematic of the object discrimination task. Rats were trained in an operant box that was equipped with an LCD monitor for stimulus presentation and an array of three sensors. The animals learned to trigger the presentation of a visual object by licking the central sensor, and to associate the identity of each object to a specific reward port/sensor (right port for Object 3 and left port for Object 4). (D) A sample of the transformed object views used during Phase II of the study. Transformations included: (1) size changes; (2) azimuth in-depth rotations; (3) horizontal position shifts; and (4) in-plane rotations. Azimuth rotated and horizontally shifted objects were also scaled down to a size of 30◦ of visual angle; in-plane rotated objects were scaled down to a size of 32.5◦ of visual angle. Note that each

transformation axis was sampled more densely than shown in the figure—sizes were sampled in 2.5◦ steps; azimuth rotations in 5◦ steps; position shifts in 4.5◦ steps; and in-plane rotations in 9◦ steps. The red frames highlight the subsets of object views that were tested in bubbles trials. (E) Illustration of the Bubbles method, which consists in generating an opaque mask (fully black area) punctured by a number of randomly located windows (i.e., the bubbles; shown as semi-transparent, circular openings) and then overlapping the mask to the image of a visual object, so that only parts of the object is visible through the mask. (F) Examples of the different degrees of occlusion that can be achieved by varying the number of bubbles in the masks. (G) An example of possible trials' sequence at the end of experimental Phase I. The object default views were presented both unmasked and masked in randomly interleaved trials (named, respectively, regular and bubbles trials). (H) An example of possible trials' sequence during experimental Phase II. The animals were presented with interleaved regular and bubbles trials. The former included all possible unmasked object views to which the rats had been exposed up to that point (i.e., size changes and azimuth rotations in this example), whereas the latter included masked views of the most recently trained transformation (i.e., <sup>−</sup>40◦ azimuth rotated objects).

for initiation of behavioral trials and collection of responses; and (3) two computer-controlled syringe pumps for automatic liquid reward delivery on the left-side and right-side feeding needles (see Alemi-Neissi et al., 2013 for further details). Rats learned to insert their head through a 4-cm diameter opening in the front wall of each box, so as to face the stimulus display and interact with the sensors' array. Constraining the head within such a viewing hole allowed its position to be largely reproducible across behavioral trials and very stable during stimulus presentation (see Alemi-Neissi et al., 2013 for a quantification), thus guaranteeing a tight control over the retinal size of the stimuli.

# Visual Stimuli

The rats were trained to discriminate a pair of four-lobed visual objects that were transformed along a variety of dimensions (see below). Since the results of this study are compared with those of our previous study (Alemi-Neissi et al., 2013), where a different pair of objects was used, we have adopted the following naming convention to label individual objects, objects pairs, rats and groups of rats. We refer to the group of rats tested in our previous work as "group 1" (including rats numbered from 1 to 6), and to the pair of objects used in that study as "Stimulus Set 1," containing Objects 1 and 2 (shown in **Figure 1A**, left panels). Conversely, we refer to the group of rats tested in the present study as "group 2" (including rats numbered from 7 to 9, given that only three animals succeeded in the discrimination task; see Section Subjects), and to the pair of objects used in this study as "Stimulus Set 2," containing Objects 3 and 4 (shown in **Figure 1B**, left panels).

For both stimulus sets, the objects were renderings of threedimensional models that were built using the ray tracer POV-Ray (http://www.povray.org/). Objects were rendered in a white, bright opaque hue against a black background. Each object's default size was 35◦ of visual angle (longest image dimension), and their default position was the center of the monitor.

Compared to Stimulus Set 1, the objects in Stimulus Set 2 were designed to be substantially more similar at the structural level. As such, the constituent parts of Objects 3 and 4 (i.e., three small ellipsoidal lobes attached to a large elliptical lobe; see **Figure 1B**, left panels) had a similar size, position, aspect ratio and overall layout. By contrast, the objects in Stimulus Set 1 were structurally quite dissimilar (see **Figure 1A**, left panels). Object 1 was made of a large, elliptical top lobe, attached to two smaller, overlapping bottom lobes, while Object 2 was composed of three elongated lobes that were approximately equally sized and equally spaced (radially). As a consequence, the overlap between Object 3 and 4 was larger than the overlap between Object 1 and 2 (see **Figures 1A,B**, rightmost panel), resulting in an overall larger pixel-wise similarity between the objects of Stimulus Set 2, as compared to Stimulus Set 1, across all tested views (see Results and **Table 1** for details).

#### Experimental Design

### Phase I: Diagnostic Features Underlying Recognition of the Default Object Views

Rats were initially trained to discriminate the two default views of Objects 3 and 4 (**Figure 1B**, left panels). The animals learned: (1) to lick the central sensor, so as to trigger the presentation of one of the objects on the stimulus display; and (2) to lick either the right or left sensor, so as to report the identity of the currently presented object (see **Figure 1C**). Successful discrimination led to delivery of reward through the corresponding reward port/sensor, while failure to discriminate resulted in a time out period. The stimulus presentation time ranged between 2.5 and 4 s (see Alemi-Neissi et al., 2013 for further details).

TABLE 1 | Normalized Euclidean distance between matching views of the objects within each Stimulus Set.


*The normalized, pixel-wise Euclidean distance between matching views the two objects in each stimulus set was computed for all the conditions tested with the bubbles masks.*

Once a rat achieved ≥70% correct discrimination of the default object views (which typically required 3–12 weeks of training), a classification image method, known as the Bubbles (Gosselin and Schyns, 2001), was applied to identify what visual features were critical for the accomplishment of the task. This method consists in superimposing on a visual stimulus an opaque mask, containing a number of circular, semi-transparent openings, or bubbles (**Figure 1E**). An observer will be able to identify the stimulus only if the visual features that are diagnostic of its identity remain visible through the bubbles. This will allow inferring what image regions produced a positive (or, conversely, a negative) behavioral outcome.

In our implementation of the Bubbles method (see Alemi-Neissi et al., 2013, for details), the bubbles' size was fixed to 2 ◦ of visual angle, while their number was randomly chosen, in each trial, between 10 and 90, in steps of 20 (see examples in **Figure 1F**). This typically reduced the performance from ∼65–75% correct obtained in unmasked trials to ∼55–60% (see **Figure 2A**). Trials in which the default object views were shown unmasked (referred to as "regular trials") were randomly interleaved with trials in which they were masked (referred to as "bubbles trials," see **Figure 1G**). The fraction of bubbles trials in a daily session varied between 0.4 and 0.75. To obtain enough statistical power to extract the diagnostic features underlying rat recognition, at least 3000 bubbles trials per object were collected.

# Phase II: Diagnostic Features Underlying Recognition of the Transformed Object Views

The animals were subsequently trained to tolerate variations in the appearance of the target objects along four different transformation axes (see **Figure 1D**), in the following order: (1) size variations, ranging from 35 to 15◦ visual angle; (2) azimuth rotations (i.e., in-depth rotations about the objects' vertical axis), ranging from −60 to 60◦ ; (3) horizontal position changes, ranging

FIGURE 2 | Critical features underlying recognition of the default object views. (A) Rat group average performance at discriminating the default object views was significantly lower in bubbles trials (light gray bar) than in regular trials (dark gray bar; *p* < 0.01; one-tailed, paired *t*-test), although both performances were significantly larger than expected by chance ( <sup>∗</sup>*<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05, ∗∗*<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.01; one-tailed, unpaired *<sup>t</sup>*-test). Error bars: SEM. (B) For each rat, the saliency maps resulting from processing the bubbles trials collected for the default object views are shown as grayscale masks superimposed on the images of the objects. The brightness of each pixel indicates how likely was, for an object view, to be correctly identified when that pixel was visible through the masks. Significantly salient and anti-salient object regions (i.e., regions that were, respectively, significantly positively or significantly negatively correlated with the correct identification of an object; *p* < 0.05; permutation test) are shown, respectively, in red and cyan.

from −18 to +18◦ visual angle; and (4) in-plane rotations, ranging from −45 to +45◦ . Each transformation was trained using an adaptive staircase procedure that is fully described in Alemi-Neissi et al. (2013) and Zoccolan et al. (2009). Once an animal had learnt to tolerate a wide range of variation along a given transformation axis (the extremes of each axis are shown in **Figure 1D**), one or more views along that axis were chosen, for each object, so that: (1) they were different enough from the default views of the two objects; and (2) most rats recognized them with a 60–70% correct performance (see **Figure 3**). These views (referred to as "bubbles views" in the following) were those selected for application of the Bubbles method and are highlighted by red frames in **Figure 1D**. Rats were then presented with randomly interleaved regular trials (in which unmasked objects could be shown across all the transformation axes trained up to that point) and bubbles trials (in which bubbles masks were superimposed to the bubbles views chosen from the most recently trained transformation; see an example of trial sequence in **Figure 1H**). As for the default object views, a minimum of 3000 bubbles trials was collected for each of the bubbles views. Note that, in general, for each rat, only some of the seven selected bubbles views could actually be tested, due to across-rat variation in life span and fluency in the invariant recognition task (see Alemi-Neissi et al., 2013, for details).

All experimental protocols were implemented using the freeware, open-source software package MWorks (http://mworksproject.org/). An ad-hoc plugin was developed in C++ to allow MWorks building bubbles masks and presenting them superimposed on the images of the visual objects.

# Data Analysis

# Computation of the Saliency Maps

A detailed description of the method for the extraction of the critical visual features underlying rat recognition of a given object view and the assessment of their statistical significance can be found in Alemi-Neissi et al. (2013). Briefly, this method consisted in two steps.

First, saliency maps were obtained that measured the correlation between the transparency values of each pixel in the bubbles masks and the behavioral responses. Throughout the article, these saliency maps are shown as grayscale masks superimposed to the images of the corresponding object views, with bright/dark pixels indicating regions that are salient/anti-salient, i.e., likely/unlikely to lead to correct identification of an object view, when visible through the bubbles masks (e.g., see **Figures 2**, **4**). For a clearer visualization, the saliency values in each map were normalized by subtracting their minimum value, and then dividing by their maximum value.

As a second step, we computed which pixels, in a saliency map, had a statistically significant correlation with the behavior. To this aim, we performed a permutation test, in which the behavioral outcomes of bubbles trials were randomly shuffled (see Alemi-Neissi et al., 2013, for details). This yielded a null distribution of saliency values that was used to compute which values, in each saliency map, were significantly higher (or lower) than what obtained by chance (p < 0.05), and, therefore, which pixels, in the image, could be considered as significantly salient (or anti-salient). Throughout the article, significantly salient regions of an object view are shown in red, whereas anti-salient regions are shown in cyan (e.g., see **Figures 2**, **4**).

Group average saliency maps and significant salient and antisalient regions were obtained using the same approach, but after pooling the bubbles trials obtained for a given object view across all available rats (see **Figure 12**).

#### Ideal Observer Analysis

Rats' average saliency maps, as well as the maps obtained for individual rats, were compared to the saliency maps obtained by simulating a linear ideal observer (Gosselin and Schyns, 2001; Gibson et al., 2005; Vermaercke and Op de Beeck, 2012). Since this method is fully described in Alemi-Neissi et al. (2013), we provide here only a short, qualitative description.

Given a bubble-masked input image, the simulated observer classified it as being either Object 1 or 2, based on which of the eight views of each object (the templates), to which the mask could have been applied (shown by the red frames in **Figure 1D**), matched more closely the input image. The template matching was linear, since it consisted in computing a normalized dot product between each input images and each template. To better match rat retinal resolution, each input image was low passfiltered, so that its spatial frequency content did not exceed 1

happens for rat 8 and 9).

cycle per degree (i.e., the maximal resolving power of Long-Evans rats, Keller et al., 2000; Prusky et al., 2002). Finally, to lower the performance of the ideal observer and bring it close to rat performance, Gaussian noise (std = 0.5 of the image grayscale) was independently added to each pixel of the input images. Saliency maps and significant salient and anti-salient regions for the ideal observer were obtained as described above for the rats (see previous section).

feature located at the intersection between the two upper lobes of Object

Each rat saliency map (either individual or group averaged) was compared to the corresponding map obtained for the ideal observer by computing their Pearson correlation coefficient. The significance of the correlation was assessed by running a permutation test, in which the behavioral outcomes of the bubbles trials were randomly shuffled for both the rat and the ideal observer, so as to obtain a null distribution of correlation values, against which the statistical test was carried out at p < 0.05 (see Alemi-Neissi et al., 2013 for details).

# Euclidean Distance between Matching Views of the Objects within Each Stimulus Sets

To compare how similar were the objects belonging to a given stimulus set, we proceeded as follows. First, low pass-filtered versions of all the object views were produced, so that the spatial frequency content did not exceed the maximal retinal resolution of Long-Evans rats (i.e., 1 cycle per degree of visual angle). Then, we computed, within each stimulus set, the superposition of all the transformed views of both objects, and a crop rectangle was defined for each stimulus set as the minimal rectangle containing the resulting superposition. Next, a cropped version of each image (e.g., view) of the objects belonging to a given stimulus set was produced using the corresponding crop rectangle. The cropping was required to minimize the effect of uninformative black pixels surrounding the object views on the distance computations. Finally, the pixel-wise Euclidean distance between the cropped images of matching views of the two objects within a stimulus set was computed. This distance was then normalized to the maximal possible distance in the image space, which is the square root of the number of pixels (see **Table 1**). This allowed a fair comparison of object similarity between the two stimulus sets.

All data analyses were performed in Matlab (http://www. mathworks.com).

# Results

The goal of this study was to assess the influence of the structural similarity of the discriminanda on the adoption, by rats, of a viewbased or a view-invariant recognition strategy. A group of rats (referred to as "group 2" throughout the article) was trained in an object recognition task that required the animals to discriminate two structurally (and visually) similar objects (i.e., Object 3 and 4, belonging to Stimulus Set 2, shown in **Figure 1B**, left panels). The results obtained from this group of rats were compared to those previously reported in a former study (Alemi-Neissi et al., 2013), where another group of rats (referred to as "group 1") underwent the same training, but with objects that were more dissimilar at the structural level (i.e., Object 1 and 2, belonging to Stimulus Set 1, shown in **Figure 1A**, left panels). As in Alemi-Neissi et al. (2013), a classification image method, known as the Bubbles (Gosselin and Schyns, 2001), was applied to a subset of the trained object views to infer rat recognition strategy, and assess its reproducibility across views, as well as its consistency across subjects.

# Critical Features Underlying Recognition of the Default Object Views

During the initial experimental phase, the 6 rats of group 2 were trained to discriminate the default views of the objects belonging to Stimulus Set 2 (shown in **Figure 1B**, left panels). The training typically lasted 3–12 weeks before the animals achieved a criterion of ≥70% correct discrimination performance. Differently from the rats of group 1 (i.e., rats numbered from 1 to 6; see below and Alemi-Neissi et al., 2013 for details), only half of the animals (referred to as rat 7, 8, and 9 in the following) reached the criterion and were able to maintain it in the subsequent experimental phases. Once the criterion was reached, regular trials (i.e., trials in which the objects were shown unmasked) started to be randomly interleaved with bubbles trials (i.e., trials in which the objects were partially occluded by the bubbles masks; see Material and Methods for details and **Figures 1E–H**). By occluding parts of the visual objects, the bubbles masks made it harder for the rats to succeed in the discrimination task. In our experiments, we adjusted the number of the semi-transparent openings (the bubbles) in each mask, so as to bring each rat performance in bubbles trials to be ∼10% lower than in regular trials. For the rats tested in this study (i.e., group 2, tested with Stimulus Set 2), the average recognition performance of the default views dropped from ∼70% in regular trials to ∼55% correct in bubbles trials (**Figure 2A**). The comparison with the rats tested in our previous study (i.e., group 1, tested with Stimulus Set 1), where the average recognition performance dropped from ∼75% correct in regular trials to ∼65% correct in bubbles trials (see Figure 3A in Alemi-Neissi et al., 2013), indicates that, as expected because of our stimulus design, objects in Stimulus Set 2 were harder to discriminate, especially when occluded by the bubbles masks.

The visual features underlying rat recognition strategy were extracted by measuring the correlation between bubbles masks' transparency values and rat behavioral responses (see Alemi-Neissi et al., 2013 for details). This yielded saliency maps, where the brightness of each pixel indicated the likelihood, for an object, to be correctly identified when that pixel was visible. Throughout the article, such saliency maps are displayed as grayscale masks superimposed on the images of the corresponding object views (see **Figures 2B**, **4**, **12**). Saliency map values that were significantly higher or lower than expected by chance (p < 0.05, permutation test; see Materials and Methods) defined, respectively, significantly salient and anti-salient regions in the images of the object views (shown, respectively, as red and cyan patches in **Figures 2B**, **4**, **12**). These regions are those objects' parts that, when visible through the masks, likely led, respectively, to correct identification and misidentification of the object views.

Contrarily to what found for Stimulus Set 1 (see Figure 3B in Alemi-Neissi et al., 2013), a larger inter-subject variability was observed in the saliency patterns obtained for the default views of the objects in Stimulus Set 2 (**Figure 2B**). In the case of Object 3, one or both the upper lobes were selected as salient features by all three rats of group 2. However, for rat 8, one of the features, the rightmost one, did not cover the upper right lobe. Rather, it was located slightly below it, at the margin of the central, largest lobe. This lobe, in turn, was mostly significantly salient for rat 9, but it was anti-salient for rat 7 (while, for the other two rats, antisalient regions were located along the lower/right margin). In the case of Object 4, the top part of the central lobe was salient for two rats, in the guise of seven small, scattered patches for rat 7, and one single spot for rat 8. Interestingly, this spot, as well as one of the salient patches of rat 7, was located right at the curvededge intersection between the central lobe and the top (smaller) lobes. On the other hand, for rat 9, this same part of the central lobe was anti-salient, along with the upper-right lobe. Similarly, the upper-right lobe was anti-salient for rat 7, while rat 8 showed spots of anti-saliency toward the right and the left margins of the object.

To summarize, although a few salient and anti-salient features were preserved across some of the rats (e.g., the top lobes of Object 3 and the small salient spot at the junction of Object 4's top and central lobes), a substantial inter-subject diversity was observed in terms of location, number, and size of the salient and anti-salient regions. This is indicative of the larger variety of perceptual strategies used by rats, when tested with structurally similar objects (such as the ones belonging to Stimulus Set 2), as compared to what we found using more dissimilar objects (such are those belonging to Stimulus Set 1, tested in Alemi-Neissi et al., 2013). These preliminary, qualitative observations will be quantified in the next sections.

# Critical Features Underlying Recognition of the Transformed Object Views

After being trained with the default views of Objects 3 and 4 and tested with bubble-masked versions of these views, the rats were further trained to recognize the objects in spite of transformations along four different variation axes: size, in-depth azimuth rotation, horizontal position and in-plane rotation. The tested ranges of variation are shown in **Figure 1D**, along with the views that, for each transformation axis, had been selected for application of the Bubbles method (referred to as "bubbles views" in the following; see red frames). The four transformation axes were trained sequentially, so that the amount of variation each rat had to tolerate increased gradually. In fact, the animals were confronted, at any given time during training/testing, with object views that were randomly sampled across all the variation axes tested up to that point (regular trials).

Similarly to what found for Stimulus Set 1 (see Figure 4 in Alemi-Neissi et al., 2013), also in the case of Stimulus Set 2, rat average recognition performance was significantly larger than chance for most of the tested object transformations, typically ranging from ∼70 to ∼80% correct and dropping below 70% correct only at the extremes of transformation axes, especially in the case of size changes and azimuth rotations (**Figure 3**, gray lines; see legend for details). Thus, in spite of their structural similarity, Objects 3 and 4 remained discriminable for the rats across a broad spectrum of image variation. On the other hand, the application of the bubbles masks resulted in a decrement of the recognition performance (see black diamonds) that was larger than the one observed in the case of Stimulus Set 1 (compare to Figure 4 in Alemi-Neissi et al., 2013). The average performance on bubbles trials ranged between 55 and 60% correct and was significantly below the performance observed in regular trials in the case of the translated object views (p < 0.05, one-tailed, paired t-test; see rectangular frames in **Figure 3C**), although it was still significantly above chance for all those transformations in which all three rats were tested (p < 0.05, one-tailed, unpaired t-test, see filled black diamonds; for the conditions tested with only 2 rats, significance was not assessed, see open circles). This suggests that, for rats, it was challenging to discriminate structurally similar objects, especially when shape information was degraded by reducing the size of the objects (see **Figure 3A**) or rotating/shifting them of large amounts (see **Figures 3C,D**), and simultaneously adding the semi-transparent bubbles masks.

Bubbles trials were analyzed as described in the previous section (see also Materials and Methods) to obtain saliency maps with highlighted significantly salient and anti-salient regions for each of the selected bubbles views (see **Figure 4**). A qualitative comparison between these saliency patterns and those previously obtained for the objects of Stimulus Set 1 (see Figure 6 in Alemi-Neissi et al., 2013) allows appreciating how rat recognition strategy depends on the structural complexity and visual similarity of the discriminanda.

Both Object 3 and 4 in Stimulus Set 2, just like Object 1 and 2 in Stimulus Set 1, were made of ellipsoidal structural parts (or lobes; see **Figures 1A,B**, left panels). However, in the case of Stimulus Set 2, such parts were less protruded and, more importantly, matching lobes in the two objects had a similar size, position and aspect ratio. Hence, they were less diagnostic of object identity, compared to the lobes of Objects 1 and 2, resulting in a larger similarity between the objects of Stimulus Set 2, as compared to Stimulus Set 1, across all tested views (see **Table 1** for details). Consistent with this observation, we found a general tendency, for the diagnostic features of Object 3 and 4, to be distributed (often in a quite scattered way) over a region of the objects (i.e., top or bottom half) encompassing multiple lobes, rather than being precisely (and reproducibly) located in specific lobes (or lobes' sub-regions), as previously found for Objects 1 and 2 (see Alemi-Neissi et al., 2013). Nonetheless, we could still find, albeit less systematically as compared to Stimulus Set 1, a tendency to select (and, to some extent, "track" throughout different transformations) discrete object features (see yellow arrows in **Figure 4** and the description below).

For rat 7, the salient features were located in the upper region of Object 3 for all tested conditions (**Figure 4A**, upper row), although, in the case of the default view, they were smaller, more scattered and mixed with anti-salient patches, which only remained as smaller spots in the azimuth-rotated views. The anti-salient regions covered preferentially the central and lower parts. A somewhat reversed pattern was observed for Object 4 (**Figure 4A**, lower row): the central/bottom region was largely salient across all tested views, starting with a combination of small patches in the default view, which reduced to a few small spots in the size-transformed condition, and finally merged into a big salient region for most of the remaining transformations. Interestingly, the salient spot located right at the intersection between the central lobe and the top lobes (see yellow arrows) was observed not only in the case of the default view (see previous section), but, systematically, across all tested conditions, either as a discrete feature or merging with the bigger salient patch.

Similarly to rat 7, rat 8 displayed a preference for the upper region of Object 3 in all tested conditions (**Figure 4B**, upper row). The anti-salient features generally covered the lower lobe, but extended to the central part of the object in three conditions (size transformed and horizontally shifted views) and to the upperright lobe in one condition (horizontally shifted to the left). It was again the central part of Object 4 its most salient region (**Figure 4B**, lower row), but the salient patches remained small, few and scattered, and always mixed with anti-salient spots. Noticeably, also for rat 8, the intersection between the central lobe and the top lobes contained a small, significantly salient spot in the case of the default, azimuth-rotated and size-transformed views (see yellow arrows). This spot was also salient for the horizontally shifted views, although it did not cross the threshold for significance.

Compared to the previous two rats, rat 9 displayed, at the beginning (i.e., for the default views), a strategy that was more consistent with the selection of the discrete, constituent elements of the objects, rather than wide regions encompassing multiple lobes. For instance, the salient patches obtained for the default view of Object 3 (**Figure 4C**, upper row) matched closely the central lobe and the two upper lobes of the object. Although these discrete features did not remain salient for all the tested transformations, they were preserved in several of the subsequently tested views. In the case of Object 4 (**Figure 4C**, lower row), a more variegate combination of salient features (often mixed with anti-salient spots) was found across the tested views, covering both upper and lower regions of the object, although discrete lobes were still occasionally selected as salient features. One of these lobes was the bottom one (with a nose-like shape), which emerged as a salient feature in one condition (the azimuth rotated view to the left; see green arrow), i.e., when it became more protruded, as compared to all other views, and, therefore, more likely to be parsed by the rat visual system. This was observed also for rat 8 (**Figure 4B**, lower row, green arrow), although the salient spot was smaller.

To summarize, when facing objects that were hard to discriminate (as in the case of Stimulus Set 2), rats appeared to rely on a set of object features that was only partially preserved across transformations. While the overall object regions (i.e., either top or bottom half) containing either the salient or anti-salient patches tended to be preserved across different views, the size, number, and location of these patches varied substantially across conditions and rats. This result is in contrast with what found in our previous study for Object 1 and 2, where the salient features tended to be reproducibly located in specific positions of the objects' structural parts (e.g., the tips of the elongated lobes defining Object 2). In other words, rats tested with Stimulus Set 2, differently from those tested with Stimulus Set 1, did not show a strong, view-invariant preference for well-defined structural elements of the objects. These qualitative observations are quantified in the next sections, starting with the reproducibility of the patterns of salient features across object views.

# Is Rat Invariant Recognition more Consistent with a View-Invariant or a View-Based Processing Strategy?

To quantify to what extent rat recognition of Objects 3 and 4 was consistent with a view-invariant visual processing strategy, we measured the overlap between the patterns of salient features obtained for all possible pairs of object views produced by affine transformations (i.e., all tested object views with the exclusion of in-depth azimuth rotations). This overlap was computed after reversing (i.e., "undoing") the transformations that originated a pair of object views, so as to perfectly align one view on top of the other (e.g., in the case of the comparison between the default and the horizontally translated views shown in **Figure 5A**, the latter was shifted back to the center of the screen and scaled back to 35◦ , so as to perfectly overlap with the default view; see second row of **Figure 5A**, right panel). This procedure yielded aligned overlap values between pairs of salient features' patterns, which could be compared to those obtained for Objects 1 and 2 in Alemi-Neissi et al. (2013). Consistently with our previous study, we also computed, for each pair of views,raw overlap values, which quantified the amount of overlap between the salient features' patterns of two object views within the stimulus display (i.e., in absolute screen coordinates; see second row of **Figure 5A**, left panel). When plotted one against the other (**Figure 5B**), the aligned and the raw overlaps measured whether rat recognition was more consistent with a view-invariant strategy (in which the same set of object-centered features is relied upon and "tracked" across different views) or a screen-centered strategy (i.e., a low-level strategy, where one or more image patches exist, at specific locations within the stimulus display, that remain diagnostic of object identity in spite of view changes, thus affording a trivial solution to the invariant recognition task).

Following Nielsen et al. (2006), both the aligned and raw overlaps were computed as the ratio between overlapping area and overall area of the significantly salient regions of the two object views under comparison (e.g., as the ratio between the orange area and the sum of the red, yellow, and orange areas in **Figure 5A**, second row). As done in our previous study (Alemi-Neissi et al., 2013) and in Nielsen et al. (2006), the significance of each individual raw and aligned overlap was assessed at p = 0.05 through a permutation test (1000 permutation loops), in which the salient regions of each object view in a pair were randomly shifted within the minimum bounding box enclosing each view (see **Figure 5A**, bottom row and Alemi-Neissi et al., 2013, for details).

As shown by the scatter plot in **Figure 5B**, for about 62% of the tested view pairs (i.e., in 37 out of 60 cases), the aligned overlap was larger than the raw overlap. Although this proportion was much higher for the objects belonging to Stimulus Set 1, as assessed in our previous study (i.e., about 92% of view pairs had a larger aligned overlap; see Figure 8B in Alemi-Neissi et al., 2013), **Figure 5B** shows that, also for Stimulus Set 2, a trivial, screen-centered strategy could not explain rat recognition behavior. This conclusion was confirmed by the fact that, for both objects belonging to Stimulus Set 2, the average aligned overlap values were significantly higher than the raw values (Object 3: aligned 0.09 ± 0.02 vs. raw 0.04 ± 0.02, p < 0.05; Object 4: aligned 0.07 ± 0.02 vs. raw 0.03 ± 0.01, p < 0.01; significance was assessed through a paired permutation test, in which the sign of the difference between aligned and raw overlap for each pair of views was randomly assigned in 10,000 permutation loops). In addition, for both objects, the number of cases in which the aligned overlaps were larger than expected by chance was approximately twice as large as the number of significant raw overlaps—for Object 3, 10/30 aligned vs. 5/30 raw overlaps were significant, while, for Object 4, 11/30 aligned vs. 7/30 raw overlaps were significant (see **Figure 5B**, where significance is coded by the shade of gray filling the symbols).

To better understand the influence of object structure on the adoption of a view-invariant strategy, we reported side by side in **Figure 5C** the median aligned overlaps obtained for the objects tested in our previous study (i.e., Objects 1 and 2, Stimulus Set 1) and in the current one (i.e., Objects 3 and 4, Stimulus Set 2). The resulting bar chart shows that the aligned overlap was much larger for the objects belonging to Stimulus Set 1, as compared to the objects of Stimulus Set 2 (and this difference was significant at p < 0.0001, Mann–Whitney U-test). In addition, for Object 2, the large majority of aligned overlap values was significantly higher (79%) than expected by chance, while, for the other objects, the percentage of significant overlaps ranged from 24 to 37% only (see **Figure 5D**). This implies that the pattern of salient features was much more reproducible for the objects belonging to Stimulus Set 1, as compared to Stimulus Set 2, and, in particular, for Object 2, which was the object made of the more distinctive structural parts (as discussed at length in Alemi-Neissi

views of an object. The default and the leftward horizontally shifted views of Object 3 are used as examples (first row). The raw features' overlap was computed by superimposing the images of the two object views (and the corresponding features' patterns) within the stimulus display (second row, left plot). The aligned features' overlap was computed by reversing the transformation that produced the leftward horizontally shifted view. That is, the object was shifted to the right of 18◦ and scaled back to 35◦ , so as to perfectly overlap with its default view (second row, right plot). In both cases, the overlap was computed as the ratio between the orange area and the sum of the red, yellow and orange areas. The significance of the overlap was

Object 4 (diamonds) resulting from affine transformations (i.e., position/size changes and in-plane rotations), the raw features' overlap is plotted against the aligned features' overlap. The shade of gray indicates whether the raw and/or the aligned overlap for a given view was significantly larger than expected by chance (*p* < 0.05; see caption). (C) Median aligned overlaps for the objects belonging to Stimulus Set 1 and 2. The error bars are standard errors of the medians (obtained by bootstrapping). The statistical significance of the difference between a given pair of medians was assessed by a Mann–Whitney *<sup>U</sup>*-test (∗∗∗∗*<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.0001). (D) Percentage of significant aligned overlap values for the objects belonging to two stimulus sets.

et al., 2013). To summarize, **Figure 5** quantifies the qualitative observations of the previous section—the larger was the discriminability of the visual objects (as in the case of Stimulus Set 1) and the more distinctive were their structural elements (as in the case of Object 2), the more view-invariant was rat recognition strategy (i.e., the animals consistently used the same structural parts of the objects, across different views, as diagnostic features of object identity). For the less discriminable objects (i.e., Stimulus Set 2), rat recognition strategy was still more consistent with an object-based tracking of broadly defined saliency regions (e.g., the top or bottom parts of the stimuli) than with a low-level, screen-centered detection of transformation-preserved diagnostic image spots. However, the specific patterns of salient features were much more view-dependent than in the case of Stimulus Set 1 (and of Object 2 in particular).

To quantitatively assess whether the difference between the strategies used by the two groups of rats could be attributed to objects' similarity, we computed the normalized, pixel-wise Euclidean distance between matching views of the objects within each stimulus set (see Materials and Methods). Only the views on which the bubbles masks were applied (i.e., the bubbles views) were considered in this analysis. The result of this comparison is reported in **Table 1**. As expected, the distance between the views of the objects belonging to Stimulus Set 1 was systematically larger than the distance between the views of the objects belonging to Stimulus Set 2. This resulted in an average pixellevel discriminability that was significantly higher for Stimulus Set 1, as compared to Stimulus Set 2 (0.23 ± 0.02 vs. 0.17 ± 0.01, respectively; one-tailed, paired t-test, p < 0.001).

# Comparing the Compactness of the Salient Features' Patterns among Stimulus Sets and Individual Objects

Having quantified the different discriminability of the two object pairs, we further assessed how such a difference affected the recognition strategy of the two groups of rats by comparing the average number (**Figures 6**, **7**) and the average absolute and relative size (**Figures 8**–**10**) of the salient features found for each object (with the average taken across all tested bubbles views). Since the absolute size of the salient features ranged from a few pixels (in the case of spot-like features) to hundreds of pixels (in the case of features spanning over large fractions of the objects; see **Figure 4** and also Figure 6 in Alemi-Neissi et al., 2013), we measured how these quantities (e.g., the number of salient features) varied when only features having a size larger than a minimal threshold value (ranging from 1 to 100 pixels) were taken into account. We then assessed, at each threshold value, the statistical significance of the difference between (e.g.) the average number of features obtained, across all tested views, for the two stimulus sets (two-tailed, unpaired t-test at p < 0.05; see **Figure 6**, where the red traces in the inset show the comparisons yielding a significant difference). The same analysis was carried out for each of the six possible pairs of objects belonging to the two object sets (e.g., see **Figure 7** for the comparison regarding the number of salient features).

We found that the average number of salient features was larger for Stimulus Set 2 (**Figure 6**, pink line) than for Stimulus

FIGURE 6 | Average number of salient features obtained for the two stimulus sets. The average number of salient features obtained for the view of an object belonging to either Stimulus Set 1 or 2 (see the caption for the color code) is plotted as a function of the minimal size of the features that were taken into account for this analysis (the average was computed by pooling across all views of both objects within a stimulus set and all rats). The difference between the values obtained for two stimulus sets is plotted in the inset as a dotted line, where the color codes its significance—black, no significant difference; red, significant difference at *p* < 0.05 (two-tailed, unpaired *t*-test).

Set 1 (**Figure 6**, purple line) and this difference was significant over a large range of minimal feature sizes (from 1 to about 55 pixels; see red dots in the inset of **Figure 6**). Only asymptotically (for very large feature sizes), the difference between the numbers of features found for the two stimulus sets became not significant (see black dots in the inset of **Figure 6**). This is expected, given that, by construction, only a few large features covering big portions of the objects are left, regardless of the stimulus set, when the minimal feature size is very large. Focusing on individual objects (**Figure 7**), i.e., considering all possible pairs of the four objects (regardless whether an object belonged to Stimulus Set 1 or 2), we found that the average number of salient features for Object 1 was significantly smaller than for Object 3 and 4 (**Figures 7B,C**), as long as the minimal feature size did not cross the 45–50 pixel value (see insets), while it was never significantly different from the number of salient features of Object 2 (**Figure 7A**). Object 2 displayed a smaller difference, in terms of number of features, when compared to object 4 (significant up to a minimal feature size of ∼20 pixels; see **Figure 7E**), and even smaller when compared to Object 3 (significant in the ranges of minimal feature size between 5–10 and 18–22 pixels; see **Figure 7D**). No significant difference was found between Object 3 and 4 (**Figure 7F**).

Next, we computed the size of the salient features obtained for the four objects across all the views that were tested with the bubbles masks. For each object view, we measured the absolute size

coded in (A–F); see caption on the top of each panel). The shaded regions are SEM. The average was computed by pooling across all views of an

between the values obtained for each objet pair (same color code as in Figure 6—black, no significant difference; red, significant difference at *p* < 0.05; two-tailed, unpaired *t*-test).

(in pixels) of all the salient features obtained for that view. Then, the features' sizes obtained for all the views were pooled to obtain the average absolute feature sizes shown in **Figures 8B**, **10**. Using the same approach, we also computed the average relative feature sizes shown in **Figures 8A**, **9**. The only difference was that, in this case, the size in pixels of each salient feature was divided by the overall area (in pixels) of the corresponding object view, thus yielding the portion of the view that was covered by that feature.

As shown in **Figure 8**, a comparison between the two stimulus sets revealed that the rats tested with the objects belonging to Stimulus Set 1 selected, on average, larger features, compared to the rats tested with Stimulus Set 2, in terms of both absolute and relative size. This difference was significant for every minimal feature size under consideration (two-tailed, unpaired t-test at p < 0.05; see red dots in the insets of **Figure 8**). However, when we considered the differences between individual object pairs, in terms of their features' relative size (**Figure 9**), we found that the only significant difference was between Object 1 and all the other objects (see **Figures 9A–C**). When the absolute size values were compared (**Figure 10**), a significant difference was also observed between Object 2 and Object 3 (**Figure 10D**).

Taken together, the analyses shown in **Figures 6**–**10** revealed a tendency for the salient features' patterns obtained for Objects 1–4 to closely match the distinctiveness and prominence of the objects' structural parts. For objects with large, clearly discriminable lobes (such as the top lobe of Object 1 and the three elongated lobes of Object 2), the diagnostic salient features were more compact (i.e., larger and less numerous). Objects with smaller and less distinctive lobes (such as Objects 3 and 4) displayed a more scattered pattern of salient features (i.e., smaller and more numerous salient patches). Not surprisingly, this difference in the compactness of the salient features was more prominent when Object 1 (the object with the largest and most distinctive lobe) was compared to the objects of Stimulus Set 2. Once again, this finding suggests that rat recognition strategy is strongly dependent on the structural properties of the target objects.

# Between-Subject Reproducibility of Rat Recognition Strategy

To quantify whether stimulus discriminability also affected the reproducibility of the object features that were preferentially chosen by one group of rats (tested with the same object conditions), we measured the across-rat consistency of the salient features' patterns obtained for our two stimulus sets. This was achieved by computing the overlap of the pattern of salient features obtained for one rat at a given object view (e.g., the default view) with the pattern of salient features obtained for another rat at the same object view (the overlap was computed in the same way as described in **Figure 5**). All possible views and all possible rat pairs were considered to obtain the resulting median overlap values shown in **Figure 11**.

The median overlap was much larger for Stimulus Set 1 than for Stimulus Set 2, and such a difference was highly significant (p < 0.0001, Mann–Whitney U-test test; see **Figure 11A**). When the results of individual objects were compared, Object 1 displayed the largest between-rat consistency of the salient features selected to solve the task, followed by Object 2 and then by the objects belonging to Stimulus Set 2, with all the pairwise comparisons, except the one between Object 3 and 4, yielding differences that were significantly larger than expected by chance (p < 0.001; Mann–Whitney U-test test; see **Figure 11B**). This confirms the observation that the rats tested with Stimulus Set 2 used a recognition strategy that was much more consistent with a view-dependent selection of object features, with respect to the rats tested with Stimulus Set 1, as noticeable by comparing Figure 4 to Figure 6 in Alemi-Neissi et al. (2013). It also confirms that, within Stimulus Set 1, the object leading to the most consistent selection of same diagnostic features was the one that, having the simplest structure (i.e., Object 1), afforded one single feature

top of each panel]. The shaded regions are SEM. The average was computed by pooling across all features, all views of an object and all Figures 6–8—black, no significant difference; red, significant difference

at *p* < 0.05; two-tailed, unpaired *t*-test).

each panel]. The shaded regions are SEM. The average was computed by

difference; red, significant difference at *p* < 0.05; two-tailed, unpaired *t*-test).

objects belonging to Stimulus Set 1 and 2. For any given object view, the overlap between the pattern of salient features obtained for two different rats was computed. Overlap values obtained for all the views of the objects within a stimulus set and all possible pairs of rats were polled, yielding the median overlaps per stimulus set shown by the colored bars. (B) Same analysis as in (A), but with the overlap values of individual objects considered independently. In both (A,B), a Mann–Whitney *U*-test was applied to check whether the resulting medians were significantly different from each other (∗∗∗*<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001, ∗∗∗∗*<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.0001).

(the top, large lobe) for its identification. Object 2, with its equally sized and equally distinctive lobes, allowed a larger number of perceptual alternatives (i.e., possible feature combinations) for its recognition. Hence, the slightly (but significantly) lower betweenrat consistency observed for Object 2, as compared to Object 1. However, since each individual feature was reproducibly confined to the tip of one of the lobes, and, in most cases, at least two lobes were used by rats as diagnostic features, Object 2 still displayed a pattern of features that was much more consistent, across rats, than what we found for the objects of Stimulus Set 2.

# Comparison between the Saliency Maps Obtained for the Rats and a Simulated Ideal Observer

The finding that rat recognition strategy is more or less viewinvariant, depending on the level of stimulus discriminability, raises the question of how optimal such a strategy was, given the discriminatory information that each pair of visual objects afforded. To address this question, we compared it to the strategy of a simulated ideal observer that was tested using the same bubble-masked images that had been presented to the rats of both experimental groups. Given a stimulus set (i.e., either Stimulus Set 1 or 2), the simulated observer performed a template-matching operation between incoming bubble-masked input images and each of the possible bubbles views of the objects within the set (e.g., those marked by red frames in **Figure 1D**), to find out to what object each input image corresponded to. The simulated observer was ideal, since it had stored in memory, as templates, all the views that each object within the stimulus set could take, and was linear, because the template-matching operation consisted in computing the dot product between each input image and each template view (see Materials and Methods and Alemi-Neissi et al., 2013, for details). The simulated observer could be incorrect or correct in identifying the object in a given bubble-masked input image, depending on whether the mask occluded parts of the object that were more or less diagnostic of its identity. Analyzing the responses of the ideal observer to the different bubble-masked images yielded saliency maps that were analog (and, therefore, directly comparable) to the ones previously obtained for the rats. Specifically, the saliency maps obtained for the ideal observer were compared both with the maps obtained for the individual rats (see **Table 2**) and with the group average maps that were obtained by pooling the bubbles trials collected for a given object view across all available rats that had been tested with that view (see **Figure 12**, and, by comparison, Figure 10 in Alemi-Neissi et al., 2013).

The motivation to compute group average saliency maps also for the animals tested with Stimulus Set 2 (in addition to the rats tested with Stimulus Set 1, as originally done in Alemi-Neissi et al., 2013), in spite of the large between-subject variability of the saliency patterns obtained with Object 3 and 4 (see **Figure 11**), was that, as previously discussed, the overall object regions (i.e., top or bottom part of the stimulus) containing mostly salient (or anti-salient) features were broadly preserved across rats (although the finer-grain features' patterns were only minimally preserved). Therefore, computing rat group average maps would still allow enhancing those features that were more consistently relied upon across subjects, by averaging out the idiosyncratic aspects of individual rat strategies. The resulting patterns of critical features extracted from the average saliency maps (see red and cyan patches in **Figures 12A,B**, top rows) indicate that, for most views, the salient features were generally located in the upper lobes of Object 3 and in the central lobe of Object 4 (or, occasionally, in Object 4's lower margins).



*Pearson correlation coefficients between the saliency maps obtained for Objects 1–4 and those obtained for a simulated ideal observer. (*∗*p* < *0.05, permutation test).*

Saliency patterns that were broadly consistent with the ones obtained for the "average rat" were found for the ideal observer too (compare the bottom rows of **Figures 12A,B** to the top rows). For instance, in the case of Object 3, the salient region obtained for the ideal observer also covered most of the upper lobes, although not the tip of the right lobe (as found, instead, for the average rat). This salient region extended to the central part of the stimulus for all tested views (**Figure 12A**, bottom row), while this was the case only of 2 out of 8 views for the average rat (i.e., the default and the position right views; see **Figure 12A**, top row). Object 4 had a large salient region in the bottom part of the central lobe, which extended to the stimulus lower margins, giving rise to a U-shaped salient feature (see **Figure 12B**, bottom row). While this pattern was quite consistent with the overall saliency pattern observed for the average rat, in the case of the ideal observer (but not of the average rat) the tip of the upper-right lobe was also salient for most views.

For every object view, the extent to which average and ideal saliency maps matched was quantified by computing the Pearson correlation coefficient (reported under each pair of saliency maps in **Figure 12**). This coefficient was significantly higher than expected by chance for all views of Object 3, and in 4 out of 8 cases for Object 4 (p < 0.05; permutation test; see Materials and Methods and Alemi-Neissi et al., 2013 for details). This implies that, similarly to what found for Stimulus Set 1 (see Figure 10 in Alemi-Neissi et al., 2013), also for Stimulus Set 2 rat recognition strategy was, on average, consistent with an optimal strategy. That is, rats made, on average, close-to-optimal use of the discriminatory information afforded by Objects 3 and 4, in spite of their lower discriminability, as compared to Objects 1 and 2. This was definitely the case for Object 3 (for which the correlation was significant at all tested views). As for Object 4, the correlation with the ideal saliency map was either null, or failed to reach significance, in all those cases where the average map was highly scattered (i.e., see the default, size, azimuth right, and position right views shown in **Figure 12B**, top row).

As mentioned before, the saliency maps obtained for the ideal observer were also compared with the saliency maps obtained for individual rats. The result of these comparisons (i.e., Pearson correlation coefficients and their significance) are reported in **Table 2**, for all the rats belonging to the two experimental groups (rows) and all the views that have been tested for each rat (columns). The highest correlation values were observed for Object 2, which also yielded the largest fraction of significant correlations (∼85%; 29/34 instances) along with Object 3 (∼85%; 17/20 instances). By comparison, ∼38 and ∼35% of the correlations were significant, respectively, for Object 1 (13/34 instances) and Object 4 (7/20 instances). This indicates that, also at the level of individual rats, there was a good consistency with a strategy that makes close-to-optimal use of the objects' discriminatory information.

At first, having observed this agreement between rat (both average and individual) and ideal saliency maps, regardless of the similarity of the stimulus pair the animals had to discriminate (i.e., also for the objects belonging to Stimulus Set 2), could sound surprising. In fact, as previously shown in **Figures 5**, **11**, the patterns of salient features obtained for Objects 3 and 4 were poorly reproducible across views and rats, and one could wonder, given such variability, how they could be significantly correlated with the saliency patterns of the ideal observer. However, it should be considered that the Pearson correlation coefficients reported in **Figure 12** and **Table 2** measure the similarity between patterns of saliency map values, each taken as a whole (i.e., the patterns of gray shades shown in **Figures 4**, **12**), and not the precise overlap between those few individual saliency patches that crossed the threshold to be considered significantly salient (i.e., the red patches in **Figures 4**, **12**). Therefore, the consistency between the saliency maps obtained for the rats and the ideal observer should be interpreted as a tendency, for rats, to exploit those relatively large object regions that are generally more informative about object identity. However, within these regions, whether the precise pattern of individual salient features (i.e., their location, size, shape, etc.) was also preserved across views and rats strongly depended on the structure and discriminability of the target objects (as shown in the previous sections).

As previously reported for the objects of Stimulus Set 1 in Alemi-Neissi et al. (2013), also in the case of Stimulus Set 2 the saliency map found for the view of a given object roughly resembled the negative image of the saliency map found for the matching view of the other object (see **Figures 4**, **12**). Such a "phase opponency" (or "reversed polarity") is especially noticeable in the case of the ideal observer (i.e., compare the bottom

rows of **Figures 12A,B**), but is clearly observable also for the saliency maps of the average rat (i.e., compare the top rows of **Figures 12A,B**). To quantify this phenomenon, we computed the Pearson correlation coefficient between saliency maps of matching object views, for both the average rat and the ideal observer (see **Table 3**). The correlation coefficients ranged between −0.75 and −0.93 in the case of the ideal observer and they were all significantly lower than expected by chance (p < 0.05; permutation test). This suggests that the optimal extraction of the discriminatory information afforded by two objects naturally leads to saliency maps with reversed polarity across matching views of the two objects. Also in the case of the average rat, most correlation coefficients were significantly lower than expected by chance (p < 0.05; permutation test). Although, on average, their magnitude was lower than for the ideal observer (−0.8 ± 0.02 ideal vs. −0.4 ± 0.09 average rat; p < 0.05, two-tailed paired permutation test), this finding further confirms that rat recognition strategy was broadly consistent with the optimal extraction of discriminatory object information.

# Discussion

# Summary

The goal of this study was to investigate the influence of objects' structural complexity and similarity on rat recognition strategy. As a follow-up to one of our recent studies (Alemi-Neissi et al., 2013), we exploited the same classification image method used there, known as the Bubbles, which has been previously applied to human (Gosselin and Schyns, 2001; Nielsen et al., 2008), monkey (Nielsen et al., 2008), pigeon (Gibson et al., 2005) and, recently, rat vision studies (Vermaercke and Op de Beeck, 2012; Alemi-Neissi et al., 2013). This approach allowed the identification of the visual features that are critical, for rats, in order


and anti-salient (cyan) features (top rows), are compared to the saliency maps


*Pearson correlation coefficients between the saliency maps obtained for matching views of Object 3 and 4 (i.e., the same maps shown in* Figure 10*). For both the average rat (top row) and the ideal observer (bottom row), the significance of the correlation was assessed by a permutation test (*∗*p* < *0.05).*

to correctly discriminate two objects, in spite of both affine (i.e., size/position changes and in-plane rotations) and non-affine (i.e., azimuth in-depth rotations) transformations. The comparison between our previous findings (Alemi-Neissi et al., 2013), obtained with structurally dissimilar objects (i.e., Stimulus Set 1; see **Figure 1A**, left panels) and our present findings (i.e., Stimulus Set 2; see **Figure 1B**, left panels) uncovered several key aspects of rat recognition strategy.

First, when required to discriminate objects with prominent, easily distinguishable structural parts (as in the case of Stimulus Set 1), rats were able to effectively process these parts and use them as markers of object identity (see Alemi-Neissi et al., 2013, for details). This resulted in a perceptual strategy where the diagnostic (salient) features closely matched the structural elements of the target objects (e.g., the central region of Object 1's top lobe or the tip of the lobes of Object 2; see Figure 6 in Alemi-Neissi et al., 2013). On the other hand, rats that faced a harder discrimination task (Stimulus Set 2) relied on smaller, more numerous and more scattered object features, often failing to display a clear match with the objects' structural parts (see **Figures 4**, **6**–**10** for a quantitative comparison among the two stimulus sets).

Second, for the rats tested with Stimulus Set 1, the recognition strategy was remarkably stable (i.e., view-invariant) in the face of variation in object appearance (see Figure 6 in Alemi-Neissi et al., 2013). This was shown by the large overlap found (for both Object 1 and 2) between the patterns of salient features of different views, after aligning one view back onto the other (i.e., see the aligned overlap axis in Figure 8B of Alemi-Neissi et al., 2013). The recognition strategy of the objects belonging to Stimulus Set 1 was also highly reproducible across rats (see **Figure 11**). On the other hand, rats tested with Stimulus Set 2 displayed a more variable pattern of diagnostic features across object views (see **Figures 5B–D**), and a higher inter-subject variability (see **Figure 11**), which are suggestive of a more view-dependent recognition strategy. Importantly though, for both groups of rats, no trivial, screen-centered strategy could explain rat recognition behavior (i.e., pairs of raw and aligned overlap values lay mostly below the diagonal not only in Figure 8B of Alemi-Neissi et al., 2013, but also in Figure 5B of the present study).

Third, rat recognition performance was, for both groups of rats, typically larger than chance over large extents of the tested transformation axes, with a substantial drop that was observed only for extreme transformation values, especially in the case of Stimulus Set 2 (see **Figure 3**).

# Interpretation, Implications, and Limitations of our Findings

As mentioned in the Introduction, view-invariant theories (in their strongest version) posit that, across changes in object view, there should be no change in recognition performance—as long as the diagnostic features are accessible, the response of the system remains invariant. By comparison, view-dependent theories hypothesize that changes in the object appearance will generally result in variation of recognition performance, since objects are represented according to how they appeared when originally learned (for a review, see Tarr and Bülthoff, 1998; Lawson, 1999; Biederman, 2000). Since both groups of rats displayed a modulation of recognition performance, one could argue that rats, in general, rely on a recognition strategy that is mainly viewdependent, and becomes view-invariant as a result of training as shown for monkeys and pigeons, when tested with unfamiliar, hard-to-discriminate objects; (Logothetis and Pauls, 1995; Wasserman et al., 1996; Spetch et al., 2001; Spetch and Friedman, 2003; Nielsen et al., 2006). However, even in "highly invariant" visual systems, like the human one, perfect invariance of the recognition performance is virtually never achieved (Biederman, 1987, 2000; Afraz and Cavanagh, 2008, 2009). More importantly, our classification image approach allowed going beyond what could simply be inferred based on performances, because it provided a direct assessment of rat perceptual strategy and its invariance. As reported in our previous study (Alemi-Neissi et al., 2013), the analysis of the patterns of diagnostic features showed, for Stimulus Set 1, a consistency in "tracking" the diagnostic features across all or most the object views the animal faced. From this, we can infer that rats are able to actively detect and extract discrete object features, which are relied upon regardless of the transformations the objects may undergo. The present study suggests that the crucial requirement for this ability to emerge is the distinctiveness of the objects, in terms of their structural similarity and the presence of "well affordable" object-specific features. Similarly to what has been reported for humans (Newell, 1998; Hayward and Williams, 2000; Spetch et al., 2001; Vuong and Tarr, 2006), rats can make use of a view-invariant strategy when confronting easily discriminable objects. Conversely, a view-dependent recognition strategy will emerge as the result of a discrimination involving visually (and structurally) similar objects. This appears to be the case of Stimulus Set 2, where the spread of salient features found for both Object 3 and 4 suggests that the rats recognized these stimuli using a novel set of features for each view.

Taking into account the larger stability of both the recognition performances and the patterns of diagnostic features observed for Stimulus Set 1, as compared to Stimulus Set 2, we can conclude that rat recognition strategy can be more or less viewinvariant, depending on the structural similarity of the target objects. Objects that are structurally dissimilar are recognized by a lower number of diagnostic features, which map onto the objects' distinctive parts across a variety of transformation axes and magnitudes (view-invariant strategy). Objects that are structurally similar are recognized through a more variable, more scattered and more numerous set of features (implicating that learning at each tested view is needed; viewpoint-dependent strategy). But view-invariant and view-dependent strategies are not mutually exclusive. As observed for humans, "it is likely that the visual system employs them all to some degree to achieve object constancy" (Lawson, 1999). As for rats, this is in agreement with a recent report (Tafazoli et al., 2012), demonstrating how these animals can spontaneously (i.e., without any training) generalize their recognition to novel object views (view-invariant strategy), although the accuracy of the discrimination improves when training is provided (view-dependent strategy).

It is worth mentioning that, according to modern theories of object recognition, be they based on hierarchical feedforward processing (see, for example, Riesenhuber and Poggio, 1999) or recurrent, error-driven computations (see, for example, O'Reilly et al., 2013), the view-invariant vs. view-dependent debate may appear outdated (Hayward, 2003). However, being concerned with the role of learning and memory in object recognition, and their impact on object representations at the neural level, such a distinction still provides a rather useful theoretical framework to understand the invariance problem. For instance, the object recognition model proposed by Riesenhuber and Poggio (1999) explicitly embodies both view-invariant and viewdependent computations in the same feedforward architecture. At the first stages of processing, iterated AND-like and OR-like computations implement general-purpose banks of local feature detectors, which respond to subportions of visual objects with increasingly complex shape tuning and tolerance to size and position changes. Instead, the upper stage of the model (corresponding to monkey inferotemporal cortex) is made of "viewtuned" units, i.e., simulated neurons that selectively respond to different views of the objects that the model has been trained to discriminate. Other experimental and computational studies (DiCarlo et al., 2012; Wyatte et al., 2012; O'Reilly et al., 2013; Tang et al., 2014) have recently highlighted the importance, in object recognition, of coupling feedforward computations (based on little or no re-entrant processing) with recurrent computations (based on within-area, error-driven learning). Such a coupling could play a key role at the latest stages of processing, as well as under particularly challenging viewing conditions (e.g., when object appearance is occluded, degraded, or dramatically shifted from its "canonical" view, as in the case of masking or in-depth rotation). The combined findings of our current and previous studies (Zoccolan et al., 2009; Tafazoli et al., 2012; Alemi-Neissi et al., 2013; Zoccolan, 2015) fit within this theoretical and experimental framework, suggesting that rat invariant recognition is achieved by combining the automatic tolerance granted by local, partially invariant feature detectors with the fuller invariance provided by acquired, view-specific object representations.

Finally, our data show that, even in the case of structurally similar objects, the saliency maps underlying rat recognition strategy partially (but often significantly) overlap with those obtained for a simulated ideal observer engaged in the same invariant recognition task (see **Figure 12**). As discussed in the

# References


Results, this implies a tendency, for rats, to select the diagnostic object features within those relatively large object regions that are the most informative about object identity (although the acrossview and across-rat reproducibility of the specific patterns of diagnostic features will strongly depend on the discriminability of the target objects).

It is important to point out that our current study rests on behavioral data collected from a rather small number of rats (3, i.e., half of the animals that were tested in our previous study, Alemi-Neissi et al., 2013), thus possibly limiting the generality of our conclusions. This would be the case, if our results were based on comparing group average performances (as in **Figures 2A**, **3**). On the contrary, the conclusions of our study mainly rest on comparing the reproducibility of rat recognition strategy across subjects and object views. Since many different object views were tested and, for each view, multiple salient features were obtained, the most crucial data analyses reported in the study (shown in **Figures 5**–**11**) are based on tens of data points, thus allowing an adequate statistical sample and a robust assessment of rat recognition strategy.

Taken together, the results presented in this study suggest that, similarly to what observed for humans, also for rats, transformation-tolerant recognition can flexibly rely on either view-invariant representations of distinctive object features or view-specific object representations. Given the extraordinary potential of the rat as a model to dissect neuronal functions at the molecular, synaptic, and circuitry levels (Margrie et al., 2002; Ohki et al., 2005; Lee et al., 2006; Greenberg et al., 2008; Deisseroth, 2011; Fenno et al., 2011; Egger et al., 2012; Tye and Deisseroth, 2012; Meyer et al., 2013), our findings suggest that rat studies could significantly advance our understanding of the formation and maintenance of transformation-tolerant object representations in the visual cortex.

# Acknowledgments

This work was supported by an Accademia Nazionale dei Lincei—Compagnia di San Paolo Grant, a Programma Neuroscienze Grant of the Compagnia di San Paolo, a Marie Curie International Reintegration Grant (IVOR) and a HFSP Program Grant (RGP0015/2013).


viewing angles. Eur. J. Neurosci. 31, 327–335. doi: 10.1111/j.1460-9568.2009. 07057.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Rosselli, Alemi, Ansuini and Zoccolan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neural discriminability in rat lateral extrastriate cortex and deep but not superficial primary visual cortex correlates with shape discriminability

Ben Vermaercke1, <sup>2</sup> \*, Gert Van den Bergh<sup>1</sup> , Florian Gerich<sup>1</sup> and Hans Op de Beeck <sup>1</sup>

*<sup>1</sup> Laboratory of Biological Psychology, Psychology and Educational Sciences, KU Leuven, Leuven, Belgium, <sup>2</sup> Department for Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA*

Recent studies have revealed a surprising degree of functional specialization in rodent visual cortex. It is unknown to what degree this functional organization is related to the well-known hierarchical organization of the visual system in primates. We designed a study in rats that targets one of the hallmarks of the hierarchical object vision pathway in primates: selectivity for behaviorally relevant dimensions. We compared behavioral performance in a visual water maze with neural discriminability in five visual cortical areas. We tested behavioral discrimination in two independent batches of six rats using six pairs of shapes used previously to probe shape selectivity in monkey cortex (Lehky and Sereno, 2007). The relative difficulty (error rate) of shape pairs was strongly correlated between the two batches, indicating that some shape pairs were more difficult to discriminate than others. Then, we recorded in naive rats from five visual areas from primary visual cortex (V1) over areas LM, LI, LL, up to lateral occipito-temporal cortex (TO). Shape selectivity in the upper layers of V1, where the information enters cortex, correlated mostly with physical stimulus dissimilarity and not with behavioral performance. In contrast, neural discriminability in lower layers of all areas was strongly correlated with behavioral performance. These findings, in combination with the results from Vermaercke et al. (2014b), suggest that the functional specialization in rodent lateral visual cortex reflects a processing hierarchy resulting in the emergence of complex selectivity that is related to behaviorally relevant stimulus differences.

Keywords: shape discrimination, rodent behavior, visual water maze, electrophysiological recording, population coding

# Introduction

Interest in the use of rodents for research into the neurobiological underpinnings of vision has grown in recent years. While most studies focus upon early stages of information processing up to primary visual cortex (V1), more and more studies have started to delineate the surprisingly large number of cortical visual areas beyond V1.

Significant advances have been made in describing the functional properties of many regions in rodent cortex that process visual information. In particular, reports in mice show that several of these areas are organized hierarchically (Marshel et al., 2011; Wang et al., 2012) and functionally specialized (Andermann et al., 2011; Glickfeld et al., 2013). Anatomical and electrophysiological

#### Edited by:

*Davide Zoccolan, International School for Advanced Studies, Italy*

#### Reviewed by:

*Laura Busse, University of Tuebingen, Germany James H. Marshel, Stanford University, USA*

#### \*Correspondence:

*Ben Vermaercke, Department for Molecular and Cellular Biology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA ben.vermaercke@ppw.kuleuven.be*

> Received: *30 June 2014* Accepted: *07 May 2015* Published: *20 May 2015*

#### Citation:

*Vermaercke B, Van Den Bergh G, Gerich F and Op de Beeck H (2015) Neural discriminability in rat lateral extrastriate cortex and deep but not superficial primary visual cortex correlates with shape discriminability. Front. Neural Circuits 9:24. doi: 10.3389/fncir.2015.00024* studies in rats have revealed many extrastriate regions that receive direct input from V1 and show retinotopical organization using electrophysiology (Montero et al., 1973; Espinoza and Thomas, 1983; Thomas and Espinoza, 1987) or anatomical methods (Olavarria and Montero, 1984; Malach, 1989; Vaudano et al., 1991; Coogan and Burkhalter, 1993; Montero, 1993). Although naming schemes vary, areas found lateral to V1 are often referred to as lateromedial (LM), laterointermedial (LI), laterolateral (LL). Studies into other functional properties of rat extrastriate regions are rare but are much needed. The value of rodent models would increase tremendously if evidence shows that neural response patterns and functional differences between areas can be linked to behavioral performance of the animals. Up to now there is only very indirect evidence for such a relationship. For example, it was shown recently that rats are able to learn complex shape discrimination tasks in which they exhibit invariance to changes in pose, illumination, and/or position (Zoccolan et al., 2009; Tafazoli et al., 2012; Vermaercke and Op de Beeck, 2012). The behavioral capacity for position invariance might very well be based upon position invariance at the neural level, which was shown recently (Vermaercke et al., 2014b).

Here we provide a more direct test of the degree to which functional differences at the neural level are related to behavioral performance. As a working hypothesis, we would expect that the functional hierarchy and specialization in rodent visual cortex reflects how the representational format of information is changed into a format which is useful for making behavioral decisions, as is assumed by current models of vision in primates (Dicarlo and Cox, 2007; Pinto et al., 2008). If this hypothesis is true, then we expect that behavioral performance would be correlated with neural selectivity in non-primary cortical areas, more than with neural selectivity in primary visual cortex.

The experiments reported here attempt to make a first step toward answering these questions. We characterized the ability of rats to discriminate pairs of shapes in a behavioral twoalternative forced choice task. Subsequently, the same stimulus set was presented to naive, awake animals while neural responses in five cortical areas (V1, LM, LI, LL, and TO) were recorded for the same set of shapes. Our results show that neural selectivity for shape differences in lower layers of extrastriate visual areas, but not in upper layers of primary visual cortex, is related to behavioral discrimination performance.

# Materials and Methods

This is the primary report of the behavioral experiment, for which we provide all experimental details. The neurophysiological data were first reported elsewhere (Vermaercke et al., 2014b). The current description of these data focuses upon the most relevant aspects and new analyses in order to relate these neural recordings to the outcome of the behavioral study.

# Animals

The behavioral experiment included 12 FBN F1 rats (F1- Hybrids, first generation offspring of crossing the Fisher and Brown-Norway strains). They were obtained from Harlan animal research laboratory (Hsd, Indianapolis, Indianapolis) at an age of 5 months and were housed in groups of six per cage, further referred to as two batches of six animals. For identification, we colored each rat's tail with 1 to 6 circles using a black marker. All procedures for animal housing and testing were approved by the KU Leuven Ethical Committee for animal experiments and were in accordance with the European Commission Directive of September 22nd 2010 (2010/63/EU).

# Behavioral Experiments Behavioral Setup

For the behavioral task, we implemented the visual water-maze setup (V-Maze) described previously (Prusky et al., 2000; Wong and Brown, 2006; Vermaercke et al., 2014a). The setup consisted of a trapezoid pool, filled with transparent water at 26◦C, and two screens (Dell 17′′ LCD monitors, 1024 × 768@60 Hz) placed at the long end of the pool (**Figure 1A**). The animal was released into the water at the short end of the pool. From there, it has to find a submerged platform located in front of one of two screens. The reflection of the stimuli on the water obscured the platform. The rats had to learn which of two stimuli predicts the location of the platform. A 50 cm long divider was placed in between the two screens to force the animal to make a choice at that point. When crossing this point, we scored the trial as correct or incorrect depending on the location of the platform. Scoring was automated using online analysis of video images (Logitech Webcam Pro 9000) implemented in Matlab and allowing continuous tracking of the animal's position. All animals had to stay in the water until the platform was found. After a wrong decision, they were left on the platform 15 s longer. After being taken out of the water, a rat was placed under a heating lamp. Its turn for a next trial would come after all other rats of the batch completed a trial.

# Stimuli

For studying shape processing, we selected 6 of the 8 shapes from the study by Lehky and Sereno (2007): a square, a diamond in a square, a triangle, the letter lambda, the letter H and a plus sign (**Figure 1B**). The exact choice of the stimuli was decided based upon the neuronal responses in inferior temporal cortex (IT) as obtained by Lehky and Sereno and included those shapes that displayed the largest variability in neural discriminability according to their data. The luminance level of each shape (i.e., the number of white pixels) was equalized. The mean width of the bounding box (the minimal rectangle containing all white pixels) around each shape was 27.3◦ , ranging from 23 to 33◦ . Stimuli were presented on a black background filling the entire display. The length of the divider determined the maximal size of the stimulus; at this point the animals had to make decision (see further). These shapes are able to drive populations of monkey anterior inferior temporal neurons, an area in monkeys which is considered as the final stage of processing in the ventral stream. At the same time they are simple black and white stimuli that contain most information in the lower spatial frequencies. This allows processing by the rat visual system with its limited visual acuity.

performance. (D) Mean performance over the last four sessions of the experiment ordered according to average performance per pair (i.e., over two animals). Blue bars indicate performances of rats from batch 1; red bars show data for rats from batch 2. The results show that the six pairs used in this study yield a wide range of performances. Error bars indicate binomial confidence intervals at the 0.05 level.

#### Shaping Phase

This phase was not part of the actual experiment, but was meant to familiarize the animals with the setup and the goal of the task: finding the location of the submerged platform. We used two very easy stimuli (black vs. white screen), of which one (the white screen) was consistently associated with the platform. In the first trials we released the animal right in front of the platform. In this phase they had to learn that a platform can be found somewhere and that this is the only way out of the water maze. Consequently, we released them gradually further away from the screen, until they were placed beyond the divider. At this time, the animal had to make a decision in which arm to look first. The position of the white screen and associated platform (left or right) was pseudo randomized by starting at a random position in the following scheme LRLLRLRR (Prusky et al., 2000). All rats learned to solve this task after a week of two times 10–12 trials per day.

# Experimental Phase

After the animals were used to being put in the water and searched for the platform readily and consistently picked the correct side in the shaping phase, transition was made to the actual experiment. In this experiment, each animal was presented with one pair of shapes. The behavioral experiment included six of the 15 possible pair-wise combinations of the six shapes. We selected three shapes as targets and combined them with either a dissimilar or a similar distractor, with (dis)similarity derived from the dissimilarity matrix obtained for area IT by Lehky and Sereno. As a result, we obtained three hypothetically easy and difficult pairs, which would maximize the variability in behavioral discrimination performance of these pairs if discriminability in monkeys would be fully or partially related to discriminability in rats.

The experiment included two batches of six rats. We performed the experiment until the average performance across all six rats in a batch was above 70% correct for at least four successive 10 or 12-trial sessions. This criterion was chosen fairly low because we were looking for differences in difficulty between shape pairs so we expected some pairs to be more difficult and not result in a learning curve yet. With a criterion of 70%, there is ample room for individual shape pairs to be associated with much lower or much higher performance than the criterion. The two batches needed respectively, 17 and 16 sessions to reach criterion. We calculated proportion correct trials over the last 4 sessions for each pair (this proportion is further referred to as behavioral performance or BEH).

# Physical Similarity Measures

We obtained measures of physical (dis)similarity for these shapes based on pixel-wise or Euclidean distances (PIX) between pairs of shapes, defined as the number of pixels with a different value (binary: black or white) in the two shapes using the formula:

$$P\mathfrak{x}\_{nm} = \sum \sum \sqrt{\left(\mathbb{S}\_n - \mathbb{S}\_m\right)^2} \ m > n$$

where n and m indicate indices of different stimuli and the double sum operates over rows and columns of the resulting difference matrix. These values were then normalized, by dividing by the maximum, and rescaled to fit between 0.5 and 1 by dividing by 2 and adding 0.5. We also determined the response of a population of simulated V1 neurons (V1Sim). For this we used a simplified version of the approach described in Pinto et al. (2008). We first smoothed the images (768 by 1280 px) using a Gaussian lowpass filter (FWHM <sup>=</sup> 20px, ∼=1.5cpd, the approximate acuity of our rats; see Prusky et al., 2002) and normalized to have zero mean and unit standard deviation. Next, the images were convoluted with 80 filters (a combination of five frequencies: 0.04, 0.08, 0.15, 0.30, and 0.60 cpd (Girman et al., 1999), and 16 orientations encompassing the full circle), with the size of each filter adjusted to include two cycles. All filters were normalized to have zero mean and norm one. The resulting response matrix R was compared between the 15 possible pairs of shapes and we calculated discriminability D as:

$$D\_{nm} = 1 - corr\left(R\_n\left(i, j, f\right), R\_m\left(i, j, f\right)\right) \ m > n$$

where indices n and m refer to one of the six images, and index f refers to one of the 80 filter response planes. Indicesi and jrefer to image pixels in each filter response plane. The 15-element vector of D-values is rescaled to fit between 0.5 and 1 as before and will be further referred to as V1Sim.

The pixel-based distance (PIX) and the simulated V1 distance (V1Sim) were highly correlated across all shape pairs (r = 0.899, p < 0.0001; N = 15 shape pairs), indicating that for this stimulus set the calculation of physical dissimilarity is not very sensitive to the particular method and parameters used.

# Electrophysiological Experiment

The primary report of the neural data is provided by Vermaercke et al. (2014b). Here we focus upon one experiment ("Experiment 4: Selectivity for moving shapes") from that study which included the same stimuli as the behavioral study.

### Animal Preparation and Surgery

All experiments and procedures involving living animals were approved by the Ethical Committee of the university and were in accordance with the European Commission Directive of September 22nd 2010 (2010/63/EU). As also described by Vermaercke et al. (2014b), we performed microelectrode recordings in awake hybrid Fischer/Brown Norway F1 rats (n = 9 males), obtained from Harlan Laboratories, Inc. (Indianapolis, IN). Rats aged between 3 and 12 months, anesthetized using ketamine/xylazine, received a stereotaxically positioned 2 mm diameter circular craniotomy at −7.90 mm posterior and 3.45 mm lateral from bregma. In most animals (N = 6), a metal recording chamber with a base angle of 45◦ was placed on top of the craniotomy. A triangular head-post was fitted on top of bregma (see **Figure 2**). In three animals, V1 was entered orthogonally to the cortical surface, at the same location. This enabled us to record from all cortical layers in V1. A CT scan of the head confirmed the position of the recording chamber and craniotomy. Buprenorphine (50µg/kg, i.p.) was administered postoperatively every 24 h as long as the rat showed signs of pain. When the animal was comfortable with being head restrained for at least 1 h and 30 min, we started with our recording sessions.

# Electrophysiological Recordings

As described by Vermaercke et al. (2014b), a Biela microdrive (385µm per turn) containing a 5–10 M impedance tungsten electrode (FHC) was placed on the recording chamber. For the diagonal recordings, the electrode was manually moved into the brain under an angle of 45◦ in steps of less than a quarter turn of the Biela drive (385µm per full turn), thereby entering five different visual areas: V1, LM, LI, LL, and TO. Action potentials were recorded extracellularly using a Cheetah system with headstage amplifier (Neuralynx, Bozeman, MT). The signal was filtered to retain the frequencies from 300 to 4000 Hz and digitized at 32556 Hz. Action potential spikes were recorded when they crossed a threshold set well above noise level. Recordings started from the brain surface and continued until we had penetrated through the five different areas and did not find visual responses anymore or the animal started to show signs of stress. During the first few penetrations, to obtain a basic idea of the retinotopy along the electrode track, we manually determined

FIGURE 2 | Schematic drawing of the rat skull with locations of implanted headpost and recording chamber and layout of lateral visual areas. This figure shows how our implants were laid out on the rat skull. The headpost was placed over bregma to leave enough room for the recording chamber and ample skull surface to attach dental acrylic. We made the craniotomy at AP −7.90 and ML 3.45 and centered the recording chamber over these coordinates. The resulting electrode track (red arrow) would typically enter cortex in the binocular part of V1 and would subsequently traverse areas LM, LI, LL and TO.

the unit's receptive field (RF) position every 200–400µm using continually changing shapes or small drifting circular sinusoidal gratings that could be moved across the screen. Units were recorded in all five areas at different depths, with mainly upper layers sampled in V1 and lower layers in the other areas with at least 200µm between recording positions during a single session. Cortical depth within each area was reconstructed based on stained histological slices. For the experiment focusing on upper and lower layers in V1 using orthogonal penetrations, depth could simply be derived from the z-travel of the microdrive. Area boundaries were determined by the reflections of the retinotopic map, which were usually accompanied by obvious changes in elevation of the RF centers. A recording session generally lasted between 2 and 3 h. After removing the electrode, cleaning and capping the recording chamber, the animal was released from the head holder, and rewarded with water in its home cage (animals were water deprived prior to the recording session and only received small drops of water during recording). In each animal, we could generally perform between 10 and 15 penetrations over a period of several months. After the recording session, action potential waveforms were assigned to individual units using off-line clustering with KlustaKwik (for more details on waveform discrimination and signal quality, see Vermaercke et al. 2014b).

### Visual Stimulation during Electrophysiology

Stimuli were presented to the right eye on a 24′′ LCD monitor (Dell, Round Rock, Texas; 1280 × 768 pixels, frame rate = 60 Hz, mean luminance = 24 cd/m<sup>2</sup> , 102 × 68◦ ) at a distance of 20.5 cm from the eye at an angle of 40◦ between the rostrocaudal axis and the normal of the screen. Visual stimuli were presented with custom-developed stimulation software using Matlab (The MathWorks, Natick, MA) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). The setup was placed within a closed, dark cabinet.

The six shapes described above were presented at identical size and contrast around the optimal position within the RF. The mean width of the bounding box around each shape was 27◦ , ranging from 23 to 33◦ (stimulus size was matched to the behavioral experiment). Because other experiments (Montero and Jian, 1995) suggested that neural responses in headrestrained animals are more sustained and more selective when stimuli are moving, the shapes were translating around this optimal RF position at four differently orientated axes of movement, separated by 45◦ (horizontal, vertical, and the two diagonals). The moving stimulus was shown for 4 s and the movement along each axis took 1 s. The order of the four movement axes was randomized within each 4 s presentation. During the movement along one axis, the shape started at the center (optimal) position, moved 8◦ (77 pixels) away from this center position in 167 ms and then moved backwards to the opposite side of the center position in 333 ms. This movement was mirrored once to complete 1 s and then the movement seamlessly continued in a different orientation. These orientations were shuffled in each trial, resulting in 24 combinations of 6 shapes × 4 orders of orientations.

# Data Analysis

#### Behavior

We calculated proportion correct trials over the last four sessions for each pair and calculated 95% confidence intervals for each performance using the Matlab function binofit (shown as error bars in **Figure 1D**). We compared differences between performances for different shape pairs using permutation analysis in which we shuffled the identity of correct and incorrect trials. For these vectors, we then computed average performance correct and recorded the difference between these for each combination of pairs over 10000 iterations. When the actual difference was outside of the 95% confidence interval of this distribution, we declared the difference as significant.

# Neural Responses

For each neuron, we calculated the number of spikes elicited by each shape per trial, averaged across the 4 s stimulus presentation time. Then, we subtracted baseline activity, which was calculated as the average number of spikes in a 2 s interval preceding each stimulus presentation. Units were included when they showed a net response above 2 Hz for at least one of the shapes (i.e., nonresponsive units were excluded; similar results were obtained if they were included).

To determine how well a population of neurons can discriminate between different stimuli, we implemented a linear classifier read-out similar to the one used by Rust and Dicarlo (2010) (see also Hung et al., 2005; Li et al., 2007; Vangeneugden et al., 2011). This read-out scheme is one possible way to assess the amount of information a population of units could serve to a downstream neuron, assuming this neuron applies a nonlinear operation on the summed inputs. Starting with the spike count responses of a population of N neurons to P presentations of M images, each presentation of an image resulted in a response vector x with a dimensionality of N by 1, where repeated presentations (trials) of the same images can be envisioned as forming a cloud in an N-dimensional space. Linear support vector machines (SVM) were trained and tested in pair-wise classification for each possible pair of shapes (6 shapes result in 15 unique pairs). A subset of the population vectors (trials) collected for both shapes were used to train the classifier. Performance was measured as the proportion of correct classification decisions for the remaining vectors/trials not used for training (i.e., standard cross-validation). The penalty parameter C was set to 0.5 (as in Rust and Dicarlo, 2010) for every analysis.

For correlations with behavior, we retained the data for the 6 shape pairs, which were also used in the behavioral experiment.

#### Reliability and Significance of SVM Performance

To equalize the number of cells and trials used across visual areas, we applied a resampling procedure. On every new iteration, we selected a new subset of cells (without replacement) with the number of cells equal to the lowest number of cells recorded in a single visual area, and a random subset of trials (without replacement). We averaged over 100 resampling iterations to obtain confidence intervals for the performance. We also computed chance performance by repeating the same analysis 100 times using shuffled condition labels (thus 100 times 100 resampling iterations).

# Chi-Square and Permutation Analysis

We used chi-square to assess how well neural classification performance for all six pairs are matched between neural data and either physical dissimilarity or behavioral performance. We used the formula:

$$\text{ChiS}q = \sum\_{i}^{n} \frac{(O\_i - E\_i)^2}{E\_i}$$

where the index i indicates the ith stimulus pair of n pairs. O represents the observed values, in our case the classifier performance based on neural population responses to the ith pair. E indicates the expected values, in our study either physical dissimilarity or behavioral discriminability.

We employed permutation statistics to test the null hypothesis that the matching of shapes is not important. In order to destroy all pairwise relations, we shuffled the vector of observed values (O). We exclude shuffles that had one or more element in the original position (the pattern of results is identical without this restriction). We tested for a significant dependency between both sets of six performances by shuffling the O-values over all unique permutations (N = 265 after selection out of 720 total). P-values were calculated by measuring the proportion of values that are more extreme than the actually observed value.

# Permutation Analysis of Correlation Values

We used similar procedures as described in the previous section to analyze the correlation data.

# Data Analysis to Compare Neural Population Discriminability with Behavioral Difficulty

Based upon earlier work in humans and other primates (Dicarlo and Cox, 2007; Op de Beeck et al., 2008), we expect a high correlation between V1 discriminability and physical, pixelbased (or V1-simulated) distances between stimuli but not between V1 and behavioral discriminability. At the same time, we expect a high correspondence between TO and behavioral discriminability, but not between TO and physical distances. To test this prediction, we constructed the transformation index H that captures this relation:

$$\begin{array}{rcl} H & = & \left[ Z \left( TO, Behavior \right) - Z \left( TO, Pex \right) \right] \\ & - \left[ Z \left( V1, Behavior \right) - Z \left( V1, Pex \right) \right] \end{array}$$

where TO and V1 refer to the neural population discriminability of the 6 shape pairs in area TO and V1, respectively. Behavior corresponds to animal performance on these pairs; and Pix refers to the pixel-wise difference between the shapes of these pairs. The operator Z corresponds to the sample Pearson correlation between both performances after Fisher-Z transformation. High values of the index would provide support for our assumption that there is a transition from pixel-related discriminability in V1 to behavior-related discriminability in TO. This index was also calculated for permuted data (same procedure as above) and we used the 95th percentile value as the threshold for significance of the obtained index.

# Results

# Behavioral Shape Discrimination

One specific hallmark of the ventral visual pathway in primates is that neural responses and neural discriminability are more related to behavioral performance for higher-level regions than for e.g., V1, where we expect more correspondence with measures of local pixel-level differences.

To test whether this is true in rodents as well, we first obtained behavioral data from 12 rats about the relative discriminability of different shape pairs. These rats were trained in a visual water task (see **Figure 1A**) to discriminate between different shape pairs, one shape pair per rat. Six shape pairs of the 15 possible pairs were included and two rats were trained per shape pair (see **Figure 1B**), one in each batch of six rats. We equated the length of training across rats/pairs. Based upon primate literature, we would expect that those shape pairs that would be associated with the best behavioral discrimination performance at the end of training would also be associated with a higher neural discriminability in the higher areas in the identified rat visual pathway, but not in area V1. Given animals have to find the target shape while moving around in a water maze, it would also be unlikely that a simple V1 representation, lacking position invariance, would be sufficient to drive the animals' decision process.

After 17 and 16 behavioral training sessions for the first and second batch respectively, average performance across all rats reached the criterion of 70% correct. We noticed clear differences in performance between the shape pairs (**Figure 1C**). A few shape pairs were associated with performance close to 100% correct, while two other shape pairs were associated with performance close to the chance level of 50%. This variation in performance across shape pairs generalized from the first to the second batch of six animals (blue and red bars in **Figure 1D**): the variation in performance across shape pairs was highly correlated between the two batches (r = 0.92, P = 0.009, N = 6 shape pairs). To assess whether the ordering was important, we pooled the data for two animals per pair and performed a permutation test to compare the average performance between all possible combinations of shape pairs (see Methods). We found the all differences to be significant, except for those between pair 2–3, 3–4, and 5–6. We conclude that the order matters for most pairs and that correlations based on these data are meaningful.

With the data of just the first batch of rats, it would have been conceivable that the differences between shape pairs would be related to interindividual differences between the rats, given that each shape pair was tested in a different animal. However, the near-perfect replication of the across-pair variation in performance in the second batch of animals argues against this alternative hypothesis in terms of interindividual differences. This alternative hypothesis is also not consistent with the fact that all rats had shown a very similar performance in the preceding shaping phase (mean = 0.91, SD = 0.06, N = 12), in which rats were trained in the general task layout using full field white and black stimuli. To quantify this, we paired rats that would receive the same shape pair in the next phase, calculated their average performance obtained during the last four shaping sessions and performed a paired t-test: [t(5) = 0.4008; P = 0.7051, N = 6]. The small differences in performance were also not correlated (r = 0.05, P = 0.9245, N = 6), excluding any preexisting similarities between the rats that would explain striking similarity in performance for the shape pairs. This suggests that the two batches start out as fairly homogeneous groups that react in a consistent way when confronted with different stimulus pairs.

We quantified the average time animals needed to make a decision; this includes swim time from the start of the trial until the animal passed the divider. Median reaction times were 5.73 and 5.47 s for both batches [N = 1045 and 1145 trials; Q25– 75 = (4.94 9.30) and (4.80 6.76) s]. These values are comparable to the presentation duration used for the electrophysiological recordings.

# Neural Discrimination Performance

The data described here form a subset of a larger dataset reported earlier (Vermaercke et al., 2014b). This earlier study characterized responses of single neurons and populations in rat primary visual cortex (V1) and 4 extrastriate areas (LM, LI, LL, and newly found area TO). We focus here on the extent to which each of the five defined cortical areas allows the discrimination of the six shape pairs. While showing these shape stimuli, we recorded from a total of 631 (114, 104, 166, 107, 140 for areas V1, LM, LI, LL, and TO, respectively) neurons. After selecting responsive neurons to which each shape had been presented at least 12 times, we retained 413 single units (88, 63, 131, 68, and 63; this yields 63 neurons per SVM subsampling). The percentage of responsive neurons was 77, 61, 79, 64, and 45 for areas V1, LM, LI, LL, and TO, respectively. Between neurons there was a large variation in exact receptive field position (Vermaercke et al., 2014b).

Averaged across the six shape pairs, we found reasonable and strongly significant population decoding performance in every area [**Figure 3A**; V1 = 92.33%, t(5) = 15.7517, P = 0.0000; LM = 82.54%, t(5) = 15.7517, P = 0.0000; LI = 83.53%, t(5) = 15.7517, P = 0.0000; LL = 78.00%, t(5) = 15.7517, P = 0.0000; TO = 71.99%, t(5) = 15.7517, P = 0.0000; error bars show SEMs]. When performing the permutation analysis using shuffled trial labels, we obtain chance level estimates of which the 95th percentile is shown as red horizontal bars in **Figure 3A**; all corresponding p-values for each area fall below the 0.0001 level.

We also tested whether differences between areas reached significance by doing a similar permutation analysis using shuffled area labels. We did this for all pair-wise comparisons and found classification performances in all areas to be significantly different, except for the difference between LM and LI. All these analyses gave similar results when performed after matching the average firing rates between areas (data not shown, see Vermaercke et al., 2014b for details on a similar matching procedure). On a more detailed level of analysis, we find that performance for all shape pairs is fairly high in V1, but shows a differential pattern in higher areas (see **Figure 3B**). Discrimination performance for four out of six shape pairs decreases slightly over areas, for two other pairs performance decrease is stronger.

Because we are interested in correlating neural responses to behavioral performance, we performed a control analysis to rule out that correlations with area V1 would be distorted/diminished by a ceiling effect with a generally high neural discriminability. To control for this, we included a progressively lower number of cells (N = 63, 40, 25, 10) for each SVM resampling. This will bring down the average performance level, which would allow for a pattern that could be compressed by overall high performance, to reappear. The curves in **Figure 3C** confirmed that this manipulation was effective: SVM models based on a smaller amount of cells show a lower overall performance. When examining the shape of the curve, we find only minor changes in the relative differences between shape pairs.

# Correlations between Pixel-Based Differences, Neural Responses, and Behavior

We combined data from the behavior and electrophysiology experiment to determine which of the cortical areas are more likely to underlie shape discrimination. We also included a measure of pixel-wise differences, which captures low-level similarity of the shapes [see Methods; responses of a simulated population of V1 neurons (V1Sim) yielded highly similar results]. The correlation between PIX and behavioral performance was non-significant (r = 0.110, P = 0.84, N = 6 shape pairs), which potentially allows us to find differential correspondences between the neural responses and either physical properties or the behavioral output of the animal.

**Figure 4A** shows scatter plots of the neural discriminability against either the pixel-based differences (top row) or the behavioral discrimination performance (bottom row). For physical dissimilarity, we pooled both measures, PIX and V1Sim, because they were highly correlated. For the behavioral results, we pooled the performances of both animals that had to learn the same pair (error bars are calculated over both animals; individual error bars are shown in **Figure 1D**). On a qualitative level, the dots seem to be close to the identity line in V1 for physical measure and diverge in higher areas. The opposite trend is seen for BEH where correspondence improves drastically toward higher areas.

To quantify these effects, we use two separate measures of correspondence: chi-square (see Methods) and Pearson correlations. Both are presented with permutation statistics (see Methods).

chance toward higher areas. (C) Results for the control analysis in which we reduced the number of units included in individual sub-samplings of the SVM classifier. Average performance decreases with lower number of cells included, but the overall pattern of classification is preserved. The order of shape pairs corresponds to that in Figures 1B,D.

**Figure 4B** shows chi-square values for both PIX (black bars, V1 = 0.08 p < 0.0001, LM = 0.16 P = 0.4300, LI = 0.26 P = 0.3429, LL = 0.35 P = 0.3755, TO = 0.39 P = 0.1858) and BEH (gray bars, V1 = 0.75 P = 0.1518, LM = 0.23 P = 0.0076, LI = 0.21 P = 0.0220, LL = 0.14 P = 0.0336, TO = 0.08 P = 0.0308), the red vertical lines indicate the distribution of values obtained through the permutation analysis (see Methods). If a bar is outside of the overlaid red line, the observed value is significant and we can reject the null hypothesis that the order of pairs is not important. This shows us that neural responses patterns in V1 and PIX are more similar than expected by chance. The neural responses in the four extrastriate areas show a significant correspondence with BEH.

In **Figure 4C** we show the correlation values for both PIX (black bars, V1 r = 0.88, P = 0.02; LM r = −0.09,P = 0.86; LI r = −0.10,P = 0.85; LL r = −0.07,P = 0.89; TO r = −0.27,P = 0.60) and BEH (gray bars, V1 r = 0.44,P = 0.38; LM r = 0.91,P = 0.01; LI r = 0.85,P = 0.03; LL r = 0.84,P = 0.04; TO r = 0.83,P = 0.04), again the red vertical lines indicate the distribution of values obtained through the permutation analysis (see Methods). The correlation with PIX is only significant for neural data in V1, while BEH correlates significantly with response pattern obtained in extrastriate areas. Here we report correlations including only the six shape pairs used in the behavioral experiment. The correlations with pixelbased differences show a very similar pattern when calculated using data from all 15 possible shape pairs (these correlations are reported in Vermaercke et al. (2014b).

Thus, Chi-square values and correlations show a consistent effect for TO compared to V1, with a strong correlation between neural discriminability and behavioral performance in TO and no correlation in V1. For Chi-square values the change from V1 and TO seems to occur gradually, with intermediate results in the intermediate brain regions, while for correlations all non-V1 areas have a strong correlation with behavior.

We did not make a priori predictions about the nature of shape representations in intermediate areas along the pathway. Predictions were very clear-cut, however, for how the representation of shape should be different when comparing the two extreme areas: we expected V1 neural discriminability to correlate well with pixel-based stimulus differences, and TO neural discriminability to correlate well with behavioral performance. We constructed a "transformation index" that captures this shift in the nature of shape representations in one value (see Methods). This index essentially results in one number that tells us how the similarity of neural responses to stimuli (PIX) and behavior (BEH) changes from V1 to TO. For chi-square, this value is -0.9763, outside of the range (−0.0399 0.0434) and significance p < 0.001. For correlations, the significance of the empirically observed transformation index [2.37, outside of the range (−2.1930 1.7294)] was p < 0.05. Thus, the prediction of a transformation in how shape is represented from V1 to TO was confirmed by the data.

#### Fine Transition of Representations in V1

We performed a similar analysis with the V1 data from the orthogonal penetrations in which we distinguished between upper and lower layers. This is a relevant additional dataset because in the diagonal recordings the V1 data are biased toward the upper layers (see Vermaercke et al., 2014b). Thus, we can consider upper-layer recordings in the orthogonal penetrations as a replication attempt of the results from the diagonal penetrations, while the lower-layer recordings provide new data to test whether there is already a transformation of shape selectivity within V1.

We classified all units beyond a depth of 500 micron as belonging to the lower layers (for more information, see Vermaercke et al., 2014b). We recorded in total from 131 neurons in three animals (V1 Upper or V1U: 61, V1 Lower or V1L: 70). After selection based on responsiveness and number of trials, we retained 44 units in V1U and 57 units in V1L. Using these new (non-overlapping) V1 data we find a further finer transition within V1. In terms of average classifier performance for the six shape pairs, both subdivisions of V1 achieve high scores (V1U = 94.11%, V1L = 88.78%, see **Figure 5B**; the data for V1 and LM from the previous section are replotted for reference, colors match those in **Figure 3A**). When we run our permutation analysis on the scatterplot data shown in **Figure 5A**, we find that V1 and V1U show significant chi-square values for PIX (V1: P = 0.0138, V1L: P = 0.0055, see **Figure 5C**), while V1L and LM show significant chi-square values for BEH (V1L: P = 0.0507, LM: P = 0.0151).

The correlations between neural responses in V1 and V1U and PIX show a similar pattern (V1: r = 0.8821, P = 0.0140, V1U:

FIGURE 5 | Summary of neural data collected in upper and lower layers of V1. (A) Scatter plots showing neural discriminability of six shape pairs in upper and lower layers of V1 relative to physical dissimilarity (average between PIX and V1Sim measures; top row) and behavioral performance (average for two rats; bottom row). Results from Figure 4 of the V1 and LM recordings from the diagonal penetrations are re-plotted here for visual comparison. Error bars indicate SEM over both dissimilarity measures in top row, SEM over both animals in bottom row and SEM for neural data in both. (B) Average SVM classification performance for the six pairs in upper layers of V1 (V1U, shown in black) and lower layers of V1 (V1L, shown in black). We also show the data for V1 and LM (gray bars) shown in the Figure 3A, for comparison. Overall performance is high in all four areas. Error bars show SEM for six shape pairs. (C) Chi-square values for the scatterplots shown in (A). V1U results are very similar to the results obtained in V1 in the diagonal recordings. V1L results fall in between V1 and LM from the diagonal recordings. Red vertical lines indicate the random distribution obtained through permutation analysis. Stars indicate significant chi-squares values at the 0.05 level. (D) The correlations for the data shown in (A). Again, V1U is more comparable to the data we collected in V1 during diagonal recordings and V1L forms an intermediate step in between V1 and LM. Red vertical lines indicate the random distribution obtained through permutation analysis. Stars indicate significant correlations at the 0.05 level.

r = 0.8367, P = 0.0062, see **Figure 5D**), and V1L and LM relate more to BEH (V1L: r = 0.7982, P = 0.0202, LM: r = 0.9072, P = 0.0106). Parametric tests show that the correlation between V1L and BEH is not significant (P = 0.057), which contradicts the result from the permutation statistic, so it would be prudent to say that the lower layers in V1 may form an intermediate step between upper layers in V1 and LM.

The pattern that emerges is that the representation in the upper layers is most similar to PIX/V1Sim while lower layers are already shifted partially toward the extrastriate regions (see **Figure 5A**). One interpretation could be that upper layers receive information from thalamus and after initial processing, transmit it further to downstream areas. After this first step of information reformatting, neural discriminability in V1L starts to resemble behavioral performance and this becomes even clearer in area LM.

# Discussion

We obtained a behavioral measure of shape similarity from two independent groups of rats. We also recorded neural responses to individual stimuli in yet another group of naïve rats. Taken together, both datasets allowed us to determine which cortical area is most likely to underlie behavior. As expected, primary visual cortex encodes the stimuli in terms of simple features, which is well captured by pixel similarity and convolutiontype models. Higher areas show more similarity to behavioral responses, with highest area TO showing the best fit. These results indicate that visual information is transformed from representing simple features to a representation that is used to drive behavior, a process reminiscent of ventral stream in non-human primates (Op de Beeck et al., 2001; Dicarlo et al., 2012). As reported by Vermaercke et al. (2014b), neural responses in area TO also tend to be most robust to changes in stimulus position, which would make these responses more reliable to be used in behavioral decision making. At least to some degree, invariance is needed to complete a swim trial, so performance would not be expected to depend purely on physical differences between stimuli. The representation of a shape in V1 is highly dependent on it's retinal position, which changes drastically during swimming. Basing performance on the population response in V1 would be sub-optimal during a swim task, even though it has a better capability of reliably encode patterns in the outside world. At least in our untrained animals, the neural data show that even though responses to shapes are reduced in higher areas, the representation becomes more informative to the task as populations of neurons in these areas prefer the same shape in different positions.

The previous reported work of Vermaercke et al. (2014b) examines many properties of neuron in multiple areas along a diagonal track through lateral visual cortex. Based on retinotopy, latency and to some extent, receptive field size, they defined five different areas. Using neural responses elicited by the six shapes, they were able to characterize that the representation of information changes over areas. Moreover, by presenting stimuli at different positions within the receptive field, they found evidence for increasing generalization performance for

the same shape at the other position, indicative of position tolerance. Taken together, these data indicate that the five areas are part of a hierarchical network that may be involved in shape processing. The current study focuses on a subset of the shape pairs to investigate how well naïve animals would be able to differentiate between them at the behavioral level. By quantifying the representations in each of the areas, we were able to pinpoint some of the transformations the visual information undergoes. There appears to be a sharp transition between areas V1 and LM, however, as shown in Table 1 of Vermaercke et al. (2014b), the pattern of transition between areas depends on what feature is being investigated. Some features show a stepwise pattern (not always V1 vs. other areas), other properties (e.g., orientation tuning) change gradually over areas.

The present study is obviously limited by the simplicity of the stimuli used, and the low number of different stimuli. Future studies should be conducted with more stimuli and with more stimulus pairs. This would require a more automated setup, unlike the labor-intensive visual water maze used in the present study. Typically, rats show relatively fast learning curves and high accuracy rates in this visual discrimination water maze, more so than often obtained in tasks using liquid or food rewards (Zoccolan et al., 2009; Meier et al., 2011; Tafazoli et al., 2012; Vermaercke and Op de Beeck, 2012; Alemi-Neissi et al., 2013). The level of motivation might be considerably higher when animals have to escape from a water tank. Despite these benefits, the visual-water task includes a low number of trials per session, and each trial has to be started manually by the experimenter. This limits the number of stimuli for which reliable performance estimates can be obtained.

Future studies could make use of parametric stimulus sets that are constructed to test specific predictions on how rats process visual objects (e.g., rotated views of objects, morphs between two know prototypes, different classes of objects etc.). Here we correlated behavioral data with neurophysiological recordings in other, naïve animals. Ideally, future studies would perform the neural recordings during the execution of the behavioral task so

# References


that direct and more causal relations between neural responses and behavioral outcomes can be investigated.

As a follow-up to the present study, we continued training with the first batch so that all animals eventually were trained in all six pairs followed by a recall phase in which performance for all pairs was checked. The data from this further testing are hard to interpret because of interference between the different shape pairs (e.g., already higher than chance performance on the first day of a new pair), but in the present context it is relevant that average performance in this recall phase was well above 70% correct, for each animal (75.86, 78.82, 83.30, 86.08, 79.68%). For one pair (+ vs. H) performance was still rather low (66.34%), suggesting that it might be extremely hard for the rats to disentangle the representations of both stimuli. Nevertheless, discrimination performance was above chance even for this pair, indicating that all the shape pairs can eventually be learned by the animals, most likely even up to close to 100% correct with long enough training. For further work it would be interesting to investigate the neural representation in these areas during and after training. Using chronically implanted electrodes or two-photon imaging, it should even be possible to monitor neural population and to characterize how the representations in the different cortical areas are changing due to the training. Causal manipulations within the same animal (e.g., lesions, optogenetics or pharmacology) will be crucial in shedding light on the importance of each visual area for shape processing and behavior.

# Acknowledgments

This work was funded by the Research Council Leuven (grant GOA/12/008), and the Fund for Scientific Research FWO-Flanders (grants G.0819.11 and G.0A39.13). BV was supported by a postdoctoral fellowship from the Research Council Leuven and is currently a postdoctoral fellow of FWO-Flanders. The authors thank Vincent Smaers for his assistance during the collection of the behavioral data.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Vermaercke, Van Den Bergh, Gerich and Op de Beeck. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Nicholas J. Priebe<sup>1</sup> and Aaron W. McGee<sup>2</sup>\***

<sup>1</sup> Section of Neurobiology, School of Biological Sciences, University of Texas at Austin, Austin, TX, USA

<sup>2</sup> Developmental Neuroscience Program, Saban Research Institute, Children's Hospital of Los Angeles, Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

#### **Edited by:**

Andrea Benucci, RIKEN Brain Science Institute, Japan

#### **Reviewed by:**

Tim Murphy, The University of British Columbia, Canada Dennis Eckmeier, Cold Spring Harbor Laboratory, USA Prakash Kara, Medical University of South Carolina, USA

#### **\*Correspondence:**

Aaron W. McGee, Developmental Neuroscience Program, Saban Research Institute, Children's Hospital of Los Angeles, Department of Pediatrics, Keck School of Medicine, University of Southern California, 4650 W Sunset Blvd, Los Angeles, CA 90027, USA e-mail: amcgee@usc.edu

Genetic programs controlling ontogeny drive many of the essential connectivity patterns within the brain. Yet it is activity, derived from the experience of interacting with the world, that sculpts the precise circuitry of the central nervous system. Such experience-dependent plasticity has been observed throughout the brain but has been most extensively studied in the neocortex. A prime example of this refinement of neural circuitry is found in primary visual cortex (V1), where functional connectivity changes have been observed both during development and in adulthood. The mouse visual system has become a predominant model for investigating the principles that underlie experience-dependent plasticity, given the general conservation of visual neural circuitry across mammals as well as the powerful tools and techniques recently developed for use in rodent. The genetic tractability of mice has permitted the identification of signaling pathways that translate experience-driven activity patterns into changes in circuitry. Further, the accessibility of visual cortex has allowed neural activity to be manipulated with optogenetics and observed with genetically-encoded calcium sensors. Consequently, mouse visual cortex has become one of the dominant platforms to study experience-dependent plasticity.

**Keywords: ocular dominance plasticity, visual cortex, binocularity, inhibition, development**

The establishment of normal primary visual cortex (V1) binocularity and depth perception (stereopsis) in humans depends critically on visual experience, particularly during development (McKee et al., 2003). Disrupting concordant vision between both eyes early in life generates amblyopia, a visual deficiency that cannot be explained by alterations in retinal function of the affected eye (Hubel and Wiesel, 1965; Lepard, 1975; Kiorpes et al., 1998). Amblyopia can arise due to either a difference in depth of focus between the two eyes (anisometropia) or from the eyes not properly moving in parallel (strabismus), and it is thought to occur in 1–5% of the human population (Webber and Wood, 2005). Amblyopia results in a number of deficits in spatial vision, including lower visual acuity and depth perception (Levi et al., 1979; Harwerth and Levi, 1983; McKee et al., 2003). While patching the non-affected eye is current standard of care for improving function of the affected eye, this approach is less effective after adolescence, a time in life characterized by the close of what is termed the critical period for brain circuit plasticity. For each sensory system there exists a developmental period in which experience has a remarkable role in shaping cortical connectivity and beyond which this influence is mostly lost (Simons and Land, 1987; Lendvai et al., 2000; Zhang et al., 2002; de Villers-Sidani et al., 2007; Poo and Isaacson, 2007). Understanding the mechanisms that both govern and drive experience-dependent plasticity during the critical period, as well as those that control the timing of the critical period, could provide therapeutic interventions to improve recovery from amblyopia and other neurodevelopmental disorders.

### **CONSERVATION OF NEURAL CIRCUITRY FOR VISION**

The functional convergence of right and left eye information occurs in V1; binocular integration within V1 has become the primary platform for studying experience-dependent plasticity. Normally, the information from the two eyes is combined in V1 to generate a three-dimensional representation of the visual world: because the two eyes are horizontally offset they signal distinct perspectives on the visual scene, and those distinct signals are used to compute the distances of objects in the world. The monocular signals from the two retinae leave the eye via the optic tract and cross at the optic chiasm. In mammals, the axons of neurons located in the nasal portion of the retina cross the midline in the optic chiasm and project to subcortical targets on the contralateral side via the optic tract. Neurons from the temporal portion of the retina, in contrast, project to ipsilateral subcortical targets. This specific crossing pattern ensures that animals with frontally-positioned eyes (e.g., cat, ferret, primate) will have signals from both eyes for corresponding regions of the retinae. Importantly, the projections from the contralateral and ipsilateral eyes innervate separate sections of their subcortical targets. For example, retinal ganglion cell axons that innervate the visual thalamus lateral geniculate nucleus (LGN), provide inputs to separate portions of the LGN, and thus the LGN relay cells that project to V1 are monocular. The binocularity observed in V1 is therefore primarily due to a mixing of monocular inputs from the LGN relay cells.

Because experience drives similar changes in both the functional response properties of cortical neurons and the anatomical projections to visual cortex from the thalamus across mammals (Antonini and Stryker, 1993; Antonini et al., 1999), the ease of accessibility and genetics, as well as the compendium of available tools, techniques, and resources for mouse has led to it becoming a standard system to investigate both the governing principles and mechanisms necessary for activity-dependent plasticity. That said, while the mouse has a number of advantages as a model system, it is important to note that in addition to similarities, there also exist large differences between rodents and other mammals that have been studied previously. One of the primary differences is the positioning of the two eyes (**Figure 1**). In the rodent the eyes are positioned laterally, in contrast to the frontal location of human eyes. This hemi-panoramic vision has consequences for studying cortical binocularity, as the visual world seen by both eyes in front of the mouse is small, covering only the central 50◦ (Drager, 1978), compared to 135◦ in man. Therefore, much of the mouse visual system is devoted to monocular—rather than binocular—vision. This difference in eye placement is evident at the optic chiasm: in man approximately 45% of retinal ganglion cell axons project to the ipsilateral LGN, whereas in the mouse only 4% of retinal ganglion cell axons project to the ipsilateral LGN (Dräger, 1974; Godement et al., 1984). Additionally, the anatomical organization of the LGN is distinct in the human and mouse (**Figure 1**). The human LGN contains multiple segregated eye-specific laminae, whereas the mouse LGN is not laminar but dominated by the contralateral eye with only a small ipsilateral patch (Dräger, 1974). Finally, the functional and anatomical organization of eye-specific signals in primary visual cortex (V1) differs between primates and mice. In primates, V1 is well characterized by a regular columnar organization for ocular dominance (OD) (Hubel and Wiesel, 1977; Adams et al., 2007). V1 neurons across cortical layers

nucleus (LGN), whereas the contralateral eye (red) provides the

termed ocular dominance columns.

share preference for one eye over the other eye, and this ocular preference changes gradually at regular intervals across the surface of cortex. Because of this organization, primate V1 neurons near one another share functional selectivity. In mice, however, no such columnar organization has been observed, and V1 neurons near one another have little functional relationship to each other (Gordon et al., 1996; Antonini et al., 1999).

Despite these differences in functional architecture across mammals, visual experience sculpts the selectivity of neurons in all mammals examined to date. The effects of activity on neural circuitry are particularly pronounced within the developmental critical period. During the critical period, the functional response properties of neurons, particularly OD, may be manipulated by perturbing the incoming signals from the periphery. Changes resulting from such manipulation are durable, generally persisting through adulthood. Hubel and Wiesel demonstrated OD plasticity in cat by occluding one eye (monocular deprivation, MD) (Wiesel and Hubel, 1963) or disrupting the alignment of the two eyes (strabismus) (Hubel and Wiesel, 1965) during the developmental critical period. After MD, V1 neurons responded strongly to the open eye and weakly to the closed eye; after strabismus, V1 neurons were far less binocular than in normal animals. This decrease in binocularity arises in part from the disruption of normal synaptic integration of binocular inputs by simple cells in visual cortex (Scholl et al., 2013b). In concert with these functional changes, anatomical correlates of experiencedependent plasticity have also been observed. The LGN relay cells that provide inputs to V1 neurons undergo a period of refinement during development in the cat and the primate (Rakic, 1976; Hubel et al., 1977; LeVay et al., 1978; Löwel, 1994). Initially, the right and left eye thalamocortical projections intermix in layer IV, but over the course of development these projections become increasingly patchy and periodic.

These same patterns of activity-dependent changes during the critical period have not only been observed in mouse V1, but in all mammals in which they have been tested (e.g., rabbit: (Van Sluyters and Stewart, 1974), rat: (Maffei et al., 1992) cat: (Wiesel and Hubel, 1963) sheep: (Martin et al., 1979) hamster: (Emerson et al., 1982) macaque: (Hubel et al., 1977) marmoset: (DeBruyn and Casagrande, 1981)). Both MD and strabismus generate changes in the functional response properties of V1 neurons, causing V1 neurons to be more sensitive to the open eye in MD (Wiesel and Hubel, 1963), and less binocular following disruptions of simultaneous patterned activity from the two eyes (Hubel and Wiesel, 1965; Gordon and Stryker, 1996). Monocular deprivation also causes anatomical shifts in the thalamocortical projection, enhancing the growth of the thalamocortical axonal arbors associated with the open eye (Antonini et al., 1999).

# **GENETIC DISSECTION OF OD PLASTICITY**

Many specific genes have been identified as necessary for OD plasticity in mice (**Figure 2**). The products of such genes are known to operate at different locations within the neuron, from components of the postsynaptic density (Taha and Stryker, 2002; Taha et al., 2002; Sawtell et al., 2003; Ranson et al., 2013) to transcription factors in the nucleus (Pham et al., 1999; Mower et al., 2002) and proteins redistributed to the dendritic compartment that regulate protein stability and turnover (Tagawa et al., 2005; McCurry et al., 2010; Shepherd and Bear, 2011). These genes can be broadly categorized into two groups: (1) necessary pieces of the neural machinery to drive changes in the strength of synaptic connections; and (2) controllers of when and how much plasticity is induced.

The genes required for OD plasticity overlap with those that have been implicated in other forms of plasticity, particularly long-term potentiation (LTP) and long-term depression (LTD) (but see Rao et al., 2004). A critical synaptic factor that appears to be the first step in OD plasticity is activity at the N-methyl-D-aspartate (NMDA) receptor, which is required for synaptic plasticity and OD plasticity (Bear et al., 1990; Sawtell et al., 2003). Because the NMDA receptor (NMDAR) is voltage-gated, opening only when the neuron is already depolarized, it signals the coincident activation of incoming synaptic inputs and the activation of the neuron itself. N-methyl-Daspartate channels are permeable to sodium, potassium and, importantly, allow the influx of calcium. It is this calcium influx that initiates the signaling cascade that eventually leads to changes in synaptic weight and that is required for normal OD plasticity.

Indeed, the calcium influx triggers a number of molecular pathways required for OD plasticity. It has been demonstrated previously that disrupting the interaction between incoming calcium and CaMKII (Taha et al., 2002), cAMP (Beaver et al., 2001; Fischer et al., 2004) or calcineurin (Yang et al., 2005) interferes with OD plasticity during the critical period. These initial calcium-driven signals lead directly or indirectly, through additional kinases such as Extracellular signal-regulated kinase (ERK; Di Cristo et al., 2001), to the activation of activity-dependent regulators of gene expression, including the calcium/cyclic AMP binding element (CREB; Pham et al., 1999, 2001). Thus, perturbing the calcium signaling pathway by weakening or eliminating a step in the cascade diminishes both synaptic plasticity and OD plasticity, providing strong evidence that synaptic modifications are a central and necessary component for the functional changes in selectivity of neurons in V1 during the critical period (Silva, 2003; Taha and Stryker, 2005).

# **GENETIC AND CIRCUIT REGULATION OF THE CRITICAL PERIOD**

In parallel with the molecular signals necessary to drive plasticity, an additional set of genes governs the timing of the critical period. While the ecological benefit of constraining plasticity to a narrow time window (P20–P32 in mice) is unclear, the conditions required for plasticity are now being uncovered. Opening the critical period requires a discrete maturation of inhibitory cortical circuitry (Levelt and Hübener, 2012). The differentiation of inhibitory neurons expressing the calcium binding protein parvalbumin (PV) precedes the onset of the critical period (Huang et al., 1999), and it has been demonstrated that OD plasticity may be induced earlier in mouse V1 by artificially increasing inhibition (Fagiolini and Hensch, 2000; Iwai et al., 2003). Indeed, increasing levels of Brain-Derived Neurotrophic Factor (BDNF), which may accelerate the maturation of inhibitory circuitry,

drive a precious critical period for OD plasticity in mouse V1 (Hanover et al., 1999; Huang et al., 1999). Reducing the amount of GABA<sup>A</sup> mediated inhibition in cortex, either by deleting GAD 65 (glutamic acid decarboxylase 65 kD), an enzyme required for synthesis of the inhibitory neurotransmitter GABA, or deleting the gene NARP, a pentraxin molecule required for normal excitatory drive onto inhibitory neurons during development, prevents opening of the critical period (Fagiolini and Hensch, 2000; Gu et al., 2013). Another method to delay inhibitory neuron development, and thus the critical period, is to darkrear animals (Huang et al., 1999). Only once those animals are moved into normal lighting conditions does the critical period open. Thus, the amount of cortical inhibition, particularly inhibition mediated by PV interneurons, appears to be an essential factor in controlling the opening of the critical period for OD plasticity.

Extracellular signals play a critical role at the closure of the critical period. For example, the distribution of perineuronal nets (PNNs), which contain chondroitin sulfate proteoglycans (CSPGs) that are components of the extracellular matrix that inhibit axonal growth, plateaus at the end of the critical period (Pizzorusso et al., 2002). The distribution of myelination in visual cortex also plateaus as the critical period closes (McGee et al., 2005) and intracortical synaptogenesis begins to decline (Morales et al., 2002). Two genes related to these alterations to the extracellular environment of visual cortex are required to close the critical period. Nogo receptor 1 (NgR1) is a neuronal receptor both for CSPGs as well as several inhibitors of neurite outgrowth associated with myelin membranes (McGee and Strittmatter, 2003; Dickendesher et al., 2012). Mice that lack NgR1 continue to display OD critical period plasticity into adulthood (McGee et al., 2005). The cartilage link protein (Crtl1) also plays an essential role in closing the critical period for OD plasticity. CRT1 is a neuronal product that triggers the formation of the PNNs (Carulli et al., 2010). Normally CRT1 is upregulated in V1 as the critical period closes; mice lacking Crtl1 retain OD plasticity into adulthood like the NgR1 mutant mice (Carulli et al., 2010). In addition to these two proteins that interact with the extracellular matrix, a third gene, Lynx1, an important regulator of cholinergic tone that increases at the end of the critical period. Mice lacking Lynx1 continue to display OD plasticity into adulthood, indicating that cholinergic signaling also plays a role in closing the critical period (Morishita et al., 2010).

Upon closure of the critical period, OD plasticity is attenuated but not absent in V1. Partial shifts in OD can still be detected by single-unit recordings, though these require longer periods of MD (e.g., 6+ days in adults vs. 4 days during the critical period) (Hofer et al., 2006). During the critical period, OD plasticity appears to proceed in two stages that overlap considerably: a weakening of responses to the deprived eye followed by a homeostatic strengthening of the non-deprived eye (Frenkel and Bear, 2004; Hofer et al., 2006). This latter homeostatic component of OD plasticity requires Tumor Necrosis Factor alpha (TNFα; Kaneko et al., 2008). Intriguingly, adult plasticity is primarily confined to a slow strengthening of the non-deprived eye by a distinct mechanism that is largely independent of TNFα but requires CaMKII (Ranson et al., 2012).

# **REACTIVATING VISUAL PLASTICITY IN THE ADULT**

One major focus of research into OD plasticity has been to understand how, and whether, plasticity may be enhanced in adults to improve recovery from neurological disorders. The first approach demonstrating that the critical period for visual plasticity could be reopened involved injecting immature astrocytes into adult cat visual cortex (Muller and Best, 1989). Several pharmacologic and environmental manipulations subsequently have been reported to restore developmental OD plasticity to the adult visual system of rats and mice. One approach has been to disrupt the extracellular signals that prevent synaptogenesis and neurite outgrowth. Injection of chondroitinase ABC degrades the CSPGs present in PNNs surrounding PV interneurons. This treatments yields modest OD plasticity (Pizzorusso et al., 2002). How loss of these PNNs affects the function of PV interneurons or impacts cortical circuitry is not yet clear. An alternative approach has been to alter the activity of inhibitory interneurons, and thus the balance between excitation and inhibition in V1. Several strategies have been employed to do this, including direct injection of immature inhibitory neurons (Southwell et al., 2010), dark exposure (He et al., 2006), administration of fluoxetine (Maya Vetencourt et al., 2008), and environmental enrichment (Sale et al., 2007). Direct reduction of overall cortical inhibition by infusing GABAa antagonists also partially restores OD plasticity (Harauzov et al., 2010). The degree to which these approaches may affect excitatory to inhibitory balance is not yet known (Morishita and Hensch, 2008).

Classic genetics, pharmacology, and environmental manipulations have revealed important aspects of both the regulation and mechanisms of OD plasticity in mouse. The combination of sophisticated tools for manipulating and measuring neuronal function in mice is now permitting the dissection of the progression of experience-dependent plasticity through the cortical circuit with greater cell-type specificity and temporal precision. For example, a recent study revealed that OD plasticity requires a decrease in inhibitory drive from a specific inhibitory cell type (Kuhlman et al., 2013). In this study, Kuhlman et al. discover with cell-attached recordings *in vivo* that an early event following MD during the critical period is a paradoxical increase in neuronal responsiveness of pyramidal (PYR) neurons in layer (L) 2/3 to visual stimulation of either eye. This disinhibition results from a decrease in excitatory drive onto L2/3 PV neurons from L4 and is only observed with MD during the critical period. Interestingly, decreasing the activity specifically of PV neurons with designer receptors exclusively activated by designer drugs (DREADDs) (Armbruster et al., 2007; Ferguson et al., 2010) in concert with MD in adult mice results in visual plasticity indistinguishable from what is observed during the critical period. These experiments are a compelling demonstration of the utility of emerging techniques available for mouse to investigate how plasticity may originate and propagate through cortical circuitry. These available genetic and molecular tools will permit experiments in the mouse that are very difficult, at a minimum, to undertake in other animal model systems.

# **OD PLASTICITY AND ACUITY**

Short periods of MD (2–4 days) during the critical period in both mouse and cat shift OD, whereas longer MD (long-term MD, LTMD, 10 or more days) results in poor acuity in the deprived eye (Giffin and Mitchell, 1978; Prusky and Douglas, 2003). LTMD throughout the critical period has been employed as a model of amblyopia in cats and rodents for decades. The effects of LTMD on acuity may stem from a combination of changes in the periphery as well as in cortical circuitry. Lid closure can cause changes in the shape of the eye (Wallman et al., 1978), potentially disrupting optics, thus creating either myopia or hyperopia in one eye (Kiorpes and Wallman, 1995). Unequal refractive error in the eyes can then lead to changes in the cortical circuitry (e.g., Kiorpes et al., 1998). One model is that loss of cortical responsiveness to the deprived eye reduces visual acuity and the subsequent close of the critical period consolidates this visual impairment. Approaches that reactivate developmental visual plasticity, particularly when any anisometropia is corrected, may therefore be expected to improve recovery from LTMD.

Several manipulations in rodents that enhance OD plasticity also improve visual acuity following LTMD (Morishita and Hensch, 2008). Treatment with chondroitinase ABC to block extracellular signals, and environmental enrichment in combination with briefly closing the previously non-deprived eye (reverse suture), restores visual acuity in the deprived eye to normal (Pizzorusso et al., 2006; Sale et al., 2007), as does dark exposure, administration of fluoxetine, and deletion of either the Lynx1 or NgR1 gene (He et al., 2006; Morishita and Hensch, 2008; Morishita et al., 2010; Stephany et al., 2014). This string of correlation has led to the model that OD plasticity and the recovery of acuity in rodents following LTMD are linked. However, genetic dissection of the requirement for NgR1 to close the critical period reveals these facets of visual plasticity are dissociable. While completely abolishing expression of NgR1 permits both OD plasticity and recovery of acuity after LTMD, restricting deletion of NgR1 to PV maintains developmental OD plasticity in the adult but is not sufficient to improve acuity after LTMD (Stephany et al., 2014). The ability to make such specific, targeted changes in protein expression illustrates the power that the mouse model can provide to our understanding of cortical neural circuitry.

# **AUTISM AND OD PLASTICITY**

It is the hope that understanding the conditions that support critical period plasticity will eventually yield therapeutic approaches for acutely reactivating developmental plasticity, aiding in the correction of amblyopia as well as the spectrum of neurologic disorders, including autism (LeBlanc and Fagiolini, 2011), brain injury (Maurer and Hensch, 2012), and perhaps even prevention of neurodegeneration. In this regard, the sensitivity of the mouse cortex to visual disruption is particularly useful for exploring how genes implicated in syndromic forms of neurodevelopmental disorders may alter the relationship between experience and neural circuit refinement in the developing brain.

For example, OD plasticity has been examined in mouse models of Fragile X syndrome (FXS; Dölen et al., 2007) and Angelman's syndrome (Yashiro et al., 2009; Sato and Stryker, 2010). Fragile X syndrome is a leading cause of developmental mental impairment and although symptoms vary in severity and expression, characteristic deficits include reduced intellectual abilities, hyperactivity, increased seizure susceptibility, and impaired visuo-spatial processing (Pfeiffer and Huber, 2009). Mice lacking a functional gene for fragile X mental retardation 1 *(FMR1*) phenocopy some aspects of FXS and have deficits in OD plasticity. Whereas MD during the developmental critical period decreases deprived eye responses in normal (wildtype) mice, *FMR1* mutants exhibit a potentiation of open eye responses similar to the visual plasticity resident in the adult visual system (Frenkel and Bear, 2004; Dölen et al., 2007). Whether *FMR1* mutant mice are responsive to LTMD is as yet unknown. Interestingly, *FMR1* mutant mice also display an imbalance of neocortical excitation and inhibition (Gibson et al., 2008).

Angelman's syndrome is caused by mutations that disrupt expression of ubiquitin E3 ligase (*UBE3A*), a gene sensitive to genomic imprinting (Kishino et al., 1997; Matsuura et al., 1997). Symptoms of Angelman's syndrome include mental impairment, seizures and behavioral abnormalities (Clayton-Smith and Laan, 2003). Ubiquitin E3 ligase mutant mice do not exhibit OD plasticity with short (3-day) MD during the critical period as measured by either visually-evoked potentials or optical imaging of intrinsic signals (Yashiro et al., 2009; Sato and Stryker, 2010), but instead display limited OD plasticity with LTMD both during the critical period and as adults. Ubiquitin E3 ligase mutant mice also possess a deficit in the balance of excitatory and inhibitory cortical neurotransmission (Wallace et al., 2012). This phenotype is reminiscent of the mice mutant for GAD65 (above) in which the maturation of inhibitory cortical circuitry is impaired (Fagiolini and Hensch, 2000). Whether enhancing inhibition rescues visual plasticity in the UBE3A mice, akin to the effects of diazepam on GAD65 mutants, not been reported.

As both *FMR1* and *UBE3A* mutant mice display aberrant E/I balance, these associated deficits in experience-dependent visual plasticity may share a common circuit-level dysfunction. OD plasticity was evaluated in the both *FMR1* and *UBE3A* mutants with visually-evoked potentials (Dölen et al., 2007; Yashiro et al., 2009) and optical imaging of intrinsic signals (Sato and Stryker, 2010), techniques with less temporal and spatial specificity than either single-unit recordings or emerging approaches to study OD plasticity such as cell-attached recordings *in vivo* and calcium imaging (Kuhlman et al., 2013). As recent studies have begun to dissect with greater precision the interaction between components of the cortical circuitry that drive OD plasticity, this model may continue to improve as a useful framework for understanding if mutations in other genes also linked to syndromic forms of autism spectrum disorders, including neuroligin 3 (*NLGN3*), Src Homology-3 domain and multiple ankyrin repeat domains protein 3 (*SHANK3*), and Methyl CpG binding protein 2 (*MECP2*), interfere with experience-dependent plasticity conserved within neocortex.

# **DIRECTIONS OF FUTURE VISION RESEARCH IN MOUSE**

A compendium of tools are now available for selectively expressing or deleting genes with various drivers of Cre recombinase (CRE), manipulating the activity specific neuronal populations with optogenetics, and measuring the activity of populations of neurons with genetically-encoded calcium indicators. These techniques are essential tools to dissect how experiences shape cortical circuitry. For example, by combining specific CRE drivers (Madisen et al., 2012) with CRE-dependent genetically-encoded calcium indicators, it may be possible to monitor plasticity during MD in specific cortical layers or subsets of interneurons with chronic calcium imaging *in vivo*. Similar experiments could then be performed on various mutant mice that lack OD plasticity in order to determine how and where plasticity is disrupted by these mutations, as well as within which neuronal populations these genes operate.

Importantly, the utility of the mouse is not restricted to OD plasticity. The mouse may serve as a model system for examining several outstanding questions in vision research. Several characteristics of visual circuitry are conserved between mouse and carnivores, including linear vs. nonlinear spatial summation, contrast-invariant tuning, and selectivity for stimulus parameters such as orientation and spatial frequency (Niell and Stryker, 2008). Thus, although mouse V1 lacks OD columns and possesses relatively poor spatial vision, it may nonetheless serve as a beneficial model system for investigating these properties of visual circuitry and potentially others, such as including disparity tuning (Scholl et al., 2013a) and/or simple and conserved relationships and connectivity between V1 and higher visual areas (Marshel et al., 2011). Overall, despite its small size and relatively simple architecture, the mouse visual system will continue to offer unique advantages for studying how experience shapes neural circuitry, allowing the field to ask—and answer—key questions with far-reaching relevance.

# **REFERENCES**


experience-dependent plasticity in developing visual cortex. *Neuron* 58, 673– 680. doi: 10.1016/j.neuron.2008.04.023


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 July 2014; accepted: 18 September 2014; published online: 02 October 2014*.

*Citation: Priebe NJ and McGee AW (2014) Mouse vision as a gateway for understanding how experience shapes neural circuits. Front. Neural Circuits 8:123. doi: 10.3389/fncir.2014.00123*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Priebe and McGee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Treatment of amblyopia in the adult: insights from a new rodent model of visual perceptual learning

#### **Joyce Bonaccorsi <sup>1</sup> , Nicoletta Berardi 1,2 and Alessandro Sale<sup>1</sup>\***

<sup>1</sup> Department of Medicine, Institute of Neuroscience CNR, National Research Council (CNR), Pisa, Italy

<sup>2</sup> Department of Psychology, Florence University, Florence, Italy

#### **Edited by:**

Davide Zoccolan, International School for Advanced Studies, Italy

#### **Reviewed by:**

Stephen D. Van Hooser, Brandeis University, USA Konrad Lehmann, Universität Jena, Germany

#### **\*Correspondence:**

Alessandro Sale, Department of Medicine, Institute of Neuroscience CNR, National Research Council (CNR), Via Moruzzi 1, 56100 Pisa, Italy e-mail: sale@in.cnr.it

Amblyopia is the most common form of impairment of visual function affecting one eye, with a prevalence of about 1–5% of the total world population. Amblyopia usually derives from conditions of early functional imbalance between the two eyes, owing to anisometropia, strabismus, or congenital cataract, and results in a pronounced reduction of visual acuity and severe deficits in contrast sensitivity and stereopsis. It is widely accepted that, due to a lack of sufficient plasticity in the adult brain, amblyopia becomes untreatable after the closure of the critical period in the primary visual cortex. However, recent results obtained both in animal models and in clinical trials have challenged this view, unmasking a previously unsuspected potential for promoting recovery even in adulthood. In this context, non invasive procedures based on visual perceptual learning, i.e., the improvement in visual performance on a variety of simple visual tasks following practice, emerge as particularly promising to rescue discrimination abilities in adult amblyopic subjects. This review will survey recent work regarding the impact of visual perceptual learning on amblyopia, with a special focus on a new experimental model of perceptual learning in the amblyopic rat.

**Keywords: amblyopia, visual acuity, environmental enrichment, perceptual learning, GABAergic inhibition**

### **AMBLYOPIA**

#### **DEFINITION AND PECULIARITIES OF THE DISORDER**

Amblyopia (from the Greek, *amblyos*-blunt; *ops*-vision), also called "lazy eye", is a developmental abnormality usually associated with physiological alterations in the visual cortex occurring early in life (Ciuffreda et al., 1991; Holmes and Clarke, 2006). In humans, this pathology occurs approximately in 1–5% of the population, and is generally associated with an early history of abnormal visual experience due to binocular misalignment (strabismus), image degradation (high refractive error and astigmatism and anisometropia), or form deprivation (congenital cataract and ptosis). The rare amblyogenic condition called congenital or early-acquired media opacity causes a form of amblyopia called deprivation amblyopia, the most severe and damaging type of amblyopia. In this case, cataracts, corneal lesions, or ptosis block or distort retinal image formation.

Regardless of its etiology, amblyopia is usually unilateral: visual acuity of one eye is reduced with respect to the other eye. Associated symptoms include poor stereoscopic depth perception, and low contrast sensitivity and reduced motion sensitivity in the weaker eye. In the clinical setting, however, the damage produced by amblyopia is generally expressed as a loss of visual acuity in an apparently healthy eye, despite appropriate optical corrections.

In contrast with early investigations indicating the retina as the primary site of amblyopia (Hess, 2001), many studies have confirmed that the retina exhibits normal physiology in amblyopic subjects (Sherman and Stone, 1973; Kratz et al., 1979; Baro et al., 1990); the lateral geniculate nucleus of the thalamus (LGN), instead, appears to be to some extent affected be sensory deprivation in one eye, with some cells exhibiting less than normal peripheral suppression and with a profound atrophy in the geniculate layers receiving inputs from the deprived eye (Wiesel and Hubel, 1963). The current consensus, however, is that amblyopia mostly originates from alterations in neural circuitries in the primary visual cortex (V1; Levi and Harwerth, 1978; Blakemore and Vital-Durand, 1986; Hess, 2001; Barrett et al., 2004), due to a combination of altered visual experience and high neuronal plasticity in the cortical developing circuits.

Development of visual system circuits depends on the interaction between genetic programs and experience-driven plasticity processes (Goodman and Shatz, 1993; Katz and Shatz, 1996), the latter being required for a proper refinement of neural circuits (Weliky, 2000; Lewis and Maurer, 2009). Critical periods (CPs) are time windows in early postnatal life during which plasticity is enhanced and neural circuits display a heightened sensitivity to acquire instructive and adaptive signals from the external environment. CPs for experience-dependent plasticity are widespread in the animal kingdom (Berardi et al., 2000), and have been demonstrated not only for the visual, auditory and somatosensory systems, but also for cognitive functions, including acquisition of song in birds and language in humans (Doherty, 1997; Doupe and Kuhl, 1999; Berardi et al., 2000; Hensch, 2004).

It is now clear that there are different CPs not only for different functions (even within the same sensory system; e.g., Harwerth et al., 1986, 1990), but also for different parts of the brain (even within different layers of V1; LeVay et al., 1980), and distinct CPs for recovery from and for induction of sensory deprivation effects (Berardi et al., 2000). The CP is not a simple, age-dependent maturational process, but is rather a series of critical developmental events controlled in a use-dependent manner. In agreement with this concept, a total absence of sensory inputs leads to a delay in the functional and anatomical maturation of the visual system. For example, the visual cortex of animals reared in darkness from birth (dark rearing, DR) displays prominent physiological deficits, including reduced orientation and direction tuning, lower cell responsiveness and increased latency, larger receptive field (RF) sizes, altered spontaneous activity, rapid habituation to repeated stimulus presentation, immature ocular dominance (OD) distribution and lower visual acuity (Frégnac and Imbert, 1978; Timney et al., 1978; Benevento et al., 1992; Fagiolini et al., 1994; Pizzorusso et al., 1997). Moreover, animals reared from birth in complete darkness have a delayed CP time course, with abnormal levels of plasticity persisting into adulthood (Mower, 1991; Fagiolini et al., 1994; Iwai et al., 2003).

The CP for the development of amblyopia closes around 6–8 years of age in humans (Worth, 1903; von Noorden, 1981). Alterations in visual experience caused by strabismus or high anisometropia with onset beyond this age do not result either in the severe loss of visual acuity for the affected eye or in the severe reduction in binocular vision caused by altered visual experience with an earlier onset. What is more important, however, is that if the correction of strabismus or anisometropia is delayed past this age, recovery of visual acuity and binocular vision is almost absent; indeed, the magnitude of the recovery is progressively reduced as the corrective intervention is made at progressively increasing ages during childhood, with negligible recovery obtained after 8 years of age. That is, in addition to the occurrence of a CP for the establishment of amblyopia, there is also a sensitive period for a successful treatment of this pathology (see Lewis and Maurer, 2009).

# **NEURAL MECHANISMS UNDERLYING AMBLYOPIA**

Much of our current understanding of the neural mechanisms underlying amblyopia derives from studies on animal models, revealing that major pathological changes in this pathology occur at the cortical level.

In animal models, amblyopia can be easily induced by imposing a reduction of inputs from one eye by lid suture (monocular deprivation, MD) during the CP. This treatment dramatically decreases V1 binocularity, shifting the physiological responsiveness of visual cortical neurons towards the open eye. As a direct consequence, the visual acuity of the deprived eye is strongly reduced and its contrast sensitivity is blunted (Wiesel and Hubel, 1963; Hubel and Wiesel, 1970; Olson and Freeman, 1975; Movshon and Dürsteler, 1977; Olson and Freeman, 1980). In their pioneering experiments, Hubel and Wiesel observed that, in kittens, the susceptibility to the effects of MD starts suddenly near the beginning of the fourth week of life, remains robust between the sixth and eighth weeks, and then declines completely after the third month, thus defining a CP for MD effectiveness. MD starting in adulthood produced no detectable outcome (Hubel and Wiesel, 1970; Olson and Freeman, 1980). The effects of MD and the existence of a CP for OD plasticity have been subsequently described also in several other species of mammals (Van Sluyters and Stewart, 1974; Hubel et al., 1977; Blakemore et al., 1978; LeVay et al., 1980; Emerson et al., 1982; Fagiolini et al., 1994; Horton and Hocking, 1997; Issa et al., 1999). While the effects of MD can be reversed to a limited extent during the CP by reversing the condition of visual deprivation, the same deficits become irreversible later on (Wiesel and Hubel, 1965; Movshon, 1976; Van Sluyters, 1978; Blakemore et al., 1981; Antonini and Stryker, 1998).

Similar to higher mammals, MD in rodents shifts the physiological responsiveness of neurons in the binocular zone of V1 towards the open eye, and this plasticity is confined to a welldefined CP (Dräger, 1978; Fagiolini et al., 1994; Gordon and Stryker, 1996). At least in the mouse, this is due to a rapid weakening of the deprived-eye responses, accompanied by a delayed strengthening of the open-eye responses which results from mechanisms of homeostatic plasticity (Frenkel and Bear, 2004; Kaneko et al., 2008; Cooke and Bear, 2010). Anatomical changes accompany functional plasticity in the developing visual cortex of the mouse, as they do in higher mammals (Antonini et al., 1999; Mataga et al., 2004; Oray et al., 2004).

# **TREATMENTS FOR AMBLYOPIA**

Theoretically, the basic strategy for treating amblyopia is to provide a clear retinal image, and then to correct the OD deficit, as early as possible, during the period of visual cortex plasticity. The methods most currently used in the treatment of human amblyopia, including refractive correction applied alone or in combination with occlusion or atropine, are known as "passive methods". Occlusion therapy with patching of the dominant eye has been widely used as the primary treatment for amblyopia (Loudon and Simonsz, 2005). The success of patching seems to correlate with the actual number of hours that the eye is patched (Loudon et al., 2002) but is also dependent on the severity of amblyopia, binocular status, fixation pattern, the age at presentation and patient compliance (Loudon et al., 2003; Stewart et al., 2005).

Atropine penalization is recognized as a valid alternative to patching for amblyopia therapy (Foley-Nolan et al., 1997; Simons et al., 1997; Pediatric Eye Disease Investigator Group, 2002). Atropine paralyzes accommodation and blurs near vision, encouraging the use of the amblyopic eye. It has been reported that atropine is as effective as patching, but that patching effects are initially faster, while atropine displays a better compliance (Pediatric Eye Disease Investigator Group, 2002). Another major difference between the two treatments is that in atropine penalization vision is binocular in the sense that the image at the fovea of the dominant (non-amblyopic) eye is degraded, while input to the amblyopic eye is not affected; in contrast, binocularity is impaired in the patching treatment.

A better strategy might be to couple passive methods with treatments in which certain tasks are prescribed to be performed by the patient: these "active" interventions could encourage a better involvement of the amblyopic eye and directly promote patient compliance, if the task is sufficiently attractive. Pleoptics is a method for visual diagnosis and training that employs monocular techniques for the detection and elimination of eccentric fixation and amblyopia: a bright ring of light is flashed around the fovea to temporarily "blind" or saturate the photoreceptors surrounding the fovea, which eliminates vision from the eccentric fixation point and forces fixation to the fovea. Typically, pleoptic treatments have to be performed several times a week in order to effectively enhance the effects elicited by occlusion therapy. Most practitioners, however, have found pleoptics to be no better than standard occlusion therapy (VerLee and Iacobucci, 1967; Fletcher et al., 1969). Another proposed active procedure was the so called CAM treatment (Campbell, 1968), consisting in a high contrast square wave grating that rotates slowly, at about one revolution per minute. The treatment was based on the findings that spatial frequency and orientation-specific filters, in the visual system, are activated by rotation. The CAM treatment was found not effective (Keith et al., 1980; Crandall et al., 1981; Tytla and Labow-Daily, 1981).

It has been established that binocular stimulation may be important for the treatment of amblyopia; indeed, animal research indicates that binocular stimulation promotes binocular cortical connections during recovery from deprivation amblyopia (Mitchell and Sengpiel, 2009). Experimental models of patching therapy for amblyopia applied to animals rendered amblyopic by a prior period of early MD indicate that the benefits of a patching therapy can be heightened when combined with critical amounts of binocular visual input each day (Mitchell and Sengpiel, 2009). Recent studies (Baker et al., 2007; Mansouri et al., 2008; Vedamurthy et al., 2008) provided new information on how signals from the amblyopic and not amblyopic eyes can impact on each other and on binocular vision (see also Mitchell and Duffy, 2014 for a recent review).

While amblyopia can often be reversed when treated early (Wu and Hunter, 2006), successful treatments are not generally possible in adults. Recently, several studies in the visual system clarified some of the mechanisms that limit plasticity to early life, showing that the adult brain is not "hardwired" with fixed neural circuits; on the contrary, following specific treatments, it can reacquire a certain degree of plasticity even well after the end of the CP (see Bavelier et al., 2010). Treatments for amblyopia in adulthood are focused on promoting cortical plasticity by reducing those factors that actively limit adult plasticity, or by exploiting endogenous permissive factors; under these favorable conditions, circuit rewiring may be facilitated in the mature brain, inducing recovery from amblyopia. Thus, several pharmacological attempts have been done to enhance adult visual cortical plasticity, acting on factors which are also thought to contribute to its developmental time course.

While, early in development, glutamatergic excitation appears to dominate cortical circuits, accumulating evidence supports a pivotal role for late-developing excitatory and inhibitory (E/I) circuit balance in the opening and successive time-course modulation of CPs. For example, the onset of visual cortical plasticity is delayed by genetic disruption of GABA synthesis or a slowing down of the maturational state of perisomatic inhibition (Hensch, 2005). Conversely, application of benzodiazepines or other treatments that accelerate GABA circuit function trigger premature plasticity (Di Cristo et al., 2007; Sugiyama et al., 2008). These manipulations are so powerful that animals of identical chronological age may be at the peak, before, or past their sensitive period, depending on how the maturational state of their GABA circuitry has been altered. The E/I circuit balance points out a possible mechanisms for enhancing recovery of function in adulthood, suggesting that a reduction of GABAergic transmission could be a crucial step for the restoration of plasticity processes in the adulthood (Hensch, 2005; Baroncelli et al., 2011). In agreement with this, a recent study showed that a pharmacological reduction of intracortical inhibition obtained through the infusion of either MPA (an inhibitor of GABA synthesis) or picrotoxin (a GABA<sup>A</sup> antagonist) directly into the visual cortex reactivates OD plasticity in response to MD in adult rats (Harauzov et al., 2010).

The release of endogenous neuromodulators, such as norepinephrine, acetylcholine, serotonin, or dopamine, may also act on visual plasticity by adjusting a favorable E/I balance (Kasamatsu and Pettigrew, 1976; Bear and Singer, 1986; Kilgard and Merzenich, 1998; Bao et al., 2001; Goard and Dan, 2009). In agreement with this, it has been demonstrated that chronic treatment with the selective serotonine-reuptake inhibitor (SSRI) fluoxetine reinstates OD plasticity following MD and promotes recovery of normal visual functions in adult amblyopic animals, acting through a pronounced reduction of intracortical inhibition (Maya Vetencourt et al., 2008). Since SSRIs are approved by Food and Drug Administration, their use for treating amblyopia appears as a very promising approach. Another recent indication that neuromodulatory systems affect plasticity in adulthood comes from the demonstration that a genetic manipulation of nicotinic cholinergic transmission promotes visual cortex plasticity after the end of the CP (Morishita et al., 2010).

On the basis of recent findings indicating that environmental experience can lead to epigenetic modifications of brain chromatin status, use of epigenetic drugs can be a promising strategy also for recovery from amblyopia (Zhang and Meaney, 2010). It has been shown that a developmental downregulation of experience-dependent regulation of histone H3 and H4 acetylation is involved in the closure of the CP (Putignano et al., 2007). Recently, Silingardi et al. (2010) found that a chronic intraperitoneal administration of valproic acid, a histone deacetylase inhibitor, drives recovery from visual acuity deficits in adult rats rendered amblyopic by long-term MD.

Finally, following the demonstration that extracellular matrix penineuronal nets (PNNs) drastically limit adult brain plasticity (Pizzorusso et al., 2002), Pizzorusso et al. (2006) showed that adult chondroitinase ABC (an enzyme degrading chondroitin sulphate proteoglycans, i.e., critical components of the extracellular matrix), coupled with reverse suture (i.e., the deprivation of the previously open eye and opening of the previously deprived eye) produces a full recovery of both OD and visual acuity in amblyopic rats (replication of this finding in cats, however, has recently been shown to fail; Vorobyov et al., 2013). These authors also found that the decrease in spine density caused by longterm MD was recovered by the chondroitinase ABC treatment, suggesting that a possible mechanism underlying the recovery from amblyopia could be the formation of synaptic contacts on the newly formed spines by the inputs from the formerly deprived eye. Some of the effects elicited by chondroitinase ABC could be mediated by modifications of intracortical inhibitory circuits occurring after PNN degradation, bringing parvalbumin (PV) interneurons back to a more juvenile-like status (Hensch, 2005). Strikingly, a specific transfer of the orthodenticle homeobox 2 (Otx2) homeoprotein into GABAergic interneurons expressing PV has been shown to be a critical trigger for both the opening and closure of the CP of plasticity in the developing mouse visual cortex (Sugiyama et al., 2008). Endogenous Otx2 is captured by specific binding sites in PNNs placed on the surfaces of PV cells, with a short aminoacidic domain containing an arginine-lysine doublet, called RK peptide, directly mediating Otx2 binding to PNNs (Beurdeley et al., 2012). Chondroitinase ABC reduces the amount of endogenous Otx2 in PV cells, and infusion of RK peptide disrupts endogenous Otx2 localization to PV cells and PNN expression, leading to restoration of binocular vision in adult amblyopic mice (Beurdeley et al., 2012).

A better strategy for amblyopia treatment would be that to induce an endogenous recapitulation of the brain states that promote plasticity in a non-invasive but targeted manner. Amblyopic rats subjected to complete visual deprivation by dark exposure for 10 days recover significant vision once allowed to see binocularly, acting through a modulation of the balance between excitation and inhibition (He et al., 2007). However, translation of this treatment to humans is debatable as the proportional length of dark exposure required is likely to be quite long. A more promising approach is environmental enrichment (EE). EE is an experimental protocol specifically designed to investigate the influence of the environment on brain and behavior (Rosenzweig and Bennett, 1996; van Praag et al., 2000; Diamond, 2001; Sale et al., 2014). "Enriched" animals are reared in large groups in wide cages where a variety of toys, tunnels, nesting material and stairs are present and changed frequently. Thus, EE aims at optimizing environmental stimulation by providing the animals with the opportunity to attain high levels of voluntary physical activity, spontaneous exploration, cognitive activity and social interaction. We showed that EE promotes a complete recovery of visual acuity and OD in adult amblyopic animals (Sale et al., 2007). Recovery of plasticity was associated with a marked reduction of GABAergic inhibition in the visual cortex, as assessed by brain microdialysis. Moreover, a decreased cortical inhibition was demonstrated also at the synaptic level, using the *in vitro* paradigm of LTP of layer II–III field potentials induced by theta-burst stimulation from the white matter (WM–LTP). The WM–LTP is normally not present in the adult as a result of the maturation of inhibitory circuits (Kirkwood and Bear, 1994; Huang et al., 1999), but it can be restored if GABA-mediated inhibition is reduced (Artola and Singer, 1987; Kirkwood and Bear, 1994). Notably, the ability of the cortex to undergo WM-LTP was fully reinstated in the visual cortex of EE adult rats (Sale et al., 2007). The reduction of cortical inhibition in EE rats was also paralleled by an increased expression of the neurotrophin BDNF and a lower density of PNNs in the visual cortex contralateral to the recovering (previously amblyopic) eye.

#### **VISUAL PERCEPTUAL LEARNING**

Perceptual learning (PL) is currently considered one of the most promising active strategies for treating amblyopia in adulthood.

#### **DEFINITION AND VARIETY OF THE PHENOMENON**

Perceptual learning is the improvement in performance on a variety of simple sensory tasks, following practice. In visual perception, such tasks, often called discrimination tasks, involve identifying small differences in simple visual attributes, such as position, orientation, texture or shape.

Visual PL has been documented in a wide range of perceptual tasks: stimulus orientation discrimination (Vogels and Orban, 1985; Shiu and Pashler, 1992; Schoups et al., 1995; Matthews and Welch, 1997; Matthews et al., 1999), motion direction discrimination (Ball and Sekuler, 1982, 1987; Ball et al., 1983; Matthews and Welch, 1997), discrimination of differences in the waveforms of two grating stimuli (Fiorentini and Berardi, 1980, 1981; Berardi and Fiorentini, 1987), detection of visual gratings (De Valois, 1977; Mayer, 1983); texture discrimination (Karni and Sagi, 1991, 1993; Ahissar and Hochstein, 1996); discrimination of changes in spatial frequency within simple or complex plaid patterns (Fine and Jacobs, 2000); ability to detect small differences in the depth of two targets (Fendick and Westheimer, 1983; Westheimer and Truong, 1988); ability to perceive depth in random-dot stereograms (Ramachandran and Braddick, 1973); ability to discriminate between 10 band-pass Gaussian filtered noise texture (Gold et al., 1999a); object (Furmanski and Engel, 2000) and face recognition (Gold et al., 1999b). Training can improve the discrimination of small differences in the offset of two lines (Vernier acuity), even though initial thresholds are already in the hyperacuity range (McKee and Westheimer, 1978). In addition, a number of studies indicate that visual acuity can improve with practice also in hyperacuity tasks (Bennett and Westheimer, 1991; Poggio et al., 1992; Fahle and Edelman, 1993; Beard et al., 1995; Saarinen and Levi, 1995; Fahle and Morgan, 1996).

An important component of visual PL is the rate at which learning occurs. For some visual tasks, the learning effect has been found to take place within an hour or two (Fiorentini and Berardi, 1980, 1981; Shiu and Pashler, 1992; Fahle et al., 1995; Liu and Vaina, 1998). In some studies, learning is practically complete after a few hundreds of trials (Fiorentini and Berardi, 1980, 1981), showing fast saturation. For other tasks, there is an initial fast saturating phase of learning, which is then followed by a slow phase where the performance continues to improve from one daily session to the next one, until a stable optimal level is reached (Karni and Sagi, 1991). Interestingly, Karni and Sagi (1993) found that an improvement between sessions occurs only if the two sessions are separated by at least 68 h, suggesting the existence of a consolidation period.

Visual PL shows a high specificity for the features of the stimuli used in the task. Many studies reported that the visual performance is typically improved on test trials that use the same stimuli as those used during training, and that the achieved performance often returns to baseline levels when test trials adopt even mildly different stimuli. A specificity of learning has been found for the orientation of lines and gratings (Ramachandran and Braddick, 1973; McKee and Westheimer, 1978; Fiorentini and Berardi, 1980, 1981; Karni and Sagi, 1991; Poggio et al., 1992; Fahle and Edelman, 1993; Schoups et al., 1995) or the direction of motion (Ball and Sekuler, 1982, 1987), and for the retinal location of the stimuli used in the learning procedure (Fiorentini and Berardi, 1981; Ball and Sekuler, 1987; Karni and Sagi, 1991; Shiu and Pashler, 1992; Schoups et al., 1995). Fiorentini and Berardi (1980) found that practice improved discrimination between complex gratings, and that the achieved improvement did not transfer to stimuli rotated by 90◦ .

In most cases, visual PL is not restricted to the eye employed, i.e., if the training process is monocular, learning transfers completely or partially to the untrained eye (Fiorentini and Berardi, 1981; Ball and Sekuler, 1982; Beard et al., 1995; Schoups et al., 1995); this indicates that the learning process occurs more centrally with respect to the site where the inputs from the two eyes converge. Texture discrimination is an exception in this respect, showing little interocular learning transfer (Karni and Sagi, 1991; Schoups and Orban, 1996).

#### **NEURAL CHANGES UNDERLYING VISUAL PERCEPTUAL LEARNING**

The selectivity of visual PL for basic attributes of the stimuli, such as orientation (Ramachandran and Braddick, 1973; McKee and Westheimer, 1978; Fiorentini and Berardi, 1980, 1981; Karni and Sagi, 1991; Poggio et al., 1992; Fahle and Edelman, 1993; Schoups et al., 1995), motion direction (Ball and Sekuler, 1982, 1987) and even retinal location (Fiorentini and Berardi, 1981; Ball and Sekuler, 1987; Karni and Sagi, 1991; Shiu and Pashler, 1992; Schoups et al., 1995), suggests the involvement of early stages in cortical visual processing, where neurons have relatively small receptive fields (RFs), are selective for stimulus features such as orientation, size, chromatic properties and direction of motion, and the visual topography is most precisely mapped.

The specificity of learning for basic visual features does not imply that the representations of learning occur only in the early stage of the visual system. Cortical changes associated with PL can also occur in intermediate visual stages. Changes have been reported in the tuning properties of cells in V4 in monkeys trained in an orientation discrimination task, whereas no such tuning changes were observed in V1 (Ghose et al., 2002; Yang and Maunsell, 2004). Yang and Maunsell (2004) were the first to demonstrate that PL modifies basic neuronal response properties at an intermediate middle level of visual cortical processing (V4). They found that an orientation discrimination task changes the response properties of V4 neurons: after training, neurons in V4 with RFs overlapping the trained location had stronger responses and narrower orientation tuning curves than neurons with RFs in the opposite, untrained hemifield. Moreover, neurons with preferred orientations, nearby the trained one, show the most relevant modifications.

The idea that changes associated with PL occur exclusively in early or intermediate visual areas has been challenged by the results of neurophysiological studies in monkeys (Chowdhury and DeAngelis, 2008; Law and Gold, 2008). In one of these studies (Law and Gold, 2008), learning to evaluate the direction of visual motion did not change the responses of cells in the middle temporal area (MT), a region highly responsive to motion, but did change the responses of cells in the lateral intraparietal area (LIP), a region that is known to represent the transformation of visual motion signals into responses by saccadic eye movements. However, PL-induced changes in MT have also been reported. For example, Zohary et al. (1994) studied the simultaneous activity of pairs of neurons recorded with a single electrode in MT while monkeys performed a direction discrimination task, exploring the relationship between inter-neuronal correlation and behavioral and stimulus parameters. They reported that spike counts from adjacent neurons were noisy and only weakly correlated, but that even this small amount of correlated noise could affect signal pooling, suggesting a relationship between neuronal responses and psychophysical decisions.

Attention exerts a significant influence on many types of PL. Some studies found that a conscious effort to direct focused attention plays an important role in gating visual plasticity, suggesting that focused attention must be directed to a feature in order to be learned (Shiu and Pashler, 1992; Ahissar and Hochstein, 1993; Herzog and Fahle, 1998; Gilbert et al., 2001; Schoups et al., 2001). Little or no transfer learning has been reported between two tasks that used the same visual stimuli but involved judgments on different stimulus attributes (either orientation of local elements or global shape) (Ahissar and Hochstein, 1993). It has also been demonstrated that the discrimination of orientation of lines did not improve when a non attended feature was presented (brightness rather than orientation of the line) (Shiu and Pashler, 1992). Furthermore, an electrophysiological study in monkeys demonstrated that PL resulted in the sharpening of orientation tuning curves only for V1 cells with RFs overlapping to the spatial location of the training task (Schoups et al., 2001). Additionally, it has been proved that PL is task-dependent, indeed there is no transfer in learning of a particular feature between tasks involving similar stimuli but using a different procedure (Li et al., 2004; Huang et al., 2007).

However, evidence from studies of "task-irrelevant" learning shows that PL can also occur in the absence of focused attention to the learned feature (Watanabe et al., 2001; Seitz and Watanabe, 2003; Nishina et al., 2007). A follow-up study demonstrated that this task-irrelevant kind of learning was highly specific for local motion of the stimuli, as opposed to the global motion, and that learning was retained for months after training (Watanabe et al., 2002). These findings indicate that focused-attention is not necessary for PL, but task-irrelevant learning might not occur simply as a result of exposure to a stimulus. Seitz and Watanabe (2005) proposed a model for task-irrelevant learning that can also explain task-relevant learning. Based on this model, PL occurs through the coincidence of diffusive signals driven by a task activity (reinforcement signals) and signals induced by the presentation of a stimulus (stimulus-driven signals). In this model, the task target induces both reinforcement signals and stimulusdriven signals, thus when task-irrelevant target and reinforcement signal interact with an appropriate temporal relationship, learning of task-irrelevant features can occur.

Gilbert et al. (2009) proposed that PL is associated with long-term modification of cortical circuits. In this view, topdown influences of attention, expectation and the nature of the perceptual task interact with experience-dependent modification processes at the early level of the visual system. Both anatomical and physiological data show that V1 neurons can integrate information over an area much larger than their RFs measured with oriented line, and that this functional property is due to a large extent to the axonal arbors of cortical pyramidal cells (Gilbert and Wiesel, 1979, 1983; Rockland and Lund, 1982; Stettler et al., 2002). The horizontal connections link orientation columns with similar orientation preference (Stettler et al., 2002), and account for the majority of the inputs that neurons receive, with over 76% of excitatory inputs arising from outside their resident hypercolumn (Stepanyants et al., 2009). Thus, these long range connections provide neurons with selectivity for features more complex than the ones predicted from their RFs, endowing neurons with context-dependent responses.

#### **CELLULAR MECHANISMS UNDERLYING PERCEPTUAL LEARNING**

Despite recent progress in localizing the visual areas involved in PL, elucidation of the underlying mechanisms at the cellular level remains a challenge. Learning is supposed to rely on changes in neuronal circuits in brain areas specific for the practiced task, leading to long-lasting modifications in synaptic efficacy (synaptic plasticity). While the notion that synaptic plasticity underlies learning is widely accepted for declarative memory processes mediated by temporal lobe areas or for implicit forms of memory such as classical conditioning (Kandel, 2009), the specific role of synaptic plasticity in PL, a form of implicit memory, remains unclear. It has been shown that skill motor learning leads to longlasting synaptic plasticity changes in the primary motor cortex (M1; Rioult-Pedotti et al., 2000) and, in the visual system, changes in V1 activity have been documented following visual PL both in monkeys and humans (e.g., Schoups et al., 2001; Li et al., 2008; Yotsumoto et al., 2008). At present, however, there is no conclusive evidence for the presence of synaptic plasticity phenomena in V1 in correlation with visual PL.

Several possible cellular mechanisms have been proposed to account for the effects of PL. One possibility is that the number of neurons representing the learned stimulus increases after training; this mechanism has been found mainly in the auditory (Recanzone et al., 1993) and somatosensory (Recanzone et al., 1992) cortex. In the visual system, PL appears to be mediated primarily by changes in the response strength or tuning of individual neurons, rather than large-scale spatial reorganization of the cortical network, as found in the auditory and somatosensory systems.

Schoups et al. (2001) demonstrated that changes in V1 orientation tuning accompany improved performance in orientation discrimination in adult monkeys. However, they did not find an increase in the proportion of neurons tuned to the trained orientation, but they reported an increase in the slope of the tuning curve at the trained orientation for neurons with preferred orientations lying between 12◦ and 20◦ of the trained one. The authors suggested that learning is correlated with changes in tuning curves of specific group of neurons that are most sensitive to small changes near the trained orientation, and, thus, that are relevant for detecting an orientation difference. Therefore, sharpening of tuning curves of cells, whose steepest parts of tuning curves coincide with the trained attribute, can improve discrimination of trained features, leading to more selective and less overlapping cortical representations. On the contrary, Ghose et al. (2002) found that PL caused only a small reduction in the response amplitude of V1 and V2 cells tuned to the trained orientation, suggesting that the psychophysical change is mediated by top-down influence for the trained task, and not by an improved neural representation of orientation in early visual areas.

Very few studies involving visual PL have been performed in rodents. Stimulus-induced vision restoration (visual training) has been proposed to be achievable in a plethora of different types of visual field impairments due to retinal or brain damage (e.g., stroke, amblyopia, age-related macular degeneration) (reviewed in Sabel et al., 2011). With the declared aim to investigate whether cortical plasticity might depend on the temporal coherence of visual stimuli, Matthies et al. (2013) showed that substantial OD plasticity can be triggered in adult mice visually stimulated by the presentation of moving square wave gratings during a period of MD, even within very short periods of time (2 days). Frenkel et al. (2006) previously described a different form of experience-dependent response enhancement (called stimulusselective response potentiation, SRP) in the visual cortex of awake mice. They found that repeated exposure to grating stimuli with specific orientation results in a potentiated response evoked by the test stimulus. The long-lasting enhancement of visual responses increased gradually over the training sessions, was specific for the orientation of the grating stimuli used, and occurred in both juvenile and adult mice. Moreover these authors reported that SRP induced through one eye did not transfer to the contralateral eye, suggesting the involvement of early stages of visual processing. While in primates the neural substrate involved in PL may have a deep dependence on training specificity, in rodents the relationship between learning and neural changes may be simpler. The effects observed by Frenkel et al. (2006) are consistent with a cortical change induced by PL, even if the stimulus-induced plasticity of SRP is not a form of perceptual learning, since no specific task was required. Interestingly, this cortical modification is more similar to the increase in fMRI response obtained in the human visual cortex after PL (i.e., Furmanski et al., 2004) compared with results obtained with single-unit recordings in monkey V1 (i.e., Schoups et al., 2001). Moreover, visual neurons can respond to non-visual inputs if they are paired with visual stimuli in a learning task: after training rats in a task that associates visual stimuli with a subsequent reward, Shuler and Bear (2006) found that a significant proportion of neurons show activity that correlated with the time in which the reward was given.

Given that PL is able to promote neural plasticity in early visual areas, possibly determining the potentiation of the visual connections active during learning, it could be exploited to facilitate recovery from conditions in which deficits in a set of visual neural connections lead to visual impairments. In the last two decades, there has been a progressive increase in studies that have tested and developed visual rehabilitation programs based on PL. We shall now discuss the possible application of PL for amblyopia treatment.

# **PERCEPTUAL LEARNING AS A POTENTIAL TREATMENT FOR AMBLYOPIA**

PL has been shown to remarkably improve visual functions in amblyopia on a wide range of tasks, including Vernier acuity (Levi and Polat, 1996; Levi et al., 1997), positional acuity (Li and Levi, 2004; Li et al., 2005, 2007), contrast sensitivity (Polat et al., 2004; Zhou et al., 2006; Huang et al., 2008), and first-order and second-order letter identification (Levi, 2005; Chung et al., 2006, 2008). While practicing each of these tasks results in improved visual performance, the high specificity of PL and the lack of transfer of PL effects to untrained orientations (Levi and Polat, 1996; Levi et al., 1997; Li and Levi, 2004) or from a Vernier acuity task to a detection task (Levi and Polat, 1996; Levi et al., 1997) can reduce its therapeutic value in the treatment of amblyopia. However, it has been shown that in various tasks (e.g., vernier acuity, position discrimination and contrast sensitivity) PL appears to transfer, at least in part, to improvements in visual acuity measured, for example, with the Snellen chart (Levi and Polat, 1996; Levi et al., 1997; Li and Levi, 2004; Polat et al., 2004; Zhou et al., 2006; Huang et al., 2008). Additionally, other impaired visual functions, such as stereoacuity and visual counting (Li and Levi, 2004; Li et al., 2007), improved with PL as well as visual acuity. Importantly, in adults with normal vision the improvements obtained through PL last for months, even for years (e.g., Karni and Sagi, 1993), and Li et al. (2004)reported that the improvement in visual acuity in the amblyopic eye induced by position discrimination training was long-lasting (from 3 to 12 months). Moreover, the effects in the improvement in visual acuity was present 12 months past the end of learning (Polat et al., 2004) and, in few cases, with a level of retention of approximately 90% (Zhou et al., 2006).

We recently reported that visual PL induces long-term potentiation (LTP) of intracortical synaptic responses in rat V1 (Sale et al., 2011). To elicit visual PL, we first trained a group of adult animals to practice in a forced-choice visual discrimination task that requires them to distinguish between two vertical gratings differing only for their spatial frequency; then, we made the two stimuli progressively more similar to each other (**Figure 1A**), until the animal performance reached a steady plateau. This task requires activation of V1 circuits, as indicated by the strong selectivity of PL for the orientation of gratings employed during training (Sale et al., 2011). Control animals only learned an association task, i.e., they were only required to discriminate between a grating and a homogeneous gray panel (**Figure 1B**), matching the overall swim time and number of training days in the water maze with those of PL rats.

Within 1 h from the last discrimination trial, LTP from layer II-III of V1 slices appeared occluded in PL animals compared to controls (**Figure 1**), both when testing its inducibility in vertical connections (stimulating electrode placed in layer IV) and when stimulating at the level of horizontal connections (stimulating electrode placed in layer II/III). Moreover, a significant shift toward increased amplitude of fEPSPs was found in the input/output curves of trained animals compared to controls (Sale et al., 2011). Thus, the data fulfill two of the most commonly accepted criteria used to relate LTP with learning, i.e., occlusion and mimicry, demonstrating that the improvements displayed by PL rats in discriminating visual gratings of progressively closer spatial frequencies can be explained in terms of long-term increments of synaptic efficacy in V1, the same cortical area at work during perception. This is consistent with the critical role for LTP in mediating learning processes previously reported in other brain areas such as the amygdala, the hippocampus and the motor cortex (Rogan et al., 1997; Rioult-Pedotti et al., 1998; Whitlock et al., 2006).

Since a potentiation of synaptic transmission might help the recovery process of visual responses for the long-term deprived eye, practice with visual PL through the amblyopic eye is expected to favor a functional rescue in amblyopic animals. In agreement with evidence on human subjects, a marked recovery of visual functions was evident in amblyopic rats subjected to visual PL (Baroncelli et al., 2012; **Figures 2A,B**), while no recovery occurred in two control groups in which the treatment did not induce LTP in V1, i.e., in rats that only learned the associative visual task and in animals that were trained only until the first step of the discrimination procedure between the test and the reference grating (**Figure 1C**), without proceeding further with a progression of finer discrimination trials (Baroncelli et al., 2012). Since these two control groups were matched to the animals trained in the PL procedure in terms of overall swim time in the water maze, their lack of recovery clearly indicates that the physical exercise component associated with our PL procedure does not contribute to the recovery of vision. This conclusion could seem at odd with the results showing a full recovery of both OD and visual acuity in adult amblyopic rats subjected to a period of intense physical exercise in a running wheel (Baroncelli et al., 2012). However, the lack of recovery found in the two control groups could be due to the purely forced nature of the exercise imposed to them: while running rats performed a form of totally voluntary movement, physical activity in the water maze is necessarily forced and artificially imposed. Several lines of evidence suggest that forced exercise and voluntary exercise exert different effects on brain and behavior. For example, forced and voluntary exercise differentially affect monoamine neurotransmitters (Dishman et al., 1997), hippocampal PV expression (Arida et al., 2004), hippocampal brain-derived neurotrophic factor and synapsin-1 expression (Ploughman et al., 2005), longevity and body composition (Narath et al., 2001), taste aversion learning (Masaki and Nakajima, 2006) and open-field behavior (Burghardt et al., 2004). On the other hand, the marked rescue of visual abilities obtained in PL rats underscores the importance and effectiveness of visual practice and incremental training in driving recovery from amblyopia.

The recovery effect achieved by trained rats persisted for quite a long time, outlasting the end of the treatment by at least 14 days (**Figure 2B**), corresponding to 20 months or more in the timescale of human life.

Our results also underscored a transfer effect in two distinct manners: first, the recovery of visual acuity was not limited to stimuli of the same orientation than that used during the PL procedure, but was also present for orthogonal stimuli; second, even if rats practiced in discriminating visual gratings in the 0.1– 0.6 c/deg range, they displayed a discrimination improvement in a range of higher spatial frequencies, with final VA values in the range of 0.9–1.0 c/deg (Baroncelli et al., 2012).

One of the clearest advantages in the use of animal models of human pathologies is the possibility to investigate the underlying molecular mechanisms. Recovery of visual abilities in PL animals was accompanied by a robust decrease of the inhibition-excitation balance, crucially involved in the regulation

of plasticity both during development and in adulthood (Hensch, 2005; Morishita and Hensch, 2008; Spolidoro et al., 2009; Harauzov et al., 2010; Sale et al., 2010; Baroncelli et al., 2011; van Versendaal et al., 2012; Kuhlman et al., 2013). These results provide the first evidence that PL is associated with reduced inhibition/excitation balance in V1. The relative strength of excitatory and inhibitory connections has been suggested to be impaired during development in amblyopic human subjects and cortical over-inhibition could underlie the degradation of spatial vision abilities (Polat, 1999; Levi et al., 2002; Wong et al., 2005). Repetitive transcranial magnetic stimulation, which increases cortical excitability, transiently improves contrast sensitivity in adult amblyopes, likely acting on the excitation/inhibition balance (Thompson et al., 2008). The reduction of intracortical inhibition could be downstream from the modulation of neuromodulatory release, such as the potentiation of serotonin transmission: it has been demonstrated that the infusion of an inhibitor of 5-HT can counteract the decrease in number of GAD67 expressing cells induced by EE (Baroncelli et al., 2010), and, moreover, it has been reported that serotonin can

inhibit GABA release via a presynaptic mechanism, probably

(c/deg) spatial frequency (SF) grating (reference grating) from a 0.712 c/deg SF grating (test grating) and then learned to distinguish the two gratings

> by regulating the availability of transmitter vesicles (Wang and Zucker, 1998).

LTP from layer II-III of V1 slices is occluded in PL animals compared to controls, at the level of both vertical and horizontal connections.

As stated previously, we found that PL increases the synaptic strength of intracortical connections in V1. Li and Gilbert suggested a mechanism for PL based on the interaction between feedback and horizontal connections (Gilbert et al., 2009; Gilbert and Li, 2013). In this view, visual responses are dependent on the behavioral context, according to the perceptual task performed, and the contextual influence can be mediated by horizontal connections within V1 (Gilbert et al., 2009), since these long-range connections provide neurons with selectivity for complex features (Gilbert and Wiesel, 1979; Li and Gilbert, 2002; Stettler et al., 2002). Thus, with PL practice, it is possible that the horizontal connections could mediate a synchronized output response for the stimulus used in the task, by recruiting neurons that show selectivity for similar orientation and that are engaged in the perceptual task. It is known that synchronized electrical activity in gamma frequency band is correlated with conscious processing of sensory stimuli and higher cognitive functions such as attention and memory and that these gamma oscillations can occur locally within a brain region or distributed in a brain-wide

manner among different regions (Gray and Singer, 1989; Gray et al., 1989; Tiitinen et al., 1993; Desmedt and Tomberg, 1994; Gray and McCormick, 1996; Miltner et al., 1999; Tallon-Baudry and Bertrand, 1999; Fries et al., 2001, 2002; Brosch et al., 2002; Laurent, 2002; Sederberg et al., 2003; Gruber et al., 2004; Tallon-Baudry et al., 2005; Axmacher et al., 2006; Jokisch and Jensen, 2007; Melloni et al., 2007). In the visual system, Gray and Singer (1989) recorded a gamma oscillatory field potential that was strongly correlated with visual stimuli specific for the orientation preference, demonstrating that neurons within a given orientation column show stimulus-dependent selectivity. Moreover, the same authors demonstrated that a synchronized activity was present also across the orientation columns: they found that neural responses were selective for feature of visual stimulus and that the neurons involved are located in superficial layers, thus the likely candidates for the synchronization activity are horizontal connections (Gray et al., 1989; Engel et al., 1990; Gray and McCormick, 1996).

The top-down influence could play a significant role in PL by selecting an appropriate contextual influence, mediated by longrange horizontal connections within each cortical area (Gilbert et al., 2009). The majority of V1 cortical output is sent to V2, and most of the feedback connections come from V2, even if V1- V2 circuitry is more complex than previously thought (Sincich and Horton, 2005), with the recent demonstration that V2 exerts a modulatory effect on V1 through feedback projections that end in layer IV of V1 (De Pasquale and Sherman, 2013). Furthermore, V1 receives feedback connections from other visual areas, including V4, MT, and the inferotemporal cortex, and it has been also suggested that connections from higher- to lower-order visual areas might be mediated by a cortex-to-thalamus-to-cortex pathway (Sherman, 2005).

### **CONCLUSIONS**

These findings can be used to depict a general theoretic model concerning the cellular processes underlying visual PL in V1. Such a model requires taking into account the strategy employed by the trained rats, which practiced the discrimination between gratings while they were highly motivated to find the hidden platform. In this process, an involvement of extra-V1 projections is very likely to take place. An interaction between the appropriate V1 intrinsic connections and the top-down feedback signals associated with the expectations of the behavioral task is a possible explanation for the induction of a potentiation process. The strong excitatory projections received by V1 and coming from higher order areas like V2, the secondary motor cortex, the temporal association cortex and the perirhinal cortex (Coogan and Burkhalter, 1993; Bai et al., 2004) could carry information about the animal's behavioral and motivational state, setting the early visual areas in a specific working mode that allows the comparison of already stored representations with new bottom-up information concerning the stimulus characteristics (Gilbert et al., 2009; Gilbert and Li, 2013). This loop may have a fundamental role in PL. It is likely that the event represented by the finding of the submerged platform is associated with a given spatial frequency value and that this association forms the basis for further comparisons performed during subsequent expositions to the new spatial frequencies of the test grating. It is admissible that a simultaneous firing of higher centers' projections carrying top-down signals and intrinsic V1 neurons selective for the stimulus parameter may lead to the induction of a synaptic potentiation process of V1 connections which eventually underlies the improvement in sensory discrimination (**Figure 2C**).

### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 June 2014; accepted: 27 June 2014; published online: 16 July 2014*.

*Citation: Bonaccorsi J, Berardi N and Sale A (2014) Treatment of amblyopia in the adult: insights from a new rodent model of visual perceptual learning. Front. Neural Circuits 8:82. doi: 10.3389/fncir.2014.00082*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Bonaccorsi, Berardi and Sale. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Mapping arealisation of the visual cortex of non-primate species: lessons for development and evolution

# **Jihane Homman-Ludiye and James A. Bourne \***

Bourne Group, Australian Regenerative Medicine Institute, Monash University, Clayton, VIC, Australia

#### **Edited by:**

Davide Zoccolan, International School for Advanced Studies, Italy

**Reviewed by:**

James C. Vickers, University of Tasmania, Australia Lutgarde Arckens, Catholic University, Belgium

#### **\*Correspondence:**

James A. Bourne, Bourne Group, Australian Regenerative Medicine Institute, Monash University, Building 75, Clayton, VIC 3800, Australia e-mail: james.bourne@monash.edu The integration of the visual stimulus takes place at the level of the neocortex, organized in anatomically distinct and functionally unique areas. Primates, including humans, are heavily dependent on vision, with approximately 50% of their neocortical surface dedicated to visual processing and possess many more visual areas than any other mammal, making them the model of choice to study visual cortical arealisation. However, in order to identify the mechanisms responsible for patterning the developing neocortex, specifying area identity as well as elucidate events that have enabled the evolution of the complex primate visual cortex, it is essential to gain access to the cortical maps of alternative species. To this end, species including the mouse have driven the identification of cellular markers, which possess an area-specific expression profile, the development of new tools to label connections and technological advance in imaging techniques enabling monitoring of cortical activity in a behaving animal. In this review we present non-primate species that have contributed to elucidating the evolution and development of the visual cortex. We describe the current understanding of the mechanisms supporting the establishment of areal borders during development, mainly gained in the mouse thanks to the availability of genetically modified lines but also the limitations of the mouse model and the need for alternate species.

**Keywords: cortical patterning, guidance molecules, neocortex, cell markers**

### **INTRODUCTION**

The visual cortex, responsible for providing the visual sensory experience, is a feature common to all mammalian species however large or small. Located at the occipital pole of the brain the visual cortex receives, integrates and interprets the information relayed from the eye via subcortical nuclei.

Despite the seemingly homogenous appearance of the neocortical surface, the visual cortex is subdivided into cytologically and functionally unique modules, forming a mosaic of adjoining areas. Despite sharing the 6-layer organization of the neocortex, each visual area (cortice) exhibits a characteristic laminar cytoarchitecture with subtle differences in layer thickness and cell density, which enables cytological identification. The neuroanatomist Korbinian Brodmann took advantage of this attribute to map the neocortex of various species, utilizing Nissl substance (cresyl violet) staining to reveal the distinct areal borders within the cortical sheet. These maps (Brodmann, 1909) were the first evidence of the arealisation of the neocortex, and have been since refined with more sophisticated anatomical and functional mapping.

The visual message is complex and comprised of many features, including shape, color, speed or direction of a moving object, which are each processed in a dedicated visual area. The processing of the visual information is a stepwise process, with inputs first relayed from the thalamus to the primary visual area (V1) and from there sequentially despatched to "extrastriate" areas organized in a hierarchical fashion through reciprocal connections (Felleman and Van Essen, 1991). The highest order areas in the hierarchy receive a refined message and perform complex integrative and associative processing (Goldman-Rakic, 1988; Mountcastle, 1997). The basic principles of functional organization are relatively conserved across species, however the number of visual areas varies across species, depending on the priority placed upon vision as a source of sensory input. Additional areas allow for in-depth, refined processing providing a more elaborate representation of the visual scene.

Many groups using a variety of techniques and animal models, including rodents, primates and carnivores have been involved in defining their visual cortical maps, resulting in the evolution of diverse nomenclature systems. In his seminal study, Korbinian Brodmann numbered the cortices according to cytoarchitectural criteria (Brodmann, 1909), with V1 originally classified as area 17 and the second visual area (V2) classified as area 18. The emergence of electrophysiological techniques and functional mapping led to a method of nomenclature relating to the area's role, such that the primary visual area gained its name V1. Other visual areas were named depending on their position relative to V1. This system rapidly proved limited as more interleaving areas were identified, and also because of the diverse brain morphologies it was difficult to correlate maps between species. Therefore, a new system was devised, based this time on the spatial position of the area on the cortical surface. Examples include area V5 in the primate, which also received the nomenclature–middle temporal area (MT). In addition, mouse V2 is often referred to as the lateromedial area (LM; Wang and Burkhalter, 2007; Wang et al., 2012). To date, there still does not exist a uniformed system, and the nomenclature varies at the authors' discretion, giving opportunity to confusion. This is no clearer than when dispute occurs over different territories in the visual cortex, or when areas are subdivided.

The specific limits of each cortice have also been the cause of much dispute, often due to the approach used, as each method is based around a particular functional or anatomical property and it is difficult to reconcile maps obtained using distinct strategies. This is clearly illustrated in the visual cortical map of the mouse, a model in which somatosensory and olfactory systems dominate and the small brain size limits accurate electrophysiology mapping (Wagor et al., 1980). Two concurrent studies attempted to resolve this longstanding issue using separate methods. One mapped the cortical fields lateral to V1 and recipient of direct inputs from V1 connections, revealed by triple anterograde fluorescent tracing and electrophysiology (Wang and Burkhalter, 2007). The second was based on the expression of the cytoskeletal marker nonphosphorylated neurofilament (NNF), characterized by its area-specific profile in the visual cortex combined with neuronal activity markers (Van der Gucht et al., 2007). Both groups concluded on the existence of discrete extrastriate areas in the mouse neocortex, however the studies conflicted on the number and location of areas identified. The tracing study demarcated seven domains comprising a complete map of the entire visual field in the region lateral to V1, compared to two subdivisions revealed by early response genes and NNF immunoreactivity. This example highlights the difficulty to reconcile maps generated using distinct methodologies, although the multimodal nature of areas beyond V1 in the mouse adds a level of complexity.

Arealisation is not limited to demarcating the spatial plan of cortical areas; great efforts are put into understanding other aspects of arealisation, including the evolutionary events that have led to the emergence of new cortical areas in higher species during the expansion of the neocortical surface and why the addition of new areas is more advantageous than the enlargement of preexisting ones. Major progress has been made in understanding the evolution of cortical areas by defining the visual maps of a large number of species on different branches of the phylogenetic tree and comparing the cortical organization, number of areas or relative position of areas fulfilling equivalent function. For example, the existence of two processing streams in the primate the dorsal "where" and ventral "what" pathways (Mishkin and Ungerleider, 1982; Ungerleider and Mishkin, 1982; Kravitz et al., 2011), have also recently been purported to be a feature of the mouse visual cortex (Wang et al., 2012), suggesting that it is not exclusive to the primates and that it must have evolved much earlier in the evolution of the visual cortex.

A prerequisite to a comparative approach is the availability of a wide range of cortical maps including atypical species such as the monotremes (e.g., echidna) or the eusocial naked mole rat (Hassiotis et al., 2004; Matsunaga et al., 2011), which is sometimes difficult to achieve using electrophysiological mapping. Therefore, researchers have taken advantage of alternative properties of visual cortical areas to consistently define their borders including molecular and chemical markers, which allow the use of fixed brain tissue.

Molecular markers are extremely powerful at demarcating visual areas, including at early stages of development, essentially before eyes open or the visual system has begun to function. They have therefore prompted major progress in the field of embryonic arealisation, which addresses how the position and identity of individual areas are specified in the developing neocortex. At the onset of corticogenesis, cortical areas progressively acquire their positional identity under the influence of molecular regulators differentially distributed across the developing brain (for review see O'Leary et al., 2007). The potential of creating transgenic animals in which the expression of the cortical patterning factors is perturbed has contributed to the prominence of the mouse in the field.

In this review, we will detail the molecular markers routinely used to define visual cortical areas and the animal models in which this has been employed. We will then comment on the importance of non-primate maps in clarifying the evolutionary relationship between visual areas and cortical expansion. Finally, we will present the current understanding of the mechanisms and actors underlying the specification of areal borders, consisting mainly of studies performed in the mouse but also including recent data from other non-primate species.

# **HOW ARE VISUAL CORTICAL AREAS DEFINED?**

Visual areas can be characterized by many anatomical and functional features. The limiting factor has usually been the unavailability of tools to efficiently detect these specific features. Historically, the characterization of visual cortical organization has been achieved using simple cellular staining techniques, such as Nissl substance (cresyl violet) staining, which stains the rough endoplasmic reticulum, or labeling for the pan-neuronal transcription factor NeuN. The technique is extremely effective at demarcating cortical layers and therefore areas for which layer thickness and/or cell density vary markedly from their immediate neighbors (**Figures 1A,B**). This is specially the case in primates for early areas such as V1, as the cytoarchitecture of higher order areas is more homogenous in terms of their laminar pattern and cell number (Rockel et al., 1980). Therefore, for many years there has been an inability to accurately demarcate the extrastriate visual areas of most species. The application of new staining methods and the advance of antibodies technology has helped characterize more area-specific features enabling identification of discrete cortical nuclei. The techniques presented here are organized according to the specific feature they reveal.

#### **CONNECTIVITY**

Individual areas establish a unique network of inputs and outputs with other cortical areas and subcortical domains. For V1, thalamic afferents form essentially glutamatergic synapses with layer 4 neurons (López-Bendito and Molnár, 2003). During development, thalamic neurons transiently uptake serotonin from the extracellular environment; the "borrowed" neuromodulator is then transported along axons to the neocortex, where it is then released in areas recipient of thalamocortical projections (Lebrand et al., 1996). Therefore, simple immunolabeling for the neurotransmitter is capable of accurately demarcating V1 in the mouse (Chou et al., 2013; Vue et al., 2013).

It is also possible to directly label the tracts using the physical properties of dyes that are transported along the axon from the cell body to the synapse (anterograde) or from the synapse to the cell body (retrograde). These tracers, largely fluorescent, can be used to map the connections emerging from an area of interest or the regions projecting onto the region of interest. This approach, recently utilized in the mouse (Wang et al., 2011) and the rat (Watakabe et al., 2012), can be combined with 3D modeling to provide details on the functional relationship between areas. Additionally, these paradigms can also be applied in developmental studies to determine when areas become wired together and therefore the relative hierarchy of individual areas (e.g., the establishment of thalamocortical connections in the mouse) (Little et al., 2009; Deck et al., 2013). Laramée et al. (2013b) used a combination of red anterograde and green retrograde fluorescent tracers in mice to investigate the consequences of visual deprivation (congenital anophthalmia and perinatal enucleation) on the topography of projections from V1 to extrastriate areas and callosal connections, revealing an important disorganization and reinforcing the importance of retinal input in the establishment of corticocortical circuits. The authors also investigated in the same mice the effect of early loss of sensory-driven activity on the afferent cortical and subcortical projections to V1 using retrograde tracer injection. They traced direct projections from the somatosensory and auditory cortices onto V1 in all three animal groups, demonstrating that multimodality is not a consequence of congenital/perinatal blindness (Charbonneau et al., 2012).

Finally, projections can also be traced by viral mediated expression of reporter proteins. For example, enhanced green fluorescent protein (EGFP) under the control of a neuron specific promoter, such as that for synapsin. The viral particles reach the cell by retrograde transport and express the reporter protein which then distributes into the dendrites and collateral (Tomioka, 2006). This robust Golgi-like stain allows the reconstruction of the dendritic arbor and the morphology of neurons projecting to a specific region (Laramée et al., 2013a).

#### **CELLULAR ACTIVITY**

Certain areas can also demarcated based on their metabolic activity, directly linked to cytochrome oxidase activity in the cell. Therefore, a simple staining technique can be used to quantitatively examine cellular activity in different visual cortices, and compartments within them (Wong-Riley, 1979). This technique is routinely used to locate the representation of the whiskers in the "barrel fields" of the rodent somatosensory cortex (e.g., Li et al., 2013) but is also able to demarcate V1 versus the extrastriate areas and is used in many species including mouse (Airey et al., 2005), cat (Wong-Riley, 1979), ferret (Innocenti et al., 2002), gray squirrel (Wong and Kaas, 2008), short-tailed possum (Wong and Kaas, 2009). In higher species, excluding rodents, cytochrome oxidase staining in V1 reveals characteristic blobs reflecting the columnar organization of visual inputs from the remaining eye in the context of a monoenucleation paradigm in the cat and the squirrel monkey (Wong-Riley, 1979; Carroll and Wong-Riley, 1984).

Visual areas can also be functionally identified by following transient changes in intracellular calcium levels associated with neuronal firing, revealed by synthetic indicators or genetically encoded calcium indicators (GECIs). GECIs are less invasive or damaging for the tissue than synthetic indicators and allow for chronic *in vivo* measurements however early generations produced inferior signals. New GCaMP variants have been engineered offering improved photostability and calcium sensitivity, including GCaMP3 which is capable of detecting transient calcium current with an amplitude linearly dependent on action potential number (Tian et al., 2009). Adeno-associated virus AAV2 coding for *GCaMP3* under the control of the *synapsin-1* promoter was recently used in combination with 2-photon imaging to decipher stimulus preferences in the visual cortex of awake behaving mice (Andermann et al., 2013). The authors reveal that the posterior medial (PM) and the anterior lateral (AL) areas present similar orientation selectivity but different spatial and temporal frequency; PM neurons respond best to slow-moving stimuli and AL neurons to fast-moving targets. These results were confirmed by flavoprotein fluorescence imaging (Tohmi et al., 2014). Two-photon calcium imaging is a cutting-edge approach but requires pre-existing knowledge of the cortical map to determine calcium indicator injection sites, however it allows systematic functional mapping in small animal models, comparably to electrophysiology.

Neuronal activity also triggers the expression of immediate early genes (IEG), such as *zif268* and *cFos* (**Figure 1D**). IEGs are activated transiently and rapidly in response to cellular activity and monitoring their expression by immunostaining or RNA *in situ* hybridization. This can efficiently label visual territories in the vervet monkey, cat, mouse and the rat (Chaudhuri et al., 1995; Lyford et al., 1995; Zangenehpour and Chaudhuri, 2002). To achieve optimal signal-to-noise ratio, experimental animals are first subjected to a period of dark adaptation, to reduce basal activity level to a minimum followed by a brief, intense light stimulation period, immediately prior to perfusion. This technique is particularly effective to determine ocular dominance in mouse V1 by specifically blocking the input from one eye (e.g., eyelid closure or enucleation) during the phase of light stimulation (Van der Gucht et al., 2007). IEGs are also advantageous to study neuroplasticity, especially during development and have been utilized for this in the mouse (Van Brussel et al., 2011; Nys et al., 2014).

The markers presented above are extremely effective at demarcating V1 and associated subcompartments, in non-primate species, but they prove limited in demarcating higher order areas. Higher order areas do not exhibit sharp cytoarchitectural differences, especially in the rodents, however their cellular composition varies greatly which can be captured with cell-type specific markers.

#### **CELL-SPECIFIC MARKERS**

The most frequently used cell-specific protein for mapping visual cortical areas in numerous species has been the nonphosphorylated isoform of high molecular weight neurofilament (NNF). The protein is an intermediate filament, a major component of the neuronal cytoskeleton, and development of a specific antibody—SMI-32 (Sternberger and Sternberger, 1983), led to an explosion in the capacity to further demarcate the extrastriate visual cortex of a number of species. NNF is specifically expressed in the basal and apical dendrites of excitatory cortical neurons in layers 2, 3, 5 and 6 and reveals specific details of the cell morphology (**Figure 1C**). Immunolabeling against NNF reveals the morphology of the dendritic tree, which varies dramatically across visual areas and across cortical layers. NNF expression profile has been established in a large number of non-primate species, including cat, ferret, mouse, rat (van der Gucht et al., 2001; Van der Gucht et al., 2007; Sia and Bourne, 2008; Homman-Ludiye et al., 2010) and is remarkably conserved across equivalent visual areas leading to a clearer understanding of the evolution of species within an order (e.g., in the cat and the ferret visual cortex (van der Gucht et al., 2001; Homman-Ludiye et al., 2010)). In the visual cortex, NNF protein content directly correlates with the conduction speed of an axon (Hoffman et al., 1987; Lawson and Waddell, 1991) and primary sensory cortical areas across modalities exhibit the highest concentration of NNF expression. High levels of NNF protein are found in fast-conducting fibers and cortical areas belonging to the dorsal visual processing stream (Gutierrez et al., 1995; Chaudhuri et al., 1996; Bourne and Rosa, 2003). This property, initially demonstrated in primate species, is conserved in carnivores (van der Gucht et al., 2001; Homman-Ludiye et al., 2010) and rodents (Van der Gucht et al., 2007). Furthermore, NNF can be used to demonstrate the maturation of visual cortical areas, as it is only expressed in structurally mature neurons. This feature has been used to map the development of areas in the visual cortex, primarily in the nonhuman primate (Bourne et al., 2005; Bourne and Rosa, 2006), demonstrating that the MT is a V1 (Bourne and Rosa, 2006; Bourne et al., 2007). Unfortunately, this property of NNF has not been taken advantage of in other species.

Visual cortical areas also exhibit a distinctive expression profile of chondroitin sulphate proteoglycan (CSPG). CSPGs constitute the extracellular matrix of most neurons, they are highly heterogeneous (Matthews et al., 2002) and are first detected at late developmental stages where they are believed to contribute to the transition to an extracellular environment non-permissive to migration (Celio et al., 1998). The antibody clone Cat-301 can detect the CSPGs and therefore, labels the cell body and proximal dendrites (McKay and Hockfield, 1982; Zaremba et al., 1989) around synapses but not the synaptic cleft (McKay and Hockfield, 1982; Hockfield et al., 1990). In the cat and old world monkey neocortex, Cat-301 labeling is restricted to layers 3 and 5 in most areas and, additionally layers 4 and 6 in primary sensory areas (Hendry et al., 1988), with a high degree of variation across association cortex areas which allows for demarcating areal borders. Numerous visual cortices of non-primate species can be demarcated utilizing the Cat-301 antibody, such as the cat (Hendry et al., 1988) and the ferret (Homman-Ludiye et al., 2010). In the visual cortex, similarly to NNF, Cat-301 is preferentially associated with dorsal stream areas in nonhuman primates (Hendry et al., 1988; Hof et al., 1995).

In addition to markers such as NNF and Cat-301, visual areas can also be defined according to the distribution of GABAergic interneurons subtypes. In particular, interneurons expressing the calcium-binding proteins Calbindin-D28k (Cb) and Parvalbumin (Pv) reveal complementary subpopulations of GABAergic

**FIGURE 1 | Demarcation of the primary and secondary visual areas in the mouse adult neocortex using different markers**. Nissl cell staining **(A)** and the neuronal marker NeuN **(B)** are not sufficient to demarcate areal boundaries compared to pyramidal neuron marker nonphorsphorylated neurofilament (NNF, **C**) and the early response gene cFos **(D)**, strongly expressed in V1 compared to adjacent lateral and

medial secondary visual areas (V2L and V2M respectively). The interneuronal markers Calbindin **(E)** and Parvalbumin **(F)** display strong laminar differences with a higher density of Calbindin+ cells in layers 2–4. Stronger Calbindin signal in V1 layer 4 is very efficient at demarcating the borders with adjacent secondary areas. WM white matter Scale bar in **(F)** 500 µm.

interneurons differentially distributed across visual cortical areas (**Figures 1E,F**). Their developmental expression profile has been well documented in the primate visual cortex, revealing an early onset of Cb during corticogenesis and a later upregulation of Pv, around birth in layers 4–6 (Hendrickson et al., 1991) but has yet to be translated into non-primate species. The early expression of Cb is very dynamic in terms of amount, laminar distribution and cell types labeled. After birth and in the adult brain, Cb expression stabilizes in the supraganular layers, whereas Pv expression tends to be associated with cells in the infragranular layers. The interneuron subpopulations do not overlap in the cat visual cortex (Demeulemeester et al., 1988, 1991) but this is less clear in rodents. The role of these molecules remains poorly understood beyond calcium buffering but it has been suggested that Cb is associated with the formation of synapses and Pv, with the onset of functional activation during cortical maturation (Hendrickson et al., 1991). Cb and Pv have been extensively used to map the neocortex of numerous nonprimate species, including the gray squirrel (Wong and Kaas, 2008) and marsupials such as the echidna, opossum, dunnart, antechinus and phascogale (Hassiotis et al., 2004; Ashwell et al., 2008; Wong and Kaas, 2009) in combination with myelin and cytoarchitectural markers. In the opossum, which also possesses a relatively small V1, the expression of Pv is restricted to V1 and does not extend into adjacent areas, while Cb is almost absent from the brain (Wong and Kaas, 2009). However, the highly visual gray squirrel exhibits a high level of Pv and Cb expression across most of the neocortex (Wong and Kaas, 2008), and Pv is very weakly expressed in the limited visual cortex of the echidna (Hassiotis et al., 2004). The comparison of these maps confirms that the expression of the calcium binding proteins Cb and Pv is highly dependent on the activity of a visual area and is upregulated in the visual cortex of species relying on visual input to interact with their environment. Their expression is therefore relative to their function in buffering calcium within the cell.

The 36-amino acid Neuropeptide Y (NPY) is involved in synaptic transmission, cerebral blood flow regulation, and inhibition of neuronal excitability (Raghanti et al., 2013) which is predominantly expressed by GABAergic interneurons. NPY+ interneurons exhibit bipolar, bitufted or multipolar morphology and are more concentrated in layers 2, 3 and 6. In the macaque, NPY+ neurons exhibit an area-specific distribution (Kuljis and Rakic, 1989a) with a high inter-animal variability. In the cat, NPY immunopositive neurons are homogeneously distributed across striate and extrastriate areas 17, 18 and 19, accumulating in layers 5 and 6 where they contribute for 0.2% and 1.5% of the total neuronal population, respectively (Demeulemeester et al., 1988). Whilst no difference in NPY distribution was originally detected in the rat visual cortex (Allen et al., 1983), a more recent analysis of *NPY* mRNA distribution established a two-fold expression increase in V2 compared to V1 at postnatal day 21 (Obst and Wahle, 1995). Visual activity is required to maintain the phenotype of supragranular NPY+ neurons in the rat V1 (Obst et al., 1998). The non-uniform laminar distribution of NPY in axons across areas is less variable between animals than the density of NPY containing somata (Kuljis and Rakic, 1989a,b). Therefore, the relative density of NPY-containing axons can be used as an additional chemoarchitectonic criterion to demarcate and characterize cortical areas. This method can be extended to multiple non-primate species as comparable pattern and density variations of NPY+ neurons have been observed in dolphin, manatee, walrus, seal, elephant (Butti et al., 2011), and species belonging to xenarthra superorder (tree sloths and armadillos) and afrotheria clade (hyraxes and elephants) (Sherwood et al., 2009). In these species, NPY distribution is concentrated again in layers 5 and 6 and the underlying white matter (Butti et al., 2011).

Since cortical areas are classically defined by anatomical, and functional criteria (Kaas, 1995), maps based on a single criterion can be inaccurate making it difficult to reconcile different studies. An example of this can be observed in the demarcation of the mouse visual cortex where different criteria have resulted in different maps (Van der Gucht et al., 2007; Wang and Burkhalter, 2007). By combining the markers and methods we presented above, investigators have been extremely successful in mapping the visual cortex of a variety of species who have a differing reliance on vision, which allows us the opportunity to retrace the evolution of the visual cortex. Achieving this goal requires developing a consensus on the visual cortical map of a particular species, but also across species, and what specific criteria are necessary to define each cortical area. This is of particular importance as advance in technologies provides a great opportunity to identify new areas.

# **EVOLUTION AND HOMOLOGY OF VISUAL CORTICAL AREAS**

The fissure pattern and the overall size of the brain of long extinct species can be deduced from endocasts of their fossilized skulls but being soft tissue, the brain is not preserved making it impossible to establish how the organization of cortical fields has been remodeled across evolution. To retrace the steps that have led to the variety of modern cortical maps, including the complex primate visual cortex, investigators have devised a comparative approach under the principle that the different levels of visual cortex complexity displayed by current species illustrate different steps along the evolutionary path (for review, see Krubitzer and Hunt, 2007). By comparing cortical maps across mammalian orders, one can determine which features are homologous, and therefore inherited from a common ancestor. For example, it was believed that the organization of visual areas into a dorsal stream, specialized in interpreting information relating to the position of an object, and a ventral stream dedicated to object recognition (Mishkin and Ungerleider, 1982; Ungerleider and Haxby, 1994) was exclusively present in primate species. The recent discovery of two processing streams in the mouse visual cortex (Wang et al., 2012) suggests that this trait is homologous in rodents and primates and probably appeared early on in evolution. The diversity of environments colonized by mammals imparts valuable information regarding the stability of the visual system and it is therefore crucial to investigate the largest variety of species possible, facilitated by the use of non-electrophysiological approaches. Some features are actively defended against change across niches such as the specification of V1 and V2 areas, which are both present in the mole rat despite being subterranean and virtually blind (Matsunaga et al., 2013). Alternatively, other characteristics have appeared in a specific lineage as an adaptation to modifications of the ecological niche (Bullock, 1984).

Two important aspects to consider when comparing the visual cortical map of separate species are the brain size and the ecological niche (Finlay et al., 2014). Originally, mammals were nocturnal (Hall et al., 2012) and in every order today, we find nocturnal species possessing a smaller brain and a rudimentary visual system compared to the large-brained diurnal species (Ross, 2000). But it is now evident that a larger brain is not equivalent to a more complex brain (Manger, 2005). The recent comparative analysis of the cat and the ferret visual cortex, two carnivores that diverged 5 million years ago (Bininda-Emonds et al., 1999), revealed the same number of visual areas despite the cat brain being 6-fold larger (30 g versus 5 g) (van der Gucht et al., 2001; Manger et al., 2005; Homman-Ludiye et al., 2010). Similarly, the highly visual marmoset monkey (*Callithrix jacchus*) visual cortex comprises more areas and enhanced visual ability but a comparatively smaller brain than the cat. Therefore, the evolutionary expansion of the neocortical surface (Rakic et al., 2009) does not directly correlate with the addition of visual areas in higher species (Kaas, 1997). It has been proposed that the complexity of neural system, corresponding to the number of cortical divisions and subcortical nuclei, increases with the establishment of a new mammalian order (Manger, 2005).

Analysis of the squirrel visual system, a highly visual diurnal arboreal rodent who shares similar ecological constrains with primates, demonstrates more similitude with mammals which are more closely related to primates, such as the tree shrew, than the mouse (Paolini and Sereno, 1998; Campi and Krubitzer, 2010). This includes the presence of a five-layered laminated LGN compared to the three-layered rat LGN (Kaas et al., 1972; Montero, 1993) and a pulvinar nucleus (Baldwin et al., 2011), a thalamic nucleus absent in most rodents. This observation suggests that the ecological niche exerts more pressure than the boundaries of a phylogenetic group (Campi and Krubitzer, 2010). Some features, including the presence of a complex pulvinar nucleus, reflect adaptive changes or specialization at the level of individual species, taxon or niche (Finlay et al., 2014). Suggestions that the rodent lateral posterior nucleus (LPN) is the equivalent of the pulvinar nucleus (see Lyon et al., 2003a,b; Kaas and Lyon, 2007) are supported by a recent study demonstrating the importance of the superior colliculus-LPN-higher visual areas pathway and that connections with different higher order areas are segregated to specific discrete domains in the LPN (Tohmi et al., 2014). However this organization does not compare to the functional parcellation and exquisite cytoarchitecture characteristic of the primate pulvinars nucleus. The investigation of the developmental origin of LPN and pulvinar nucleus in rodents and primates will certainly help resolve this ambiguity.

Although the suggestion is that a larger brain does not correlate with a more complex brain (Manger, 2005), the addition of new areas is certainly concomitant with the expansion of the cortical surface, however it is unclear if one event prompted the other. The generation of a larger neocortical sheet occurred through modifications of the cell cycle and division mode of cortical progenitors, including expansion of the progenitor pool by increasing cell cycle re-entry. Forcing cell cycle re-entry by upregulating the cell cycle regulators Cdk4 and CyclinD1 in the mouse appears to recapitulate the evolutionary expansion of the cortical surface without thickening of the cortical layers (Nonaka-Kinoshita et al., 2013). Indeed, the human neocortex is 1000 times larger than that of the mouse but only twice as thick (Blinkov and Glezer, 1968; Rakic, 1995). A study in the macaque suggested that differences in cell cycle regulation could also be observed at the level of a single area, revealing higher proliferation rates in V1 compared to V2 (Lukaszewicz et al., 2005). Analysis of the ferret, sheep, cat and mouse neocortex confirmed that mitotic cells do not distribute evenly during development, however this study demonstrated that fast cycling progenitors accumulate in regions undergoing the greatest tangential expansion, corresponding to presumptive gyri (Reillo et al., 2011). It is therefore possible that the more intense proliferation in the macaque V1 compared to V2 is a topologic feature independent of the area identity or function, and reflects the lateral expansion of the primary visual cortex leading to the formation and folding of the calcarine sulcus. The folding of the neocortical sheet is an important feature in the elaboration of a larger neocortex (Zilles et al., 2013) in order to maintain a reasonable head to body size ratio. The pattern of gyri and sulci exhibits inter-individual variation but is largely conserved within a species suggesting a genetic control. Local regulation of *Trnp1* (Stahl et al., 2013) and *GPR56* (Bae et al., 2014) in the mouse induces the formation of folds in the smooth rodent brain, illustrating the importance of multispecies approaches.

While we have garnered a better grasp on the principles of the evolution of the visual cortex and the mechanisms underlying the expansion of the cortical surface, the driving forces leading to the emergence of new visual areas with novel function and an original identity remain largely unknown. Elucidating the developmental regulation controlling the patterning of the neocortex and visual areas identity specification will undoubtedly provide answers regarding the evolution of the visual cortex, including the advantage of adding more areas instead of developing new functions in pre-existing ones.

# **GENETIC SPECIFICATION OF NEOCORTICAL DOMAINS**

Cortical layers originate from the proliferation of progenitor cells (PCs) in the neurogenic compartment of the developing neocortex lining the surface of the ventricle. PCs in the ventricular and subventricular zones (VZ; SVZ) divide symmetrically to generate two progenitor daughter cells to amplify the pool of PCs and expand the ventricular surface laterally (**Figure 2**). Alternatively, asymmetrical PCs division give rise to a single neuron and a PC or an intermediate progenitor cells (IPC) and a PC. IPCs are the main source of cortical neurons, they reside in the SVZ where they divide to produce two neurons or two IPCs (Haubensak et al., 2004; Kawaguchi et al., 2008; Pontious et al., 2008; Kowalczyk et al., 2009). Gyrencephalic species exhibit an enlarged SVZ, divided in an inner and outer compartments, ISVZ and OSVZ respectively, which is absent in non-gyrencephalic rodents

migratory stream, guided by a combination of attracting and repulsive cues. LGE, lateral ganglionic eminence; MGE, medial ganglionic eminence; POA, preoptic area.

(Smart et al., 2002; Lukaszewicz et al., 2005; Zecevic et al., 2005; Dehay and Kennedy, 2007; Bayatti et al., 2008; Martínez-Cerdeño et al., 2012). In addition to IPCs, the OSVZ contains radial glia cells similar to those found in the VZ but they lack an apical process attaching them to the VZ, and possess a single basal process along which the cell body moves during the cell cycle (Hansen et al., 2010; Reillo et al., 2011; Shitamukai et al., 2011; Gertz et al., 2014). OSVZ radial glia cells (oRGC) self-renew and generate neurons directly, participating to the gyrification of larger brains but are also found in limited amount in the mouse cortex (Wang et al., 2011). The newborn neurons then migrate along a radial process in an inside-out fashion to form the cortical layers (Kriegstein and Noctor, 2004; Molyneaux et al., 2007) where they mature and establish short-range connections with neighboring cells and long-range connections with other areas or subcortical regions (for review see Marín and Rubenstein, 2003).

GABAergic interneurons populate the neocortex through a different mode (**Figure 2**): most are born in subcortical domains, the ganglionic eminences (GE) and the pre-optic area (POA; Gelman et al., 2009; Zimmer et al., 2011; Sultan et al., 2013) and migrate tangentially until they reach the neocortex and then switch to a radial mode to integrate into the cortical network (Nery et al., 2002; Ang et al., 2003; Marín and Rubenstein, 2003). This migration mode has been demonstrated in the mouse, however studies suggest that in nonhuman primates, additional waves of interneurons are generated locally in the neocortex and migrate radially along a similar route to that followed by pyramidal neurons (Letinic et al., 2002; Rakic, 2002). The controversial hypothesis of locally born neocortical interneuron populations is appealing because it provides a mechanism by which interneurons might have adjusted to the increasing distance between the traditional interneurogenic sites and the neocortex during the evolutionary expansion of the brain. Recent evidence arguing against a neocortical pool of interneuron progenitors in the embryonic macaque and human (Ma et al., 2013) endeavored to close the debate, however the study focused on early stages of neocorticogenesis and did not analyze later waves of neocortical interneurons which most likely originate locally as they are born in a brain of larger dimension. In addition, the authors analyzed the interneurons emerging from the GE exclusively, without taking into account the contribution of the POA recently demonstrated as a source of interneurons in the mouse (Gelman et al., 2009; Zimmer et al., 2011). Considering the substantial increase of the proportion of interneuron in the neocortex during evolution, which constitute 15% of the total neuronal population in the mouse neocortex compared to 24–30% in primates (for review see Rudy et al., 2011), it is plausible that sites of interneuron genesis must have increased not disappeared, supporting the hypothesis of neocortical interneuron progenitors. Alternative intermediate models, such at the ferret or the cat, with a complex brain likely to comprise a mixed interneuronal population similar to the primate but a simpler visual cortex, will without a doubt play an important role in resolving the debate. Encouragingly, interneuron migratory routes are beginning to be characterized in the developing ferret brain, in the context of cortical dysplasia (Poluch et al., 2008; Abbah and Juliano, 2013).

Although the generation of cortical neurons and interneurons is well characterized, progress on area patterning has been slow. Two opposing models of cortical patterning were originally proposed to explain the phenomenon. The "tabula rasa" hypothesis states that the neocortex begins as a blank slate and is patterned solely by the innervation of thalamic afferents (O'Leary, 1989), while the "protomap" hypothesis argues that cortical identity is predetermined, already present in PCs in the neurogenic zones and subsequently transferred to the progeny (Rakic, 1988). The current theory suggests that in fact, both theories are in play (for review O'Leary et al., 2007). Areas initially acquire their identity through a combination of intrinsic molecular programs and their borders are later refined via signals carried by the thalamic axons, who also provide the cortical domains' functional identity. The precedence of intrinsic over extrinsic signals in conferring area position suggests that new areas could arise from a modification of the gene expression profile present in a particular cortical region at a given time. In order to identify the modifications that have led to more areas, one must first understand the regulatory events in a simple brain with fewer cortical areas, such as the mouse, which also affords the potential for manipulating gene expression at the cellular level.

The first step of cortical patterning is achieved through the graded expression of transcription factors and homeobox genes along the axes of the brain to define domains with a unique combination. In the embryonic mouse brain, the transcription factor Paired Box 6 (Pax6) is expressed in a high anterior/low posterior and high lateral/low medial gradient (Walther and Gruss, 1991; Stoykova and Gruss, 1994). The transcription factor Emx2 is expressed in an opposing gradient, with low anterior/high posterior and low lateral/high medial gradients (Gulisano et al., 1996; Mallamaci et al., 1998). Removing either transcription factor (TF) dramatically affects the organization of cortical areas. In *Emx2* knock out (KO) mice, the anterior territories, including the somatosensory cortex and the motor cortex, expand and take over more posterior domains, leading to a reduction of the visual cortex. The situation is reverted in *Pax6* KO where the visual cortex expands rostrally with detrimental effects on anterior areas (Bishop et al., 2000). This pivotal finding demonstrates that *Emx2* is capable of repressing the "anterior identity" and specify visual identity in the immature cortical plate (Bishop et al., 2000). Similarly, the transcription factor *COUP-TFI* is upregulated in the caudoventral portion of the neocortex (Liu et al., 2000) and promotes caudal area identity including the visual areas (Armentano et al., 2007), in part by downregulating *Pax6* expression along the dorsoventral axis and blocking the "anterior identity" (Faedo et al., 2008).

Gradients of transcription factors across the embryonic neocortex are established by diffusible morphogens, including BMPs, Wnts and Fgfs. Fgf8 and Fgf17 to a lesser extent, are secreted by the anterior neural ridge (ANR) and contribute to promoting anterior identity by negatively regulating the expression of *Emx2* and *COUP-TF1* (Garel et al., 2003; Grove and Fukuchi-Shimogori, 2003; Cholfin and Rubenstein, 2007). Fgf8 upregulates the expression of the zinc-finger transcription factor *Sp8* (O'Leary and Sahara, 2008) which inhibits Emx2 by direct interaction (Zembrzycki et al., 2007) therefore Sp8 contributes to anterior territories specification and represses visual identity (Borello et al., 2014). Using genetic models of loss and gain of function, target genes regulated by Pax6 are slowly being identified (Quinn et al., 2007), shedding light on how the gradual regional identity is propagated from PCs in the neurogenic zones to mature cortical neurons in order to establish areal boundaries. Recent evidence suggests that the positional identity is maintained across the successive differentiation stage and zones by a specific cascade of transcription factors. Tbr2 expression in IPCs, directly activated by Pax6 (Sansom et al., 2009), is detected in a high rostral/low caudal gradient across the SVZ (Bulfone et al., 1999; Krüger and Braun, 2002; Bedogni et al., 2010) reminiscent of Pax6 expression profile in the VZ. The conditional loss of *Tbr2* (also known as *Eomes*) in the mouse neocortex at embryonic day 11 (E11) leads to the downregulation of rostral markers in the CP at E14.5 (Arnold et al., 2008; Sessa et al., 2008; Elsen et al., 2013) and perturbation of the anterior regional identity leading to disorganized somatosensory "barrel fields" (Elsen et al., 2013). Therefore, in addition to promoting IPC genesis, Tbr2 participates to cortical patterning and relays Pax6 positional information (Elsen et al., 2013) in neurons entering the cortical plate by activating the expression of the transcription factor Tbr1 (Englund et al., 2005). *Tbr1* expression is reduced in *Tbr2* conditional knockout mice (Elsen et al., 2013), and anterior patterning is disorganized in *Tbr1* mutants (Arnold et al., 2008; Sessa et al., 2008), suggesting that *Tbr1* carries the rostral identity in the cortical neurons. A similar genetic sequence for the specification of the visual cortex has not yet been identified, however the transcription factor *Bhlhb5* (also known as *Bhlhe22*) is expressed in a profile similar to that of *Emx2* and is thought to regulate the posterior identity acquisition in cortical neurons (Joshi et al., 2008). *Bhlhb5* is therefore a privileged candidate for visual cortex patterning. The patterning of subcompartments within visual areas also comprises an intrinsic component. Researchers investigating the development of ocular dominance columns in the cat visual cortex recently identified the heat shock protein 90 alpha (Hsp90α) to be specifically associated with ipsilateral connections. They reveal that clusters of cells expressing Hsp90α form in the visual cortex 2 weeks before the development of the columns, setting the initial pattern for optical dominance columns (Tomita et al., 2013). The absence of columns in the rodent precludes this research to be completed.

Candidate genes responsible for cortical patterning and visual area specification have mainly been identified in the mouse and it is not known yet to what degree their roles can be translated in higher species. Pax6 patterning function resides in its gradual distribution across the anteroposterior axis during development, demonstrated in the mouse. However, Pax6 is consistently expressed in oRGC throughout the OSVZ of gyrencephalic species (Reillo et al., 2011), suggesting that Pax6 might have lost its patterning properties during neocortical expansion. Quantitative studies comparing gene expression level in various region of the brain, including microarray and quantitative real time polymerase reaction, in gyrencephalic species are needed to validate area specification pathways identified in the mouse. The specification of discrete visual areas is genetically controlled but the functional identity is carried by axons emerging from the visual relay nuclei of the thalamus and projecting to layer 4 in the neocortex. Recently in the mouse, new genetic models that specifically obliterate input to the neocortex, combined with molecular demarcation of area borders, have enabled the elucidation of the role of cortical afferents in area specification. By specifically deleting the expression of the transcription factor *COUP-TFI* in the dorsal lateral geniculate nucleus (dLGN), researchers have demonstrated that geniculocortical inputs drive the genetic distinction between primary and higher-order areas (Chou et al., 2013; Vue et al., 2013). Vue and colleagues also reveal that the surface of V1 in the mouse varies with the modification of the size of the LGN (Vue et al., 2013). These results are recapitulated in **Figure 3**.

The refinement of gene transfer techniques, in particular *in utero* electroporation, can help to bridge the gap with other species. This technique allows for gene transfer in restricted portions of an epithelium by application of a series of electric

pulses (Saito and Nakatsuji, 2001). Groups around the world are taking advantage of this technique to characterize the genes involved in visual cortex patterning in species more dependent on vision, like the ferret, therefore offering a more relevant substrate (Kawasaki et al., 2012). Undoubtedly, the combinatorial distribution of transcription factors has increased with the addition of new visual areas by modifying their expression domain and/or the timing of their expression. We are getting closer to breaking the code underlying the specification of a large number of areas, in particular with the development of microarray in a large number of species and next generation sequencing, which identifies all the gene products present in a given region, including noncoding regulating sequences (Ayoub et al., 2011; Belgard et al., 2011; Bernard et al., 2012; Oeschger et al., 2012). However, it is important to also decipher how these genes affect individual cell behavior, which ultimately leads to the formation of characteristic areal boundaries and the specific function of areas within a specific domain, such as the visual cortex.

# **MOLECULAR CONTROL OF VISUAL CORTICAL AREALISATION**

The transcription factors discussed above exhibit graded expression throughout the developing cortical compartments and it is not known how their "blurry" limits are translated into the sharp boundaries characteristic of the visual areas in the mature neocortex. Spatiotemporal mapping of the visual cortex in different species demonstrates a combinatorial expression of guidance molecules, dynamically regulated during development. Each subtype of guidance molecule defines a permissive or repulsive environment for subsets of cortical neurons. Remarkably, during development the expression of guidance molecules demonstrates sharp boundaries, often matching the borders of the putative area. In addition, guidance cues distributed in an area-specific profile also contribute to guiding intracortical connections as well as connections between the neocortex and subcortical regions, contributing to the specification of an areas functional identity.

Guidance cues are traditionally divided into two categories: secreted molecules that diffuse in the extracellular space and membrane-bound molecules attached to the cell surface and requiring close proximity between the two interacting cells. Interaction between the ligand and its specific receptor(s) expressed on the surface of the target cell, elicits a cascade of intracellular reactions leading to the reorganization of the cytoskeleton. Signaling pathways promoting microtubule polymerization attract responsive cells towards the source of ligand. Conversely, collapse of the microtubule scaffold results in repulsion and the target cell moves away from the source of guidance molecule. The migratory response to a particular guidance molecule is highly influenced by the environment and the combination of receptors and coreceptors expressed on the target cell, thus the same guidance molecule can be both attractive and repulsive (Lehigh et al., 2013).

# **EPH/EPHRINS**

The first evidence of the implication of guidance molecules in area formation illustrated the selective expression of *EphA* family members in the developing macaque neocortex (Donoghue and Rakic, 1999). Eph receptors (A and B) belong to the large family of tyrosine-kinase receptors activated by cell surface ligands, the *ephrins*. Ephrin-As are attached to the membrane via a glycosyl phosphatidylinositol (GPI) anchor, while the ephrin-Bs are transmembrane (Flenniken et al., 1996; Brückner and Klein, 1998). Activation of the receptor often results in repulsion of the cell (Gale and Yancopoulos, 1997; Hattori et al., 2000). The receptorligand interaction is also capable of eliciting a response in the ligand-bearing cell, a phenomenon known as reverse signaling (Holland et al., 1996; Gale and Yancopoulos, 1997). Eph/ephrin signaling is involved in many aspects of development, including blood vessels and topographic organization of retinal projections; animals with defective Eph/ephrin signaling usually exhibit aberrant connectivity (Friedman and O'Leary, 1996; Gale and Yancopoulos, 1997; Flanagan and Vanderhaeghen, 1998; Frisén et al., 1998; Feng et al., 2000; Helmbacher et al., 2000). The *Eph/ephrin* RNA expression profile in the embryonic primate neocortex reveals an area-specific patterning, providing the first evidence of the early specification of presumptive functional domains (Donoghue and Rakic, 1999). Similar analysis in the mouse demonstrates that *EphA6* expression is restricted to the posterior pole of the developing neocortex, suggesting a selective guidance mechanism for excitatory neurons into the future visual cortex (Yun et al., 2003). The specific expression of *EphA6* in the presumptive visual cortex is independent of thalamic inputs as it is not affected in *Mash1* KO animals, which fail to develop inputs from the LGN (Nakagawa et al., 1999; Yun et al., 2003). *EphA7* and *ephrin-A5* are mutually exclusive and absent from the presumptive visual cortex with *EphA7* restricted to the anterior end of the developing mouse neocortex and *ephrin-A5* delineating a specific domain in the middle of the A-P axis (Yun et al., 2003). In *Mash1*−/−, *EphA7* expression domain expands posteriorly and overlaps with *ephrin-*A5 to define a new region (Yun et al., 2003). In addition to steering excitatory neurons to appropriate neocortical areas, activation of EphA7 by ephrin-A5 controls brain size by regulating apoptosis of neural progenitors (Depaepe et al., 2005). The discrete ephrin-A5 expression profile suggests that EphA7/ephrin-A5 dependant apoptosis takes place in an area specific manner, providing an additional regulatory mechanism for area specification. Ephrin-B1 also contributes to excitatory neuron migration by restricting their lateral migration and maintaining the columnar organization of the progeny of a single progenitor cell (Dimidschstein et al., 2013). Unfortunately, this study does not take into account the arealisation of the neocortex. We can hypothesize differential ephrin-B1 regulation at the level of the border between two areas, where the lateral spread of cortical neurons would be more strictly controlled to segregate different populations compared to neurons within an area. In addition to its roles during development, Ephrin-B1 expression is sustained in postnatal and adult marmoset monkey visual cortex (*Callithrix jacchus*, Teo et al., 2012) suggesting a role in maintenance of connectivity and ongoing neuroplasticity which need to be further investigated and confirmed in other species.

We recently described EphA4 expression profile during development, in the visual cortex of the marmoset monkey (Goldshmit et al., 2014), revealing major differences with the mouse, including robust expression of EphA4 on glial cells in the adult, which normally disappears in rodents at the end of neurogenesis. This finding implies that EphA4 bears additional function in the primate visual cortex compared to the mouse. Although these roles have yet to be characterized, it will be important to analyze the expression of Eph/ephrin family members in alternative species to identify potential modifications and associate with the evolution of the neocortex. Despite the prevalence of Eph/ephrin in corticogenesis, few studies have been performed in nonprimate species other than the mouse, except a functional study of the ferret retinothalamic projections (Huberman et al., 2005).

# **CADHERINS**

Another example of guidance molecules implicated in arealisation is the family of adhesion molecules known as cadherins. Cadherins are glycoproteins expressed at the cell surface. These molecules engage in homophilic binding, to confer preferential adhesiveness to cell populations in a calcium-regulated manner (for review Redies and Takeichi, 1996; Takeichi, 2007). Cells expressing the same cadherin within a larger population will specifically aggregate with each other, and separate from cells expressing different cadherins. In addition to this qualitative segregation, cells expressing different levels of the same cadherin will also selectively associate, adding a quantitative variable (Steinberg and Takeichi, 1994). These properties make cadherins ideal candidates to sort cells across presumptive cortical areas. A thorough study of the expression profile of 10 cadherins in the ferret visual cortex, from early embryonic stage to adult, demonstrates a dynamic area-specific and layer-specific expression profile (Krishna et al., 2009). The authors identified several cadherins differentially expressed across the V1/V2 borders with cadherin20 and protocadherin10 selectively expressed in V1 and cadherin8 and -11 restricted to V2. Similarly to the ferret visual cortex, cadherins exhibit a graded and areal pattern in the mouse neocortex independent of thalamocortical inputs, confirming that the initial steps of arealisation are intrinsically regulated (Nakagawa et al., 1999). These observations in nonprimate species have emphasized the crucial role of cadherins in controlling the selective migration of neurons into particular visual areas, prompting similar mapping studies in a primate model, the marmoset monkey (Matsunaga et al., 2013).

# **SEMAPHORINS**

The Semaphorin family comprises secreted and membranebound proteins characterized by a semaphorin domain in Nterminal and an immunoglobulin loop. Members exposed at the cell surface contain an additional GPI anchor and an intracellular C-terminal domain (Kolodkin et al., 1993). They interact with Plexin and Neuropilin (Npn) receptors but are also capable of activating the vascular endothelial growth factor receptor (VEGFR) through the formation of a receptor-complex with Npn (Kolodkin et al., 1997). Semaphorins regulate the migration of a large range of cells, including interneurons (Zimmer et al., 2010; Hernández-Miranda et al., 2011) and endothelial cells (Kutschera et al., 2011). They also control axon pathfinding in the central and peripheral nervous systems (Deck et al., 2013). In the somatosensory system, Sema6A guides thalamic projections to the appropriate domain in the dorsal neocortex. In absence of *Sema6A*, the thalamocortical axons project to a more ventral region of the neocortex, leading to a disorganized barrel field (Little et al., 2009) and modification of cortical domain identity. The barrel field is characteristic of rodent models therefore it is not known if Sema6A patterning potential is conserved in other species.

Using a comparative approach, our laboratory demonstrated that the secreted Sema3A interacts with Npn1 to regulate areaspecific neuron migration in the mouse and the marmoset monkey visual cortex (Homman-Ludiye and Bourne, 2013). Moreover, we suggest that Sema3A, despite being homogenously expressed throughout the developing mouse neocortex (Giger et al., 1998; Polleux et al., 2000), contributes to patterning posterior identity in the mouse through differential expression of its receptor *Npn1* in presumptive V1. The volume of V1 is reduced in *Sema3A* KO animals compensated by an expansion of anterior fields (Homman-Ludiye and Bourne, 2013)*.*

With 20 members interacting with a wide variety of receptorcomplex, semaphorins are great candidates to fine tune the migration of cortical neurons into appropriate cortical domain. Semaphorin activity can also be modulated by components of the extracellular matrix, including CSPG (Kantor et al., 2004) for which the maps illustrating arealised expression in the visual cortex are available in non-primate species (Homman-Ludiye et al., 2010; van der Gucht et al., 2001). Therefore it will be extremely useful to compare the profile of CSPG and semaphorins in a given species to postulate on the potential functional interactions between members of the two families.

# **CONCLUSION**

The visual cortex is one of the most studied neocortical domains, possibly because of the prominent role of vision in a number of species. A large part of vision research is undertaken in primate species however, the organization of the visual system is robust and well conserved across evolution allowing comparison of human gene expression with analogous data in the mouse (Lein et al., 2007). Even virtually blind subterranean species retain a visual cortex (Crish et al., 2006; Matsunaga et al., 2011). Therefore, non-primate species can be examined to understand the evolution and development of visual cortical areas, especially that of man, which are difficult to source, including embryonic tissue, and do not offer opportunity for genetic modifications like the mouse.

Utilizing a wide variety of species can help us understand the major traits of cortical arealisation, as they are expected to present the least cross-species differences and identify what makes the human visual cortex so unique. A recent study reveals that a heavy selection pressure weighs on genes responsible for setting the basic structure of the brain organization, whilst the genes exhibiting cross-species difference have non-widespread expression patterns. This demonstrates a reduced selection pressure on these genes or that distinct, subtle changes may be opted for in divergent species rather than global changes (Zeng et al., 2012). The results reported in this study support the use of mouse as a good model system for the understanding of human brain function while pointing out important differences in the cellular organization between mouse and human brains and the differential functions individual genes may play in each species.

In summary, it is evident that to understand the complexity of a specific sensory system, whether it is its evolution, development or function relies on the analyses of multiple species. While the principal focus has been on primates and rodents, evidence indicates the importance of other species in completing this story. The next decade will most likely focus on closing the gap in our knowledge through comparative studies employing molecular tools, which will not only assist in addressing questions of evolution and development but also in tackling specific neurological issues.

# **REFERENCES**


extant Carnivora (Mammalia). *Biol. Rev. Camb. Philos. Soc.* 74, 143–175. doi: 10. 1017/s0006323199005307


O'Leary, D. D. M., Chou, S.-J., and Sahara, S. (2007). Area patterning of the mammalian cortex. *Neuron* 56, 252–269. doi: 10.1016/j.neuron.2007.10.010


independently. *Neuroscience* 156, 118–128. doi: 10.1016/j.neuroscience.2008. 07.002


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 April 2014; accepted: 19 June 2014; published online: 04 July 2014*. *Citation: Homman-Ludiye J and Bourne JA (2014) Mapping arealisation of the visual cortex of non-primate species: lessons for development and evolution. Front. Neural Circuits 8:79. doi: 10.3389/fncir.2014.00079*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Homman-Ludiye and Bourne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Visual cortical areas of the mouse: comparison of parcellation and network structure with primates

# **Marie-Eve Laramée<sup>1</sup> and Denis Boire<sup>2</sup>\***

<sup>1</sup> Laboratory of Neuroplasticity and Neuroproteomics, Department of Biology, KU Leuven-University of Leuven, Leuven, Belgium <sup>2</sup> Département d'anatomie, Université du Québec à Trois-Rivières, Trois-Rivières, QC, Canada

#### **Edited by:**

Andrea Benucci, RIKEN Brain Science Institute, Japan

#### **Reviewed by:**

Stephen D. Van Hooser, Brandeis University, USA Arianna Maffei, SUNY Stony Brook, USA

#### **\*Correspondence:**

Denis Boire, Département d'anatomie, Université du Québec à Trois-Rivières, 3351 Boul des Forges CP500, Trois-Rivières, QC G9A 5H7, Canada e-mail: denis.boire@uqtr.ca

Brains have evolved to optimize sensory processing. In primates, complex cognitive tasks must be executed and evolution led to the development of large brains with many cortical areas. Rodents do not accomplish cognitive tasks of the same level of complexity as primates and remain with small brains both in relative and absolute terms. But is a small brain necessarily a simple brain? In this review, several aspects of the visual cortical networks have been compared between rodents and primates. The visual system has been used as a model to evaluate the level of complexity of the cortical circuits at the anatomical and functional levels. The evolutionary constraints are first presented in order to appreciate the rules for the development of the brain and its underlying circuits. The organization of sensory pathways, with their parallel and cross-modal circuits, is also examined. Other features of brain networks, often considered as imposing constraints on the development of underlying circuitry, are also discussed and their effect on the complexity of the mouse and primate brain are inspected. In this review, we discuss the common features of cortical circuits in mice and primates and see how these can be useful in understanding visual processing in these animals.

**Keywords: evolution, sensory pathways, feedforward, feedback, hierarchy, connectivity, cross-modal, connectome**

# **IS THE MOUSE BRAIN SIMPLE?**

The mouse presents many advantages for the study of neural functions, circuits and their underlying genetic and molecular mechanisms. Its small size and ease of breeding offer significant advantages over the use of larger, less prolific and more costly housing and care of larger mammals. The mouse is a small mammal and a small rodent, and its brain is both small in absolute and in relative terms. An often represented bivariate log-log plot of brain size over body size clearly shows rodents to be in the most inferior portion of the minimum convex polygon for all mammals. The encephalization quotient of some of the smallest brained rodents is comparable to that of monotremes and marsupials (Striedter, 2004). The question here is to see whether the small size of the mouse brain also indicates its level of complexity. Is a small brain also a simpler brain?

Size has a particular significance in the evolutionary history of mammals because the earliest mammals emerged from particularly small ancestors and were not brainier than their reptilian ancestors (Kaas, 2011; Rowe et al., 2011). Throughout the evolution of mammals, an increase of the relative brain size has appeared independently in several groups, namely in primates, whales and dolphins and elephants. A great evolutionary radiation followed the initial increase of relative brain size, suggesting that more encephalized species were better at invading new niches or adaptive zones. In this respect, rodents appear to contradict this trend. With more than 2000 species and 30 different families, the order Rodentia is the most diverse order of placental mammals (Jansa and Weksler, 2004; Wilson and Reeder, 2005). It is quite stunning that over 40% of all mammalian species are rodents. They are found on all continents and exhibit a wide range of lifestyles from terrestrial, arboreal desert living, to aquatic, fossorial and even some achieve amazing feats of gliding flight. The range of body size varies more than 1000 fold and brain size by 200 fold. Yet, despite this tremendous adaptive radiation, the encephalization quotients of rodents are quite similar.

# **BRAIN SIZE AND NUMBER OF BRAIN AREAS**

The relationship between complexity and brain size is not clear cut. The general principle that larger brains are more complex is generally considered as fact. In their seminal comparative studies of brain size in Insectivores, Chiroptera and Primates, Stephan et al. considered that: ". . . increased size is almost always accompanied by progressive differentiation. . ." (Stephan et al., 1981). This view is challenged by an alternate hypothesis that proposed that: ". . .changes in the complexity of neural systems, in terms of the number of identifiable subdivisions, occur only during the evolutionary events leading to the establishment of a new mammalian order." Therefore, within an order, all species should have the same organization of nuclear systems regardless of life history, brain size and time since evolutionary divergence (Manger, 2005). This hypothesis has been verified for the differentiation of cholinergic, cathecolaminergic and orexinergic nuclear masses in rodents (Kruger et al., 2012), visual cortical areas in carnivores, somatosensory and motor areas in primates and cortical areas in monotremes (Manger, 2005). This particular hypothesis questions the proposal that an increase in brain size necessarily leads to an increase in brain complexity. It implies that the higher levels of complexity of neural systems observed in the larger brains of primates would not be dependent on size but other factors. This hypothesis is interesting and should be further studied. As yet, there is no direct test and robust cladistic analysis of the relationship between brain size, either absolute or relative, and the complexity of the component neural systems.

There is another interesting corollary to this hypothesis. Considering that the mouse is amongst the smallest rodents, its brain would be neither more complex nor any simpler than other rodents regardless of the diversity of lifestyles and brain size. This does not mean that all rodents are identical, but proposes that they should have the same complement of nuclear masses and cortical areas. In this respect, a recent comparison of the cortical organization in several rodents representative of the main suborders, life history trait and levels of encephalization shows a general common pattern of neocortical organization, as well as the diversity of the relative size of the different sensory field and of the central magnification factors within these fields (Campi et al., 2007, 2011; Campi and Krubitzer, 2010; Krubitzer et al., 2011). This survey of rodent cortex shows a quite striking common set of cortical areas that can be found in numerous other orders of mammals. The authors do propose however several differences in the number of cortical areas in different species that would challenge the hypothesis of Manger (2005). For instance, although the ubiquity of the location and presence of the primary visual area in all rodents is not questioned, the number and parcellation scheme of extrastriate visual areas in rodents remains a matter of debate. There have been several attempts to decipher the organization of extrastriate cortices in the mouse (Wagor et al., 1980; Schuett et al., 2002; Van Der Gucht et al., 2007; Wang and Burkhalter, 2007; Garrett et al., 2014) and rat (Espinoza, 1983) as well as in a few other rodents (Thompson et al., 1950; Hall et al., 1971; Kaas et al., 1972; Tiao and Blakemore, 1976; Choudhury, 1978; Espinoza, 1983; Espinoza et al., 1992) and it is yet not clear that all rodents have the same complement of visual areas, as would require Mangers' hypothesis.

# **ORGANIZATION OF RODENT VISUAL AREAS**

In the early literature, Rose had proposed that V1 is surrounded by at least five distinct visual extrastriate areas (Rose, 1929). However, there is no clear cytoarchitectonic differentiation of these areas lateral and medial to V1, and Caviness (1975) proposed that the primary visual cortex is flanked by only the two lateral and medial areas, 18a and 18b respectively. Tracing experiments have shown that V1 projects to several distinct sites in the cortices lateral and medial to V1 in mouse (Olavarria et al., 1982; Olavarria and Montero, 1989; Wang and Burkhalter, 2007; Wang et al., 2012) and rat (Montero et al., 1973). Electrophysiological mapping (Wagor et al., 1980) and optical imaging (Schuett et al., 2002) also suggest the presence of several medial and lateral extrastriate areas in the mouse, although the number and parcellation does not strictly correspond to the anatomical findings. In addition, neurofilament staining revealed delineation of monocular and binocular V1, in addition to two lateral and five medial extrastriate areas (Van Der Gucht et al., 2007). More recent anatomical and functional studies in mice provide quite convincing evidence for the presence of at least 9 extrastriate areas surrounding V1 in mice that exhibit distinct functional properties (Wang and Burkhalter, 2007; Andermann et al., 2011; Marshel et al., 2011; Roth et al., 2012; Wang et al., 2012; Glickfeld et al., 2013, 2014). Whether similar areas are also present in other rodents has not been adequately investigated. According to Mangers hypothesis, these visual areas would be very similar in all rodents. This hypothesis has yet to be thoroughly tested.

The comparison of mice and rats with squirrels is highly relevant. Squirrels are diurnal rodents and rely more on vision than the nocturnal muridae. In this respect, they have higher encephalization quotients and larger visual cortical areas than murids (Krubitzer et al., 2011). Anatomical (Kaas et al., 1989) and electrophysiological mapping (Hall et al., 1971) of the lateral extrastriate cortex in squirrels has led to the suggestion that there is one single visual field representation therein and more visual areas lateral to V2. This conclusion, in light of the more recent information in mice, is rather surprising in that it would suggest a less elaborate parcellation of visual cortical fields in a diurnal highly visual rodent than in a less visual nocturnal rodent. These results on the visual fields of the mouse therefore challenge the present understanding of the evolution of the visual cortex and of its organization.

In the present state of our understanding of the homologies between visual cortical areas in mammals, it is generally accepted that, in the initial mammals, there was a primary visual cortex located in the occipital region of the cortical sheet that appears to be common to all mammals, and that this V1 is flanked laterally by a single area V2 that would also be common to all mammals. This is the simple extrastriate cortex hypothesis (Rosa and Krubitzer, 1999). The opposing "complex hypothesis" states that V1 shares its lateral border and representation of the vertical meridian with multiple visual areas (Rosa and Krubitzer, 1999). The arguments opposing the simple and complex hypothesis have been exposed in detail in the review of Krubitzer on this specific subject and they will not be repeated here (Rosa and Krubitzer, 1999). We do believe however that some points should be reconsidered. The simple hypothesis is supported by the fact that a single representation of the visual field lateral to V1 and making up V2 is found in squirrels and that Sciuridae are considered as representative of the ancestral rodents (see Robinson et al., 1997; in Rosa and Krubitzer, 1999). The tracing of the V1 projections to lateral cortices in squirrels shows a patchy distribution of efferents (Kaas et al., 1989) not much different to what has recently been shown as indications of multiple extrastriate areas in mice (Wang and Burkhalter, 2007). This patchy distribution is presently interpreted, as in primates, to represent connection between related modules from V1 to V2 within a single visual field representation without notable discontinuities. Indeed, in monkeys, cytochrome dense blobs of V1 project to thin stripes in V2 (Livingstone and Hubel, 1983, 1984; Sincich and Horton, 2002, 2005; Sincich et al., 2007) and interblobs of V1 preferentially project to V2 thick and pale stripes (Xiao and Felleman, 2004; Sincich et al., 2010). The modular hypothesis for the visual projections to lateral V2 in squirrels is rather surprising given that there are no demonstrated modules in their visual cortex. There is no evidence for ocular dominance columns (Weber et al., 1977) and, although there are abundant orientation selective neurons, there are no orientation maps in the primary visual cortex of squirrels (Van Hooser et al., 2005a,b; Van Hooser and Nelson, 2006). In addition, the long range intrinsic connections within the primary visual cortex do not show a patchy distribution (Van Hooser et al., 2006) as is shown in mammals with modular visual cortices (Gilbert and Wiesel, 1983; Callaway and Katz, 1990; Malach et al., 1993; Ruthazer and Stryker, 1996; Bosking et al., 1997; Wang and Burkhalter, 2007). In mammals that exhibit functional maps, intrinsic long-range connections in the visual cortex selectively link neurons with similar functional properties and this is apparent by their patchiness (Rockland and Lund, 1982; Rockland et al., 1982; Gilbert and Wiesel, 1983; Malach, 1989; Bosking et al., 1997). Although one study reported the intrinsic connectivity of the squirrel visual cortex to show a patchy distribution (Kaas et al., 1989), another account using retrograde tracing shows no evidence for this patchiness (Van Hooser et al., 2006).

These questions support the need for a reassessment of the distribution and retinotopic and functional map organization of the extrastriate visual areas in the squirrel. For the moment, there is no clear evidence that the squirrel might be all that different than other rodents. The null hypothesis would state that the squirrel would have multiple extrastriate areas adjoining V1, each comprising a complete representation of the visual field as in mice. The internal organization would be, as in other rodents, lacking functional maps and with local connection that are not patchy (Burkhalter, 1989; Rumberger et al., 2001).

# **ON SIZE AND CONNECTIONS**

It is generally accepted as a clear trend in mammalian brain evolution that greater brain size is correlated with an increase in the number of distinct cortical areas (Campos and Welker, 1976; Kaas, 1987) and increased cortical folding. Several hypotheses have been proposed for mechanisms explaining the appearance in evolution of novel cortical areas. The parcellation theory of Ebbesson (1980), although it has been largely discredited, makes several important observations. In its initial formulation, the theory states that complexity and novel brain structures arise through the parcellation of extant structures and by the selective loss of connections of the novel "daughter aggregates". The objections to this theory will not be reviewed here but the main problem is the hard stance on the loss of connections as the main mechanisms for novelty and differentiation of brain structures (Striedter, 2004). The interesting aspect of this theory however is the link between divergence of brain areas and connectivity. The parcellation model could predict that increasing the number of cortical areas would lead to more specialized, less globally connected individual areas.

Another hypothesis has been proposed to explain the formation of novel cortical areas by the aggregation and pulling out of cortical modules. One of the key observations towards understanding this model of cortical evolution by modular aggregation is the presence within cortical areas of heterogeneities, modules, that can be distinguished by specific functional and structural properties (Krubitzer and Huffman, 2000). Such modules are exemplified by whisker barrels, blob and interblob patches of the visual cortex of primates, orientation specific columns etc. Krubitzer proposed that these modules could represent intermediate stages in the emergence of a cortical area. These modules would be under two opposing selective pressures. In some instances the element of these modules would aggregate under the pressure to decrease connection length and increase transmission speed, whereas in other circumstances these modules would be pressed to "pull out" of the area where they are located to form a new cortical area (Krubitzer and Huffman, 2000). These two models of cortical arealization both suggest a link between the multiplication of areas and connectivity.

It is further suggested that this pulling out of specific modules would explain the formation of novel cortical areas and the type of connectivity between the areas within the whole network. As in the parcellation hypothesis, the brain would then evolve toward a less global connectivity and greater segregation of modules. One of the main driving forces for this process would be the optimization of the network through the maximization of processing complexity with minimal costs (Ringo, 1991; Ringo et al., 1994; Cherniak et al., 2004; Chklovskii and Koulakov, 2004). This increase in the number of cortical areas through this process is hypothesized to shape the network structure of the cortex (Krubitzer, 2009) in that there are less long range connections and more short connections in larger brain typical of smallworld types of networks (Bassett and Bullmore, 2006; Krubitzer, 2009).

This proposed model of cortical arealization by modular aggregation and exclusion (see **Figure 1**) would predict that the initial random cortical map has a low clustering coefficient and low node degrees and thus heterogeneous connections. With increasing complexity, neurons start to connect more with other functionally related neurons. This connectivity model leads to the emergence of the scale-free network architecture characterized by higher node degrees and by the appearance of cortical hubs. As functional subnetworks are regrouping, they are pulled out of the initial map to give rise to specialized areas and more specific modules. This results in a higher clustering coefficient and in a small-world network architecture. One could predict that the cortical areas in the mouse would be more highly interconnected than in primates. A recent network analysis of the visual areas of the mouse supports this prediction (Wang et al., 2012). Indeed although the network of visual areas in the mouse approaches a small-world topology because of the numerous extrastriate areas and the evidence for two functional streams as in primates, each area has a much greater connectivity with all the other areas and most of these connections are reciprocal. This will have important functional consequences on the balance between global synchronization and segregation of modules within the cortical network (see below).

column, first level) and the functional architecture of the area (random distribution). During evolution, neurons of similar functions gathered together (left column) to form functional clusters (right column; shaded red, green and blue zones). Those clusters were initially highly interconnected with each other but, as they were pulled-out of the initial map, their segregation became more and more clear and connections between the clusters became less numerous. This resulted in more functionally homogeneous areas (shaded red, green and blue ovals) separated by areas highly connected with all clusters with heterogeneous properties (gray areas). The high number of connections between different clusters and the presence of several hubs (purple dots) in the network corresponds to a scale-free architecture (middle

(highlighted areas of the network). Those modules contain provincial hubs (orange dots) that represent areas highly connected with other areas of the same module. Intermediate areas, which are also connected with other intermediate and multisensory areas, can be considered as connector hubs (turquoise dots). This organized structure resulted in the development of the cortical hierarchy and of the small-world network architecture (middle column, third level). In the left column, colored dots are cell bodies and colored lines represent cortical projections. In the middle column, dots are areas and lines are connections between those areas. In the right column, red, blue and green dots or areas indicate different functional properties. Gray color indicates a heterogeneous function.

#### **SALT-AND-PEPPER LAYOUT IN RODENT CORTEX**

The visual cortex in many species is highly segregated in modules that are distinct with respect to their functional properties and connectivity. Typically, there are ocular dominance columns that receive thalamic input from eye specific thalamic geniculate layers. These have been demonstrated quite clearly in Old World monkeys (Hubel and Wiesel, 1968, 1972; LeVay et al., 1985) and more recently in New World monkeys (Markstahler et al., 1998; Fonta et al., 2000; Xu et al., 2005; Kaskan et al., 2007; Takahata et al., 2014). In the primary visual cortex there are cytochrome oxidase (CO) rich blobs and interblobs (Wong-Riley, 1979; Horton and Hubel, 1981) that have specific connectivity with thick stripes and thin stripes of the extrastriate cortex V2 (see references above). Ocular dominance columns and CO blobs are spatially registered in Old World monkeys but not in New World monkeys (Adams and Horton, 2009). In addition, there are functional columns of orientation selectivity in which cells respond to a specific stimulus orientation in primates (Hubel et al., 1978; Blasdel and Salama, 1986), carnivores (Hubel and Wiesel, 1963; Grinvald et al., 1986; McConnell and LeVay, 1986; Rao et al., 1997), ungulates (Clarke et al., 1976) and tree shrew (Humphrey and Norton, 1980; Bosking et al., 1997).

On the other hand, the visual cortex of rodents is organized in what has been coined a salt-and-pepper distribution of cells, without a columnar grouping of cells that share functional properties (see Ohki and Reid, 2007; Kaschube, 2014). Indeed, even if neurons of the visual cortex exhibit specific functional specializations such as orientation selectivity, they show no evidence of structured functional maps in mice (Niell and Stryker, 2008; Van den Bergh et al., 2010), rats (Girman et al., 1999) or even in more visual diurnal and larger brained rodents such as squirrels (Van Hooser et al., 2005a). However, there is recent evidence for ocular dominance domains in the visual cortex of rats (Laing et al., 2014). Such domains have not been shown in other rodents.

There is however some evidence that the output of the visual cortex of the mouse is organized in functionally distinct streams of information. As in monkey, extrastriate areas are organized in dorsal and ventral streams, with anterolateral (AL) and lateromedial (LM) being the two gateways to these pathways, respectively (Wang et al., 2012). Neurons in AL have a greater orientation or direction selectivity and are tuned to lower spatial frequencies than those in anteromedial (AM; Marshel et al., 2011). There are two independent studies that show that extrastriate visual areas receive inputs from functionally distinct neurons of V1 (Glickfeld et al., 2013; Matsui and Ohki, 2013). These selective projections from the primary visual cortex indicates that the parallel processing is starting at least in V1 for these functional properties even though the neurons in the primary visual cortex are not grouped together in functionally homogeneous modules as in monkeys.

It was believed that the brains of mice and rats were too small and that they did not have sufficient visual acuity to require functional maps. The absence of such maps in the squirrels argues against the hypothesis that brain size and higher visual performance are related to the formation of functional maps (Van Hooser et al., 2005a). It has been considered that the columnar organization is not critical for the emergence of the basic functional cell types in the visual cortex such as orientation and direction selectivity (Van Hooser, 2007).

This salt-and-pepper distribution has often been considered as the manifestation of a random organization of close local cortical connections, in agreement with Peters' rule, which dictates that axons make random connections with dendrites in proportion to their occurrence in the neuropil with no local specificity (see DeFelipe et al., 2002; and Ohki and Reid, 2007 for discussion and references). Although there are some examples which could support a random probabilistic local cortical connectivity (Kalisman et al., 2005), there are several studies demonstrating that the fine local cortical circuitry is highly structured and not a probabilistic function of distance between cells. Indeed, there is evidence for the existence of more highly connected neurons that appear to form structured local subnetworks in the visual cortex of rodents (Song et al., 2005; Yoshimura and Callaway, 2005; Yoshimura et al., 2005). Moreover, at least some subnetworks seem to be related to orientation selectivity (Hofer et al., 2011; Ko et al., 2011). In addition, in the mouse, clonally related neurons have similar orientation selectivity and, even if some do not share this preferred orientation, it suggests that cell lineage is involved in the development of response selectivity and in the determination of the structure of cortical subnetworks (Ohtsuki et al., 2012). These authors suggested that the strong connectivity between sister cells (Yu et al., 2012) establishes a network of neurons that share similar functional properties (Ohtsuki et al., 2012) that could explain the salt-and-pepper organization of the rodent visual cortex. Clonally related neurons share a significant degree of functional properties and neurons of different clones are intermingled in the mouse (Ohtsuki et al., 2012) whereas they undergo less extensive radial dispersion in the monkey (Kornack and Rakic, 1995) and could contribute in the formation of more homogeneous functional columns. However, they note that this explanation is contradicted by the more radially dispersed clonally related neurons in the ferret cortex (Reid et al., 1997). As an alternate scenario, they propose that in species with functional modules in the cortex, each single column could derive from multiple clones and that some mechanisms may act to assemble functionally similar neurons. The initial understanding of the presence of these columns was that they were the result of evolutionary pressure to minimize cortical wiring (Hubel and Wiesel, 1977) and simulations suggest that wiring economy appears as a likely mechanism for grouping of neurons in such columns (Koulakov and Chklovskii, 2001).

#### **CAN THE SALT-AND-PEPPER LAYOUT OF MOUSE CORTEX BE OPTIMAL?**

Wiring length minimization predicts that a salt-and-pepper layout should yield a connectivity pattern with no preferences for a specific orientation (Chklovskii and Koulakov, 2004; see also Kaschube, 2014 for discussion). However, it has been suggested that orientation selectivity can emerge in a salt-andpepper distribution of specific functional cell types and a random connectivity between these cells when there is a specific local connectivity in which the large untuned excitatory and inhibitory components balance out (Hansel and van Vreeswijk, 2012).

There is an increasing body of work that supports the idea that there is not one canonical micro-network in the cortex but multiple more or less interrelated and possibly also parallel subnetworks within the visual cortex in rodents. For example, it has been shown that highly interconnected neurons in layers 2–3 are also preferentially connected to a subgroup of layer 4 neurons (Yoshimura et al., 2005). Furthermore, these authors have shown that connections to layers 2–3 coming from layer 5 pyramidal neurons and from layer 2–3 and 4 inhibitory interneurons do not respect these connection defined subgroups, providing opportunities for information exchange between these fine-scale cortical subnetworks (Yoshimura et al., 2005). In addition, they have shown that fast-spiking interneurons establish reciprocal connections with specific subgroups of pyramidal neurons (Yoshimura and Callaway, 2005). There is no simple and general rule of connectivity between neighboring neurons and different connection rules seem to apply to the different subgroups of neurons. For example, there are also specific connectivity patterns within cells of the visual cortex that are related to cortical output streams. Layer 5 pyramidal cells project to several subcortical targets, namely the striatum, superior colliculus and thalamic nuclei. The probability of connections between these output neurons is related to the pre- and postsynaptic target of the neurons. Specifically, the frequency of connection between corticostriatal pyramidal neurons is greater than between corticocortical or corticotectal pyramidal neurons. Moreover corticocortical neurons are more than three times more likely to maintain local connections with neighboring corticotectal pyramids than with any corticocortical or nonadjacent corticotectal pyramids (Brown and Hestrin, 2009).

If a rule of wiring efficiency or minimization is applied in the formation of columns of functionally similar neurons, this would mean that the wiring costs of one or possibly several subnetworks are limiting factors with possibly increasing brain size. Wiring costs optimization should consider competing costs of local fine scale wiring, local intermodular wiring and also of long distance connectivity (see **Figure 2**). Simulations strongly suggest the functional maps in the cortex arise for minimizing cost of wiring namely between cells with similar orientation specificities (Koulakov and Chklovskii, 2001).

The salt-and-pepper organization of the rodent cortex could simply be the best available compromise for wiring efficiency for the rodent visual system. There is no reason to believe that there is only one optimal solution that would apply to all subnetworks. Each type of cortical subnetwork is likely under different constraints for efficiency and economy of wiring. The forces at work to bring together functionally related cells in a columnar map seem to have favored orientation selectivity in many cases as in primates, carnivores and tree shrews. These forces could simply be counterbalanced by others that apply to other structural and functional properties within these competing subnetworks, resulting in an intermingling of functionally different neurons even though functionally similar neurons might maintain strong interconnectivity. The identification of connectivity at the single cell level combined with genetic analysis of individual neurons will allow for the identification of the wiring optimization constraints for each of the cortical subnetworks. It is proposed here that the optimization of the wiring between small scale and between mesoscale networks will be instrumental in understanding the origin of the modular organization of the cortex in primates and of the salt-and-pepper layout of neurons in rodents.

There is also evidence suggesting that the wiring economy in rodents and primates brains is not governed by the same rules. The white and gray matter increase in size with respect to the increase in neuronal number in rodents and primates but they scale differently (Ventura-Antunes et al., 2013). Indeed, in primates, the white matter increases at a slower rate than the increase in the number of neurons. As a result, for a given number of cortical neurons, there is a smaller volume of white matter in primates than in rodents (Ventura-Antunes et al., 2013). As pointed out by these authors, there is a decreasing connectivity with growth in small-world networks but the increase in size in rodents results in a constant connectivity fraction as a uniform network would (Ventura-Antunes et al., 2013). This supports the idea that the wiring constraints are different in rodents and primates.

The salt-and-pepper cortex of rodents is not necessarily a simple random or even suboptimal cortical organization, but the expression of constraints different to those of modular cortices. The mouse offers many opportunities for the study of the wiring rules and development of cortical subnetworks with more genetic tools than primates. Investigations at this scale of cortical microcircuitry in primates will be necessary to know what they have in common with the mouse.

# **SENSORY PATHWAYS**

The brain of the mouse has fewer cortical areas than primates. However, mice, just like primates, have sensory systems that require several cortical areas to process information from the periphery. The small size of their brain and the fewer cortical areas compared to primates could suggest that either some aspects of the sensory processing are simpler in mice than in primates or that the small size and less differentiated cortex represents the optimal evolutionary solution for the mouse.

#### **ASCENDING SENSORY PATHWAYS**

It is generally believed that ascending lemniscal sensory pathways are organized in parallel channels reaching the primary sensory cortices from which information is then distributed to more specific cortical networks for further analysis. There is indeed almost no cross talk between sensory pathways except for a few cross-projections in which the inferior colliculus (Tokunaga et al., 1984; Shore et al., 2000; Zhou and Shore, 2004, 2006) and cochlear nuclear complex (Wolff and Künzle, 1997) receive trigeminal afferents. The senses come together nevertheless quite significantly in the superior colliculus, where important multisensory interactions are elaborated (Stein and Meredith, 1993). The multisensory interactions that take place in these layers of the superior colliculus do not give rise to ascending multisensory pathways to the cortex, but rather form descending streams involved in motor pathways for body orientation. As a result, primary sensory cortices receive unisensory ascending projections from specific thalamic nuclei (but see below). Unisensory cortices then give rise to parallel feedforward streams of information processing through cortical networks that eventually reach multisensory processing areas, mainly located in the frontal, temporal and parietal lobes, where unified multisensory percepts are believed to be elaborated for conscious perception and action. Multisensory areas can, in return, send modulatory feedback projections to lower cortical areas.

#### **VISUAL STREAMS IN THE MOUSE**

As in primates, extrastriate areas of mice were shown to be distributed in two functional streams. Anatomical and calcium imaging experiments showed that lateral areas LM, laterointermediate (LI), posterior (P) and postrhinal (POR) project to the ventral stream and that lateral areas AL, rostrolateral (RL) and anterior (A) and medial areas posteromedial (PM) and AM are associated with the dorsal stream (Wang et al., 2011, 2012; Glickfeld et al., 2013). The functional properties of the neurons situated in these extrastriate areas also seem to correspond to what is usually found in primates (Andermann et al., 2011; Marshel et al., 2011; Roth et al., 2012; Glickfeld et al., 2013). The functional properties of extrastriate areas therefore seem to have been either conserved or convergent during evolution, although the properties of the neurons will be fine-tuned to fulfill their role in a way that suits each species (see Huberman and Niell, 2011 for review).

#### **CROSS-MODAL PATHWAYS IN PRIMATES**

There is increasing evidence showing that combining information from the different sensory modalities is important in perception and cognition (Murray and Wallace, 2012; Stein, 2012). In classical models of cortical organization, multisensory integration occurs only in high-order association cortices (Felleman and Van Essen, 1991). In monkeys, several areas of the parietal, temporal and frontal lobes are clearly involved in multisensory processing. Multisensory convergence in the cortex of the superior temporal sulcus (STS) was demonstrated by its responsiveness to visual, auditory and somatosensory stimuli (Desimone and Gross, 1979). The cortical areas of the STS receive visual projections from parietal (Seltzer and Pandya, 1978, 1994) and temporal cortices (Boussaoud et al., 1990; Kaas and Morel, 1993; Saleem et al., 2000), auditory projections from the auditory belt (Morel et al., 1993) and parabelt areas (Seltzer and Pandya, 1978, 1994; Hackett et al., 1998) and somatosensory projections from parietal cortex (Neal et al., 1988; Seltzer and Pandya, 1994; Lewis and Van Essen, 2000). There are several areas in the intraparietal sulcus (IPS) where visual, auditory and somatosensory information converge (Cavada and Goldman-Rakic, 1989; Blatt et al., 1990; Hackett et al., 1998; Beck and Kaas, 1999; Lewis and Van Essen, 2000; Nakamura et al., 2001). It is noteworthy here that these sensory inputs to high order association cortices originate from high order sensory cortices and not from primary sensory cortical areas.

In primates, very few neurons project directly from one primary sensory area to another (Falchier et al., 2002; Clavagnier et al., 2004). In rodents, anatomical evidence revealed multimodal inputs in areas surrounding primary sensory cortices in rats (Paperna and Malach, 1991) and mice (Laramée et al., 2011). In contrast to monkeys there are significant direct cross-modal connections between primary sensory areas in marsupials and rodents. They have been observed in opossums (Kahn et al., 2000; Karlen et al., 2006; Dooley et al., 2013), gerbils (Budinger et al., 2000, 2006, 2008; Henschke et al., 2014), prairie vole (Campi et al., 2010), mice (Wang and Burkhalter, 2007; Charbonneau et al., 2012) and rats (Stehberg et al., 2014). Electrophysiological recordings detected multisensory neurons (suprathreshold response to inputs to more than one sensory modality) in the primary cortices of opossums (Karlen et al., 2006), whereas their incidence was quite low in the center of unisensory cortices of rats but increased in their periphery and in higher areas (Wallace et al., 2004). In monkeys, multisensory neurons (suprathreshold response) were only detected in higher areas (Schroeder et al., 2001; Schroeder and Foxe, 2002; Fu et al., 2003; Ghazanfar et al., 2005; Kayser et al., 2005). What is surprising here is that crossmodal connections in rodents result in multisensory suprathreshold responses in primary sensory cortices, whereas they remain undetected in primates. Only spatially and temporally coherent cross-modal stimuli that result in multisensory integration (see Stein and Stanford, 2008 for review) can functionally reveal cross-modal connections in low order cortical areas in primates (Molholm et al., 2002; Ghazanfar et al., 2005; Lakatos et al., 2007; Kayser et al., 2008). This indicates that feedback crossmodal inputs reaching unisensory cortices in monkeys only have a subthreshold influence on the post-synaptic neurons (Allman et al., 2009). The difference between mice and monkeys regarding the presence or absence or multisensory neurons in low order cortical areas could therefore simply be the consequence of the number and strength of cross-modal inputs reaching these areas.

#### **CROSS-MODAL PATHWAYS IN RODENTS**

The presence of quite strong direct cross-modal connections between low order cortical areas in the mouse compared to primates is in agreement with the formation of cortical areas by pulling out of specific functional modules hypothesis. If this is the case, one would therefore expect a higher prevalence of cross-modal connections between primary sensory areas and a higher number of multisensory neurons in areas that are usually considered as unisensory in more primitive mammals. There is indeed a lot of evidence showing that the primary sensory cortices receive more cross-modal projections from other primary sensory cortices in the opossum (Kahn et al., 2000; Karlen et al., 2006; Dooley et al., 2013) and rodents (Budinger et al., 2000, 2006, 2008; Campi et al., 2010; Charbonneau et al., 2012; Henschke et al., 2014) than in primates (Falchier et al., 2002; Clavagnier et al., 2004).

The actual sensory maps in ancestral mammal are not known but it is hypothesized that cross-modal cortical connectivity was greater than in the more derived and segregated cortices (Schneider, 2014). The greater multimodality of the primary sensory cortices in rodents and marsupials would support the idea that the parcellation of unimodal areas from an initial multimodal cortex is incomplete (Schneider, 2014). This does not mean that the rodent cortex is suboptimal, evolution is an ongoing process and each species is a compromise between many competing constraints, but rather that this less segregated state of primary sensory cortices might be, as mentioned earlier, the appropriate adaptive optimum for the behavioral requirements of these animals.

Instead of taking place in very high level temporal and parietal cortices as in primates, multisensory integration in the mouse cortex is achieved in the primary sensory cortices and in the secondary sensory cortices. The greater intermomular connectivity between the visual, somatosensory and auditory cortices (see further) than in primates indicates that these areas of multisensory convergence have not segregated and expanded into the multitude of areas observed in primates. Visual extrastriate areas in the mouse are not unimodal in that they show much evidence for multisensory integration. There are important concentrations of multimodal neurons in the periphery of the primary visual cortex of the rat (Paperna and Malach, 1991). The lateral extrastriate cortex receives direct projections from the primary auditory cortex that terminate on dendrites of neurons that project directly to the primary visual cortex in the mouse (Laramée et al., 2011). The implication of extrastriate areas in multisensory processing is supported by the strong activation of the lateral part of V2 (V2L) following an audio-visual task in the rat (Hirokawa et al., 2008) and by the abundant potential connectivity among multimodal areas surrounding unisensory cortices (Paperna and Malach, 1991). In addition, direct projections from the primary auditory cortex (A1) to V2 have been demonstrated in other rodents such as gerbils (Budinger et al., 2000), prairie voles (Campi and Krubitzer, 2010) and rats (Miller and Vogt, 1984). These projections can further support multisensory processing in V1 through direct feedback connections to V1, which were observed in primates (Rockland and Pandya, 1979; Tigges et al., 1981), tree shrews (Lyon et al., 1998), cats (Squatrito et al., 1981; Symonds and Rosenquist, 1984a,b; Olavarria, 1996) as well as rodents (Olavarria and Montero, 1981, 1989, 1990; Simmons et al., 1982; Coogan and Burkhalter, 1990, 1993). Also, area 2 in mouse, known as the auditory dorsal field, receives projections from auditory, visual and somatosensory cortices as well as from parietal cortices and is clearly involved in multisensory processing (Hishida et al., 2014). Furthermore, a recent study elegantly demonstrated that cross-modal information conveyed by multisensory parietal cortex is implicated in the development of the visual field maps in the primary visual cortex in the mouse (Yoshitake et al., 2013).

The mouse is therefore a very interesting model for the study of cross-modal sensory integration at the level of the primary sensory cortices. These studies are relevant to cross-modal plasticity of the sensory cortices and in this particular case following the loss of vision. Many studies have shown that the visual cortex is activated by other sensory modalities in blind humans (Wanet-Defalque et al., 1988; Kujala et al., 1995a,b, 2005; Sadato et al., 1996; Cohen et al., 1997; Leclerc et al., 2000; Weeks et al., 2000; Burton et al., 2002a,b, 2004, 2006; Burton, 2003; Théoret et al., 2004; Gougoux et al., 2005; Voss et al., 2006, 2008; Weaver and Stevens, 2007; Collignon et al., 2009, 2011). One particular case is of particular significance. It has been demonstrated that in intact sighted human cases, blindfolding induces cross-modal activation of the visual cortex (Pascual-Leone et al., 2005). This demonstrates that there are cross-modal pathways that are functional but possibly silent or subthreshold in the normal visual cortex in humans. Cross-modal pathways in primates and mice are most likely different because, as discussed above, the direct cross-model pathways are more robust in the mouse; but the mouse offers better opportunities than primates to understand these direct routes and their functional significance.

# **CORTICAL HIERARCHY**

Information processing for perception and action appears to require a hierarchical structure of cortical architecture with a dual mode of connectivity between areas by either feedforward or feedback connections. Feedforward and feedback connections are respectively involved in bottom-up and top-down flow of information in the cortex. In primates, feedforward projections arise mostly from supragranular layer 3b, but also from infragranular layer 5, whereas feedback projections originate mainly from infragranular layer 6, but also from layers 2/3a (Rockland and Pandya, 1979; Markov et al., 2014). The laminar distribution of their axon terminals is also distinct; feedforward neurons project onto the granular layer, whereas feedback connections target supragranular and infragranular layers and avoid layer 4 (Rockland and Pandya, 1979). In rodents, feedforward projections arise mostly from supragranular layers and feedback projections mostly originate from infragranular layers. The projection patterns of feedforward connections are quite similar to those found in primates, but the feedforward connections show some differences. In addition to layer 4, feedforward axons in rodents also target the supragranular and infragranular layers (Coogan and Burkhalter, 1990). The difference between feedforward and feedback axonal projections in rodents is therefore the presence or absence of axon terminals in layer 4, respectively.

Bottom-up and top-down pathways allow the identification of the hierarchical relationship between two cortical areas (Rockland and Pandya, 1979; Maunsell and van Essen, 1983; Felleman and Van Essen, 1991; Coogan and Burkhalter, 1993; Scannell et al., 1995). With this organization scheme, the visual system comprises two functional streams with several hierarchical levels in primates (Maunsell and van Essen, 1983; Felleman and Van Essen, 1991; Barone et al., 2000; Vezoli et al., 2004; Markov et al., 2014) and cats (Scannell et al., 1995). A similar organization has also been recently described in mice even if they have fewer cortical areas than primates (Wang et al., 2012). This suggests that, notwithstanding the small size of the brain and the limited number of cortical areas in the mouse, a hierarchical scaffold is still present. Moreover, the ubiquity of the hierarchical organization of the cortex in these diverse animals suggests that it emerged in a quite distant common ancestor, and that it is a very efficient strategy or design for sensory processing.

# **MODELS OF CORTICAL ORGANIZATION**

The study of the mouse visual cortex from Wang et al. (2012), suggest a similar hierarchical organization in mice and primates, with fewer areas and potentially fewer hierarchical levels in the mouse. This suggests that the rules governing the establishment of cortical circuits have been conserved during evolution. Models have been developed over the years to study how cortical circuits are established in primates, but also in other species. The first evidences suggested that cortical connections depend on the hierarchical relationship between two interconnected areas, with areas or the same hierarchical levels being highly connected. However, further investigations using connectivity matrices revealed that only a small percentage of connections actually fit the hierarchical model (Scannell et al., 1995). This indicated that other factors also participate in the establishment of cortico-cortical connections. Mitchison (1991) proposed that cortico-cortical connections should be organized in a way to optimize cortical wiring in order to limit energy costs. This theory led to the "nearest neighbors" model, which stipulates that adjacent areas are highly connected and distant areas are weakly connected. This model fits quite well with the anatomical evidences from the visual system (Young, 1992) and neocortex (Young, 1993) of primates and the neocortex of cats. The alternate "next-doorneighbor-or next-door-but-one" model proposes that, connections between adjacent areas are strong, those between areas that have few common neighbors are moderate and where those between areas having only one common neighbor are weak. This model was shown to fit better with the connectivity profiles than the nearest neighbor model (Young, 1992; Scannell et al., 1995) and could constitute a trade-off in term of energy and biochemical costs.

Since the years 2000, a new approach has been used to understand how cortico-cortical circuits are established. Instead of looking only at the presence or absence of connections, numbers of projecting neurons with respect to the total number of neurons projecting to the area of interest are now being counted in order to determine the weight, or strength, of the connections (Vezoli et al., 2004). In the macaque visual cortex, connections were found to be very dense between neighboring areas and weaker with more distant areas (Markov et al., 2011). A close relationship between the strength of the connections and the hierarchical distance was also demonstrated (Markov et al., 2014). The study of Markov et al. (2011) also elegantly demonstrated that the density of cortico-cortical connections obey a lognormal distribution spanning across nearly six orders of magnitude, regardless of the cortical areas. Other studies have also found this lognormal organization of cortico-cortical connections with an order of magnitude of 5 in the neocortex of monkeys (Ercsey-Ravasz et al., 2013) and mice (Oh et al., 2014). In the visual system of mice, a lognormal distribution was also found but had a smaller (2–3) order of magnitude (Wang et al., 2012).

The order of magnitude of the lognormal distribution indicates the difference in amplitude between the strength of all possible connections in a system. As mentioned above, the distribution of cortico-cortical connections depends on the physical and hierarchical distances between areas, nearby areas having stronger connections and thus higher connectivity indexes (Markov et al., 2011). In monkeys, the order of magnitude was found to be slightly above 5 for the whole neocortex and visual system. An order of magnitude of 5 was also found in the mouse neocortex, whereas its visual system had an order of magnitude reaching only 2–3, depending on the extrastriate area. The order of magnitude of the neocortex in both mice and primates (order of 5) could indicate that, although mice have a smaller brain size than primates, similar relative physical and hierarchical distances and similar intensity of connections between cortical areas can be found in both species. In the visual system, however, the smaller number of orders of magnitude in the mouse (order of 2–3) compared to primates (order of 5) could indicate that fewer hierarchical steps are involved in visual processing. This would be consistent with the fact that the visual system of rats (and possibly mice) consists of only 3 hierarchical levels (Coogan and Burkhalter, 1993), whereas the visual system of primates has up to 10 levels (Felleman and Van Essen, 1991; Markov et al., 2014). These results also suggest that the visual cortical network in mice is less complex than in primates.

# **ON COMPLEXITY**

Simplicity or complexity of the brain is not easily defined, and a single metric that can allow a scaling of different species with regards to complexity remains elusive. We will not review here theories on complexity as a very insightful review of the definition of complexity in the brain is provided by Sporns and collaborators (see Sporns, 2011). More specifically, they propose that complexity in brain circuits emerges through the interaction and equilibrium between the functional segregation of defined local areas and the interactions between these areas (Tononi et al., 1994; Sporns, 2011). In neuronal systems, each component should have some distinct functional properties and functional autonomy and these should be linked in such a way that allows for system wide coordination. There is no doubt the brain is composed of functionally segregated subnetworks from levels of organization ranging from cellular to brain-wide systems. The cerebral cortex is typically organized in areas that have distinct functional properties and connections and hence cytoarchitectonic features such as the relative importance of cortical layers. This group proposed a measure of neural complexity that "reflects the interplay between functional segregations and integration within a neural system" (Tononi et al., 1994). In this model (see **Figure 1**), cortico-cortical connections are links between nodes (cortical areas), which are clustered into modules (e.g., sensory systems). The connections between modules are established by two levels of hubs: connector hubs transfer information between modules and provincial hubs are highly connected with all nodes of the module and with the connector hubs. The complexity of the network will be dependent on the functional and anatomical parcellation of groups of neurons and the connectivity within and between these groups or areas of the cerebral cortex. Small-world architectures are characterized by high node clustering and short path lengths, whereas scale free networks are featured by a small number of highly connected hubs (see Sporns, 2011). In this sense, scale free networks scale lower in modularity and could be less complex that small-world or hierarchical modular networks in which the higher modularity would support greater functional segregation of the nodes. A series of studies by this group showed that greater system complexity arises in hierarchical modular smallworld type networks (see Sporns, 2011 for a more complete bibliography).

#### **BRAIN NETWORKS**

The network analyses performed on mouse anatomical data<sup>1</sup> suggest that the mouse cortex is organized in modules linked by connector hubs, as in primates and exhibits high levels of clustering, as in higher mammals. A small-world architecture is therefore also a feature of the mouse cortical network (Oh et al., 2014; see also Sporns and Bullmore, 2014 for critical comments; Wang et al., 2012). However, whereas cortical networks in cats and macaques (Hilgetag et al., 2000; Sporns et al., 2002) and humans (He et al., 2007; Iturria-Medina et al., 2007, 2008; and see Sporns, 2011 for more complete references) exhibit a clear small-world architecture, with a high clustering, short path lengths and multiple hierarchical levels, there is evidence for high node clustering and hub nodes in mouse cortical networks. This organization is more consistent with a scale-free architecture and the mouse network has therefore been considered intermediate between small-world architecture and scale free architectures (Sporns and Bullmore, 2014).

In the mouse visual system, more specifically, the organization of the network also shows some modularity and some properties of small world networks, but it also, as the whole cortical network, shows less distinct modularity and quite high connectivity between modules, even though some particular areas appear to be positioned to act as hubs for specific pathways (see Wang et al., 2012). There is evidence for functional modules that could correspond to a dorsal and a ventral stream of processing as in primates. There is however a wealth of weak connections both within and between these modules. The abundance of weak intermodular connections has important functional consequences (Goulas et al., 2014). Greater intermodular connectivity increases the global synchronization of the whole network, whereas less intermodular connectivity shifts the dynamic balance toward a greater local network synchronization and functional segregation between modules (Gómez-Gardeñes et al., 2010; Zhao et al., 2011; and see also Goulas et al., 2014 for more discussion). This would indicate that the visual system network in the mouse is based on a similar scaffold as monkeys in being close to a smallworld network and having similar two streams of information flow, and would be less functionally segregated than monkeys mainly because of the many weak links between all the network components.

Network analyses of cortical connectivity are largely based on the assumption that the strength of a connection is a function of the number of terminals or synapses in a given connection. This view of an anatomical democracy has been challenged by recent evidence that glutamatergic corticocortcal connectivity is not functionally homogeneous. Indeed, studies have shown functional classes of glutamatergic postsynaptic responses that appear to be correlated with presynaptic terminal size (Covic and Sherman, 2011). Moreover, these authors define functional classes in which corticocortical class 1B connections terminate on postsynaptic sites with ionotropic receptors whereas type 2 corticocortical connections terminate on postsynaptic sites with metabotropic receptors (Covic and Sherman, 2011; De Pasquale and Sherman, 2011, 2013). This functional heterogeneity strongly suggests that not all cortical contact exert the same influence on postsynaptic neurons. Network analyses based only on terminal or neurons number might not provide a sufficient overview for understanding the functional architecture of cortical connectivity.

<sup>1</sup>The connectome of the mouse is being produced by several endeavors such as the Allen Brain Atlas (http://www.brain-map.org/), the Brain Architecture Project (http://brainarchitecture.org/) and the Mouse Connectome Project (http://www.mouseconnectome.org/).

The Allen Brain Atlas and the Brain Architecture Project are also working on the connectome of other species.

# **CONCLUDING REMARKS**

While primates evolved to become large animals with large brains, mice remained small and so did their brain. The mouse brain has both similarities and differences with the primate brain. It is different in that it has fewer cortical areas with fewer visual areas and extensive cross-modal and intermodular cortical connections. Ocular dominance columns are also lacking and, instead, a salt-and-pepper organization is found in mouse visual cortex. Moreover, the brain of the mouse and primates share a similar hierarchical organization based on largely reciprocal feedforward and feedback connections. In addition, cortical connectivity follows similar distance rules in that close areas are more strongly interconnected than distant areas. The visual cortical areas of mice and primates are also similar in that the extrastriate areas are distributed in two functional streams that share many similar functional properties.

Overall, these features show that although the mouse brain and primate differ in absolute and relative size, in the number of hierarchical levels and in the diversity of cortical areas and their modular parcellation, several key features are shared between these animals. Cortical connections develop according to similar wiring rules even though the optimal solutions for wiring economy appear to be different. In the visual system, extrastriate areas are organized in similar functional streams even though the primary visual cortex exhibit very different modular organizations in mice and primates.

# **ACKNOWLEDGMENTS**

This work was supported by a FRQS postdoctoral fellowship to Marie-Eve Laramée and institutional (UQTR) sabbatical support and an NSERC grant to Denis Boire.

# **REFERENCES**


superior temporal visual areas in the macaque. *J. Comp. Neurol.* 296, 462–495. doi: 10.1002/cne.902960311


in macaque visual cortex. *J. Comp. Neurol.* 522, 225–259. doi: 10.1002/cne. 23458


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 August 2014; accepted: 09 December 2014; published online: 07 January 2015*.

*Citation: Laramée M-E and Boire D (2015) Visual cortical areas of the mouse: comparison of parcellation and network structure with primates. Front. Neural Circuits 8:149. doi: 10.3389/fncir.2014.00149*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2015 Laramée and Boire. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# A simpler primate brain: the visual system of the marmoset monkey

# *Samuel G. Solomon1\* and Marcello G. P. Rosa2,3,4 \**

<sup>1</sup> Department of Experimental Psychology, University College London, London, UK

<sup>2</sup> Department of Physiology, Monash University, Clayton, VIC, Australia

<sup>3</sup> Monash Vision Group, Monash University, Clayton, VIC, Australia

<sup>4</sup> Australian Research Council Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, VIC, Australia

#### *Edited by:*

Davide Zoccolan, International School for Advanced Studies, Italy

#### *Reviewed by:*

Gregor Rainer, University of Fribourg, Switzerland Jude F. Mitchell, University of Rochester, USA

#### *\*Correspondence:*

Samuel G. Solomon, Department of Experimental Psychology, University College London, 26 Bedford Way, London WC1P 0AH, UK e-mail: s.solomon@ucl.ac.uk; Marcello G. P. Rosa, Department of Physiology, Monash University, Clayton, VIC 3800, Australia e-mail: marcello.rosa@monash.edu

Humans are diurnal primates with high visual acuity at the center of gaze. Although primates share many similarities in the organization of their visual centers with other mammals, and even other species of vertebrates, their visual pathways also show unique features, particularly with respect to the organization of the cerebral cortex. Therefore, in order to understand some aspects of human visual function, we need to study non-human primate brains.Which species is the most appropriate model? Macaque monkeys, the most widely used non-human primates, are not an optimal choice in many practical respects. For example, much of the macaque cerebral cortex is buried within sulci, and is therefore inaccessible to many imaging techniques, and the postnatal development and lifespan of macaques are prohibitively long for many studies of brain maturation, plasticity, and aging. In these and several other respects the marmoset, a small NewWorld monkey, represents a more appropriate choice. Here we review the visual pathways of the marmoset, highlighting recent work that brings these advantages into focus, and identify where additional work needs to be done to link marmoset brain organization to that of macaques and humans.We will argue that the marmoset monkey provides a good subject for studies of a complex visual system, which will likely allow an important bridge linking experiments in animal models to humans.

**Keywords: vision, retina, thalamus, striate cortex, extrastriate cortex, Callitrichidae**

# **INTRODUCTION**

Despite advances in non-invasive techniques for study of the living human brain, animal studies are still a necessary approach for understanding the nervous system. Many of the biochemical and physiological operations carried out by neurons represent common, fundamental functions that need to be carried out by all nervous systems. Moreover, the basic anatomical plan of organization of the mammalian nervous system is constrained by a common set of developmental mechanisms, which lead to a similar set of subdivisions and interconnections among adults of different species (Krubitzer, 2007). For these reasons non-primate animal models are often appropriate for addressing scientific questions that cannot be explored in humans. Yet, while it is important to recognize the fundamental similarity of nervous systems in general, and mammalian brains in particular, there are also clear variations, which often translate into marked differences in sensory, motor, and cognitive capacities (e.g., Padberg et al., 2007; Buckner and Krienen, 2013; Chaplin et al., 2013b; Fjell et al., 2014).

The visual system is a case in point. The evolution of human societies has been linked to the emergence of a sophisticated visual system, which we share with other primates. For most of the evolution of humans as a species, the capacity to see the world in sharp, colorful, three-dimensional detail, to understand, differentiate, and remember objects in complex contexts, and to use vision to guide skilful behavior have been important to survival.

Whereas other animals have eyes that afford higher acuity (e.g., Fox et al., 1976; Reymond, 1987) or more complex color vision (Marshall and Oberwinkler, 1999; Sabbah et al., 2010), it is the balance between evolution of the eye and brain, including in many cases specific anatomical characteristics, that sets primates apart from other groups of animals, including members of other mammalian orders. Thus, research on non-human primates remains, in many cases, the only way to gain insight to many neural systems that are of particular importance to human cognition and health.

The most widely used non-human primate models in neuroscience research, including the visual system, are the various species of the genus *Macaca* (macaque monkeys; for discussion, see Rosa and Tweedale, 2005; Manger et al., 2008). However, the macaque is not always the best model for investigating the primate visual system. As we will argue below, these limitations become particularly obvious when one considers emerging technologies for physiological and developmental studies of the visual system. We propose that the marmoset monkey (*Callithrix* spp.) offers distinct advantages in many contexts, which allow new avenues of investigation of visual anatomy and function. Although no single species is likely to represent the "ideal" model for every scientific question, the marmoset can provide a powerful counterpart to macaque for understanding brain systems that are sufficiently derived, in evolutionary terms, to demand investigation in a primate.

Here we describe the current state of knowledge of the organization of the marmoset visual system, from the retina to the cortex. In order to make this review tractable, we will generally only include references to the work done in marmosets; comparative references can be found within those primary sources. Many features of the marmoset visual system are shared with macaques and humans, and we will not repeatedly highlight those similarities. When applicable, we will note differences, particularly those that may be important in experimental design. We will demonstrate that, unlike as recently as 20 years ago, there is now a substantial body of knowledge on the visual system of the marmoset, which provides a strong foundation for future work.

#### **THE MARMOSET BRAIN**

In general, the term "marmoset" refers to over 20 species of South American monkeys of the family Callitrichidae, which are characterized by small body size, agile movements, and the presence of claw-like nails on the hands and feet. By far the most commonly used species in laboratory studies is the common marmoset (*Callithrix jacchus*); in this review the term"marmoset"will refer to this species. Marmosets naturally live in family groups of 10–15 individuals, are day-active, and inhabit the upper canopy of forested areas, although they are highly adaptable and can be found in urban fringe areas. The adult body size rarely exceeds 20 cm (excluding the long, non-prehensile tail), and body weight is approximately 300 g (Stevenson and Rylands, 1988). Gestation is approximately 5 months, and breeding females generally give birth twice a year, most frequently to non-identical twins. Sexual maturity is reached around 18 months, and the average life span in

captivity is about 13 years (Chandolia et al., 2006; Nishijima et al., 2012). Marmosets remain in their social group until adulthood and are cooperative in caring for their offspring.

**Figure 1** illustrates the external morphology of the marmoset brain, with visual and visual association cortical areas highlighted. The marmoset brain (∼8 g) is approximately 12 times smaller in volume than that of the rhesus macaque, and 180 times smaller than the human brain (Stephan et al., 1981). **Figure 1** readily conveys one of the key advantages of the marmoset as a model for studies of the visual system: the relatively smooth topology of the cerebral cortex. Thus, in marmosets the vast majority of the visual cortex lies exposed on the surface of the cerebral hemispheres. The only known exceptions are those portions of visual cortex buried in the banks of the calcarine sulcus: that is, the representation of the peripheral visual field in the primary visual cortex (V1; Fritsches and Rosa, 1996), small sectors of the peripheral representation in the second visual area (V2; Rosa et al., 1997), and area prostriata (Yu et al., 2012).

# **THE MARMOSET EYE**

#### **OPTICS AND PHOTORECEPTOR DISTRIBUTION**

The marmoset eye is large compared to its body weight and brain size, with a diameter of about 11 mm. For details, we direct the reader to the fine schematic marmoset eye provided by Troilo et al. (1993). The size of the marmoset eye is such that near the fovea the retina samples the image with a resolution of about 128 μm/degree. The major distinguishing feature of the primate retina, the *fovea centralis*, appears morphologically similar in the marmoset and Old World monkeys. Cone photoreceptors

**FIGURE 1 | Lateral (left) and medial (right) views of the marmoset cerebral cortex, showing the location of visual areas.** The images are representations of the reference brain reconstructed in detail by Paxinos et al. (2012). Names within parentheses indicate the names of likely homologous areas in macaque brain. Colors denote different subdivisions of visual cortical pathways, as follows. Magenta: primary visual cortical area (V1). Pink: visuotopically organized areas of extrastriate cortex. Green: posterior parietal cortex. Dark blue: inferior temporal cortex. Light blue: polysensory areas of the superior temporal cortex. Orange: "limbic" visual areas. Yellow: frontal cortex visual association areas, including frontal eye fields. Abbreviations: 8aV, cytoarchitectural area 8a ventral; 23V, cytoarchitectural area 23 ventral; AIP, anterior intraparietal area; DA, dorsoanterior area (probable homolog of macaque area V3a); DI, dorsointermediate area; DM, dorsomedial area (probable homolog of macaque area V6); FST, fundus of superior temporal area; FSTv, fundus of

superior temporal ventral area (probable homolog of macaque cytoarchitectural areas PGa and IPa); ITc, caudal inferior temporal area (probable homolog of macaque area TEO); ITd, dorsal inferior temporal area; ITv, ventral inferior temporal area; LIP, lateral intraparietal area; MIP, medial intraparietal area; MST, medial superior temporal area; MT, middle temporal area (probable homolog of macaque area V5); MTC, middle temporal crescent (probable homolog of macaque area V4T); OPt, cytoarchitectural area OPt; PEC, cytoarchitectural area PE caudal; PG, cytoarchitectural area PG; PGM, cytoarchitectural area PG medial; PPM, posterior parietal medial area (probable homolog of macaque area V6a); ProSt, area prostriata; STP, superior temporal polysensory area (probable homolog of macaque cytoarchitectural area TPO); TF/ TL, cytoarchitectural areas TF and TL; V1, primary visual area; V2, second visual area; VIP, ventral intraparietal area; VLA, ventrolateral anterior area (probable homolog of macaque area V4); VLP, ventrolateral posterior area (probable homolog of macaque area V3).

are small and packed at high density, rod photoreceptors and blood vessels are absent, and the post-receptoral elements are displaced across the retina by up to 1 mm from the photoreceptors (Wilder et al., 1996). The combination of cone density and optical clarity means that potential visual acuity is much higher at the fovea than anywhere else. Cone density reaches approximately 200,000 cones/mm<sup>2</sup> in the marmoset fovea (Troilo et al., 1993; Wilder et al., 1996; see also Finlay et al., 2008), similar to the peak cone density in macaques and humans (Curcio et al., 1987). The spatial resolution of the photoreceptor mosaic in the marmoset is therefore estimated to be close to 30 cycles/degree, which is near the spatial acuity found in behavioral measurements (Ordy and Samorajski, 1968). Rod photoreceptors are effectively absent from the fovea – they rise to a peak density of approximately 70,000 rods/mm2, at about 15◦ from the fovea (Goodchild et al., 1996; Wilder et al., 1996). The absolute size of the rod-free foveal zone is similar in marmosets and larger primates (Franco et al., 2000; Finlay et al., 2008), and the ratio of cones to rods in peripheral marmoset retina is higher than that in macaque and human retina (Wilder et al., 1996), so marmoset vision may be cone-dominated over a larger fraction of the visual field. Functional correlates of these species differences are yet to be established.

The relatively short gestation time of marmosets makes it easier to study the developing eye and retina, including the emergence of an avascular zone at the fovea and the associated changes in neural organization (Hendrickson et al., 2006, 2009; Springer et al., 2011). The fovea emerges relatively late in marmoset development, but develops rapidly (Hendrickson et al., 2006, 2009). Recent adaptive optics measurements (Coletta et al., 2010) confirm that marmosets are generally hyperopic in early life and become myopic with age. The rapid postnatal maturation of marmosets makes them useful in understanding the neural changes that accompany developmental disorders, including myopia (Troilo and Judge, 1993; Nickla et al., 2002; Troilo et al., 2007), retrograde degeneration triggered by lesions of the visual pathway (Hendrickson et al., 2013), normal aging (Böhm et al., 2013), and potentially diseases related to primate retinal specialization, such as foveal detachment and macular degeneration.

#### **CONE PHOTORECEPTOR CLASSES**

The spectral sensitivity of a photoreceptor is defined by the type of opsin that it expresses, and primate cone photoreceptors can be divided into two classes – those most sensitive to shorter ("blue") wavelengths, and those most sensitive to longer ("red," "green") wavelengths (reviewed by Jacobs, 2008). Shorter wavelengths are subject to greater scatter by the atmosphere and optics, and are not focused at the same point as longer wavelengths, making them less useful for fine spatial vision. Cones most sensitive to short wavelengths (S-cones; "blue"; peak wavelength 423 nm) are relatively rare (5–10% of all cones), are smaller than other cones (Martin and Grünert, 1999), and show some molecular similarities to rods (Craft et al.,2014). These S-cone photoreceptors appear more irregularly distributed in the marmoset (Martin et al., 2000) than in macaque and other Old World monkeys and apes, and are present (at low density) at the center of the fovea (Martin and Grünert, 1999; Hendrickson et al., 2009); some other quantitative

aspects of S-cone distribution may also differ from those in the macaque and human retina (Curcio et al., 1991).

In primates the opsins associated with sensitivity to mediumlong wavelengths are encoded on the X-chromosome. In macaques and humans, the genes for opsins most sensitive to long (L-cones; "red") and medium (M-cones; "green") wavelengths lie in sequence, and a locus control region controls which opsin is expressed in an individual photoreceptor. In marmosets and several other New World monkey species there is instead a single locus, where distinct opsins are encoded as allelic variants (Jacobs, 2008). In the marmoset three alleles code opsins that are most sensitive to 543, 556, or 563 nm (Travis et al., 1988; Tovée et al., 1992; Williams et al., 1992; Hunt et al., 1993; Shyue et al., 1995); which opsin is expressed in females is dictated by inactivation of one of the X-chromosomes early in development. The result is that male marmosets are dichromatic ("red–green color blind"), because the longer wavelength photoreceptors all have the same peak sensitivity. Those female marmosets carrying two distinct alleles are trichromatic, with color vision that depends on the particular combination of opsins present. There is a good match between the capacity for color vision as predicted from opsin genotype and that observed behaviorally: in particular, trichromatic females show behavioral color vision consistent with presence of cone-opponent mechanisms in red–green region of the visible spectrum (Tovée et al., 1992; for similar behavioral work in marmosets other than *C. jacchus*, see also Pessoa et al., 2005; Caine et al., 2010). At mesopic luminances both rods and cones are active, providing a potential source of "trichromacy" in dichromatic marmosets, and there is some evidence that dichromatic marmosets can exploit this potential source of chromatic information (Freitag and Pessoa, 2012).

The polymorphic variation of red–green color vision in marmosets forms a natural model for understanding the impact of red–green color blindness on subsequent visual processing (Jacobs, 2008). As yet no anatomical correlates of color blindness have been found in the retina (Chan and Grünert, 1998; Chan et al., 2001; Jusuf et al., 2006a,b), thalamus, or primary visual cortex (Goodchild and Martin, 1998; Solomon, 2002). The presence of large numbers of dichromatic individuals should also make it possible to ask whether the introduction of novel photoreceptor opsins can be exploited by plasticity in subsequent neural representations, which may directly or indirectly model future treatments of photoreceptor degeneration (Mancuso et al., 2009). Indeed, intraocular injections of adeno-associated virus vectors can be used to convert marmoset ganglion cells and other inner retinal cell types into photosensitive cells, by expression of channelrhodopsins (Ivanova et al., 2010). This may offer an approach for development of treatments for blindness caused by retinal degenerative diseases.

#### **OTHER RETINAL NEURONS AND OUTPUT PATHWAYS**

Parallel pathways emerge in the output of cone photoreceptors, which in primates distribute their signals to at least nine different classes of bipolar cells (Boycott and Wässle, 1991; Chan et al., 2001). These in turn provide input to at least 15 morphological classes of retinal ganglion cell (Percival et al., 2009, 2011, 2013; Moritoh et al., 2013). In the marmoset, the peak ganglion cell

density is <sup>∼</sup>550,000 ganglion cells/mm2, so each foveal cone is sampled by at least two ganglion cells (Wilder et al., 1996). These parallel pathways within the retina, and their subsequent targets in the brain, are remarkably similar in macaques and marmosets. Criteria used for morphological classification of horizontal cells, bipolar cells, amacrine cells, and ganglion cells in macaques are generally just as suitable for classification of the same cells in marmosets (Ghosh et al., 1996; Chan et al., 1997, 2001; Chan and Grünert, 1998; Jusuf et al., 2004; Szmajda et al., 2008), providing that the smaller eye and retina of the marmoset are taken into account. Some differences in protein expression (assessed by antibody binding) are apparent, but these appear minor (Chan et al., 2001; Puller et al., 2014). Specifically, antibodies to recoverin stain flat midget bipolar cells in macaque but do not stain any bipolar cells in marmoset retina; antibodies to the carbohydrate epitope CD15 stain only DB6 cells in macaque retina but stain two populations of bipolar cells in marmoset (Andressen and Mai, 1997; Chan et al., 2001). It is not known if there are functional correlates of these differences in expression. Recent work has successfully developed organotypic tissue culture of the marmoset retina (Moritoh et al., 2013; Percival et al., 2014). This method gives a new complementary line of analyses of the retinal circuitry underlying parallel visual pathways.

As in all mammals studied to date, most ganglion cells in the marmoset retina can be classified as "ON-center" or "OFF-center" (Protti et al., 2014). A smaller number of ganglion cells respond well to both the onset and offset of light ("ON–OFF"). Retinal ganglion cells generally show classical center-surround receptive field organization, with a smaller excitatory center surrounded by a larger inhibitory surround. This center-surround organization

is already present in the bipolar cells that provide excitatory input to ganglion cells, and the surround of ganglion cells is likely augmented by amacrine cells in the inner retina (Protti et al.,2014).

Around 90% of the ganglion cells project to the lateral geniculate nucleus (LGN) of the thalamus (Jusuf et al., 2006b; Szmajda et al., 2008). The LGN of the marmoset has a basic laminar organization, which emerges before birth (Garey and de Courten, 1983). The size of the LGN increases rapidly after birth, without an increase in the number of neurons, and stabilizes at about 6 months of age (Fritschy and Garey, 1986b, 1988). Retinal input arrives mainly at two dorsal parvocellular layers and two ventral magnocellular layers, each receiving dominant input from either the contralateral or the ipsilateral eye. These layers are embedded in a matrix of smaller koniocellular neurons (**Figure 2**; Le Gros Clark, 1941; Kaas et al., 1978; Spatz, 1978; Solomon, 2002). In the marmoset koniocellular neurons are well segregated from the principal layers in two particular zones, one ventral to the magnocellular layers (K1), and one between the internal parvocellular and magnocellular layers (K3). This segregation has allowed targeting of koniocellular zones for electrophysiological recordings (see below) and anatomical tracing, so much of what we know about the koniocellular visual pathways in simian primates stems from work in marmoset.

Most retinal ganglion cells are of the midget class, and project to the parvocellular layers of the LGN (Goodchild et al., 1996; Gomes et al., 2005; Jusuf et al., 2006a). Within about 10◦ of the fovea, ONand OFF-type midget ganglion cells appear to get input from a single midget bipolar cell (Ghosh et al., 1996; Goodchild et al., 1996; Telkes et al., 2008), which in turn receive input from a single cone photoreceptor (Chan et al., 2001). Thus the midget-parvocellular

#### **FIGURE 2 |The two major retino-thalamic pathways in marmoset. (A)** Camera lucida drawings of representative midget (parvocellularpathway) and parasol (magnocellular-pathway) ganglion cells in marmoset retina, each located about 1 mm from the fovea (reproduced from Ghosh et al., 1996). **(B)** Photomicrograph of the LGN, showing the pairs of parvocellular (P) and magnocellular (M) layers; the dorsal most P layer and ventral most M layer get input from the contralateral eye; the internal layers get input from the ipsilateral eye. These layers are embedded in a matrix of koniocellular cells that lie between the principal layers, including two prominently segregated zones (K1, K3). Scale bar = 0.5 mm.

**(C)** peristimulus time histograms of the responses of representative OFF P- and M-cells to brief (0.2 s) decrements in light from a gray background. The P-cell shows sustained response, the M-cell shows transient response (reproduced from Cheong and Pietersen, 2014). Y-axis scale bars 50 impulses/s. Thick black bar shows the time and duration of the stimulus. **(D)** Spatial-frequency tuning of representative P- and M-cells for drifting achromatic gratings, modulated at 4 Hz (adapted from White et al., 2001). Y-axis scale bars 20 impulses/s. **(E)** Contrast response of representative P- and M-cells for drifting gratings of optimal spatial frequency (adapted from Cheong and Pietersen, 2014).

system provides a way in which the signals of individual cone photoreceptors located in and near the fovea can be passed largely independently to the LGN. Note, however, that while in macaque the midget bipolar cells contact single cones out to at least 8 mm (40◦), in marmosets the midget bipolar cells get convergent input from multiple cones at eccentricities above 1 mm (8◦; Wässle et al., 1994; Telkes et al., 2008). In addition the density of ganglion cells falls more rapidly with eccentricity in marmoset than macaque (Wilder et al., 1996).

The ON and OFF parasol ganglion cells form the next most populous class of ganglion cell; these draw on multiple diffuse bipolar cells (Chan et al., 2001; Gomes et al., 2005; Eriköz et al., 2008) and project to the magnocellular layers of the LGN (Szmajda et al., 2008). The number of bipolar cells, and thus cone photoreceptors, converging onto a single midget or parasol ganglion cell increases with distance from the fovea (Jusuf et al., 2006b; Telkes et al., 2008). Neurons in the parvocellular and magnocellular layers project to V1 (Solomon, 2002; Cheong and Pietersen, 2014) and there are about as many LGN neurons projecting to V1 as there are likely retinal afferents to the LGN (ca. 400,000; Fritschy and Garey, 1986b; Solomon, 2002), suggesting that there is limited mixing of retinal signals in the LGN. This is consistent with simultaneous recordings from nearby LGN cells, which show little evidence of common retinal input (Cheong et al., 2011).

One well established pathway through the koniocellular zones of the LGN is that formed by the small bistratified ganglion cell, which in the macaque retina is characterized by strong blue–yellow color sensitivity (Dacey and Lee, 1994). Anatomical work shows very similar retinal morphology and connectivity for a small bistratified ganglion cell type in the marmoset (Ghosh et al., 1996, 1997; Ghosh and Grünert, 1999), which projects to the koniocellular zones of the LGN, particularly K3 (Szmajda et al., 2008). As described below, recordings from the dorsal koniocellular zones in the marmoset LGN, particularly K3, show the presence of neurons with blue–yellow color sensitivity (Martin et al., 1997; White et al., 1998); these neurons can be antidromically activated by electrical stimulation of V1 (Cheong and Pietersen, 2014), to which many koniocellular LGN neurons project (Solomon, 2002). The characteristics of other retinal ganglion cells projecting to the koniocellular layers are less well defined, although for some, their retinal morphology and laminar projection is becoming clearer (Szmajda et al., 2008; Percival et al., 2013). Recent work suggests that the ventral koniocellular zone (K1) is a particular target of the narrow thorny ganglion cell class (Percival et al., 2014). Neurons in this region can project to extrastriate regions of the visual cortex (Warner et al., 2010), and this network potentially provides a direct route from the retina to extrastriate cortex, which mediates residual visual capabilities following lesions of V1 (Rodman et al., 1989; Rosa et al., 2000; Yu et al., 2013).

Anterograde labeling techniques show that there are substantial projections from the retina to non-geniculate thalamic areas including the pulvinar complex (Warner et al., 2010), pregeniculate nucleus (potentially homologous to the intrageniculate leaflet and ventral geniculate nucleus of rodents: Lima et al., 2012), and smaller projections to the midline and dorsomedial thalamic nuclei (Cavalcante et al., 2005; de Sousa et al., 2013). There

are also projections from the retina to the accessory optic system, including the medial terminal nucleus (Weber and Giolli, 1986). Retinal projections to the hypothalamus include the suprachiasmatic nucleus, as well as diffuse projections to several other regions (Costa et al., 1999). Systematic studies of the retinal projection to the superior colliculus, nucleus of the optic tract, and pretectum, among others, are lacking. The organization of ganglion cells that comprise these non-geniculate pathways has also not been clarified in the marmoset. Intrinsically photosensitive retinal ganglion cells (which express melanopsin) are morphologically similar in marmosets and macaques (Jusuf et al., 2007). Their central projections include the LGN (Szmajda et al., 2008), but other targets are possible.

# **FUNCTIONAL PROPERTIES OF NEURONS IN THE SUBCORTICAL VISUAL SYSTEM**

There is now a substantial body of work describing the functional properties of neurons in the retino-geniculate pathway, as we review below. Among subcortical areas other than the LGN neuronal recordings have only been reported from superficial layers of the superior colliculus (Tailby et al., 2012; see Bourne and Rosa, 2003a for a description of the laminar organization of this nucleus). Parvocellular, magnocellular, and koniocellular neurons are generally well segregated in the marmoset LGN (Kaas et al., 1978; Bourne and Rosa, 2003b), allowing correlation of functional properties with the anatomical position of recorded neurons. In particular, work in the marmoset suggests that the functional properties of neurons in the parvocellular and magnocellular layers are each relatively homogenous, whereas neurons in the koniocellular zones form a more heterogeneous population.

Extracellular recordings from the LGN, generally obtained under opiate anesthesia, show that the receptive fields of neurons are very similar to those in macaques (**Figure 2**; Kremers et al., 1997, 2001; Kremers and Weiss, 1997; Solomon et al., 1999; White et al., 2001; Solomon et al., 2002; Forte et al., 2005). Neurons in the parvocellular layers have small receptive fields, low contrast sensitivity, a generally linear contrast–response function and a sustained response to an effective stimulus. Neurons in the magnocellular layers have larger receptive fields, higher contrast sensitivity, saturating contrast response function and a transient response to an effective high contrast stimulus. Magnocellular cells show contrast adaptation, such that sensitivity drops during prolonged presentation of an effective stimulus (Camp et al., 2009, 2011). Magnocellular neurons also show presence of a strongly suppressive region surrounding the classical receptive field (Felisberti and Derrington, 2001; Solomon et al., 2002;Webb et al., 2002, 2005; Kilavik et al., 2003; Kremers et al., 2004). Neurons in the parvocellular layers are less susceptible to contrast adaptation, and show weaker suppressive surrounds.

Measurements with drifting gratings reveal that the receptive fields of parvocellular neurons in the parafovea are usually less than 0.1◦ in diameter, and can resolve greater than 10 cycles/degree (Kremers and Weiss, 1997; White et al., 2001; Martin et al., 2011). Magnocellular neurons have larger receptive fields; because they are very sensitive to contrast their spatial resolution can be as high as that of parvocellular neurons for low contrast stimuli (White et al., 2001). Among both parvocellular and magnocellular neurons, receptive field size increases with distance from the fovea and the response becomes more transient; however, at any given eccentricity, magnocellular neurons have larger receptive fields, shorter visual latencies, and more transient responses than parvocellular neurons (White et al., 2001; Solomon et al., 2002; Pietersen et al., 2014; see also Silveira and de Mello, 1998).

The presence of dichromatic and trichromatic individuals makes the marmoset a natural model to study normal red– green color vision, anomalous color vision and color-blindness. Recordings from parvocellular neurons in the LGN show that if an individual female expresses two photoreceptor opsins in the middle-long wavelength range (see above) then cone-opponent receptive fields can be identified, as long as the receptive fields are close to the fovea (Yeh et al., 1995; White et al., 1998; Blessing et al., 2004; Martin et al., 2011). The chromatic properties of these receptive fields are very similar to those of parvocellularpathway neurons in the macaque and there is no evidence that the presence of red–green color responses in trichromatic animals is associated with a change in the achromatic response properties of cells in the retino-geniculate pathway (Blessing et al., 2004; Victor et al., 2007; Martin et al., 2011). That achromatic signals are independent of chromatic signals in parvocellular cells is consistent with the idea that chromatic processing is achieved by mechanisms that are primarily concerned with spatial analysis (Ingling and Martinez-Uriegas, 1983; Paulus and Kröger-Paulus, 1983). Overall, however, the segregation of coneopponent inputs to center and surround of the receptive field is more pronounced in macaque than in marmoset (Buzás et al., 2006). This may reflect higher convergence of cone photoreceptors onto the receptive fields of ganglion cells outside of the fovea.

Neurons in koniocellular zones of the marmoset LGN show diverse response properties. Many respond well to achromatic stimuli (Solomon et al., 1999), and their receptive fields are generally larger than those of parvocellular and magnocellular neurons at the same eccentricity from the fovea (White et al., 2001). Some are "ON–OFF" (White et al., 2001; Solomon et al., 2010), some are suppressed by the presence of any stimulus (Solomon et al., 2010), and some are selective for orientation (Cheong et al., 2013). The most prominent functional characteristic is that many koniocellular neurons in K3 and K4 show strong functional input from short wavelength (S-) cones, responding well to an increase ("blue-ON"; Martin et al., 1997;White et al., 1998; Hashemi-Nezhad et al., 2008; Tailby et al., 2008, 2010) or decrease ("blue-OFF"; Szmajda et al., 2006; Tailby et al., 2008; Solomon et al., 2010) in S-cone activation. A small subset of neurons in and around the magnocellular layers shows highly non-linear spatial summation (White et al., 2001), although it remains unclear if these are a subset of magnocellular neurons, or part of a koniocellular pathway. Finally, koniocellular cells in the LGN show slow rhythms in spiking activity (Cheong et al., 2011). Spiking activity of nearby koniocellular cells waxes and wanes at the same time, and these slow rhythms appear to be correlated with changes in the EEG state as measured in the visual cortex. The meaning of this slow rhythm is unknown, and it is not known if the phenomenon is common to marmosets and macaques.

#### **PRIMARY VISUAL CORTEX (V1)**

### **STRUCTURE AND TOPOGRAPHIC ORGANIZATION**

V1 is the largest single area in the marmoset brain, with a surface area of approximately 200 mm2 in each hemisphere (Pessoa et al., 1992; Missler et al., 1993a; Fritsches and Rosa, 1996). Marmoset V1 is also very large in relative terms in comparison with that in other species of monkey, including the macaque (20% versus 10% of the total area of the neocortex; Rosa and Tweedale, 2005; Chaplin et al., 2013b). The retinotopic map found in V1 of the marmoset is very similar to that described for the macaque and other diurnal primates (Fritsches and Rosa, 1996; Schira et al., 2012; Chaplin et al., 2013a; **Figure 3**). The foveal representation is highly magnified, occupying ∼20% of the surface area, and about 60% of V1 is dedicated to the central 10◦ of the visual field (Chaplin et al., 2013a). The peak magnification factor near the representation of the center of the fovea has been estimated to be 4–5 mm/degree, about 40% of the equivalent value in the macaque (Van Essen et al., 1984; Dow et al., 1985), and this proportional relationship is maintained throughout the visual field. The representations of the upper and lower contralateral quadrants are nearly symmetrical in size. As in other primates (e.g., Silveira et al., 1989; Azzopardi and Cowey, 1993), the magnification factor follows the sampling density of ganglion cells, but detailed analysis show that representation of the foveal field in V1 greatly exceeds that expected based from the retinal ganglion cell density (Chaplin et al., 2013a). This magnification of central vision in V1 is likely due to greater divergence in the retino-geniculo-cortical pathways serving foveal vision, compared to those serving peripheral vision (Chaplin et al., 2013a).

The laminar organization of marmoset V1 (**Figure 4A**) is similar to that seen in other diurnal primates, as revealed by the distribution of Nissl stain, and several neurochemical markers (Gebhard et al., 1993; Spatz et al., 1994; Goodchild and Martin, 1998; Solomon, 2002; Bourne et al., 2007). Although the layers of V1 are fully formed at birth, many important developmental events occur postnatally, with marked changes particularly within the first 3 months (Missler et al., 1993a,b; Spatz et al., 1994; Bourne et al., 2005; Fonta et al., 2005; Ribic et al., 2011). The reader should note that some studies (e.g., Spatz, 1975a; Vogt Weisenhorn et al., 1995; Elston et al., 1996, 1999; Solomon, 2002; Bourne and Rosa, 2003b) have employed a nomenclature of cortical layers in V1 that differs from the more commonly used Brodmann scheme (Hassler, 1966; see Casagrande and Kaas, 1994 for a discussion of the relative merits of the two schemes). The main difference to keep in mind is that in the Hassler scheme the layers IVa and IVb of the Brodmann nomenclature are considered subdivisions of layer III.

Relatively little is known about the distribution of cell types and interlaminar connections in marmoset V1. The few studies that have addressed neuronal morphology in this area have concentrated primarily on dendritic architecture, with respect to columnar domains (Malach, 1992), projection patterns (Vogt Weisenhorn et al., 1995; Elston and Rosa, 2006) or postnatal development (Fritschy and Garey, 1986a; Oga et al., 2013). One possible point of interest is the fact that most, if not all layer IVb cells, which form the projection to the middle temporal area (MT), have an

unambiguously pyramidal morphology (Vogt Weisenhorn et al., 1995; Elston and Rosa, 2006), as opposed to spiny multipolar in the macaque (Yabuta et al., 2001; see, however, Elston and Rosa, 1997).

#### **CONNECTIONS OF V1**

Perhaps surprisingly, our knowledge of the afferent connections of V1 in the marmoset still has many gaps. As expected from studies in other simian primates, anterograde tract tracing has shown strong projections from the LGN to layers IVcα (IVα in Hassler's nomenclature) and IVcβ (IVβ), as well as a weaker projection to layer VI, and patchy projections to supragranular layers (Spatz, 1979; DeBruyn and Casagrande, 1981). Analysis of retrograde tracing shows that the projection to supragranular layers arises primarily from koniocellular LGN neurons, whereas parvocellular and magnocellular LGN neurons project primarily to layers IV and VI (Solomon, 2002). A projection from the lateral pulvinar complex to V1 has been demonstrated, but its laminar targets have not been determined (Dick et al., 1991). Other subcortical projections to marmoset V1 have not yet been investigated in any detail.

Substantially more research is also needed on the issue of the intrinsic connectivity of V1 in the marmoset. Knowledge of horizontal connections would specify how signals are pooled across visual space andfunctional domains (e.g., orientation columns): to date, we know only that periodic horizontal connections have been shown between neurons in supragranular layers (Solomon, 2002), which have similar periodicity to the distribution of cytochrome oxidase "blobs." Knowledge of intralaminar connections would help specify the flow of information through V1 (e.g., Douglas and Martin, 1991), but the distribution of interlaminar connections has remained virtually unexplored, with the exception of a demonstration of projections from layerVI to the superficial layers (I and II; Divac et al., 1987).

Additional inputs to V1 arise in "feedback" connections from various other cortical areas. These connections originate primarily from infragranular layers in those areas (e.g., Spatz, 1977; Rosa and Tweedale, 2000), but their precise laminar targets in V1 have not been determined. Feedback projections originate mainly from other topographically organized areas, but also include smaller projections from subdivisions of the caudal parietal and inferior temporal cortices (Rosa and Tweedale, 2000; Lyon and Kaas,

**(A)** Photomicrographs of neighboring coronal sections through V1, showing the laminar structure as revealed by staining for cytochrome oxidase (left) and Nissl substance (right). Scale bar = 0.5 mm. Reproduced from Solomon (2002). The terminology of layers follows that defined by Brodmann. **(B)** Tuning for grating orientation and direction in two representative V1 neurons. Left: orientation selective neuron, responding equally well to gratings of appropriate orientation, in both directions of drift (adapted from Cheong et al., 2013). Right: direction selective neuron (adapted from Tinsley et al., 2003). **(C)** Spatial frequency tuning of representative parafoveal V1 neuron (adapted from Yu and Rosa, 2014); the

Rosa, 2014): response is suppressed in large sizes, showing presence of extraclassical receptive field modulation, or suppressive surround. Scale bars in **(B–D)** show 20 impulses/s. **(E)** Distribution of orientation selectivity amongst V1 neurons in marmoset. The abscissa shows an orientation selectivity index based on the circular variance (higher numbers indicate poorer tuning); the ordinate shows half-width at halfheight of a von Mises function fit to the tuning curve. The inset at right shows orientation tuning of example neurons that are indicated in the plot. Adapted from Yu and Rosa (2014).

2001). No study has mapped the entire pattern of extrastriate projections to V1, but injections into the central visual field representation (Rosa and Tweedale, 2000; Lyon and Kaas, 2001) label neuronal projections from V2, the ventrolateral posterior (VLP) and ventrolateral anterior (VLA) areas (likely homologs of areas V3 and V4 in the macaque; Rosa and Manger, 2005), the dorsomedial area, DM (V6; Rosa et al., 2013), MT (V5), and the middle temporal crescent [MTC; V4 transitional (V4t)]. Less dense, but clear projections were also detected from the dorsoanterior area, DA (a likely homolog of V3a; Rosa and Schmid, 1995) and other areas forming the occipitoparietal transition, as well as the caudal inferior temporal cortex (ITc). Overall, this pattern conforms to that described by studies using fluorescent tracers in the macaque (Perkel et al., 1986; Rockland and Van Hoesen, 1994; Markov et al., 2014).

Knowledge about the projection of V1 to extrastriate cortex in the marmoset comes mainly from retrograde tracer injections in extrastriate areas, which suggest that, as in other primates, V1 sends reciprocal projections to most, if not all areas from which it receives afferents (e.g., Spatz, 1977; Krubitzer and Kaas, 1990; Lyon and Kaas, 2001; Rosa et al., 2005, 2009; Palmer and Rosa, 2006a,b). Projections to V2 arise throughout the upper layers of V1 (from layer II to layer IVb), but there is also a small projection from layer VI. As in macaques, layer IVb (IIIc in the Hassler nomenclature) contains the majority of neurons that project to thick cytochrome oxidase"stripes"in areaV2 (Federer et al., 2009, 2013). In addition, layer IVb is also the primary source of V1 input to areas MT (Spatz, 1977; Palmer and Rosa, 2006a) and DM (Rosa et al., 2009; Jeffs et al., 2013); however, the morphology of cells projecting to MT and DM differs in detail (Vogt Weisenhorn et al., 1995).

Finally, callosal fibers provide interhemispheric connections between left and right V1, which may be important in linking the representations of the left and right visual hemifields (Choudhury et al., 1965). These callosal connections appear to be more extensive than those reported in the macaque (Cusick et al., 1984; Spatz and Kunz, 1984; Rosa and Manger, 2005). Most callosal neurons are found along the border between V1 and V2 (the representation of the vertical meridian), but can also be found more than 1 mm within V1.

#### **COLUMNAR ORGANIZATION OF V1**

The presence or absence of ocular dominance columns (ODCs) in marmosets remains a matter of interest. Early work suggested that marmosets lack ODCs in adulthood (Spatz, 1979, 1989; DeBruyn and Casagrande, 1981), although they can be transiently induced by silencing the input from one eye (Markstahler et al., 1998). Functional measurements in adults also suggest weak segregation of ocular dominance (Sengpiel et al., 1996; Roe et al., 2005). It is likely that ODCs form transiently during development (Spatz, 1989; Chappert-Piquemal et al., 2001): monocular lid suture during development can stabilize these ODCs into adulthood (Sengpiel et al., 1996). Transient or unstable expression of ODCs in marmosets is consistent with observations in some other New World monkeys, where the pattern and presence of ODCs varies from animal to animal (Adams and Horton, 2003). This variability may suggest that expression of ODCs is not necessary, or does not advantage any particular visual function; rather, the segregation of ocular inputs observed in adults of some primate species may simply reflect"leftovers" of a stochastic developmental process (Horton and Adams, 2005). Strong evidence for functional ODCs, in electrophysiological or optical imaging experiments, has not been reported in any individual marmoset (Sengpiel et al., 1996; Schiessl and McLoughlin, 2003; Roe et al., 2005).

As in cats and macaques, but unlike in rodents, V1 in the marmoset shows a columnar organization of orientation preference. Optical imaging reveals regions of relatively homogenous orientation preference ("iso-orientation domains") interspersed with regions of rapid change ("pinwheels"; Liu and Pettigrew, 2003; Roe et al., 2005; McLoughlin and Schiessl, 2006; Buzás et al., 2008; Valverde Salzmann et al., 2011).

The upper layers of marmoset V1 are also characterized by patchy ("blob" like) distribution of staining for cytochrome oxidase, a marker of metabolic activity, and these blobs align with the axon terminals of koniocellular LGN neurons (Solomon, 2002; Roe et al., 2005; Federer et al., 2009; Valverde Salzmann

et al., 2012). Neurons in blobs are often thought to be important for color vision, but there is no difference in the distribution of blobs in dichromatic and trichromatic marmosets (Solomon, 2002). Optical imaging studies of spatial organization of chromatic responses in the marmoset have found no spatial organization of the blue–yellow chromatic response or the achromatic response across the cortical surface (Roe et al., 2005; Buzás et al., 2008; Valverde Salzmann et al., 2012). However, spatial non-uniformity has been identified in trichromatic animals, such that the "red–green" chromatic response is more likely to be found in cytochrome-oxidase "blobs" (Valverde Salzmann et al., 2012). Finally, whereas in Old World macaques and New World capuchin monkeys blobs lie at the center of ODCs (Livingstone and Hubel, 1984; Rosa et al., 1991), in marmosets, where such columns seem largely absent, blobs appear to form a hexagonal array (Solomon, 2002).

#### **FUNCTIONAL PROPERTIES OF V1 NEURONS**

Although the literature on single unit response properties in the marmoset visual cortex is still small relative to that in the macaque, there has been substantial progress, particularly over the last decade. To date, analyses of the response properties in V1 of the marmoset have been made under either barbiturate (Sengpiel et al., 1996; Roe et al., 2005; McLoughlin and Schiessl, 2006) or, more commonly, opiate anesthesia. Quantitative measurements from visual neurons in awake marmosets are not yet available, but the recent demonstration of the animals' ability to maintain fixation and perform visual tasks under head fixation (Mitchell et al., 2014), combined with the success of marmosets for singleunit recordings in other sensory systems (Lu et al., 2001; see Wang et al., 2008 for review), suggests that this situation will change substantially in the coming years. As we show below, there is little to differentiate the functional properties of neurons in V1 of marmosets and other primates.

The spatial response properties of marmoset V1 neurons strongly resemble those described in the macaque (**Figures 4B–D**). The degree of orientation selectivity varies between neurons (**Figure 4E**), but throughout V1 the majority of neurons (∼80%) show clear orientation preference. Quantitative analyses show that the orientation bandwidth (half width at half height) is on average 22–29◦ (Sengpiel et al., 1996; Bourne et al., 2002; Forte et al., 2005; Zinke et al., 2006; Cheong et al., 2013; Yu and Rosa, 2014). Some neurons in marmoset V1 show "simple" responses to drifting gratings, with the response modulated at the temporal frequency of the drift, and consistent with spatially offset ON and OFF subregions. The remainder shows "complex" responses to drifting gratings, with an increase in the mean rate but no modulation of discharge. In some studies the prevalence of simple cells is 5–15% (Sengpiel et al., 1996; Yu and Rosa, 2014); other work finds approximately equal prevalence of simple and complex cells (Webb et al., 2003; Forte et al., 2005; Nowak and Barone, 2009). The reason for this discrepancy is not clear, and may be related to specific conditions of the tests conducted (Crowder et al., 2007); the latter estimates are nearer those found in macaques.

The preferred spatial frequency (**Figure 4C**) among V1 neurons depends strongly on eccentricity from the fovea: preferred spatial frequency is ca. 1.1 cycles/degree within 5◦ of the fovea, and 0.14 cycles/degree at eccentricities beyond 50◦ (Sengpiel et al., 1996; Forte et al., 2005; Yu et al., 2010). At least for receptive fields in parafoveal visual space, the peak spatial frequency of V1 cells is comparable to that of marmoset LGN cells (Forte et al., 2005) and is about half that in macaque V1 (Foster et al., 1985), as expected from the smaller eye of the marmoset. Neurons in V1 are less responsive to low spatial frequencies and uniform fields than neurons in the LGN, and show correspondingly tighter bandwidth for spatial frequency (Forte et al., 2005; Martin et al., 2011).

Qualitative and quantitative analyses reveal that direction selectivity (**Figure 4B**), in response to either moving bars or drifting gratings, is evident in approximately 20% of marmoset V1 neurons (Sengpiel et al., 1996; Bourne et al., 2002; Yu and Rosa, 2014). These neurons are more likely to be found in the infragranular layers than the supragranular layers, and are absent from the granular layers (IVcα and IVcβ; Yu and Rosa, 2014). Most neurons are generally sensitive to motion orthogonal to the preferred orientation, and are incapable of extracting motion direction independent of contour orientation (Tinsley et al., 2003); the signals of some broadly tuned neurons are less dependent on contour orientation, and may be an early stage in complex motion analysis (Tinsley et al., 2003; see also Barraclough et al., 2006; Guo et al., 2006). On average, neurons in marmoset V1 prefer temporal frequencies of ca. 4 Hz throughout the visual field. In the central visual field, the preferred temporal frequency is generally independent of the spatial frequency, suggesting that the receptive fields of most neurons are not extracting a measure of retinal image speed. This may be different from the case in the macaque, where speed sensitivity in the corresponding region of V1 is apparent in a subpopulation of complex cells (Priebe et al., 2006). The proportion of neurons showing speed sensitivity increases in the peripheral visual field representation of marmoset V1 (Yu et al., 2010).

Neurons in marmoset V1 show a broad distribution of contrast sensitivity – some are sensitive to very low contrasts and others only respond at high contrast. Many neurons in marmoset V1 display a saturating contrast response function (Webb et al., 2003), which is usually taken as evidence for some form of contrast gain control. As in the macaque, other evidence for gain control is found in around half of V1 neurons, which show the presence of suppressive surrounds similar to those found in the LGN (**Figure 4D**). On average, making the stimulus larger than the preferred size reduces the response by about 30% (Webb et al., 2003; Bourne et al., 2004; Yu and Rosa, 2014). The large size of these suppressive surrounds makes many neurons selective for the size of a textured stimulus – the preferred size depends on eccentricity from the fovea, with a diameter of 1.4◦ in the parafovea, and about 10◦ at eccentricities beyond 50◦ (Webb et al., 2003; Yu and Rosa, 2014). Unlike in the LGN these surrounds can be orientation tuned: they are most evident during the presentation of gratings or contours that are aligned to the preferred orientation of the classical receptive field (Webb et al., 2003).

The majority of neurons in marmosetV1 can be driven by stimulation of either eye (Sengpiel et al., 1996), including those in layer IV. The percentage of binocular cells appears higher than that in macaques and other species of New World primate that show welldefined ODCs (Rosa et al., 1992). No study has yet investigated the sensitivity of neurons in the marmoset visual cortex to binocular disparity. The interocular distance of the marmoset is much smaller than that of larger primates; the range of depths that can be usefully discriminated from binocular disparity should be correspondingly smaller, but no behavioral or physiological evidence is currently available. Knowledge of disparity sensitivity early in the visual pathway will be necessary to understand mechanisms of depth perception in the marmoset.

There has been limited investigation of the chromatic response of neurons in marmoset V1. No study has characterized the response of V1 neurons to modulation along the red–green dimension of color space, which is present only in trichromatic animals; some work has investigated the response to blue–yellow modulation (Buzás et al., 2008; Hashemi-Nezhad et al., 2008). As in macaques, many neurons respond weakly to blue–yellow color but strong responses to blue–yellow color (that is, sensitivity similar to that of blue–yellow color-responsive cells in the LGN) are rare.

Finally, some of the spiking variability of cortical neurons is shared with other cortical neurons, as evidenced by correlations in the activity ("noise correlations") of pairs of neurons. In V1 of marmoset, as in macaque, these noise correlations are dominated by short time-scales (<1 s), are slightly higher in pairs of neurons with similar functional characteristics, and extend over long distances (>1 mm; Cheong et al., 2011; Solomon et al., 2014).

# **SECOND VISUAL AREA, V2**

#### **STRUCTURE AND TOPOGRAPHIC ORGANIZATION**

In common with other simian primates, marmoset area V2 forms a continuous belt that wraps around V1, except at the rostral end of the calcarine sulcus, where area prostriata is located (Rosa et al., 1997; **Figure 1**). The vertical meridian of the visual field is represented along the border with V1; the horizontal meridian is represented along the anterior border, where V2 abuts areas of the "third visual complex" (Jeffs et al., 2009; Rosa et al., 2013). Following the topology of V1, the lower visual field is represented in dorsal V2, and the upper visual field is represented in ventral V2. Whereas in the macaque V2 is nearly as large as V1 (Olavarria and Van Essen, 1997), in the marmoset it is only half as large, with a surface area of about 100 mm<sup>2</sup> in each hemisphere (Rosa, 2002). The representation of the central visual field appears emphasized in V2, relative to V1, with approximately half of the surface area of V2 dedicated to the representation of the central 5◦ (Rosa et al., 1997).

#### **CONNECTIONS OF V2**

There have been no detailed studies of the pattern of subcortical projections to marmoset V2, although early work confirmed that, as in most primates, thalamic afferents largely originate in the inferior and lateral subdivisions of the pulvinar complex (Dick et al., 1991), and are topographically organized (Kaske et al., 1991). In addition to the V1 input described above, major cortical afferents to V2 originate in the third visual complex (DM/V6 and VLP/V3), the fourth visual area (VLA/V4), the motion-sensitive areas MT and MTC, and other dorsal extrastriate areas (in particular, the

dorsoanterior area, DA/V3a; Jeffs et al., 2009, 2013). These inputs are topographically organized. Much smaller projections to V2 arise from areas in the occipitoparietal transition (Jeffs et al., 2013), likely extending into the lateral intraparietal area (LIP), the fundus of the superior temporal area (FST), the caudal ITc (ITc/TEO), and the prefrontal cortex (primarily, area 8aV, which likely includes the frontal eye field; Burman et al., 2006; Reser et al., 2013).

#### **COLUMNAR ORGANIZATION OF V2**

Like other simian primates (e.g., Livingstone and Hubel, 1984), marmoset V2 displays well-defined, stripe-like modular compartments, which are best visualized by stains for cytochrome oxidase (Rosa et al., 1997; Lyon and Kaas, 2001; Roe et al., 2005; Jeffs et al., 2009). Cytochrome oxidase-rich stripes can be further classified as thin or thick, which alternate with cytochrome oxidase-poor (or "pale") interstripes. Each point in the visual field is sampled by a thin stripe, a thick stripe, and a pair of interstripes (Rosa et al., 1997). These stripes can also be defined by their inputs from V1. Neurons within V1 "blobs" project to thin stripes in V2, those at the borders of blobs project to the thick stripes, and those in the center of "interblob" regions project to interstripes. This last projection can be further distinguished, based on the laminar location of the V1 afferents, into parallel streams that target alternating interstripes (Federer et al., 2009). Specifically, "palelateral" interstripes receive 10% of their V1 input from layer IVb, while the"pale-medial"interstripes receive no IVb input; this finding that has recently been confirmed in macaque (Federer et al., 2013). Some of the details of connectivity between V1 and V2 may differ between marmosets and macaques, but the functional organization of this system in macaques remains a topic of ongoing debate (e.g., Livingstone and Hubel, 1984; Xiao and Felleman, 2004; Sincich et al., 2010; Federer et al., 2013).

#### **FUNCTIONAL PROPERTIES OF V2 NEURONS**

Functional work on marmoset area V2 has been limited. The receptive field diameter of V2 neurons is 2–3 times greater than that in V1 (Rosa et al., 1997), but the neurons show a similar range of spatial and temporal properties, including orientation and direction selectivity, to those in V1 (Lui et al., 2005; Barraclough et al., 2006). The relationship with cytochrome oxidase modules has not been studied in detail, although one optical imaging study shows that regions with poor selectivity for orientation are coincident with "thin" cytochrome oxidase stripes, whereas regions with strong orientation selectivity coincide with the interstripes (Roe et al., 2005) and thick stripes (Federer et al., 2009). As in other primates (Malach et al., 1994), the imaged orientation domains in marmoset V2 are considerably larger than those in V1 (Liu and Pettigrew, 2003; McLoughlin and Schiessl, 2006).

#### **AREAS PROSTRIATA AND 23V**

Area prostriata is a narrow (1–2 mm wide) belt of cortex that separates the representation of the far peripheral visual field inV1 from the hippocampal formation, near the rostral tip of the calcarine sulcus. Area prostriata is distinct from V2, with low myelination and a poorly developed layer IV. Similar to the macaque, in marmosets prostriata provides input to the peripheral representations of several visual areas, as well as to many other sensory and association areas, extending as far as the frontal pole (Palmer and Rosa, 2006b; Burman et al., 2011; Reser et al., 2013; see Yu et al., 2012 for review). Area prostriata is adjoined by area 23V (23 ventral), a subdivision of the posterior cingulate cortex with which it shares many connections, including projections to the peripheral representations of MT and the medial superior temporal area (MST; Palmer and Rosa,2006b) andfrontal visual association areas (Reser et al., 2013). Based on its location relative to V2, area 23V seems to correspond the scene-selective area of the retrosplenial cortex, described by Nasr et al. (2011) in other species.

Traditionally regarded as a high-order "limbic" visual association area, recent work in marmoset (Yu et al., 2012) suggests that area prostriata may be part of a primordial visual pathway parallel to that coursing through V1, which enables rapid response to events in peripheral vision and multisensory integration (Smiley and Falchier, 2009; Rockland, 2012). The subcortical afferents to this region are unclear, but neurons in area prostriata show short latency responses and broad tuning along the dimensions of orientation, direction, and spatial and temporal frequency; that is, their functional properties resemble those of neurons at early stages of visual processing. The receptive fields are, however, enormous (30–50◦ in diameter), and are concentrated in the peripheral visual field (Yu et al., 2012).

#### **"THIRD TIER" VISUAL CORTEX (AREAS DM, VLP, AND 19M)**

The third tier visual areas are those that lie adjacent to the anterior border of V2, and in the marmoset these are exposed on the surface of the brain, rendering them more readily accessible to modern experimental techniques including multielectrode array recording, optogenetics, and imaging. Electrophysiological studies demonstrate at least two areas, each forming a near complete representation of the contralateral hemifield: areas DM (V6) and VLP (V3; **Figure 5**). Fragmentary evidence suggests the existence of at least one additional area, near the midline (19M; **Figure 1**). DM and VLP may also be separated by an anatomically distinct subdivision, the dorsointermediate area (DI; Krubitzer and Kaas, 1990; Rosa and Schmid, 1995; see **Figure 1**), about which virtually nothing is known.

# **AREA DM**

Area DM contains representations of the upper and lower visual fields, both of which lie adjacent to V2 (Rosa et al., 2005, 2013; Jeffs et al., 2013). At first sight, this organization seems to differ from that described in the corresponding region in the macaque brain, in which the dorsal cortex that is anterior to V2 is usually thought to contain only the lower visual field representation of area V3 (Gattass et al., 1988). However, anatomical evidence reveals strong similarities between marmoset DM and macaque area V6 [Rosa and Tweedale, 2001; Rosa et al., 2013; note that V6 overlaps partially with the"parietooccipital area" (PO) of other nomenclatures; Neuenschwander et al., 1994; Galletti et al., 2005]. Like macaque V6 marmoset DM is heavily myelinated, a characteristic which allows it to be easily distinguished from V2 and other subdivisions of the third tier complex, and obtains its predominant input from layer IVb neurons in V1; smaller projections arise in more superficial layers of V1 (Krubitzer and Kaas, 1993;Vogt Weisenhorn et al.,

**FIGURE 5 | Schematic organization of visual cortex in the marmoset.** "Unfolded" representation prepared using the technique of Van Essen and Maunsell (1980). Discontinuities in the representation, introduced to minimize distortion, are indicated by the arrows. Continuous black lines indicate the main cortical folds, including the lips and fundi of the lateral and calcarine sulci, the fundi of the superior temporal and intraparietal dimples, and the limits of the medial, ventral, and orbital surfaces. The inset on the lower left shows a lateral view of the intact marmoset brain, with boundaries of some visual areas indicated to help orientation. Colors indicate visual areas that have been mapped using electrophysiological

techniques; other areas are simply indicated by labels in their approximate location. For abbreviations, see legend of **Figure 1**. The light gray dashed outlines indicate the borders of the primary auditory (A1), motor (M1), and somatosensory (S1) areas, for orientation. The topographic organization of visual areas is indicated according to the following symbols: white squares, representations of the vertical meridian (VM); black circles, representations of the horizontal meridian (HM); "+," representations of upper contralateral quadrant; "−," representations of the lower contralateral quadrant; red dashed lines, isoeccentricity lines (numbers indicate eccentricity from the fovea, in degrees).

1995; Rosa et al., 2009; Jeffs et al., 2013). In addition, both marmoset DM and macaque V6 show a relatively large representation of the peripheral visual field, in comparison with most other visual areas.

In addition to the V1 projections, most cortical afferents to marmoset DM originate in extrastriate areas, including VLP and VLA, motion-sensitive areas MT, MTC, and MST, occipitoparietal transition areas [DA and PPM (medial posterior parietal area); see below], and other dorsal areas of the caudal posterior parietal cortex (in particular, LIP). Smaller cortical projections from the granular frontal cortex (primarily 8aV), rostral premotor cortex, ventral parietal cortex (primarily cytoarchitectural fields OPt and PG) and parahippocampal cortex (primarily TF) have also been described (Rosa et al., 2009; Jeffs et al., 2013; Burman et al., 2014b). Finally, subcortical projections from the pulvinar complex, centrolateral and centromedial thalamic nuclei, and claustrum, have been documented (Dick et al., 1991; Rosa et al., 2009).

The receptive fields of neurons in DM are about twice the diameter of those in V2 (Rosa and Schmid, 1995), although many neurons show larger, facilitatory, fields, suggesting a role in integrating contours across large regions of the visual field (Lui et al., 2006, 2013). Most neurons are orientation selective, and include some with remarkably narrow orientation tuning (Lui et al., 2006). Direction selectivity is observed in a minority of the neurons (Rosa and Schmid, 1995; Lui et al., 2006), although this deserves more careful study, particularly with respect to the peripheral visual field representation. These properties contrast sharply with those observed in MT, another densely myelinated area that receives projections from layer IVb of V1 (Lui et al., 2013).

#### **AREA VLP**

Area VLP, which lies lateral to DM, is the likely homolog of the third visual area (V3, or area 19) found in most mammals (Rosa and Manger, 2005). In VLP the lower visual field is represented on the dorsolateral cortical surface, and the upper visual field on the tentorial surface (Rosa and Tweedale, 2000; Jeffs et al., 2013). Over half of VLP is devoted to the central 5◦ of the visual field, and there is little if any representation beyond 50◦. The myeloarchitecture of VLP is similar to that of "ventral V3" (also known as the ventral posterior area, VP) in macaque and capuchin monkeys (Gattass et al., 1988; Rosa et al., 1993). Also similar to V3, the anterior border of VLP is formed by a representation of the vertical meridian of the visual field. VLP sends and receives topographically organized projections from the central visual field representations of areas V1 (Rosa and Tweedale, 2000; Lyon and Kaas, 2001),V2 (Jeffs et al., 2009, 2013), MT (Palmer and Rosa, 2006a,b), and DM (Rosa et al., 2009), but the full pattern of connections is yet to be determined. Quantitative measurements of response properties are not yet available, but direction selectivity is rare. Most cells prefer slow moving stimuli, and receptive fields are not much larger than those in area V2 (ca. 1◦ in diameter near the center of the fovea; Rosa and Tweedale, 2000). Preliminary evidence based on functional MRI suggests that VLP is closely affiliated with the ventral stream of visual processing (Ciuchta et al., 2013).

#### **Area 19M**

Adjacent to the representation of the lower visual quadrant periphery of V2 (Rosa and Schmid, 1995), along the midline of the cortex, is area 19M (also named the "parietooccipital medial area," POm). Area 19M lacks the heavy myelination that characterizes the adjacent DM, but shares with this area connections with MT and the frontal oculomotor fields (Palmer and Rosa, 2006b; Reser et al., 2013). The visual field representation encompasses the upper and lower visual fields, and the representation of the peripheral visual field seems expanded relative to that of V1 and V2. Area 19M is likely to overlap in part with the "medial visual area" described in the owl monkey (Allman and Kaas, 1976).

# **MIDDLE TEMPORAL AREA, MT**

#### **STRUCTURE AND TOPOGRAPHIC ORGANIZATION**

Area MT, which as in other primates is characterized by dense myelination (Spatz, 1977; Rosa and Elston, 1998; Bourne et al., 2007; Bock et al., 2009), lies posterior to the lateral sulcus (**Figures 1** and **6**). Marmosets (and probably other species of

#### **FIGURE 6 |The middle temporal area (MT) of marmoset.**

**(A)** Photomicrograph of adjacent coronal sections, showing the histological distinctiveness of area MT revealed by myelin (left) and Nissl (right) stains. MT stands out as heavily myelinated in comparison with most cortical areas. Although the boundaries are less obvious, MT can also be identified in Nissl stained sections by the thinner and denser layer IV, and by the thicker layer VI, in comparison with adjacent areas. Scale bar = 1 mm. **(B)** Direction tuning for gratings and plaids in two representative directions elective MT neurons. The left panel illustrates the responses of a "component-cell," which shows bi-lobed tuning for plaids, as if it responded to the individual gratings that comprise the plaid. The right panel shows the responses of a "pattern-cell," which has similar direction tuning to gratings and plaids. **(C)** Spatial frequency tuning of a representative "component cell" in the peripheral representation of MT; the response to low spatial frequencies is neglible. **(D)** Tuning for the size of a patch of drifting grating, of optimal spatial frequency, showing large receptive field size of neurons in area MT. Scale bars in **B** show 20 impulses/s. **(B–D)** adapted from Solomon et al. (2011).

Callitrichidae) are the only simian primates in whichMT is entirely exposed on the surface of the cortex, creating unique opportunities for studies using imaging, intracellular or multielectrode array analyses. The size of MT in the marmoset is approximately 13 mm<sup>2</sup> in each hemisphere, making it about 6.5% the size of V1; these estimates are similar to those in other simian primates (Pessoa et al., 1992; Rosa, 2002). The representation of the central visual field is less emphasized than in V1: whereas the central 5◦ around the fixation point project to about 40% of the volume of V1, the corresponding region only occupies 20% of MT (Rosa and Elston, 1998).

#### **CONNECTIONS OF MT**

The main thalamic afferents to MT originate in the inferior subdivision of the pulvinar complex, with smaller inputs from koniocellular layers K1 and K3 in the LGN (Dick et al., 1991; Warner et al., 2010, 2012), and intralaminar nuclei (Spatz, 1975b). Sparse projections also arise from the claustrum (Spatz, 1975b).

In addition to the V1 input described above, which primarily projects to lower layer III and upper layer IV of area MT (Spatz, 1977), major cortical afferents to MT originate in V2, in surrounding motion-sensitive areas (MTC,MST, and the fundus of superior temporal sulcus area, FST; Krubitzer and Kaas, 1990), and in other dorsal extrastriate cortex areas (in particular, DM, DA, 19M, and PPM). In comparison, input from ventral stream areas is minor (Palmer and Rosa, 2006a). Additional inputs arise in the posterior parietal cortex (primarily LIP), prefrontal cortex (primarily area 8aV; Reser et al., 2013), and parahippocampal cortex (TF). For quantitative analysis of these and other cortical projections, the reader is directed to Palmer and Rosa (2006a,b). Projections from area MT include a strong projection onto V1 (Spatz, 1977) and most, if not all areas from which it receives afferents (Krubitzer and Kaas, 1990). The pattern of intrinsic connections within marmoset area MT has not yet been explored.

#### **FUNCTIONAL PROPERTIES OF MT NEURONS**

As in all primates so far studied, the connections and functional properties of area MT in marmoset are consistent with a role in motion analysis and the control of eye movements. The response properties of marmoset MT neurons strongly resemble those described in the macaque. The degree of direction selectivity varies between neurons, but throughout MT the majority of neurons (80–90%) show clear direction selectivity (**Figure 6B**), whether the stimulus is a moving grating, bar or dot field (Rosa and Elston, 1998; Solomon et al., 2011; Lui et al., 2013). Among these neurons there is a bias for motion radial from the fovea, particularly in the representation of the peripheral visual field (Rosa and Elston, 1998). Quantitative analyses show tuning bandwidth (half width at half height) of directionally selective neurons is around 33◦ for drifting gratings (Solomon et al., 2011) and slightly broader for moving bars, kinetic contours or dot fields (Solomon et al., 2011; Lui et al., 2012, 2013).

Direction-selective neurons in area MT of the macaque are distinguished from those in V1 by their capacity to signal motion direction independently of contour orientation. This is most commonly revealed by comparing responses to drifting gratings, and plaids formed by the superposition of two such gratings (**Figure 6B**). Some neurons respond to plaids with bimodal direction tuning curves, as if they "see" each of the components of the plaid ("component cells"), and others respond to the overall motion direction of the plaid and not that of its components ("pattern cells"); other neurons respond in an intermediate way. In both

qualitative and quantitative aspects the signatures of this motion integration are the same in MT of marmosets and macaques (Solomon et al., 2011; McDonald et al., 2014).

Receptive field sizes in area MT are much larger than those in V1 (**Figure 6D**), and as in the macaque, the average receptive field diameter is similar to the receptive field eccentricity (i.e., a receptive field centered at 10◦ eccentricity will be about 10◦ wide). Each point in the visual field projects onto 1–1.5 mm of the surface of area MT (Rosa and Elston, 1998). Most neurons in marmoset MT show a "complex" response to drifting gratings, with an unmodulated increase in the mean firing rate (e.g., Solomon et al., 2011). The preferred spatial frequency (**Figure 6C**) depends weakly on eccentricity from the fovea: it is about 0.2 cycles/degree within 5◦ of the fovea, and 0.1 cycles/degree at eccentricities beyond 30◦ (Lui et al., 2007a). Neurons are generally insensitive to modulation of uniform fields, but show broad bandwidth for spatial frequency (Lui et al., 2007a; Solomon et al., 2011). The preferred temporal frequency is in the range 4–12 Hz, increasing in the peripheral field. In about one-third of neurons, the preferred temporal frequency depends on the spatial frequency, suggesting that the receptive fields of these neurons are extracting a measure of retinal image speed (Lui et al., 2007a). Responses to drifting dot-fields show that the speed tuning of neurons can appear low-pass, band-pass, or high-pass (Solomon et al., 2011).

Neurons in marmoset MT show very high contrast sensitivity, and a saturating contrast–response function, with the contrast to achieve a half-maximum response ca. 0.13 (Solomon et al., 2011). Many neurons also show the presence of "suppressive surrounds." On average, making a grating patch larger than the preferred size (generally similar to receptive field size) reduces the response by 40–50% (Lui et al., 2007b; Solomon et al., 2011). The inhibitory surrounds of marmosetMT neurons are primarily aligned with the receptive field length (i.e., perpendicular to the optimal direction of motion), so that end-inhibition tends to be stronger than sideinhibition (Lui et al., 2007b, 2013).

Like many other visual cortical areas, MT in the marmoset lies exposed on the cortical surface and is accessible to multielectrode arrays. Recent work has exploited this anatomical convenience to measure the spatiotemporal distribution of neural correlations in anesthetized animals, and its impact on the neural codes that populations of neurons in MT can provide (McDonald et al., 2014; Solomon et al., 2014). This work shows that the spiking activity of neurons within about 1.5 mm of each other (that is, neurons with overlapping receptive fields) can be tightly synchronized (<0.05 s), and is stronger in neurons with similar direction preference (Solomon et al., 2014). Superimposed on this are slower correlations (with time scales in the range of 0.2–1 s), which extend across much of MT and therefore neurons with very dissimilar functional properties. These observations are consistent with the idea that correlations over short time scales reflect common driving input or direct connectivity between neurons, while those over longer time scales reflect modulation in the gain of larger networks.

#### **COLUMNAR ORGANIZATION OF MT**

Electrophysiological recordings approximately tangential to the cortical surface show smooth changes in direction preference in MT. Nearby neurons must have generally similar direction preference, as multiunit activity is well tuned for direction (McDonald et al., 2014), and recordings with laminar probes inserted approximately perpendicular to the cortical surface also exhibit a preponderance of similar direction preferences along each probe (Solomon et al., 2014). These observations are all consistent with the columnar organization of direction preference in marmoset MT. In addition, staining for myelin in marmoset MT reveals quasi-periodic bands, which may align with the distribution of transcallosal afferents arising in the contralateral area MT (Krubitzer, 1995). Functional correlates of this banding pattern have not yet been identified, and it does not appear to be associated with discontinuities in retinotopy (unlike, for example, the discontinuities associated with cytochrome oxidase stripes in area V2).

#### **DEVELOPMENT AND PLASTICITY OF MT**

The rapid postnatal development of marmosets has been instrumental in allowing studies of cortical maturation and plasticity. Area MT undergoes neurochemical maturation in parallel with V1, and ahead of all other visual areas, suggesting that MT may act as an "anchor point" that guides the maturation of cortical areas (Rosa, 2002; Bourne and Rosa, 2006; Warner et al., 2012; Buckner and Krienen, 2013). Indeed, many of the response properties of MT neurons can develop even when V1 is lesioned in early postnatal life, including normal receptive field topography and short latency responses to visual stimuli. Direction selectivity, however, the characteristic functional feature of neurons in MT, fails to develop in the absence of V1 (Yu et al., 2013). The effects of V1 lesions are age-dependent, as lesions in adults substantially reduce the proportion of responsive neurons in MT, but do not abolish direction selectivity (Rosa et al., 2000); the latter observation is in line with results in the macaque (Rodman et al., 1989).

#### **THE "MT SATELLITES": AREAS MST, FST, MTC**

As in other primates studied, area MT is neighbored by a complex of areas that have strong interconnections with area MT, and contain relatively high proportions of neurons showing motion selectivity (Krubitzer and Kaas, 1990; Palmer and Rosa, 2006a). These areas might provide complementary or higher stages of motion processing.

#### **AREA MST**

Medial superior temporal area lies anterior to MT, near the tip of the lateral sulcus (Krubitzer and Kaas, 1990). The pattern of visual field representation suggests that this area may be further subdivided, although whether this is warranted remains unclear (Rosa and Elston, 1998). The vast majority of neurons in MST show strong direction selectivity, and have receptive fields predominantly in the peripheral visual field, which are on average larger than those in the corresponding part of area MT. Area MST forms one of the main sources of feedbacktype projections to MT (i.e., projections that originate primarily from infragranular neurons; Palmer and Rosa, 2006a). As in the macaque (Boussaoud et al., 1990), marmoset MST receives a small but distinct projection from the representation of peripheral vision in V1, as well strong inputs from areas MT and MTC (Palmer and Rosa, 2006b). Other inputs arise in dorsal and medial extrastriate areas that emphasize peripheral vision (DM, DA, 19M, 23V, area prostriata), in FST, in visual association areas in the posterior parietal cortex (primarily LIP and PPM), in the superior temporal polysensory cortex (STP/TPO), in the parahippocampal cortex (TF) and in frontal lobe areas (primarily 8aV and 8aD; Palmer and Rosa, 2006b). Finally, MST has sparse connections with motor and premotor areas (Burman et al., 2014a,b), and to caudal auditory association areas (Palmer and Rosa, 2006b), suggesting roles in visuomotor and polysensory integration.

### **AREA FST**

Another major source of feedback-type connections to marmoset MT is area FST (Krubitzer and Kaas, 1990; Palmer and Rosa, 2006a,b). Unlike MST, FST lacks the marked emphasis on peripheral vision, and fewer neurons show clear direction selectivity (Rosa and Elston, 1998). Other than the major projection to MT, FST also projects to other visual areas (e.g., V2 and DM; Jeffs et al., 2009; Rosa et al., 2009) and frontal area 8aV (Reser et al., 2013). FST may be a major node of integration between the dorsal and ventral streams of processing (Rosa and Elston, 1998).

# **AREA MTC**

The MTC area forms a topographically organized, horseshoeshaped ring around much of MT (**Figures 1** and **5**), and may be related to the "V4t" area described in the macaque (Gattass et al., 1988). Area MTC is a major source of input to MT, but unlike in FST and MST these connections originate in equal proportion in the supragranular and infragranular layers, suggesting that they are better thought of as lateral, rather than feedforward or feedback, connections (Palmer and Rosa, 2006a). Receptive fields are, on average, slightly larger than those in MT (Rosa and Elston, 1998), and only half of the neurons show clear direction selectivity. By comparison with MT, MTC receives input from a wider variety of frontal areas, including subdivisions of the ventrolateral and orbital frontal cortices, as well as oculomotor centers (Burman et al., 2006).

### **OCCIPITOPARIETAL AND CAUDAL PARIETAL AREAS**

Anterior to DM lie areas of cortex that are cytoarchitecturally intermediate between the "classical" (area 19-type) extrastriate cortex and the posterior parietal (areas 5 and 7-type) cortex. This region of cortex is likely to be a site of visuomotor integration, and includes areas whose likely counterparts in macaque are buried deep in the annectant gyrus and parietooccipital sulcus. Receptive field topography and response properties suggest at least two subregions: area DA, which contains neurons with clear visual receptive fields, and a medial region (PPM) where visual responses are harder to obtain in anesthetized preparations. Both DA and PPM are heavily interconnected with areas DM, MT, and MST, suggesting that they are part of the dorsal stream of visual processing (Palmer and Rosa, 2006a,b; Burman et al., 2008; Rosa et al., 2009; Jeffs et al., 2013). In addition, they have reciprocal interconnections with frontal motor, premotor, and oculomotor areas (Burman et al., 2006, 2008; Reser et al., 2013).

### **AREA DA**

Area DA (or one of its subdivisions) is likely to be homologous to macaque area V3a. DA is topographically organized and includes neurons with relatively large receptive fields, which grow from ∼5◦ diameter in the central representation to ∼30◦ in the periphery (Rosa and Schmid, 1995; Jeffs et al., 2013). The topographic organization is complex, with some evidence for two visuotopic maps (Rosa and Schmid, 1995).

# **AREA PPM**

Based on its connectivity and location, area PPM is likely to correspond to macaque area V6a (Burman et al., 2008; Paxinos et al., 2012). Many neurons in area PPM do not respond to simple visual stimuli under anesthesia (Rosa and Schmid,1995; Rosa et al., 2009); among those that do respond, receptive fields are very large and diffuse. Area PPM is adjoined anteriorly by putative homologs of macaque area PEC (caudal subdivision of cytoarchitectural area PE) and PGM (medial subdivision of cytoarchitectural area PG), which form other connectional nexuses between visual areas and the premotor centers of the frontal lobe (Burman et al., 2008; Reser et al., 2013).

# **POSTERIOR PARIETAL CORTEX**

This region comprises a series of architecturally distinct fields (Rosa et al., 2009; Paxinos et al., 2012; Reser et al., 2013) but knowledge of their functional properties and precise boundaries requires further study, preferably in awake-behaving preparations. Among the best characterized subdivisions is a putative homolog of area LIP, which as in the macaque forms strong projections to MT and to the frontal eye fields, and is heavily myelinated in comparison with other "intraparietal" areas (Rosa et al., 2009; Reser et al., 2013). Likely homologs of the medial and ventral intraparietal areas (MIP and VIP), of medial parietal area PGM (medial subdivision of PG) and of ventral parietal areas OPt, PG, PFG, and PF have also been suggested, based on cyto- and myeloarchitecture (Rosa et al., 2009; Paxinos et al., 2012). Large visual receptive fields have been recorded in the likely homologs of OPt and LIP/VIP (Rosa and Tweedale, 2000; Rosa et al., 2005). As in other primates, large lesions that include multiple subdivisions of the posterior parietal cortex result in contralateral neglect (Marshall et al., 2002).

# **VENTRAL STREAM AREAS**

Our knowledge of the ventral stream areas of the marmoset is still in its infancy. The location and topographic organization of the likely homolog of area V4 (VLA) have been mapped in detail. In addition, area ITc has been defined, which bears strong resemblance to macaque area TEO in terms of location, cytoarchitecture, receptive field size and topography (Rosa and Tweedale, 2000). Both areas are preferentially activated by complex visual stimuli (Ciuchta et al., 2013). Areas VLA and ITc both send feedback-type connections to the central representations of areas V1 (Rosa and Tweedale, 2000; Lyon and Kaas, 2001) and V2 (Jeffs et al., 2013). Whereas VLA also sends topographically organized connections to dorsal stream areas DM and MT, projections from ITc to dorsal stream cortical areas appear to be very sparse (Palmer and Rosa, 2006a; Rosa et al., 2009; Jeffs et al., 2013).

The rostral subdivisions of the inferior temporal cortex of the marmoset are known primarily from histological analyses (Burman et al., 2011; Paxinos et al., 2012), which suggest a close resemblance with these regions in the macaque (**Figure 1**). Dorsal (ITd) and ventral (ITv) cytoarchitectural areas are currently recognized, but these are likely to include multiple functional subdivisions. Although full reports of the response properties of neurons in the different subdivisions of the inferior temporal cortex have yet to appear, there have been preliminary reports of subregions containing face-selective cells (Tamura and Fujita, 2007; Hung et al., 2013). In addition, it has been established that lesions of the marmoset inferior temporal cortex result in deficits in visual object discrimination (Ridley et al., 2001).

# **FRONTAL ASSOCIATION AREAS**

The frontal eye field of the marmoset has been identified based on both physiological (Blum et al., 1982) and cytoarchitectural (Burman et al., 2006; Burman and Rosa, 2009) criteria. As in the macaque, the frontal eye field is approximately coincident with cytoarchitectural area 8aV, although it may extendfurther ventrally to include area 45 (Reser et al., 2013). Most (if not all) extrastriate areas have connections with the frontal eye field, but projections from V1 are absent. There is some topography in the connections between extrastriate cortex and the frontal lobe, with the anterior part of area 8aV receiving connectionsfrom neurons with receptive fields in peripheral vision, and the posterior part receiving connections from those representing central vision (Reser et al., 2013). Other areas of the frontal lobe, including areas 8aD and 8C, and the rostral premotor cortex, receive sparse projections from extrastriate cortex. These projections originate primarily from dorsal stream visual areas such as MST, FST, LIP, and 19M (Reser et al., 2013; Burman et al., 2014b), and may have a role in the visual guidance of motor activity.

# **SUMMARY AND CONCLUSION**

We set out to establish the current state of knowledge on the visual system of the marmoset. The most studied stages of visual processing, in the marmoset as in the macaque, are the retina, LGN, and cortical areas V1, V2, and MT. We have shown that in marmosets the corpus of knowledge available for these areas is now solid enough to allow high-level experimental design that exploits the advantages that marmoset monkeys may provide. Among these areas there appear to be no substantive functional or anatomical properties that distinguish marmosets from macaques, provided that the smaller eye and polymorphic color vision of the former are taken into account. Indeed, the simpler geometry of the thalamus and cortex in the marmoset has already allowed sharper understanding of the relationship between structure and function in LGN and MT.

The last decade has seen rapid progress in the establishment of robust protocols for electrophysiology in anesthetized preparations (Yu and Rosa, 2010), structural MRI (Bock et al., 2009), functional MRI (Belcher et al., 2013; Liu et al., 2013), optical imaging (Valverde Salzmann et al., 2012), and behavioral study of eye movements (Mitchell et al., 2014), among other important developments. Although the full extent to which marmosets can be

trained in visual tasks has yet to be established, there are indications that, given appropriate training, they can offer reliable performance in tests requiring relatively complex cognitive processes (Dias et al., 1996; Spinelli et al., 2004; Rygula et al., 2010; Tokuno and Tanaka, 2011). In addition, we have not touched on one of the strong advantages of the marmoset in developing primate models of normal vision and visual dysfunction – the potential for genetic modification (Sasaki et al., 2009). The precise functional organization of visual cortex, combined with the availability of embryonic tissue, rapid postnatal maturation and potential for genetic manipulation, mean that the marmoset may provide a tractable model for the study of the detailed molecular events that guide development of the primate cerebral cortex (Bourne et al., 2005; Teo et al., 2012; Goldshmit et al., 2014; Homman-Ludiye and Bourne, 2014). For these reasons we suggest that the marmoset is a sufficient model of primate vision.

Away from the areas of intense research interest mentioned above, our understanding of the visual system in marmosets, macaques, and humans remains incomplete. In the case of most other extrastriate areas, as well as visual association areas of the parietal, temporal and frontal lobes, further comparative work is required to solidify knowledge regarding homologies between primate species. We believe, moreover, that the marmoset will be a necessary model for understanding the roles of these areas in vision. This is because most of these areas appear to be particular specializations of the primate cortex, and in the marmoset these areas lie exposed on the cortical surface, amenable to cellularresolution imaging and large-scale electrophysiological recording. We invite the reader to imagine what may be learnt by measuring population activity simultaneously from all visual areas between V1 and MST, together with parietal areas such as LIP, during active vision in normal adults. This is already technically achievable. Now imagine what may be learnt about detecting and treating the visual deficits that accompany normal aging and retinal disease, or understanding the brain plasticity that follows stroke.

Finally, the inter-individual organization of marmoset groups has many parallels to human societies, including strong family and peer interactions during development. Marmosets may provide a natural model of visual communication and its development (e.g., Kawai et al., 2014). In conjunction with recently developed techniques for genetic manipulation, which will soon allow transgenic lines with expression of genes known to represent risk factors (Kishi et al., 2014), marmosets will likely become particularly important in understanding the physiological, anatomical, and cognitive correlates of mental disorders, such as schizophrenia and autism.

#### **AUTHOR CONTRIBUTIONS**

Samuel G. Solomon and Marcello G. P. Rosa conceived and wrote this review.

#### **ACKNOWLEDGMENTS**

We thank Alessandra Angellucci, Ulrike Grünert, Paul Martin, Luiz Carlos Silveira and Rowan Tweedale for comments on previous versions of this review. We also thank Ulrike Grünert and Jonathan Chan for the high-quality reproductions of the photomicrographs used in **Figures 2** and **6**, respectively, and Tristan

Chaplin for the brain reconstructions used in **Figures 1** and **3**. Samuel Solomon receives funding from the National Health and Medical Research Council (NHMRC; 511967, 1005427, 1027913) and Australian Research Council (ARC) Centre of Excellence in Vision Science. Marcello Rosa receives funding from the NHMRC (1003906, 1020839, 1028710, 1054055) and ARC (SR1000006, CE140100007, DP140101968).

#### **REFERENCES**


superior temporal visual areas in the macaque. *J. Comp. Neurol.* 296, 462–495. doi: 10.1002/cne.902960311


cortex in the marmoset (*Callithrix jacchus*). *Exp. Brain Res.* 84, 233–253. doi: 10.1007/BF00231444


motion, and stimulus size from center to far periphery. *Vis. Neurosci.* 31, 85–98. doi: 10.1017/S0952523813000448


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 June 2014; accepted: 22 July 2014; published online: 08 August 2014. Citation: Solomon SG and Rosa MGP (2014) A simpler primate brain: the visual system of the marmoset monkey. Front. Neural Circuits 8:96. doi: 10.3389/fncir.2014.00096 This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Solomon and Rosa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A chicken model for studying the emergence of invariant object recognition

# *Samantha M. W. Wood\* and Justin N. Wood\**

Department of Psychology, University of Southern California, Los Angeles, CA, USA

#### *Edited by:*

Davide Zoccolan, International School for Advanced Studies, Italy

#### *Reviewed by:*

David D. Cox, Harvard University, USA Jakob H. Macke, University College London, UK

#### *\*Correspondence:*

Samantha M. W. Wood and Justin N. Wood, Department of Psychology, University of Southern California, 3620 South McClintock Avenue, Los Angeles, CA 90089, USA e-mail: samantha.m.w.wood@ gmail.com; justin.wood@usc.edu

"Invariant object recognition" refers to the ability to recognize objects across variation in their appearance on the retina. This ability is central to visual perception, yet its developmental origins are poorly understood. Traditionally, nonhuman primates, rats, and pigeons have been the most commonly used animal models for studying invariant object recognition. Although these animals have many advantages as model systems, they are not well suited for studying the emergence of invariant object recognition in the newborn brain. Here, we argue that newly hatched chicks (Gallus gallus) are an ideal model system for studying the emergence of invariant object recognition. Using an automated controlled-rearing approach, we show that chicks can build a viewpoint-invariant representation of the first object they see in their life. This invariant representation can be built from highly impoverished visual input (three images of an object separated by 15◦ azimuth rotations) and cannot be accounted for by low-level retina-like or V1-like neuronal representations.These results indicate that newborn neural circuits begin building invariant object representations at the onset of vision and argue for an increased focus on chicks as an animal model for studying invariant object recognition.

**Keywords: invariant object recognition,** *Gallus gallus***, chicks, imprinting, controlled rearing**

#### **INTRODUCTION**

Humans and other animals can recognize objects despite tremendous variation in how objects appear on the retina (due to changes in viewpoint, size, lighting, and so forth). This ability—known as "invariant object recognition"—has been studied extensively in adult animals, but its developmental origins are poorly understood. We have not yet characterized the initial state of object recognition (i.e., the state of object recognition at the onset of vision), nor do we understand how this initial state changes as a function of specific visual experiences.

Researchers have long recognized that studies of newborns are essential for characterizing the initial state of visual cognition; however, methodological constraints have hindered our ability to study invariant object recognition in newborn humans. First, human infants cannot ethically be raised in controlled environments from birth. Consequently, researchers have been unable to study how specific visual experiences shape the initial state of invariant object recognition. Second, it is typically possible to collect just a small number of test trials from each newborn human. As a result, researchers have been unable to measure newborns' first visual object representations with high precision.

Here, we describe an automated controlled-rearing approach with a newborn<sup>1</sup> animal model—the domestic chick (*Gallus gallus*)—that overcomes these two limitations.

#### **NEWLY HATCHED CHICKS AS A NEWBORN ANIMAL MODEL**

Animal models provide a critical tool in the investigation of visual processing machinery. To date, nonhuman primates have been the model of choice for studying invariant object recognition because their visual systems closely mirror our own. Studies of primates have revealed many important characteristics about object recognition, including the nature of its underlying computations and the architecture of its neural substrates (reviewed by DiCarlo et al., 2012; see also Yamins et al., 2014). There is also growing evidence that rats and pigeons may be promising animal models for studying object recognition because they, too, have invariant object recognition abilities (Zoccolan et al., 2009; Soto et al., 2012; Tafazoli et al., 2012; Wasserman and Biederman, 2012; Alemi-Neissi et al., 2013). These animal models enable experimental techniques that are difficult to perform with primates. For instance, rat studies allow the application of a wide range of techniques including molecular and histological approaches, two-photon imaging, and large-scale recordings from multiple brain areas. However, while primates, rodents, and pigeons have many advantages as model systems, these animals are not well suited for studying the *initial state* of object recognition because they cannot be raised in strictly controlled environments from birth2.

These three animal models all require parental care. Thus, after birth or hatching, the newborns must be raised in environments that contain a caregiver. Experience with this caregiver

<sup>1</sup>The term "newborn" is used to refer to an animal at the beginning of the postembryonic phase of their life cycle.

<sup>2</sup>Rats and mice can be reared in darkness. However, dark rearing prevents complete microcircuit maturation in the visual cortex (Ko et al., 2014), produces abnormalities in local cortical connectivity (Ishikawa et al., 2014), and alters the long-term development of GABAergic transmission (Morales et al., 2002). Further, rats and mice cannot be raised from birth in controlled, lighted environments (i.e., environments devoid of objects and agents). In contrast, chicks can be raised in controlled, lighted environments immediately after hatching. Thus, with chicks, it is possible to examine how patterned visual input drives the emergence of object recognition at the beginning of the post-embryonic phase of the animal's life cycle.

could significantly shape the newborn's object recognition mechanisms by providing clues about which retinal image changes are identity-preserving transformations and which are not. Indeed, studies of monkeys and humans show that object recognition machinery changes rapidly in response to statistical redundancies in the organism's environment (e.g., Wallis and Bulthoff, 2001; Cox et al., 2005), with significant neuronal rewiring occurring in as little as one hour of experience with an altered visual world (Li and DiCarlo, 2008, 2010). There is also extensive behavioral evidence that primates begin encoding statistical redundancies soon after birth (e.g., Saffran et al., 1996; Kirkham et al., 2002; Bulf et al., 2011). These findings allow for the possibility that even early emerging object recognition abilities (e.g., abilities emerging days, weeks, or months after birth) are learned from experience with objects early in postnatal life.

Analyzing the initial state of invariant object recognition therefore requires a newborn animal model with two characteristics: (1) the animal can develop invariant object recognition abilities and (2) the animal's visual environment can be strictly controlled immediately after the post-embryonic phase of their life cycle (i.e., to prevent learning from visual object experiences). Chicks meet both of these criteria. First, newly hatched chicks develop invariant object recognition abilities rapidly (Wood, 2013, 2014a). For example, chicks can build a viewpoint-invariant representation of the first object they see in their life (Wood, 2013, 2014a). Chicks also have other advanced object recognition abilities, including the ability to bind color and shape features into integrated color-shape units at the onset of vision (Wood, 2014b). Second, chicks can be raised from birth in environments devoid of objects and caregivers (Vallortigara, 2012; Wood, 2013). Unlike newborn primates, rodents, and pigeons, newly hatched chicks do not require parental care and are immediately able to explore their environment.

In addition, chicks imprint to objects seen soon after hatching (e.g., Bateson, 2000; Horn, 2004). Chicks develop a strong attachment to their imprinted objects, and will attempt to spend most of their time with the objects. This imprinting behavior can be used to test chicks' object recognition abilities without training (Regolin and Vallortigara, 1995; Bolhuis, 1999; Wood, 2013). Imprinting in chicks is also subject to a critical period (Lorenz, 1937). Once the critical period ends, the chick can be presented with over one hundred test trials without significantly changing the chick's representation of their imprinted object (e.g., Wood, 2013, 2014a,b). This makes it possible to measure each chick's first visual object representation with high precision.

Notably, studies of chicks can also inform human visual development because birds and mammals use similar neural mechanisms. At a macro-level, avian and mammalian brains share the same large-scale organizational principles: both are modular, small-world networks with a connective core of hub nodes that includes prefrontal-like and hippocampal structures (Shanahan et al., 2013). Further, avian and mammalian brains have homologous cortical-like cells and circuits for processing sensory information (Jarvis et al., 2005; Wang et al., 2010; Dugas-Ford et al., 2012; Karten, 2013). Although these neural circuits are organized differently in birds and mammals (nuclear vs. layered organization, respectively), they share many similarities in

terms of cell morphology, the connectivity pattern of the input and output neurons, gene expression, and function (Saini and Leppelsack, 1981; Karten and Shimizu, 1989; Karten, 1991, 1997; Butler, 1994; Medina and Reiner, 2000; Reiner et al., 2005). For instance, in chicken neural circuitry, sensory inputs are organized in a radial columnar manner, with lamina specific cell morphologies, recurrent axonal loops, and re-entrant pathways, typical of layers 2–5a of mammalian neocortex (reviewed by Karten, 2013). Similarly, long descending telencephalic efferents in chickens contribute to the recurrent axonal connections within the column, akin to layers 5b and 6 of the mammalian neocortex. The avian visual wulst also has circuitry and physiological properties that are similar to the mammalian visual cortex (Karten, 1969, 2013). For example, like the cat and monkey visual cortex, the visual wulst includes precise retinotopic organization, selectivity for orientation, and selectivity for direction of movement (Pettigrew and Konishi, 1976). Together, these studies indicate that birds and mammals use homologous neural circuits to process visual information. Thus, controlled-rearing experiments with chicks can be used to inform the development of vision in humans.

Finally, while chickens have less advanced visual systems than humans, this should not be seen as a problem. When attempting to understand a particular phenomenon, it is often valuable to use the simplest system that demonstrates the properties of interest. Pioneering research in neuroscience and genetics has relied heavily on this strategy—for example, researchers have used *Aplysia* to study the physiological basis of memory storage in neurons (e.g., Kandel, 2007), *C. elegans* to study the mechanisms of molecular and developmental biology (e.g., Brenner, 1974), and *Drosophila* to study the mechanisms of genetics (e.g., Bellen et al., 2010). In a similar vein, the study of newly hatched chicks can offer an important window onto the emergence of high-level visual abilities like invariant object recognition.

### **AN AUTOMATED CONTROLLED-REARING APPROACH FOR STUDYING INVARIANT OBJECT RECOGNITION**

Historically, newborn subjects' behavior has been quantified through direct observation by trained researchers. While direct observation has revealed many important insights about human development, this approach has limitations: researchers can only observe a small number of subjects simultaneously, and there are constraints on the resolution of these observations.

Recent technological advances in automated image-based tracking provide a solution to these limitations by allowing researchers to collect large amounts of precise and accurate behavioral data (Dell et al., 2014). Further, image-based tracking uses a digital recording of the animal's behavior, which maintains an objective view of events. This increases the repeatability of analyses, while allowing subjects to be tracked with high spatiotemporal resolution. Finally, and perhaps most importantly, automated approaches eliminate the possibility of experimenter bias (e.g., bias that may occur when coding the subject's behavior, presenting stimuli to the subject, or deciding whether to include the subject in the final analysis).

To study the initial state of invariant object recognition, we used an automated controlled-rearing approach. This *complete* *data* controlled-rearing technique allows researchers to raise newly hatched chicks for several weeks within controlled-rearing chambers (for details see Wood, 2013). We use the term *complete data* because the chambers track and record *all* of the chicks' behavior (9 samples/second, 24 h/day, 7 days/week), providing a complete digital record of each subject's behavior across their lifespan. This technique produces hundreds of hours of data for each subject, allowing researchers to measure chicks' emerging visual-cognitive abilities with high precision.

Importantly, our controlled-rearing chambers also make it possible to control all of the chicks' visual object experiences. The chambers contain no real-world (solid, bounded) objects, and object stimuli are presented to the chick by projecting virtual objects onto two display walls situated on opposite sides of the chamber. Thus, the chicks' visual object experiences are limited to the virtual objects presented on the display walls.

#### **THE PRESENT EXPERIMENT**

The current study builds on a previous study that examined whether newly hatched chicks can build invariant object representations at the onset of vision (Wood, 2013). In this previous study, chicks were raised for one week in controlled-rearing chambers that contained a single virtual object that could only be seen from a limited 60◦ viewpoint range. In their second week of life, we then measured whether chicks could recognize the virtual object across a variety of novel viewpoints. The majority of subjects successfully recognized the object across the novel viewpoints, which shows that chicks can build a viewpointinvariant representation of the first object they see in their life.

The present study extends this finding in three ways. First, we significantly reduced the amount of visual object input available to the subjects. In Wood (2013), the chicks were shown a virtual object that moved smoothly over time through a 60◦ viewpoint range at 24 images/second, whereas in the present study, the chicks were shown a virtual object that moved abruptly over time through a 30◦ viewpoint range at 1 image/second (see **Figure 1**). Thus, compared with Wood (2013), the chicks in the present study observed a smaller number of unique images of the object (3 unique images vs. 72 unique images), a smaller range of movement (30◦ viewpoint range vs. 60◦ viewpoint range), and unnatural (abrupt) vs. natural (smooth) object motion. The abrupt object motion was unnatural because it caused the object's features to move large distances across the retina instantaneously, breaking the spatiotemporal contiguity of the images. The present study therefore provided a particularly strong test of whether chicks can build invariant object representations from impoverished visual input.

Second, we tested chicks' object recognition abilities across a systematically varying recognition space. Each chick's object recognition abilities were tested across 27 different viewpoint ranges; the viewpoint ranges canvassed a uniform recognition space in which the object was rotated −60◦ to +60◦ in the azimuth direction and −60◦ to +60◦ in the elevation direction (in 15◦ increments; see **Figure 4**). Thus, we were able to examine whether chicks' recognition performance varied as a function of the object's degree of rotation.

Third, we investigated whether chicks' recognition abilities could be explained by some low-level features of the test animations, by quantifying the similarity between the input images and the test images. We quantified image similarity in terms of both pixel-like similarity and V1-like similarity, akin to previous studies that tested object recognition in adult rats (Zoccolan et al., 2009; Tafazoli et al., 2012).

# **EXPERIMENT**

#### **METHODS**

#### *Subjects*

Ten chicks of unknown sex were tested. No subjects were excluded from the analyses. Fertilized eggs were incubated in darkness in an OVA-Easy incubator (Brinsea Products Inc., Titusville, FL, USA). We maintained the temperature and humidity at 99.6◦F and 45%, respectively, for the first 19 days of incubation. On day 19 of incubation, the humidity was increased to 60%. The eggs were incubated in darkness to ensure that no visual input would reach the chicks through their shells. After hatching, we moved the chicks from the incubator room to the controlled-rearing chambers in complete darkness. Each chick was raised singly within its own chamber.

#### *Controlled-rearing chambers*

The controlled-rearing chambers measured 66 cm (length)×42 cm (width) × 69 cm (height). The floors of the chambers consisted of black wire mesh suspended 1 over a black surface by transparent, plexiglass beams. Object stimuli were presented to the subjects by projecting virtual objects onto two display walls (19 LCD monitors with 1440 × 900 pixel resolution) situated on opposite sides of the chambers. The other two walls of

the chambers were white, high-density plastic. We used matte (non-reflective) materials for both the walls and the floor to avoid incidental illumination. The chambers contained no rigid, bounded objects other than the virtual objects presented on the display walls. See Figure 1 in Wood (2013) for a picture of the chambers.

Food and water were provided *ad libitum* within transparent, rectangular troughs in the ground (66 cm length × 2.5 cm width × 2.7 cm height). Grain was used as food because grain does not behave like a rigid, bounded object (i.e., grain does not maintain a solid, bounded shape). All care of the chicks was performed in darkness with the aid of night vision goggles.

The controlled-rearing chambers recorded all of the chicks' behavior (24 h/day, 7 days/week) with high precision (9 samples/second) via micro-cameras (1.5 cm diameter) embedded in the ceilings of the chambers and automated image-based tracking software (Ethovision XT, Noldus Information Technology, Leesburg, VA, USA). This software calculated the amount of time each chick spent within zones (22 cm × 42 cm) next to each display wall. In total, 3,360 h of video footage (14 days × 24 h/day × 10 subjects) were collected and analyzed for the present study.

#### *Input phase*

During the input phase (the first week of life), chicks were raised in environments that contained a single virtual object. Four chicks were presented with Object 1 and six chicks were presented with Object 2 (see **Figure 1**). The object animations contained just three unique images of the object: a front view and two side views with ±15◦ azimuth rotations. The images changed at a rate of 1 image/second. From a human adult's perspective, the objects appeared to undergo apparent motion, rocking back and forth through a 30◦ viewpoint range along a frontoparallel vertical axis. The virtual object was displayed on a uniform white background, and appeared for an equal amount of time on the left and right display walls. The object switched walls every 2 h, following a 1-minute period of darkness (**Figure 2**).

#### *Test phase*

During the test phase (the second week of life), we examined whether each chick had built a viewpoint-invariant representation of their imprinted object by using an automated two-alternative forced choice testing procedure. On each test trial, the imprinted object was shown on one display wall and an unfamiliar object was shown on the other display wall. We then measured the amount of time chicks spent in proximity to each object. If chicks successfully recognized their imprinted object, then they should have spent a greater proportion of time in proximity to the imprinted object compared to the unfamiliar object. The imprinted object was shown from 81 different test viewpoints, consisting of all possible combinations of 9 azimuth rotations (−60◦, −45◦, −30◦, −15◦, 0◦, +15◦, +30◦, +45◦, +60◦) and 9 elevation rotations (−60◦, −45◦, −30◦, −15◦, 0◦, +15◦, +30◦, +45◦, +60◦). To equate the direction of object motion across the input and test phases, the 81 viewpoints were organized into 27 different viewpoint ranges, each containing three images. Like the input object animation, each of the 27 test animations showed the imprinted object rotating back and forth

±15◦ along the azimuth rotation axis. **Figure 4** shows how the 81 individual viewpoints were organized into the 27 test animations.

The unfamiliar object was similar to the imprinted object in terms of its size, color, motion speed, and motion trajectory. Further, on all of the test trials, the unfamiliar object was presented from the same frontal viewpoint range as the imprinted object from the input phase. Presenting the unfamiliar object from this frontal viewpoint range maximized the similarity between the unfamiliar object and the imprinting stimulus. Thus, to recognize their imprinted object, chicks needed to generalize across large, novel, and complex changes in the object's appearance on the retina. The test trials lasted 17 min and were separated from one another by 32 min rest periods. During the rest periods, we projected the animation from the input phase onto one display wall and a white screen onto the other display wall. The test trials and rest periods were separated by 1 min periods of darkness. On each day of the test phase, chicks were presented with each viewpoint range one time, for a total of 27 test trials per day. Thus, each chick received 189 test trials over the course of the experiment. The 27 viewpoint ranges were presented in a randomized order during each day of the test phase.

### **RESULTS**

#### *Overall performance*

To test whether performance was significantly above chance, we used intercept-only mixed effects models (also called "multilevel

**FIGURE 2 | A schematic showing how the virtual objects were presented on the two display walls during the input phase (top) and the test phase (bottom).** During the input phase, chicks observed a single virtual object rotating abruptly back and forth through a 30◦ viewpoint range. During the test phase, chicks were presented with regularly scheduled test trials. During the test trials, the imprinted object was shown on one display wall and an unfamiliar object was shown on the other display wall. The imprinted object was shown from a variety of novel viewpoints, whereas the unfamiliar object was always shown from the same frontal viewpoint range as the imprinted object during the input phase. This maximized the pixel-level and V1-level similarity between the unfamiliar object and the imprinting stimulus. Thus, to recognize their imprinted object, chicks needed to generalize across large, novel, and complex changes in the object's appearance on the retina.

models"). Since we collected multiple observations from each subject, it was necessary to use an analysis that can account for the nested structure of the data (Aarts et al., 2014). The mixed effects models were performed using R (www.r-project.org). First, we computed the number of test trials in which chicks preferred their imprinted object over the unfamiliar object. The chick was rated to have preferred their imprinted object on a trial if their object preference score was greater than 50%. The object preference score was calculated with the formula:

Accordingly, test trials were scored as "correct" when subjects spent a greater proportion of time with their imprinted object, and "incorrect" when they spent a greater proportion of time with the unfamiliar object. Chicks spent more time with their imprinted object on 59% (SEM = 3%) of the test trials (see **Figure 3**).

We used a mixed effects logistic regression model (R package lme4) to test whether performance was significantly greater than chance.We fitted the model with test trial outcome (binary: correct or incorrect) as the dependent variable, an intercept as the fixed effect, and a random intercept for the subject-effect. The fixed effect intercept was positive and significant [*b* = 0.394, *z* = 2.857, *p* = 0.004], which indicates that chicks' recognition performance was significantly greater than 50% (chance performance). Chicks' recognition performance was also significantly above chance when the analysis did not include the test trials where the imprinted object was shown from the familiar viewpoint range [*b* = 0.365, *z* = 2.636, *p* = 0.008].

Second, we confirmed these results with a similar analysis on the object preference scores (i.e., the proportion of time chicks spent with the imprinted object compared to the unfamiliar object). Because the significance of the intercept indicates whether the intercept is significantly different than 0, we subtracted 50% from each object preference score. Thus, the adjusted object preference scores ranged from −50 to +50%, with an adjusted object preference score of 0 indicating equal time spent with the imprinted object and unfamiliar object. We fitted a linear mixed effects model (R package nlme) with the adjusted object preference score as the dependent variable, an intercept as the fixed effect, and a random intercept for the subjecteffect. Again, the fixed effect intercept was positive and significant [*b* = 0.072, *t*(1878) = 3.015, *p* = 0.003], which provides further evidence that chicks' recognition performance was significantly higher than 50% (chance performance). Chicks' recognition performance was also significantly above chance when the analysis did not include the test trials where the imprinted object was shown from the familiar viewpoint range [*b* = 0.068, *t*(1808) = 2.828, *p* = 0.005].

With this controlled-rearing method we were able to collect a large number of test trials from each chick. Thus, we were able to examine whether each subject was able to build a viewpointinvariant representation of their imprinted object. To do so, we computed whether each subject's performance across the test trials exceeded chance level (using one-tailed binomial tests). Six of

the 10 subjects successfully built an invariant object representation [*p*s ≤ 0.05]3. When the analysis did not include the familiar viewpoint range from the input phase, 5 of the 10 chicks performed significantly above chance (see **Figure 3**). Thus, many of the chicks successfully built an invariant object representation that generalized across novel viewpoints.

To ensure that all of the chicks successfully imprinted to the virtual object (i.e., developed an attachment to the object), we examined whether the chicks showed a preference for the imprinted object during the rest periods in the test phase. All 10 subjects spent the majority of the rest periods in proximity to the imprinting stimulus [mean = 88% of trials; SEM = 2%; onetailed binomial tests, all *p* < 10−9]. Thus, it is possible to imprint to an object but fail to build a viewpoint-invariant representation of that object (see also Wood, 2013).

#### *Correlations of object recognition performance across subjects*

As shown in **Figure 3**, there was substantial variation in chicks' recognition abilities. To examine whether chicks' recognition abilities were correlated with one another, we measured the correlation in performance across the viewpoint ranges for each pair of chicks. Specifically, we computed the percentage of time spent with the imprinted object for each viewpoint range for each chick. The correlations in performance between all pairs of chicks are shown in **Figure 5**. Performance was highly correlated across the subjects: out of the 45 subject pairs, 44 were positively correlated and only 1 pair was negatively correlated. Overall, the average correlation between subjects was *r* = 0.35 (SEM = 0.03). These correlation values were significantly different from 0 (no correlation), *t*(44) = 8.72, *p* < 0.001. Despite the substantial range of variation in performance across subjects, the chicks' recognition abilities were nevertheless highly correlated with one another.

#### *Analysis of change in performance over time*

To examine whether recognition performance changed over the course of the test phase, we calculated the percentage of time chicks spent in proximity to the imprinted object versus the unfamiliar object as a function of test day. The results are shown in **Figure 6**. Performance remained stable across the test phase [one-way ANOVA, *F*(6) = 0.224, *p* = 0.968]. Chicks' recognition behavior was spontaneous and robust, and cannot be explained by learning taking place across the test phase. Chicks immediately achieved their maximal performance and did not significantly improve thereafter.

#### *Analysis of viewpoint effects*

To test whether recognition performance varied as a function of the degree of viewpoint change, we calculated chicks' mean object preference scores for each of the elevation viewpoint change magnitudes (i.e., ±60◦, ±45◦, ±30◦, ±15◦, 0◦). The correlation between the magnitude of viewpoint change and performance did not approach significance [*r* = −0.06, *p* = 0.93]. Thus, when chicks first begin to recognize objects, their performance does not decline with larger changes in viewpoint.

<sup>3</sup>Four of the 10 subjects performed significantly higher than chance level after a Bonferroni correction for 10 independent tests [10 subjects; *p* < 0.005].

**FIGURE 3 | Recognition performance for the overall group (top) and the individual subjects (bottom).** The dark gray bars denote the percentage of correct trials, and the light gray bars denote the proportion of time subjects spent with the imprinted object. These graphs do not include the test trials in which the imprinted object was shown from the familiar viewpoint range from the input phase. The subjects are ordered by performance. The red dashed lines show chance performance (50%). P-values denote the statistical difference between the number of correct and incorrect trials as computed through mixed effects models (top graph) and one-tailed binomial tests (bottom graph).

In general, however, chicks' recognition performance was lower when the object was presented from negative elevation rotations (see **Figure 4**). When the object was presented from negative elevation rotations, a smaller portion of the object was visible to the subject (see **Figure 4**). Thus, chicks' recognition performance (i.e., the percentage of time spent with the imprinted object versus unfamiliar object) was positively correlated with the number of foreground (object) pixels that were visible on the screen [*r* = 0.41, *p* < 0.01]. One possible explanation for this effect is that the negative elevation rotations occluded discriminative features that were used to recognize the object. For instance, a recent study with adult rats who were trained to distinguish between these same two virtual objects showed that rats built sub-features of objects that were smaller than the entire object (Alemi-Neissi et al., 2013). When these sub-features were occluded with "bubble masks" (Gosselin and Schyns, 2001), rats' recognition abilities declined. It would be interesting for future studies to use this bubble masking approach with chicks to characterize the specific features used to recognize objects at the onset of vision.

#### *Analysis of object stimuli and performance*

Did chicks need high-level (invariant) object representations to succeed in this experiment? Previous studies have shown that

chicks do not use overall brightness as a low-level cue to distinguish between these two virtual objects (Wood, 2014a), and that chicks' early emerging invariant object recognition abilities cannot be explained by retina-like (pixel-wise) representations when recognition is tested across more extreme azimuth and elevation rotations (Wood, 2013).

To extend these previous analyses, we quantified the similarity between the input animations and the test animations in two ways. First, we computed the amount of image variation between the input animations and the test animations from a retina-like (pixel-level) perspective. For each animation, we (1) measured the brightness level of each pixel in each of the three unique object images, (2) compared each image from the test animation to each image from the input animation (i.e., by comparing the brightness level of each corresponding pixel across the images and taking the absolute difference), and (3) calculated the average pixel-level difference between the three unique images from the input and test animations (i.e., the first test image was compared to the first, second, and third input image; the second test image was compared to the first, second, and third input image; and the third test image was compared to the first, second, and third input image). Recognition performance (i.e., the object preference scores) did not vary as a function of the

pixel-level difference between the input animations and test animations [linear regression: *<sup>b</sup>* = −7.08 <sup>×</sup> <sup>10</sup>−8, *<sup>t</sup>*(52) = −1.29, *p* = 0.20].

Second, we computed the amount of image variation between the input animations and the test animations from a V1-level perspective. To do so, we used a Gabor measure of similarity with the Gabor jet model: a multi-scale, multi-orientation model of V1 complex-cell filtering developed by Lades et al. (1993). The general parameters and implementation followed those used by Xu and Biederman (2010), which can be downloaded at http://geon.usc.edu/GWTgrid\_simple.m. For each unique image in each animation, we measured the magnitude of activation values that the image produced in a set of 40 Gabor jets (8 orientations × 5 scales). We measured the dissimilarity between two images by computing one minus the correlation between their Gabor jet activation values. Thus, the dissimilarity between two

**FIGURE 5 | A similarity matrix showing the correlation in performance for each pair of subjects.** The order of the subjects in the matrix is determined by a hierarchical cluster analysis. The cells are color-coded by

correlation value: green values = positive correlation in performance; red values = negative correlation in performance. The color scale reflects the full range of possible correlation values.

images could range from 0 (perfect positive correlation) to 2 (perfect negative correlation). Finally, we calculated the average Gabor jet dissimilarity across all three unique images of the animations (i.e., the first test image was compared to the first, second, and third input image; the second test image was compared to the first, second, and third input image; and the third test image was compared to the first, second, and third input image). Recognition performance (i.e., the object preference scores) did not vary as a function of Gabor jet dissimilarity between the input animations and test animations [linear regression: *b* = −0.11, *t*(52) = −1.04, *p* = 0.30].

Additionally, to confirm that chicks' recognition performance could not be explained by retina–like or V1–like representations, we tested whether models based on pixel-level or V1-level representations could successfully predict object identity in this experiment. Specifically, we generated a pixel-level model and a V1-level model that predicted object identity based on the image differences between the test animations and the input animation. For each viewpoint range, we measured (1) the difference between the test animation of the imprinted object and the input animation of the imprinted object (within-object difference), and (2) the difference between the test animation of the unfamiliar object and the input animation of the imprinted object (between-object difference; see **Figure 7**). If the within-object difference was smaller than

the between-object difference, then the model was "correct" for that viewpoint range. Conversely, if the between-object difference was smaller than the within-object difference, then the model was "incorrect" for that viewpoint range. The retina-like (pixel-level)

model was correct on only 20% of the viewpoint ranges, while the V1-level (Gabor jet) model was correct on only 28% of the viewpoint ranges. Unlike the chicks' recognition performance, which was significantly above chance (50%) levels, both low-level models performed significantly below chance levels [pixel-level interceptonly logistic regression: *b* = −1.36, *z* = −4.04, *p* < 0.0001; V1-level intercept-only logistic regression: *b* = −0.96, *z* = −3.15, *p* = 0.002].

To compare the models' performance to the chicks' performance, we computed the average percentage of time chicks spent with the imprinted object versus the unfamiliar object for each viewpoint range. If chicks spent more time, on average, with the imprinted object than the unfamiliar object, then the chicks were "correct"for that viewpoint range. Conversely, if chicks spent more time with the unfamiliar object than the imprinted object, then the chicks were "incorrect" for that viewpoint range. For each model and for the chicks, there were 54 conditions (27 viewpoint ranges × 2 imprinted objects). The chicks were correct on 35

conditions and incorrect on 19 conditions. The pixel-level model was correct on 11 conditions and incorrect on 43 conditions. The V1-level model was correct on 15 conditions and incorrect on 39 conditions. Chi-square tests comparing the number of correct and incorrect conditions for the chicks and the models found significant differences between chicks' recognition performance and both models' recognition performance [pixel-level model versus chick performance: *<sup>X</sup>*2(1, *<sup>N</sup>* <sup>=</sup> 108) <sup>=</sup> 21.81, *<sup>p</sup>* <sup>&</sup>lt; <sup>10</sup>−5; V1 level model versus chick performance: *<sup>X</sup>*2(1, *<sup>N</sup>* <sup>=</sup> 108) <sup>=</sup> 14.90, *p* < 10−3].

Overall, the within-object difference was greater than the between-object difference, both at the pixel-level and V1-levels. Thus, in principle, chicks could have succeeded in this experiment by preferring the test animation that was the most different from the input animation (i.e., a novelty preference). To test this possibility, we analyzed the test trials in which the imprinted object was presented from the familiar viewpoint range from the input phase. If chicks had a novelty preference, then they should have avoided

the imprinted object on the trials in which the test animation of the imprinted object was identical to the input animation of the imprinted object. Contrary to this prediction, chicks spent significantly more time with the imprinted object than the unfamiliar object when the imprinted object was presented from the familiar viewpoint range [logistic mixed effects regression: *b* = 1.514, *z* = 2.794, *p* = 0.005; linear mixed effects regression: *b* = 0.180, *t*(60) = 3.062, *p* = 0.003]. Thus, chicks did not simply have a preference for the novel animation in this experiment.

Together, these analyses indicate that chicks build invariant object representations that cannot be explained by lowlevel retina-like (pixel-wise) or V1-like neuronal representations. Rather, chicks build selective and tolerant object representations, akin to those found in higher levels of the visual system.

#### *Comparison to prior studies*

The virtual objects used in this study were the same as those used in Wood (2013). However, in the current study, each imprinting and test animation only contained three unique images showing the objects rotating abruptly at a rate of 1 image/second, while in Wood (2013), the virtual objects moved smoothly over time through a 60◦ viewpoint range at 24 images/second. To test whether the impoverished visual stimuli used in the current experiment impaired chicks' object recognition abilities, we compared performance in the current study to chicks' performance in Wood (2013). **Figure 8** shows the mean recognition performance from both studies. A one-way ANOVA showed that performance was significantly higher in Wood (2013) than in the current study [*F*(1) = 4.239, *p* = 0.05]. Thus, experience with smooth, continuous object motion over a larger viewpoint range appears to facilitate the development of invariant object recognition. However, additional studies are needed to determine

the relative importance of each of these factors (i.e., the number of unique object images, the type of object movement, and the size of the viewpoint range) on chicks' ability to build invariant object representations.

# **GENERAL DISCUSSION**

In this study, we examined whether newly hatched chicks can build invariant object representations from highly impoverished visual input (i.e., three images of a single virtual object separated by 15◦ azimuth rotations). Impressively, many of the chicks successfully built an invariant object representation soon after hatching, which shows that experience with a rich visual world filled with diverse objects is not necessary for developing invariant object recognition. This finding opens up largely unexplored experimental avenues for probing the initial state of invariant object recognition and charting how that initial state changes as a function of specific visual experiences.

# *Implications of our findings and comparison with previous studies*

We have previously reported invariant object recognition in newly hatched chicks (Wood, 2013, 2014a); the present study extends this previous research in five ways. First, these results provide an existence proof that newly hatched chicks can build invariant object representations from extremely impoverished visual input. In previous studies (Wood, 2013, 2014a), chicks were shown objects that moved smoothly over time (24 frames/second), thereby presenting large numbers of unique and gradually changing images of the objects. Conversely, in the present study, the object animations were far more sparse (i.e., there were only three unique images of the object), which interrupted the natural temporal stability of the visual object input (i.e., the objects did not change smoothly over time). Thus, the chicks never observed their imprinted object (or any other object) move with smooth, continuous motion. Nevertheless, some of the chicks were able to build an invariant object representation from this impoverished input. For these subjects, three unique images of an object were sufficient input to build an invariant object representation.

Second, these results suggest that it is possible to impair invariant object recognition in newly hatched chicks by presenting abnormally patterned visual input. Although group performance was above chance, performance was significantly lower compared to previous experiments in which the virtual object moved smoothly over time and rotated through a larger viewpoint range (Wood,2013; see **Figure 8** for comparison of performance between studies). Thus, newborn visual systems appear to operate best over a specific type of patterned visual input. It would be interesting for future studies to characterize the nature of this 'optimal space' of visual object input.

Third, these results indicate that invariant object recognition in newly hatched chicks is not subject to the well-documented "viewpoint effect" observed in studies of human adults (i.e., larger viewpoint changes lead to greater costs in object recognition performance; Tarr et al., 1998; Hayward and Williams, 2000). We tested chicks on a wide range of viewpoints, consisting of systematic 15◦ changes in azimuth and elevation rotations. This allowed us to test whether objects presented from larger viewpoint changes are more difficult to recognize than objects

presented from smaller viewpoint changes. We found no significant differences in chicks' recognition abilities across the larger versus smaller viewpoint changes. Chicks were able to build invariant object representations that generalized beyond the imprinted viewpoint range, but the degree of generalization did not vary as a function of the degree of viewpoint change.

Fourth, we demonstrated that chicks' object recognition abilities cannot be explained by low-level retina-like or V1-like neuronal representations. Prior experiments have confirmed that chicks' object recognition abilities could not be explained by overall brightness (Wood, 2014a) or retina-like (pixel-wise) similarity (Wood, 2013, 2014a). Here, we performed additional analyses using simulated Gabor jet activation to measure the V1-like similarity between the input animations and the test animations. We found that chicks' recognition performance did not vary as a function of the V1-like similarity between the input and test animations. Further, we found that neither a model using pixel-like representations nor a model using V1-like representations was able to successfully predict object identity in this experiment (**Figure 7**). These results indicate that chicks build selective and tolerant object representations, akin to those found in higher-level cortical visual areas (DiCarlo et al., 2012).

Finally, our results provide evidence that invariant object recognition emerges in a consistent manner across different newborn subjects. The chicks' patterns of recognition performance across the individual viewpoints were strongly correlated with one another (**Figure 5**). This suggests that there are constraints on the development of invariant object recognition in newborn visual systems. However, the data also revealed substantial variation in chicks' object recognition abilities (see **Figure 3**). Despite being raised in identical visual environments, some chicks were able to recognize their imprinted object robustly across the novel viewpoints, whereas other chicks were not. Future studies could use this controlled-rearing method to further examine both the nature of the constraints on early emerging object recognition abilities and the sources of the individual variation across subjects.

In summary, the present study provides additional evidence that the domestic chick is a promising animal model for studying the emergence of invariant object recognition in a newborn visual system (see also Wood, 2013, 2014a). We have shown how a fully automated controlled-rearing technique can be used to study the initial state of invariant object recognition in newly hatched chicks with high precision. Thus far, our approach indicates that newborn neural circuits are surprisingly powerful, capable of building invariant object representations from impoverished input at the onset of vision.

#### **ACKNOWLEDGMENTS**

This research was funded by the University of Southern California and by National Science Foundation CAREER Grant BCS-1351892 to JNW. The experiment was approved by the University of Southern California Institutional Animal Care and Use Committee.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fncir.2015.00007/ abstract

### **REFERENCES**


Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996). Statistical learning by 8 month-old infants. *Science* 274, 1926–1928. doi: 10.1126/science.274.5294.1926

Saini, K. D., and Leppelsack, H. J. (1981). Cell types of the auditory caudomedial neostriatum of the starling (*Sturnus vulgaris*). *J. Comp. Neurol*. 198, 209–229. doi: 10.1002/cne.901

980203


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 June 2014; accepted: 03 February 2015; published online: 26 February 2015.*

*Citation: Wood SMW and Wood JN (2015) A chicken model for studying the emergence of invariant object recognition. Front. Neural Circuits 9:7. doi: 10.3389/fncir.2015.00007*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2015 Wood and Wood. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Mechanisms of object recognition: what we have learned from pigeons

# **Fabian A. Soto<sup>1</sup>\* and Edward A. Wasserman<sup>2</sup>**

<sup>1</sup> Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA <sup>2</sup> Department of Psychology, University of Iowa, Iowa City, IA, USA

#### **Edited by:**

Davide Zoccolan, International School for Advanced Studies, Italy

#### **Reviewed by:**

Hans P. Op De Beeck, University of Leuven, Belgium Justin N. Wood, University of Southern California, USA

#### **\*Correspondence:**

Fabian A. Soto, Department of Psychological and Brain Sciences, University of California, Santa Barbara, Building #251, Santa Barbara, CA 93106, USA e-mail: fabian.soto@psych.ucsb.edu Behavioral studies of object recognition in pigeons have been conducted for 50 years, yielding a large body of data. Recent work has been directed toward synthesizing this evidence and understanding the visual, associative, and cognitive mechanisms that are involved. The outcome is that pigeons are likely to be the non-primate species for which the computational mechanisms of object recognition are best understood. Here, we review this research and suggest that a core set of mechanisms for object recognition might be present in all vertebrates, including pigeons and people, making pigeons an excellent candidate model to study the neural mechanisms of object recognition. Behavioral and computational evidence suggests that error-driven learning participates in object category learning by pigeons and people, and recent neuroscientific research suggests that the basal ganglia, which are homologous in these species, may implement error-driven learning of stimulus-response associations. Furthermore, learning of abstract category representations can be observed in pigeons and other vertebrates. Finally, there is evidence that feedforward visual processing, a central mechanism in models of object recognition in the primate ventral stream, plays a role in object recognition by pigeons. We also highlight differences between pigeons and people in object recognition abilities, and propose candidate adaptive specializations which may explain them, such as holistic face processing and rule-based category learning in primates. From a modern comparative perspective, such specializations are to be expected regardless of the model species under study. The fact that we have a good idea of which aspects of object recognition differ in people and pigeons should be seen as an advantage over other animal models. From this perspective, we suggest that there is much to learn about human object recognition from studying the "simple" brains of pigeons.

**Keywords: object recognition, categorization, invariance, learning, pigeon**

Visually recognizing objects in the environment has a clear advantage for the survival and reproduction of any organism. Among many functions, it allows an animal to respond adaptively to sources of food, conspecifics, and possible threats. Although object recognition poses difficult computational problems (Rust and Stocker, 2010), humans and animals alike learn to respond similarly to nonidentical objects from the same category (categorization) as well as to respond differently to individual objects from the same category (identification).

Primates possess what are believed to be the most sophisticated visual systems among mammals. However, there is another vertebrate group that has also evolved highly advanced visual systems: birds (Shimizu and Bowers, 1999; Husband and Shimizu, 2001). For this reason, birds are the non-primate group in which highlevel vision has been the most studied, and the pigeon is the species chosen in the majority of such studies (for reviews, see Cook, 2001; Wasserman and Zentall, 2006; Lazareva et al., 2012). This research has demonstrated impressive visual capabilities

in pigeons, including the ability to detect and categorize many different classes of objects in a variety of conditions.

Object categorization and recognition have been studied in pigeons for 50 years, resulting in the accumulation of a large body of behavioral data (for previous reviews, see Huber, 2001; Kirkpatrick, 2001; Lazareva and Wasserman, 2008; Zentall et al., 2008). This accumulated knowledge affords us a unique opportunity for studying mechanisms of visual categorization that might be common to all amniote vertebrates (birds, reptiles, and mammals), which share a common evolutionary ancestor and basic organizational properties of their visual systems (Shimizu and Bowers, 1999; Husband and Shimizu, 2001; Shimizu, 2009). For these reasons, recent efforts in this line of research have been directed toward understanding the computational mechanisms that can explain the accumulated data. Here, we review the literature on object recognition and categorization in pigeons, with a special emphasis on the likely mechanisms involved, their plausible neurobiological substrates, and their evolution across vertebrates.

We will focus almost exclusively on object recognition and categorization. The large body of research on associative categories (i.e., stimulus equivalence; for a review, see Zentall et al., 2014) and artificial polymorphous categories (e.g., Lea et al., 2006) will be glanced here, and only in reference to related phenomena in object categorization. Furthermore, we will ignore categorization based on abstract stimulus properties, such as variability (Wasserman and Young, 2010), numerosity (Emmerton, 2001), relational properties (Vasconcelos, 2008), etc.

The review will be organized as follows. In section Behavioral Research on Object Categorization by Pigeons, we will review basic research on object categorization by pigeons. Because pigeons are assumed to have little or no experience with the objects presented to them in categorization experiments, an important part of this research has focused on object category *learning* instead of visual object *representation*, which is different from the focus of most human research (Soto and Wasserman, 2012b). Much like research in the area of perceptual learning in people (Lu et al., 2011), the evidence suggests that the learning of object categories by pigeons might result from the enhancement of selective readout from visual areas at a postvisual level, rather than from the direct modification of visual representations. Thus, a full account of what we know about object categorization in pigeons cannot focus exclusively on vision; we will review the learning mechanisms that might operate in non-visual areas of the pigeon brain in sections The Role of Error-driven Reinforcement Learning and Learning of Abstract Category Representations.

In section Visual Object Representation, we will turn to studies that have more directly assessed visual object representation in pigeons. We will show that many aspects of this research can be explained by feedforward processing of shape information, as implemented in models of primate vision.

In section The Evolution of Mechanisms of Object Recognition in Vertebrates: A Working Hypothesis, we will propose our current working hypothesis regarding the evolution of object recognition mechanisms in vertebrates, aiming toward explaining similarities and differences between pigeons and people (and other primates) found in behavioral studies. Finally, we argue in section The Neurobiological Mechanisms of Object Recognition: What We *Can* Learn From Pigeons that the pigeon could and should be used as an animal model of *some* of the computational processes involved in object recognition by people.

# **BEHAVIORAL RESEARCH ON OBJECT CATEGORIZATION BY PIGEONS**

# **BASIC TASKS AND RESULTS**

Two basic tasks have been used to study object categorization by pigeons. Early research used go/no-go tasks, in which a single response is rewarded in the presence of some stimuli (go trials), but not in the presence of other stimuli (no-go trials). In the first published study in this area, by Herrnstein and Loveland (1964), pigeons were rewarded after pecking at a response key when a photograph included people, but they were not rewarded for responses to photographs without people. More recent research has used forced-choice tasks (see **Figure 1A**), in which several responses are made available at the same time (introduced by Bhatt et al., 1988). Pigeons are rewarded only when they peck at the response key assigned to the presented stimulus.

A large number of studies using both of these tasks have shown that pigeons can learn to categorize objects through feedback and, more importantly, pigeons can generalize discriminative performance to novel objects never seen before. The typical pattern of results is high performance with novel objects, but at a slightly lower level of accuracy than with the original training objects.

Pigeons are capable of learning categories comprising natural objects (Herrnstein and Loveland, 1964; Herrnstein et al., 1976; Herrnstein and De Villiers, 1980; Bhatt et al., 1988; Aust and Huber, 2001, 2002) human-made objects (Bhatt et al., 1988; Wasserman et al., 1988; Lazareva et al., 2004, 2006), scene gist (Kirkpatrick et al., 2014), cartoons (Matsukawa et al., 2004), human face identity (Soto and Wasserman, 2011), gender (Troje et al., 1999; Huber et al., 2000) and emotional expression (Jitsumori and Yoshihara, 1997), and even paintings from different artists (Watanabe et al., 1995; Watanabe, 2001).

The fact that pigeons can accurately classify new objects from known categories suggests that their brains can extract visual properties which are invariant across diverse members of such object categories. However, the information that the pigeon visual system extracts from images is even richer, allowing them to flexibly categorize the same images at different levels. For example, pigeons can learn *pseudocategorization* tasks (**Figure 1B**), in which photographs containing objects from several categories are randomly assigned to different sets (Herrnstein and De Villiers, 1980; Wasserman et al., 1988). Focusing on category-relevant visual information would actually hinder performance in pseudocategorization tasks. Thus, the birds must be capable of extracting many different object properties from photographs, some of them invariant across members of the category and others specific to a particular object.

In line with this idea, studies that have directly manipulated object properties in photographs have found that many features simultaneously control pigeons' performance in a categorization task (e.g., Huber et al., 2000; Aust and Huber, 2002; Lea et al., 2013), with variations in performance being well explained as a linear function of the presence or absence of such features (Huber and Lenz, 1993; Jitsumori and Yoshihara, 1997).

An important aspect of human categorization is that the same object can be flexibly categorized at several different hierarchical levels. For example, the photograph of a human can be categorized at the so-called "basic" level as a person, at the "superordinate" level as an animal, and at the "subordinate" level as "John". Pigeons, too, have shown the ability to flexibly categorize the same objects (cars, chairs, flowers, and people) at different levels, depending on task demands (**Figure 1D**; Lazareva et al., 2004). The procedure used to train such flexible categories is illustrated in **Figure 2**. When the photograph of a human is presented together with four response keys, the pigeons learn to classify it at the basic level (**Figure 2A**), whereas when the photograph is presented together with two different response keys, the pigeons learn to classify it at the superordinate level of "natural object" (**Figure 2B**), comprising both people and flowers.

**(C), and superordinate categorization (D)**. Panels of different colors represent different responses assigned to the enclosed images.

The success of pigeons in the task shown in **Figure 2** is evidence for the flexibility in their categorization skills. However, it could be argued that learning to give the same response to two object categories is a far cry from forming a common superordinate representation for them. Other evidence shows that pigeons do in fact learn common superordinate representations in this type of task. For example, when objects from two perceptually dissimilar categories are associated with the same response, new learning obtained with objects from one of the categories automatically transfers to objects from the other category (Wasserman et al., 1992; Astley and Wasserman, 1998, 1999). This transfer suggests that training with a common response leads to the emergence of a single representation for both categories, which then mediates new learning about either of them. Such learning of a common representation for all stimuli associated with the same response is not restricted to superordinate categories, as it can be found after training with basic categories (Vaughan and Herrnstein, 1987) and with pseudocategories composed of two or more perceptually-dissimilar stimuli (Vaughan, 1988). This learning phenomenon, named stimulus equivalence in the behavioral literature (for a review, see Zentall et al., 2014), can also be found when members of a category share a common association with a particular stimulus or reward, instead of with a specific response.

In summary, the basic features of object category learning in pigeons are the following. First, pigeons can learn a variety of complex object categories and transfer this learning to novel objects. Second, pigeons can flexibly classify the same object according to different criteria (e.g., pseudocategories and superordinate categories). Third, pigeons extract a rich variety of visual properties from photographic images and use them in combination to learn the structure of object categories. Finally, pigeons learn common abstract representations for all members of the same trained category.

#### **VARIABLES THAT AFFECT OBJECT CATEGORY LEARNING**

Several factors affect both the speed with which pigeons learn new object categories and the level to which they can generalize this knowledge to unseen objects. One of the factors that has a strong effect on object categorization by pigeons is the similarity relations between objects in the same category (withincategory similarity) and between objects in different categories (between-category similarity) included in the same training task. It is generally believed that natural basic object categories have a higher level of within-category similarity than between-category similarity, what is termed "perceptual coherence". For this reason, several early studies sought evidence as to whether pigeons could perceive and use such perceptual coherence for categorization, in contrast to just learning object categories by rote memorization of the images.

For example, Astley and Wasserman (1992) rewarded pigeons for pecking at photographs from a target category and measured to what extent the pecking response generalized to nonrewarded test objects. Some of these test objects belonged to the target category and others belonged to different categories. Higher responding to objects from the target category would be a indication that pigeons perceive within-category similarity as being higher than between-category similarity. Such categorical generalization was high early in the experiment, but slowly fell as pigeons acquired experience with non-rewarded presentations of the test stimuli.

Several pieces of evidence suggest that the perceptual coherence of object categories biases pigeons to group objects together into basic categories, even when this categorical bias goes against

the prevailing task demands and is therefore costly in terms of earned food reward. One example comes from experiments comparing the learning of real categories and pseudocategories (**Figures 1A,B**). When the perceptual coherence of categories is eliminated by randomly assigning objects to pseudocategories, learning of the task slows down compared to when perceptual coherence is maintained (Herrnstein and De Villiers, 1980; Wasserman et al., 1988).

A categorical bias is also clearly observable in "subcategorization" tasks, in which two different responses are assigned to objects from the same category. In one experiment (Wasserman et al., 1988), illustrated in **Figure 1C**, objects from one category were assigned to two separate response keys, and objects from a second category were assigned to two other response keys. In this task, if the pigeons randomly choose a response key, then they get 25% correct responses. Pigeons can also learn about which two response keys are associated with each category, in which case they get 50% correct responses, but 50% categorical errors. Thus, this categorization strategy leads to above-chance performance, but it is not the strategy leading to the best payoff. The optimal strategy is learning to identify each individual stimulus and its correct response. When Wasserman et al. estimated the percentage of trials in which the pigeons were following each strategy, they found the results shown in **Figure 3A**. Although it is not the best strategy, pigeons first learn to categorize stimuli, and only later learn to identify them.

The categorical strategy shown by pigeons in the early blocks in **Figure 3A** is not optimal, but it does produce better reward payoff than guessing. Soto and Wasserman (2010b; see also Soto et al., 2012) found that a similar categorical bias can be found using a go/no-go subcategorization task. In this task, responses to a group of objects never produce reward, yet early in training pigeons respond to them at the same level as to rewarded objects from the same category. That is, pigeons learn first to categorize objects in subcategorization tasks, regardless of whether or not this strategy produces reward.

The previous experiments all suggest that pigeons perceive the within-category similarity of objects in natural photographs to be higher than their between-category similarity. This result is not trivial; it is important that the category structures that pigeons are biased to learn are exactly those that are likely to be encountered in the natural environment (see Smith et al., 2010). However, even when they are learning artificial categories, pigeons (Cook and Smith, 2006) and primates (Blair and Homa, 2003; Smith et al., 2010) show a bias to learn perceptually-coherent category representations before learning information about specific stimuli.

Differences in between-category similarity also play a role in category learning. For example, Aust and Huber (2002) concluded that how much responding to the trained category "person" generalized to similar or related categories (such as dolls, primates, mammals, and birds) depended on how many features were shared by the categories.

When pigeons are concurrently trained to classify the same categories at both basic and superordinate levels, it is usually found that they learn the basic task faster for some categories and the superordinate task faster for other categories (Lazareva et al., 2004, 2010; Lazareva and Wasserman, 2009). Lazareva et al. (2010) obtained estimates of the similarity among four object categories by analyzing generalization data through multidimensional scaling. Then, they showed that such similarity estimates could predict whether the basic or superordinate levels would show an advantage for different pairs of categories. A superordinate-level advantage is seen when the two categories in a superordinate set are perceptually similar, whereas a basic-level advantage is seen when the two categories in a superordinate set are perceptually dissimilar. This result is interesting because it is in

**different hierarchical levels**.

as a response strategy, respectively.

line with one of the hypotheses put forward to explain the basiclevel advantage in humans (Rosch et al., 1976).

Factors related to the training regime also affect category learning and generalization. One such factor is the number of different objects in each category presented during training (Kendrick et al., 1990; Astley and Wasserman, 1992; Wasserman and Bhatt, 1992). As shown in **Figure 4A**, learning of the categorization task is slowed and transfer of performance to new images is enhanced with a higher number of training exemplars. In the extreme case, in which training images are never repeated, pigeons can still learn the object categories, but learning is slower than when the training exemplars are repeated (Bhatt et al., 1988).

Another important training factor for studies using a go/nogo task is whether responses are rewarded to images showing the category, in what is called a feature-positive task, or to images showing no category, in what is called a feature-negative task. For example, Edwards and Honig (1987; see also Aust and Huber, 2001, 2002) trained pigeons to discriminate photographs of various scenes from photographs of the same scenes with people in them. Their results, reproduced in **Figure 5A**, show that pigeons were quite fast in learning the feature-positive discrimination, in which responses to people were rewarded, but they were slow in learning the feature-negative discrimination, in which responses to scenes without people were rewarded. In fact, learning of the feature-negative discrimination was as slow as learning a pseudocategorization task, suggesting that pigeons do not show any benefit from perceptual coherence when responses to the category are not rewarded.

Patterns of generalization also vary for feature-positive and feature-negative tasks. Aust and Huber (2001) trained pigeons with the "people" category in feature-positive and featurenegative tasks. After training, pigeons were presented with new combinations of background scenes and people that involved contradictory information. For example, either familiar or novel people, which were associated with one outcome during training (e.g., reward), could be presented on familiar backgrounds, which were followed by the opposite outcome during training (e.g., no reward). The authors found that feature positive training led to generalization of the response learned for people to these conflicting test stimuli, whereas feature negative training led to no preference to respond or to inhibit responding to conflicting test stimuli. Again, this finding suggests that learning about the whole category is possible only when responses to the category are rewarded, but not when responses to the category are not rewarded.

# **THE ROLE OF ERROR-DRIVEN REINFORCEMENT LEARNING**

What learning mechanisms could give rise to the features of object category learning we reviewed in the previous section? We have recently shown (Soto and Wasserman, 2010b, 2012c) that most of this research can be explained by a model implementing two simple assumptions. The first assumption is that objects from any category are represented by a large common collection of features or "elements", with different categories involving different probabilities that an object from the category will activate each of those common elements. When the probability of activation of an element is high in a particular category, that element is activated by several different objects from that category, rendering it relatively *category-specific*. When the probability of activation of an element is low in a particular category, only a few objects from the category activate the element, rendering it relatively *stimulus-specific*.

The second assumption is that category learning proceeds by strengthening connections between such elemental representations and responses through error-driven learning. As in some reinforcement learning systems (Kaelbling et al., 1996; Sutton and Barto, 1998), on each trial, the model selects an action that is likely to maximize predicted reward, usually the action with the strongest connections to active elements. The difference between the predicted reward and the reward obtained

**pigeons (Wasserman and Bhatt, 1992)**. Category size increases the

increases generalization to novel objects from the trained categories (right).

after the response is made–reward prediction error–determines how much the connection between the active elements and the chosen action should be modified (Rescorla and Wagner, 1972).

Note that this model is deliberately abstract regarding object representation: the elements do not have specific semantic content (i.e., they do not represent specific features), they only play different roles depending on what information they carry about the category. Furthermore, specific object and category representations are irrelevant, as they are randomly sampled in each simulation and the results of many simulations are then averaged to generate predictions. This process allowed us to ignore many questions about visual representation, while testing to what extent our two simple assumptions can explain pigeons' behavior. The resulting learning model is compatible with any account of visual processing which produces representations in line with our assumptions; indeed, we expanded the model in precisely this direction, as we will see below.

Our model specifies the conditions leading to the control of actions by category-specific elements, yielding categorization learning; it also specifies the conditions leading to the control of actions by stimulus-specific elements, yielding identification learning. For example, all instances of the categorical bias discussed in the previous section are the result of differences in the rate at which category-specific and stimulus-specific

elements are presented in a typical categorization task. Because category-specific elements are shared by many objects, they are presented often and their connections with responses can be modified faster. Stimulus-specific elements are presented less often and they support slower learning. In short, category-specific elements have a repetition advantage over stimulus-specific elements.

As seen in **Figure 3B**, this repetition advantage can explain the reliance on a categorization strategy shown by pigeons during the early stages of learning in a subcategorization task (Wasserman et al., 1988; see **Figure 3A**). Early in training, category-specific elements quickly strengthen their connections with the two different responses with which a category is paired, producing above chance accuracy. However, this tendency also results in a large proportion of categorical errors due to within-category generalization. To reduce such categorical errors, the connections between stimulus-specific elements activated by particular objects and the incorrect response become inhibitory. This inhibitory learning is slow due to the low rate of presentation of stimulusspecific elements, but it eventually leads to better discrimination performance at the end of training by canceling generalized excitation from one subcategory to the other.

Note how strongly the repetition advantage effect depends on the number of objects included in the training set. With just one object, the effect does not occur because all types of elements are presented equally often. As the number of objects increases, category-specific elements are presented quite often (in the extreme, on each trial from the same category), whereas presentations of stimulus-specific elements become more and more rare (in the extreme, once for each object repetition). As shown in **Figure 4B**, this analysis explains the effect of category size on learning rate and generalization. With a small category size, the same elements are repeated on each trial and learning about a specific stimulus is fast. However, there is no repetition advantage effect for category-specific elements and generalization to new objects is poor, as it depends on control by such common elements. The opposite is true when category size is increased.

Some particular features of error-driven learning help explain other results. For example, the faster learning of feature-positive discriminations, reproduced by the model in **Figure 5B**, stems from the fact that such discriminations require the model to first learn to respond to a number of rewarded stimuli and then to inhibit generalized responding to non-rewarded stimuli. This two-stage process is a signature of error-driven learning: inhibitory learning does not occur without an excitatory context to provide negative prediction errors, so excitatory learning must occur first. In the feature-positive discrimination, the repetition of category-specific elements boosts excitatory learning at the beginning of training, whereas in the feature-negative discrimination, pigeons must first learn to respond to each individual background independently, which takes longer. For a detailed explanation of other feature-positive effects, as well as many other results from the literature, see Soto and Wasserman (2010b).

#### **DIRECT EMPIRICAL EVIDENCE FOR ERROR-DRIVEN LEARNING**

More recent experiments, motivated by the model described in the previous section, have led to more direct evidence for the role of error-driven learning in object categorization by pigeons. The important insight provided by the model is that different tasks can be used to manipulate the connections between different types of elements (category-specific and stimulus-specific) and responses.

One example is the blocking design illustrated in **Figure 6A**. In the blocking condition (Soto and Wasserman, 2010b,d), objects from the same category are first assigned to different responses in a pseudocategorization task (Phase 1). According to the model, accurate performance in this task requires strong connections between stimulus-specific elements and the correct responses. Once the pseudocategorization task is learned, it is possible to transform it into a true categorization task by dropping half of the

trials, as shown in the middle panel of **Figure 6A** (Phase 2). Under normal circumstances, experience with this new categorization task should lead to strong control by category-specific elements and good generalization to new objects when they are presented during a test (Phase 3). In a control condition, pigeons were exposed only to this categorization task and a generalization test (Phases 2 and 3). In the blocking condition, however, the stimulus-response mapping is already known at the beginning of Phase 2; thus, pigeons should make few, if any, errors in predicting the correct response for each of the stimuli in this phase. No prediction error means no category learning; so, the model predicts less generalization of categorical performance to new objects in the blocking group than in the control group.

The predictions of the model and the performance of pigeons with novel objects in each condition are shown in **Figures 6B,C**, respectively. It can be seen that pigeons showed the predicted pattern of results. This blocking effect, analogous to effects found in Pavlovian conditioning (Kamin, 1969), is direct evidence that object category learning in pigeons is driven by reward prediction error.

The blocking effect also helps to explain some contradictory results in the literature. For example, Sutton and Roberts (2002) used a design very similar to that of Astley and Wasserman (1992) to study the "perceptual coherence" of object categories, but found that generalization was the same to objects from any category, not only the target category. We have shown (Soto and Wasserman, 2010b) that Sutton and Roberts' results can be explained as a blocking effect, in which elements common to all of the object categories acquire control over performance early in training.

Other studies have found evidence of an *overshadowing* effect in category learning (Soto and Wasserman, 2012a; Soto et al., 2012). **Figure 7A** shows a schematic representation of the training tasks given to pigeons in one of these experiments (Soto and Wasserman, 2012a). On each trial, two different objects were presented to the pigeons. In the overshadowing condition, these objects came from two categories that were both informative about the correct response. For example, in **Figure 7A**, both airplanes and chairs were consistently associated with Response 1. Here, the category-specific elements of both categories should acquire control over behavior quite fast, quickly reaching a point in which performance is good and learning stops. At this point, the two categories overshadow each other: each acquires only a proportion of the response control that they would have gained if they had been presented alone. In the control condition, two objects are presented in each trial, but a single target category is informative about the correct response. In the example in **Figure 7A**, butterflies and cars are informative about correct responses, but people and flowers are not. In both conditions, category learning was tested by presenting pigeons with new

objects from the trained categories. As shown in **Figure 7B**, the model predicts that performance with the target categories (red bars) should be impaired in the overshadowing condition compared to the control condition. As shown in **Figure 7C**, this prediction of the model matched the pigeons' behavior. Furthermore, performance with the competing categories (blue bars) was also close to the model's predictions.

#### **PREDICTION ERROR AS A GENERAL MECHANISM OF OBJECT CATEGORY LEARNING**

Given the accumulated evidence suggesting that error-driven learning plays an important role in object categorization by pigeons and the fact that this form of learning is widespread across species and tasks (Siegel and Allan, 1996; Bitterman, 2000; Macphail and Bolhuis, 2001), it seems likely that similar mechanisms underlie object categorization in primates, including humans.

A repetition advantage effect for category-specific properties seems to be as important in people as it is in pigeons. For example, the categorical bias effects and category size effects that are pervasive in the pigeon literature can also be found in people and other primates (Homa et al., 1973; Smith and Minda, 1998; Minda and Smith, 2001; Smith et al., 2010). As indicated earlier, such effects result naturally from the interaction of a repetition advantage for category-specific information and error-driven learning (see Soto and Wasserman, 2010b).

The results of behavioral experiments suggest that errordriven learning plays an important role in object categorization in people. Just as with pigeons, when people are trained to solve a discrimination task by memorizing individual objects in photographs and their assigned responses, they are impaired in detecting a change in the training circumstances in which all of the presented objects are sorted according to their basic-level categories (Soto and Wasserman, 2010d). That is, people show a category blocking effect, as illustrated in **Figure 6D** (see also Gluck and Bower, 1988; Shanks, 1991; Nosofsky et al., 1992).

We have proposed (Soto and Wasserman, 2012c) that underlying these behavioral similarities is an evolutionarily conserved learning mechanism that might be implemented in the basal ganglia, which are homologous structures in birds and mammals (Reiner, 2002; Reiner et al., 2005). Many studies implicate the basal ganglia in visual categorization and other visual discrimination tasks in people and other primates (Ashby and Ennis, 2006; Seger, 2008; Shohamy et al., 2008). The basal ganglia receive input from most sensory areas and send output to motor areas, which allows for the sensory integration and response selection functions necessary for category learning. The input nuclei in the basal ganglia, collectively known as striatum, receive dopaminergic input from the subtantia nigra pars compacta (Durstewitz et al., 1999; Nicola et al., 2000; Reiner et al., 2005) and the plasticity of corticalstriatal synapses depends on the presence of this dopaminergic input (Centonze et al., 2001; Reynolds and Wickens, 2002). As there is considerable evidence that the activity of these dopaminergic neurons is correlated with reward-prediction error (Montague et al., 1996; Schultz, 1998, 2002; Waelti et al., 2001; Suri, 2002), cortical-striatal synapses (pallial-striatal synapses in birds) may mediate the error-driven learning of associations between visual representations and responses. As proposed by our model, learning in the striatum during object categorization tasks would require activity of the presynaptic visual neurons (stimulus elements in the model), activity of the postsynaptic striatal neurons (actions in the model), and the presence of a dopaminergic signal (reward prediction error in the model).

# **LEARNING OF ABSTRACT CATEGORY REPRESENTATIONS**

Some of the features of object category learning in pigeons mentioned in section Basic Tasks and Results cannot be explained by the reinforcement learning account described in the previous section. In particular, a model that only learns associations between stimulus properties and responses cannot explain the vast behavioral evidence that pigeons (and other vertebrates) learn a common representation for all members of a category associated with the same response ("stimulus equivalence"; for a review, see Zentall et al., 2014).

Evidence from a neurophysiological study suggests that such a common representation may have a substrate in the nidopallium caudolaterale (NCL), where neurons can be found that respond similarly to perceptually dissimilar stimuli associated with a common response (Kirsch et al., 2009). These results were interpreted as indicating that categorization learning established categoryselective coding of the stimuli in NCL, and they are similar to findings in the primate prefrontal cortex (PFC; Freedman et al., 2001, 2002, 2003).

Just as is the case of the primate PFC, the avian NCL receives massive dopaminergic projections from the midbrain (Wynne and Güntürkün, 1995; Durstewitz et al., 1999; Kröner and Güntürkün, 1999) as well as input from neurons in both visual and sensorimotor areas (Leutgeb et al., 1996; Kröner and Güntürkün, 1999). NCL is thus particularly well suited to integrate information from several different sensory modalities through dopamine-modulated learning.

This result is important because the observation that neurons in lateral PFC come to respond selectively to the category of a stimulus and other behaviorally relevant factors in an object categorization task (Freedman et al., 2001, 2002, 2003) has led to wide acceptance, among primate researchers, of the hypothesis that PFC is the most critical site for object category learning (Freedman et al., 2003; Jiang et al., 2007; Serre et al., 2007). One possibility is that primate PFC and avian NCL implement learning of a common abstract representation for objects belonging to the same category, whereas stimulus-response associative learning is implemented in the basal ganglia (Antzoulatos and Miller, 2011). This possibility could explain why the PFC does not seem to be necessary for performance and generalization of category learning in monkeys (Minamimoto et al., 2010).

# **VISUAL OBJECT REPRESENTATION**

In the previous sections, we reviewed a line of research in pigeons that focused on object category learning. A different, but related line of research in pigeons has been heavily influenced by the human literature on invariant object recognition. As in the human literature, this line of research has been strongly focused on questions about object representation, such as: Which object properties are important for object recognition in pigeons? Can pigeons extract invariant object representations? Can pigeons show invariant object recognition after limited experience with an object? The following two sections will focus on this literature.

# **INVARIANCE IN OBJECT RECOGNITION BY PIGEONS**

Following the human literature, much research in object recognition by pigeons has focused on whether or not this species can show recognition that is invariant to changes in identitypreserving variables, such as rotation, scaling, illumination, etc. In general, the results of psychophysical experiments all point to the same conclusion: pigeons' object recognition after training with a single object image is controlled by a variety of properties that are irrelevant to object identification. In order to show invariant object recognition, pigeons require training with variations in such irrelevant properties.

For example, experiments that have explored whether pigeons show view-invariant object recognition after being trained with only one object view have uniformly found significant costs of object rotation on accuracy, regardless of the type of object used to generate the experimental stimuli (Cerella, 1977; Lumsden, 1977; Wasserman et al., 1996; Peissig et al., 1999, 2000, 2002; Friedman et al., 2005). Similarly, other experiments have found that, after experience with a single image view, pigeons' object recognition is affected by variations in size (Larsen and Bundesen, 1978; Pisacreta et al., 1984; Peissig et al., 2006), shading (Cabe, 1976; Cook et al., 1990; Young et al., 2001), and position (Kirkpatrick, 2001).

Although object recognition in people is far from being completely invariant (Jolicoeur, 1987; Hayward and Tarr, 1997; Tarr et al., 1998; Kravitz et al., 2008), it is clear that humans show greater invariance than do pigeons (Biederman and Ju, 1988; Biederman and Cooper, 1992; Biederman and Gerhardstein, 1993; Hayward, 1998). For example, people, but not pigeons, have been shown to exhibit view-invariant recognition when they are tested with the appropriate stimuli (Biederman and Gerhardstein, 1993) and show view-invariant recognition of novel views of an object which are interpolated between experienced views (Spetch and Friedman, 2003). Furthermore, some factors that are known to foster view-invariance in people do not have the same effect in pigeons. People show rotation costs when recognizing bentpaperclip objects (e.g., Edelman and Bülthoff, 1992), but these costs are reduced when a single diagnostic geometrical volume ("geon") is added to each object (Tarr et al., 1997). The same results are not observed in pigeons (Spetch et al., 2001), which show decrements in performance as a function of rotational distance regardless of the object components.

On the other hand, pigeons show generalization behavior that is closer to true view invariance as the number of training views is increased (Wasserman et al., 1996; Peissig et al., 1999, 2002). This finding is essentially another manifestation of the category size effect described earlier and can be explained in the same way: that is, as arising from a repetition advantage effect for view-invariant properties, which are repeated often across different views and therefore are frequently paired with the correct responses.

If this explanation of view-invariance learning in pigeons is correct, then it should be possible to arrange conditions in which training with multiple views of an object does not lead to higher invariance, by reducing the advantage of view-invariant properties over other properties during training. Soto et al. (2012) recently tested this hypothesis by training pigeons with object views similar to those shown in **Figure 8A**. In the training images for the overshadowing condition, across variations in viewpoint, there is a pronounced feature that is not view-invariant and that can perfectly predict object identity: the orientation of the main axis. The repetition of this feature across views should produce something akin to the category overshadowing effect explained earlier and impair view-invariance learning. On the other hand, in the control condition, pigeons are trained with the same views of less elongated objects; this training eliminates the competing non-invariant feature of main-axis orientation, which should result in higher invariance. **Figure 8B** shows that performance with new views was above chance for the control condition and below chance for the overshadowing condition, just as predicted.

Humans do sometimes show rotational costs in object recognition tasks (Hayward and Tarr, 1997; Tarr et al., 1998), which diminish after training with multiple views (Mash et al., 2007). These findings raise the possibility that view-invariance learning in people might follow similar principles as in pigeons, being driven by prediction errors. A role for error-driven learning has been found in human object categorization (Soto and Wasserman, 2010d) and there is evidence that categorization and identification depend on similar neural representations and computations (e.g., Hung et al., 2005).

This possibility has remained unexplored in the primate literature, which has focused instead on looking for evidence of unsupervised learning of invariant object representations (Cox et al., 2005; Li and DiCarlo, 2008, 2010, 2012). It must be noted that the evidence gathered so far does not rule out a role for reward prediction error in invariance learning. In the monkey experiments carried out by Li and DiCarlo (2008, 2010), for example, animals were rewarded for looking at the presented objects. In similar human experiments (Cox et al., 2005), people were engaged in a task that involved "correct" and "incorrect" responses and learning was not observed when experience was delivered passively (Li and DiCarlo, 2012). Thus, these experiments do involve presentation of explicit and implicit rewards and clearly raise the possibility that learning is driven by reinforcement (Li and DiCarlo, 2010). Although one study (Li and DiCarlo, 2012) reported evidence of unsupervised learning independent of reward magnitude and timing, it did not show that reward is not *necessary* for invariance learning. On the other hand, Yamashita et al. (2010) have provided evidence that rewardbased discrimination, and not simple exposure, is necessary for invariance learning at least under some circumstances.

The role of unsupervised learning mechanisms in object recognition by pigeons has also remained unexplored. As our discussion of the primate literature shows, one reason is that it is quite difficult to study unsupervised learning in isolation from the influence of reward, particularly in nonhuman animals. This is an important issue that should be addressed by future research.

#### **WHAT INFORMATION IS EXTRACTED FROM IMAGES BY PIGEONS?**

Despite the difference in invariant recognition shown by people and pigeons, there is considerable evidence that the two species rely on similar image information during object recognition tasks (Wasserman and Biederman, 2012). For example, both primates and pigeons seem to extract nonaccidental properties from images of geons and rely heavily on them for recognition (e.g., Biederman and Bar, 1999; Vogels et al., 2001; Kayaert et al., 2005; Gibson et al., 2007; Lazareva et al., 2008). Gibson et al. (2007) trained pigeons and people to discriminate four simple objects, each shown from a single viewpoint. Using the Bubbles technique (Gosselin and Schyns, 2001), it was determined that both species relied more heavily on image properties that are relatively invariant across changes in viewpoint, such as cotermination and other edge properties, than on properties that vary across changes in viewpoint, such as shading. This result is depicted in the leftmost group of bars in **Figure 9**.

Results such as those shown in **Figure 9** do not mean that pigeons rely *only* on view-invariant properties for object recognition. As mentioned earlier, pigeons are sensitive to changes in object viewpoint, size, location, and shading, which means that all of these properties are extracted and used by pigeons during object recognition tasks. The inability of pigeons to show oneshot view invariance is not the result of an inability to extract view-invariant representations. Instead, it is more likely that pigeons extract a rich variety of visual properties from images and

can only gradually learn to focus on those that are relevant for a given task through a reinforcement learning mechanism.

Several experiments have found evidence that pigeons represent not only local shape properties, but also the spatial structure of objects (Van Hamme et al., 1992; Wasserman et al., 1993; Kirkpatrick-Steger and Wasserman, 1996; Kirkpatrick-Steger et al., 1998). In one study, Van Hamme et al. (1992) trained pigeons to recognize line drawings of objects, similar to those shown in **Figure 10A**, in which half of an object's contour was deleted. This technique allowed the experimenters to train the pigeons with one contour image and to test them with its complement, which shared no local features with the training stimulus. As shown in **Figure 10B**, pigeons recognized these complementary contours with considerable accuracy, suggesting that their visual system could infer object structure from the partial contours seen during training.

Furthermore, when both shape and spatial relations can be used as cues to solve a recognition task, pigeons rely on both of them and show a trade-off between their reliance on one source of information vs. the other; that is, the more a pigeon relies on shape for recognition, the less it relies on spatial information, and vice-versa (Kirkpatrick-Steger and Wasserman, 1996). Such trade-offs can be explained as another form of overshadowing: when two object properties are equally reliable for identification, they compete for control of performance.

#### **FEEDFORWARD SHAPE PROCESSING CAN EXPLAIN OBJECT RECOGNITION IN PIGEONS**

Comparative studies have revealed similarities and differences in high-level vision by pigeons and people not only at the behavioral level, as described in the previous section, but also at the neurobiological level. Although primate and avian visual systems are each organized into two main visual pathways, the tectofugal pathway is used for complex visual discrimination tasks in pigeons, whereas the thalamofugal pathway is used for such tasks in primates (Shimizu and Bowers, 1999; Wylie et al., 2009). Still, these pathways show similar functional organization, which has led to the proposal that they might be analogous (Shimizu and Bowers, 1999). For example, the avian tectofugal pathway and its pallial targets are organized into parallel subdivisions in charge of processing motion and shape (Wang et al., 1993; Shimizu and Bowers, 1999; Laverghetta and Shimizu, 2003; Nguyen et al., 2004; Fredes et al., 2010), which is similar to the organization of the primate thalamofugal pathway and its cortical targets (Mishkin et al., 1983; Ungerleider and Haxby, 1994).

Furthermore, there is evidence that one of the main mechanisms thought to be responsible for visual shape processing in the primate thalamofugal pathway is also at work in the avian tectofugal pathway. This mechanism, first proposed by Hubel and Wiesel (1962, 1968), relies on feedforward processing across visual areas that are hierarchically organized in terms of the complexity of the visual information that they represent. Neurons at each

level of the system integrate information from neurons at the previous level to build selectivity for shape features of increasing complexity and tolerance to variables such as size and location (for a short review and references, see Soto and Wasserman, 2012c). Li et al. (2007) found that the receptive fields of neurons in the pigeon nucleus isthmus (sensitive to oriented gratings) are constructed by feedforward convergence of receptive fields from neurons in the tectum (which have center-surround organization), as proposed by the hierarchical model of Hubel and Wiesel (1962, 1968). Also in accord with hierarchical processing, there is a large increase in receptive field size from early to later areas in the avian tectofugal pathway (Engelage and Bischof, 1996).

Thus, hierarchical and feedforward processing of shape information–a central mechanism for most current neurocomputational theories of object recognition in primates (e.g., Fukushima, 1980; Perrett and Oram, 1993; Riesenhuber and Poggio, 1999, 2000; Rolls and Milward, 2000; Serre et al., 2007)– might be widespread across vertebrate visual systems. If this is true, then behavioral differences between pigeons and people must be explained by some other mechanism. We Soto and Wasserman (2012c) recently offered a proof of concept for this hypothesis, by showing that a hierarchical model of object recognition in the *primate* ventral stream (a version of the HMAX model described in Serre et al., 2007), coupled with a reinforcement learning model (see Section The Role of Error-driven Reinforcement Learning), can explain much of the available behavioral data in object recognition by *pigeons* reviewed in sub-sections Invariance in Object Recognition by Pigeons and What Information Is Extracted From Images by Pigeons?

The success of this model was surprising for two reasons. First, the model could better explain pigeon behavior than human behavior. Just as pigeons but unlike people, the model's recognition was strongly affected by changes in viewpoint, size, and shading. In the case of size, the model could even reproduce the logarithmic relation between physical and perceived object size that has been found in pigeons (Peissig et al., 2006). Furthermore, invariant recognition was not fostered by variables that seem to do so for people, such as adding geons to paperclip objects.

Second, although this model uses a "bag of features" to mediate object representation, the results of several simulations showed that such representations can be much richer than one would initially assume. As shown in **Figure 10C**, the model has no problem reproducing the ability of pigeons to recognize objects from their complementary contours (Van Hamme et al., 1992). This result was originally interpreted as showing that a featurebased representation (such as that proposed by Cerella, 1986)– lacking explicit information about the spatial relations among features–could not explain object recognition in pigeons. This interpretation is only partially correct, because the simulated results suggest that the feature pool in the model can *implicitly* represent information about spatial structure.

The model also reproduces the bias to rely on nonaccidental properties in geon recognition found in people and pigeons (Gibson et al., 2007), as depicted in **Figure 9**. The model is successful despite the fact that it was not designed to do so, as in the case of other theories of object recognition (structural description theories; see Biederman, 1987). Instead, the bias emerges in the hierarchical model from simple principles of biological visual computing and because the features in the model have been trained through exposure to natural images (see Serre et al., 2007). Coterminations and elongated edges are both quite common in natural images (Geisler et al., 2001) and they could reliably distinguish between the objects used by Gibson et al. (2007).

The success of the hierarchical model in explaining the pigeon behavioral data has no equal in the current literature. Together with the results of neurophysiological studies (Engelage and Bischof, 1996; Li et al., 2007), the success of this model suggests that feedforward and hierarchical processing of visual information play important roles in object recognition by pigeons, as they do in primates.

# **THE LIMITS OF GENERALITY: PIGEONS' RECOGNITION OF HUMAN FACES**

Up to this point, we have focused on the mechanisms of visual object recognition that are likely to be shared by pigeons and people. However, the evolutionary lineages of both species diverged more than 300 million years ago; surely, we can expect their visual systems to show important differences due to adaptive specialization.

For example, it is likely that there are specialized mechanisms<sup>1</sup> of face perception in people and other primates (Pascalis and Kelly, 2009). However, a comparative analysis requires taking into account the fact that face recognition is a complex form of behavior, likely to result from the interaction of many mechanisms, including general processes shared with other species (de Waal and Ferrari, 2010; Shettleworth, 2010). Determining which aspects of human face perception are due to specialized vs. general mechanisms requires comparative research; here, pigeons are becoming a key species to determine the role of general recognition processes (Soto and Wasserman, 2011).

Only a handful of behavioral studies have compared human face recognition by pigeons and people. They have led to a complex pattern of results, suggesting that some properties of face perception in people are likely to be the result of specialized processes, whereas others might result from general processes. Regarding specialized processes, it has been found that, while people and other primates show an advantage in discriminating upright faces over inverted faces, the same advantage is not found in pigeons (Phelps and Roberts, 1994). It is widely believed that faces are perceived in a "holistic" or "configural" way to a larger extent than other objects (for reviews, see Maurer et al., 2002; Richler et al., 2008) and inversion effects have been proposed as a manifestation of holistic face perception (Farah et al., 1995). That is, holistic processing might be a specialized mechanism for face perception in primates.

Surprisingly, other studies have shown similarities in the way people and pigeons process human faces. For example, both species use information near the eyes and chin to discriminate gender and they use information near the mouth to discriminate emotion (Gibson et al., 2005). Also, in both people (e.g., Schweinberger and Soukup, 1998; Fox and Barton, 2007; Ellamil et al., 2008; Fox et al., 2008) and pigeons (Soto and Wasserman, 2011), recognition of emotional expression depends on variations in identity, whereas recognition of identity is relatively independent of variations in emotion. It is possible that the origin of this latter interaction in people is decisional rather than perceptual (Soto et al., 2014), which would make the similarity across species easier to reconcile with the existence of specialized face perception processes in primates.

Overall, these results challenge the common assumption that a specialized human face perception system must underlie all observed aspects of human face recognition, being somehow "encapsulated", or free from the influence of more general processes. Furthermore, they serve to underscore the fact that the evolution of a face recognition system did not solely involve the specialization of perceptual processes, but also the specialization of the human face as an efficient transmitter of social signals (Smith et al., 2005; Schyns et al., 2009). The human face could have been specialized through evolution to transmit signals that would be easily decoded by existing visual processes. If such visual processes are also present in birds, then the fact that some aspects of face recognition are similar in pigeons and people seems less surprising.

# **THE EVOLUTION OF MECHANISMS OF OBJECT RECOGNITION IN VERTEBRATES: A WORKING HYPOTHESIS**

The ultimate goal of comparative studies of high-level vision is to understand how biological visual systems have evolved mechanisms to solve the challenging computational problems posed by the environment (Soto and Wasserman, 2010a). It is likely that some of the computational problems that are posed by object recognition are present in many environments, leading to the evolution of a core system of processes that are required to solve object recognition tasks across species. Other computational problems may be specific to the environment of one or a few species, leading to the evolution of more specialized processes.

**Figure 11** represents our current working hypothesis regarding the evolution of mechanisms of object recognition in birds and mammals. This diagram is a useful way to summarize what is known about the evolution of a complex form of behavior in a large group of animals. The outer part of the diagram consists of a phylogenetic tree, which provides information about the evolutionary relations among species that are being compared. The leaves in this tree include the genera that are most commonly studied in comparative cognition. There is no information about

<sup>1</sup>Note that *specialized* and *general* are used here to refer to the distribution of a cognitive mechanism across species, with specialized referring to a mechanism that can be found in only a few species and general referring to a mechanism that can be found across a variety of species. The distribution of a mechanism across species should in turn depend on whether the computational problem solved by such mechanism is widespread across environments (see Soto and Wasserman, 2012c). Importantly, how a mechanism is distributed across species is different from the issue of whether such mechanism is domaingeneral or domain-specific. Thus, when we propose that any complex ability is likely to be influenced by specialized processes, we mean processes that are only present in one or a few species (e.g., language), not processes that are domain-specific.

the object recognition abilities of most of these genera; so, they are included simply as a reference. The genera that have been studied to some extent are highlighted: homo (i.e., humans), macaca (macaques) and columba (i.e., pigeons). Rattus (rats) is also highlighted, as recent studies have started to shed light on their object recognition skills (e.g., Zoccolan et al., 2009; Brooks et al., 2013).

The center of the diagram provides information about which species are thought to possess a specific mechanism. Each concentric circle of a different color represents a different hypothetical mechanism. To know which mechanisms are hypothesized in each species, we can draw an imaginary line from that species to the center of the diagram. If the line crosses a colored area in the circle representing a particular mechanism, then this means that the species is thought to possess that specific mechanism. The core system of mechanisms that are shared by many species is shown at the center of the diagram, by circles that are completely colored. More specialized mechanisms are shown toward the periphery.

As illustrated in **Figure 11**, at least three processes seem to be part of the core system of object categorization in vertebrates: error-driven learning, feedforward processing of visual information, and learning of a common representation for objects in the same category. Of these, there is considerable evidence that error-driven learning is a core mechanism that is present across vertebrates and used in all object categorization tasks. Furthermore, the best candidate structures for implementing this mechanism, the basal ganglia, are homologous across amniote vertebrates, suggesting that this is an evolutionarily conserved mechanism. There is also considerable evidence for feedforward visual processing in primates, but the evidence in other species is less clear. In pigeons, only computational evidence and a couple neurophysiological studies support this hypothesis, so clearly more research is necessary. There is also evidence of learning common representations across all vertebrates, coming from the literature on learned equivalence (see Zentall et al., 2014). Regarding these two latter mechanisms, current neurobiological evidence suggests that they are not implemented in homologous structures across vertebrates, although they are implemented in structures thought to be analogous in birds and primates. These analogous mechanisms could have evolved separately in these different groups, due to similar evolutionary pressures.

Two more specialized mechanisms have been proposed for primates, as shown in **Figure 11**. We warn that the proposed distribution of these mechanisms across species is highly speculative. Still, the evidence suggests a specialized mechanism for "holistic" face processing in people and other primates, which is not present in birds. It is also likely that birds have evolved specialized mechanisms of visual categorization; for example, flight might have had an important impact on birds' evolved ability to categorize scenes from different perspectives (Kirkpatrick et al., 2014).

The evolution of a specialized rule-based learning mechanism in primates (and perhaps other mammals) could explain a number of differences found between these species and birds–including many of the differences reviewed here. So, this hypothesis merits more detailed discussion.

There is a growing body of evidence suggesting that at least two learning systems may underlie the categorization abilities of people (e.g., Ashby et al., 1998; Ashby and Ell, 2001; Ashby and Valentin, 2005). One of them is a procedural learning system, believed to be implemented by the circuitry of the basal ganglia and based on slow, error-driven associative learning. The other is a rule-based learning system, believed to be implemented in the PFC and based on hypothesis testing supported by working memory and executive attention. This rule-based system can easily learn category structures in which good performance requires selectively attending to a single dimension, while ignoring other dimensions.

Recent comparative studies (for a review, see Smith et al., 2012a) have suggested a dissociation between these learning systems in people, rhesus monkeys (Smith et al., 2010), and capuchin monkeys (Smith et al., 2012b). On the other hand, neither pigeons (Berg and Grace, 2011; Smith et al., 2011) nor rats (Vermaercke et al., 2014) have shown evidence of such dissociations, even when tested with the same stimuli and similar procedures as people. These results have been interpreted as evidence that the rule-based categorization system is present in primates, but is not found in other mammals and birds.

Assuming that this interpretation is correct, how can we explain the differences between people and pigeons in object recognition tasks? Rule-based learning in people is extremely fast (Smith et al., 2011, 2014) and it generalizes perfectly across irrelevant stimulus dimensions (Casale et al., 2012). Thus, after limited exposure to a specific object, people can selectively attend to those visual dimensions that are important for object identification and ignore those visual dimensions that are irrelevant, such as viewpoint, shading, size, etc. Such learning would require that people separately represent relevant and irrelevant shape dimensions, so that attention can select some dimensions while ignoring others (Demeyer et al., 2007). The results of psychophysical studies agree with this idea: people encode shape information separately from viewpoint information (Stankiewicz, 2002; Blais et al., 2009).

Pigeons, on the other hand, may only slowly learn to select relevant information and ignore irrelevant information through the procedural learning system. That is why pigeons do not show invariant object recognition unless they are trained with variations in irrelevant object dimensions.

This hypothesis also explains why people, but not pigeons, exhibit view-invariant recognition of bent-paperclip objects when a geon has been added to them (Spetch et al., 2001). An ideal observer analysis shows that the task of recognizing objects composed of both bent-paperclips and geons across changes in viewpoint is very difficult, whereas the task of recognizing geons by themselves across changes in viewpoint is very simple (Tjan and Legge, 1998). This analysis suggests that the reason why people show view-invariant recognition of bent-paperclip objects when a geon is added is because they can quickly learn to selectively attend to the geon in order to decrease task difficulty. Pigeons might not be able to show such fast changes of selective attention.

Finally, the hypothesis of a rule-based mechanism present in primates, but not birds, can also explain why many research findings suggest that people and pigeons extract similar information from images, but show performance differences on invariance tests. Similarities could be due to similar visual processing, whereas differences could be due to differences in post-visual processing.

Still, the value of the multiple systems hypothesis depends on how future research is able to eliminate alternative explanations of the comparative results. For example, it is possible that pigeons do posses a rule-based mechanism; but, unlike primates, they do not perceive the dimensions of line width and orientation used by Smith et al. (2011) as separable and thus cannot selectively attend to them. Indeed, some evidence suggests that these dimensions might interact for pigeons (Berg and Grace, 2011; Berg et al., 2014); so, an urgent issue is to determine whether such perceptual interactions do exist using traditional tests of separability adapted to animal research (e.g., Blough, 1988; Soto and Wasserman, 2010c, 2011) or, better still, adapting tests of separability that control for the influence of non-perceptual factors (Ashby and Soto, in press; Soto et al., 2014).

Another possibility is that quantitative differences in visual processes may explain behavioral differences between pigeons and people. Feedforward visual processing gradually increases tolerance to identity-preserving variables across several hierarchically organized layers (see Serre et al., 2007). If the pigeon visual system has a smaller number of layers than the human visual system, then we could expect pigeons to show object recognition that is more sensitive to changes in size, rotation, etc.

Although this is an interesting possibility, it cannot explain why primates, but not pigeons, seem to use two different strategies to categorize artificial stimuli varying along dimensions that are not identity-preserving in natural objects (width and orientation of lines, see Smith et al., 2012a). Furthermore, this hypothesis cannot explain why people show invariant recognition in some behavioral studies after experience with a single image of a novel object. Such behavioral invariance (in contrast to the invariance shown by neurons), requires a readout mechanism that is able to ignore variations along identity-preserving variables (Goris and Op de Beeck, 2009, 2010). The availability of a rule-based readout mechanism in people would allow one to explain why humans can show invariant recognition after experience with a single image of an object. The absence of such a readout mechanism in pigeons would explain why this species does not show this behavior.

If the hypothesis of multiple learning systems turns out to be correct, then future research will be required to determine exactly which aspects of the rule-based system are specialized in primates. As indicated earlier, the NCL is an area of the pigeon brain that seems to support the same executive functions as the primate PFC (Güntürkün, 2005). Thus, it is likely that some of the mechanisms involved in the rule-based system are available to pigeons, and the main difference from people is either merely quantitative or restricted to a few of the processes involved in rule learning.

One possibility is that pigeons do not deploy selective attention in the same way as primates (Smith et al., 2012a) or that they do not perceive any visual dimensions independently, but process all stimuli holistically (Berg et al., 2014). These ideas are in line with studies of compound generalization in pigeon associative learning, which suggest that pigeons process visual stimulus compounds as configurations rather than as the simple sum of their component elements (e.g., Rescorla and Coldwell, 1995; Aydin and Pearce, 1997), whereas people show much more elemental processing in analogous tests (e.g., Collins and Shanks, 2006; Soto et al., 2009). Although pigeons might deploy some forms of dimensional attention during categorization tasks (Mackintosh and Little, 1969; but see Hall and Channell, 1985; Castro and Wasserman, 2014), perhaps the fast switching of dimensional attention that is required for testing hypotheses about category rules is unique to primates (for more on selective attention in pigeons, see Zentall, 2012; Vyazovska et al., 2014).

Although the rule-based system is also thought to require holding hypotheses about possible rules in working memory, it has been shown that neurons in the pigeon NCL–the area of the avian brain also thought to be involved in learning of abstract category representations (Kirsch et al., 2009)—have similar working memory functions as neurons in the primate PFC (Diekamp et al., 2002; Rose and Colombo, 2005). This fact makes it unlikely that working memory is the critical component of the rule-based system that is absent in pigeons.

# **THE NEUROBIOLOGICAL MECHANISMS OF OBJECT RECOGNITION: WHAT WE CAN LEARN FROM PIGEONS**

The neuroscience community has focused almost exclusively on nonhuman primates for studying the neurobiology of visual cognition, perhaps due to their evolutionary proximity to humans. From a truly comparative standpoint, however, other animals are just as useful as nonhuman primates for the study of the core processes involved in visual object recognition. Using pigeons as an animal model for the study of object recognition offers many advantages. The most important advantage, as demonstrated by the present review of the literature, is that we know far more about pigeons' object recognition abilities than about those of any other species, excluding people and rhesus macaques. Furthermore, comparative data are available for most human results in the pigeon literature, so we have a good idea as to just what is similar and different in people and pigeons; such parallel data sets help us understand the limits of our generalizations from the animal model to humans. Finally, behavioral and neurobiological evidence suggests that birds possess highly advanced visual systems, comparable to those of primates in their level of sophistication (Shimizu and Bowers, 1999; Cook, 2001; Husband and Shimizu, 2001; Wasserman and Zentall, 2006).

Given these advantages, it is rather puzzling that pigeons are not being used more widely as a model for the neurobiological basis of object recognition (and other forms of high-level vision). Worse still, neuroscientists studying object recognition in primates have thus far ignored the behavioral and neurobiological literature on pigeons as a source of information for their own research. This omission suggests an implicit belief that this literature is useless for understanding human vision, perhaps due to the evolutionary distance between pigeons and people. We believe that this position comes both from the unfortunate, but popular misconception about the pigeon brain and from the failure to adopt a truly comparative approach in the study of visual and cognitive neuroscience.

The reluctance to accept the idea that anything about the primate brain can be learned from the study of the avian brain might have its origins in the old terminology used to describe bird brains, which suggested that these consist entirely of basal ganglia (Colombo and Scarf, 2012). This perspective is now outdated (Reiner et al., 2004, 2005; Jarvis et al., 2005), as there is considerable evidence that an important proportion of the avian brain consists of pallial areas, many of them homologous to cortical areas in mammals.

Current thinking in comparative psychology recognizes that most forms of complex behavior are the result of many underlying processes, some of them specialized in a single species, others shared across many species, and most somewhere in between these extremes (de Waal and Ferrari, 2010; Shettleworth, 2010; Soto and Wasserman, 2012c). No species will provide a perfect animal model of human behavior. For example, comparative studies have found differences between the human brain and that of other primates–including great apes–across all studied levels of organization, from genes to the size and connectivity of large areas (Preuss, 2011).

All of this work suggests that the only way to appropriately use animal models is by understanding what is shared and what is not between people and each specific model animal. Unfortunately, a much more common approach is to choose a model animal based on face validity and to glibly assume that the mechanisms underlying behavior in the model animal are similar to those in people. The belief that a species that is closer to people in the phylogenetic tree must provide a better model for any cognitive process is one manifestation of such reliance on face validity. Underlying this idea is the (clearly incorrect) assumption that the rate of evolutionary change is fixed across traits, environments, and species. From a truly comparative perspective, researchers should avoid relying on face validity to choose the species that they study. Instead, they should rely on the results of comparative studies–including behavioral research. In precisely this respect, the pigeon offers many manifest advantages.

We propose that pigeons can provide an excellent animal model for the study of the core processes involved in visual object recognition. Only in the study of specialized processes may other models be proven to afford a better alternative. In those cases, researchers should seek strong behavioral evidence regarding the computational mechanisms involved, just as has been done in pigeons over the last 50 years. After such research is performed, we would be in a better position to determine exactly what we are studying when we investigate object recognition in such species. Fortunately, we do not need to take another 50 years in order to reach a good understanding of the mechanisms of object recognition in rats, cats, and other mammals, as we can learn from the successes and failures of the pigeon research.

We have shown here that the behavioral study of object recognition in pigeons has yielded important insights into the general computational mechanisms used by vertebrates to solve this vital visual task and into the evolution of these mechanisms. Similarly, we believe that much will be learned about the neurobiology of object recognition from the study of the "simple" brains of pigeons.

### **REFERENCES**


of instances defining the prototype. *J. Exp. Psychol.* 101, 116–122. doi: 10. 1037/h0035772


Riesenhuber, M., and Poggio, T. (2000). Models of object recognition. *Nat. Neurosci.* 3, 1199–1204. doi: 10.1038/81479


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 June 2014; accepted: 15 September 2014; published online: 13 October 2014*.

*Citation: Soto FA and Wasserman EA (2014) Mechanisms of object recognition: what we have learned from pigeons. Front. Neural Circuits 8:122. doi: 10.3389/fncir.2014.00122*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Soto and Wasserman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Illusory patterns are fishy for fish, too

# *Christian Agrillo\*, Maria Elena Miletto Petrazzini and Marco Dadda*

*Department of General Psychology, University of Padova, Padova, Italy \*Correspondence: christian.agrillo@unipd.it*

#### *Edited by:*

*Davide Zoccolan, International School for Advanced Studies, Italy*

#### *Reviewed by:*

*Davide Zoccolan, International School for Advanced Studies, Italy Vera Schluessel, Rheinische Friedrich-Wilhelm Universität Bonn, Germany*

**Keywords: visual illusion, comparative perception, fish, illusory pattern, animal models**

It has been widely recognized that size, shape, and distance perception are not the mere translation of images in the eyes, as retinal images are inherently ambiguous. Some form of knowledge and/or assumptions by unconscious inductive inference seems to be necessary (Gregory, 1997). With respect to this topic, visual illusions are a valuable tool for understanding the neuro-cognitive systems underlying visual perception by indirectly revealing the hidden constraints of the perceptual system in a way that normal perception cannot. In humans, such constraints have been often summarized as the so-called "Gestalt principles," which can be briefly described by the motto "the whole is greater than the sum of its parts" (Wertheimer, 1938). Almost a century of experimental investigation on visual illusions has broadened our comprehension of the perceptual mechanisms that enable us to perceive figures and forms instead of just a collection of lines and curves. Such mechanisms are highly adaptive, as they allow for a quick and stable picture of the environment, enabling an appropriate motor response in every context (Ikin and Turner, 1972).

Given their high ecological value, there is little reason to believe that selective pressures to develop a visual system that is able to segregate objects from the background have acted only on hominids. Indeed, over the last decade, research has demonstrated that both apes and monkeys are deceived by illusory patterns. For instance, baboons perceive the Zöllner illusion (Benhar and Samuel, 1982), capuchin monkeys perceive the Müller-Lyer illusion (Suganuma et al., 2007), and rhesus monkeys perceive numerosity illusion (Beran, 2006; Beran and Parrish, 2013), thus showing that the organization of visual information is similar between human and non-human primates.

Despite the existence of a large number of studies, it is still unclear to what extent previous experience plays a role in how the brain/mind interprets and reconstructs physical reality (Hebb, 1949; Bod, 2002; Quinn and Bhatt, 2006). For practical and ethical reasons, it is very difficult to manipulate experiences during developmental periods in human and non-human primates. Furthermore, as primates lack independence at birth, different procedures are used for studying newborns, juveniles, or adults, presenting one of the major drawbacks when studying the development of visual perception in primates, i.e., the difficulty of devising experimental paradigms applicable to different ages (Bisazza et al., 2010). The recent discovery that even relatively simple organisms like fish, whose divergence seemingly occurred approximately 450 million years ago (Kumar and Hedges, 1998), also perceive visual illusions, as humans do, paves the way for the use of new animal models to investigate the relative contribution of genes and experience.

Redtail splitfin, for instance, was shown to be able to perceive illusory contours (Sovrano and Bisazza, 2009). Fish were required to discriminate between a square or a triangle and the corresponding background. After reaching a learning criterion, subjects performed test trials in the presence of two stimuli: one consisted of a subjective figure (triangle or square) induced by interruption or spatial phaseshift of diagonal lines; the other consisted of a series of diagonal lines only. In a subsequent test, two figures were presented: one in which pacmen were positioned in order to reproduce the Kanizsa triangle or square, and one in which the same pacmen were scrambled in different positions so as to prevent an impression of a subjective figure. Discrimination of orientation, rather than discrimination of shape, was also tested in a second experiment. Subjects were initially trained to discriminate between a vertical and a horizontal line with real physical contours. In test trials vertically and horizontally oriented illusory lines were presented, created either through interruption or spatial phase-shift of diagonal lines (see **Table 1**). Redtail splitfin were found to perceive illusory contours in both experiments.

Wyzisk and Neumeyer (2007) successfully trained goldfish to discriminate between triangles and squares. After reaching the learning criterion, the authors presented a Kanizsa triangle and a Kanizsa square, and found that goldfish were able to discriminate between the two patterns based on the illusory contours. Goldfish showed high orientation sensitivity with respect to the pacmen generating the illusory patterns. Interestingly, if black lines were over-imposed on a Kanizsa triangle or square, the illusory perception was disrupted, as has also been reported in humans, suggesting the existence of an end-stopped property similar to the neurons in V2 found in monkeys (von der Heydt, 2004).

Data collected on redtail splitfin and goldfish are particularly interesting as the two species are only distantly related. According to recent estimates, the Ostariophysi, the group to which redtail splitfin belong, and the Acanthopterygii, the group to which goldfish belong, diverged more than 250 million years ago (Steinke et al., 2006). The fact that even distantly related species perceive illusory contours suggests the existence of orientation-selective neurons—responding to edges, lines, or bars of high contrast—in a wide range of teleost fish. Also, more recent evidence further suggests similar perceptual

**Table 1 | Summary of static illusory patterns investigated in teleost fish (chronological order).**


mechanisms between fish and primates: reef fish tested in their natural environment exhibited amodal completion, as they tried to attack their own mirror image even when they could see a fragmented image of themselves (Darmaillacq et al., 2011). It is interesting to note that fish did not attack their imagine when they could see only a portion of the body in a single square, thus showing that their aggressive behavior was not simply triggered by some specific body features, such as color. Amodal completion was also reported in another fish species, the redtail splitfin (Sovrano and Bisazza, 2008).

These studies have theoretical implications in the debate surrounding human visual perception. It has been suggested that a single unit-formation process may underlie modal (the perception of both real and subjective contours) and amodal completion, as completion processes would depend on a common underlying mechanism connecting edges across gaps (Kellman et al., 1998; Palmer, 1999). Fish species reported in the literature (**Table 1**) showed a successful perception of both modal and amodal completion. This finding indirectly aligns with the idea of a single mechanism for the two processes. Nonetheless, we believe that future research on newborn and juvenile fish will provide even more useful insights, especially in the debate surrounding the developmental trajectories of Gestalt principles. Due to their relatively short lifespan and independence at birth, fish represent an excellent experimental model for studying the development of perception and cognition. Indeed, recent studies have already adopted fish to study the ontogeny and the developmental trajectories of perceptual and cognitive systems (Bisazza et al., 2010; Miletto Petrazzini et al., 2013). Given that adult fish vision seems to be based on Gestalt principles, the development of such principles may be now investigated using newborn/juvenile fish as a model.

A validated method exists to study cognition and perception in newborn fish (Miletto Petrazzini et al., 2012). This method involves introducing two stimuli (i.e., two different geometric figures) at the opposite ends of the tank and delivering food near the discriminative stimulus. Discrimination is inferred from the portion of time spent near the trained stimulus during final probe trials. The method has been shown to be very rapid (only 12 reinforced trials) and successful in discrimination tasks (i.e., circle vs. triangle), thus making it a good candidate for investigating the ontogeny of Gestalt principles in rapidly growing species, such as fish. Based on previous literature, the focus should be given initially to illusory patterns called "Fictions"—including illusory contours—in the classification advanced by Gregory (1997). First, it would be interesting to see if/which Gestalt principles are inherent; if not, it would be challenging to study their developmental trajectory and the influence of maturation and experience.

The use of zebrafish, one of the main model organisms for neurobiology studies of vision and neurodevelopmental genetics, is especially welcome, given the possibility to extend the investigation on illusory perception with genetic and neuroanatomic aspects. The anatomical, physiological, and genetic components of the zebrafish visual system have been widely investigated in both larval and adult individuals (e.g., Bilotta and Saszik, 2001). Several studies indicate that zebrafish are capable of high-level motion processing. In particular, two visually guided behaviors received great attention in the literature: the optokinetic response (OKR) and the optomotor response (OMR). The OKR is a consistent behavior in which moving objects across the visual field evoke stereotyped eye movements (Neuhauss, 2003; Huang and Neuhauss, 2008). These eye movements consist of two distinct components: a smooth pursuit movement and a fast saccade which resets the eyes once the object has left the visual field (Portugues and Engert, 2009). A small hindbrain area in rhombomere 5 has been found to be necessary for this response to occur properly (Schoonheim et al., 2010). Neuhauss et al. (1999) found that zebrafish mutant *belladonna* (*bel*) often displays an OKR opposite to the direction of movement of the objects. Interestingly, Huang et al. (2009) found that a subset of the same mutants also display atypical circular swimming patterns ("looping") as a result of illusionary self-motion perception. On the other hand, the OMR occurs when a whole-field moving stimulus is presented and the fish turn and swim according to the perceived motion direction (Neuhauss et al., 1999; Portugues and Engert, 2009). Mutants with visual defects—such as the *lakritz*(*lak*) mutant, which lacks a large subset of retinal ganglion cells—fail at the OMR test (Baier, 2000).

In humans, both OKR and OMR have been hypothesized to be involved in different visual illusions (Schor et al., 1984; Riecke et al., 2009). In this sense, the use of mutant zebrafish with opposite OKR, or lacking OMR, will play a key role in verifying the influence of both neural mechanisms in the perception of illusory patterns in a way that is not possible with primates.

Small brains are likely to provide important insights with respect to the ancient philosophical question of how the visual system builds our reality.

# **ACKNOWLEDGMENTS**

The authors would like to thank Alexandra Protopopova and the reviewers for their useful comments.

# **REFERENCES**


Darmaillacq, A. S., Dickel, L., Rahmani, N., and Shashar, N. (2011). Do reef fish, *Variola louti* and Scarus niger, perform amodal completion? evidence from a field study. *J. Comp. Psychol.* 125, 273–277. doi: 10.1037/a0024295


visually-induced self-motion illusion (circular vection) in virtual reality. *ACM Trans. Appl. Percept*. 6, 7–27. doi: 10.1145/1498700. 1498701


*Received: 02 July 2013; accepted: 07 August 2013; published online: 28 August 2013.*

*Citation: Agrillo C, Miletto Petrazzini ME and Dadda M (2013) Illusory patterns are fishy for fish, too. Front. Neural Circuits 7:137. doi: 10.3389/fncir.2013.00137*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2013 Agrillo, Miletto Petrazzini and Dadda. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The brain creates illusions not just for us: sharks (*Chiloscyllium griseum*) can "see the magic" as well

# *Theodora Fuss\*, Horst Bleckmann and Vera Schluessel*

*Department for Comparative Sensory Biology and Neurobiology, Institute of Zoology, Rheinische Friedrich-Wilhelms-University Bonn, Bonn, Germany*

#### *Edited by:*

*Davide Zoccolan, International School for Advanced Studies, Italy*

#### *Reviewed by:*

*Andreas Nieder, University of Tübingen, Germany Guy Wallis, University of Queensland, Australia*

#### *\*Correspondence:*

*Theodora Fuss, Department for Comparative Sensory Biology and Neurobiology, Institute of Zoology, Rheinische Friedrich-Wilhelms-University Bonn, Poppelsdorfer Schloss, Meckenheimer Allee 169, 53115 Bonn, Germany e-mail: thfuss@uni-bonn.de*

Bamboo sharks (*Chiloscyllium griseum*) were tested for their ability to perceive subjective and illusionary contours as well as line length illusions. Individuals were first trained to differentiate between squares, triangles, and rhomboids in a series of two alternative forced-choice experiments. Transfer tests then elucidated whether Kanizsa squares and triangles, grating gaps and phase shifted abutting gratings were also perceived and distinguished. The visual systems of most vertebrates and even invertebrates perceive illusionary contours despite the absence of physical luminance, color or textural differences. Sharks are no exception to the rule; all tasks were successfully mastered within 3–24 training sessions, with sharks discriminating between various sets of Kanizsa figures and alternative stimuli, as well as between subjective contours in *>*75% of all tests. However, in contrast to Kanizsa figures and subjective contours, sharks were not deceived by Müller-Lyer (ML) illusions. Here, two center lines of equal length are comparatively set between two arrowheads or –tails, in which case the line featuring the two arrow tails appears to be longer to most humans, primates and birds. In preparation for this experiment, lines of varying length, and lines of unequal length randomly featuring either two arrowheads or -tails on their ends, were presented first. Both sets of lines were successfully distinguished by most sharks. However, during presentation of the ML illusions sharks failed to succeed and succumbed either to side preferences or chose according to chance.

**Keywords: optical illusion, Kanizsa, subjective contour, Müller-Lyer deception, elasmobranch,** *Chiloscyllium griseum*

# **INTRODUCTION**

Illusionary contours, such as Kanizsa squares or triangles are misreadings of visual information by the brain; instead of processing merely the actual information coming from the retina, the brain adheres to preconceptions and assumes what is most likely to be seen, based on previous experiences and neural wiring (Kandel et al., 2000). In this respect, vision is a creative, interactive process that depends on both the real properties of a visual object as well as contextual interactions and prior experiences, which are organized by processing different pieces of information (e.g., shape or color) according to system specific rules (Kandel et al., 2000). The most famous examples for such phenomena are provided by the "Kanizsa figures," which are produced when the brain is fooled into seeing a square or a triangle, without there actually being a physical counterpart (Kanizsa, 1974). The triangle-illusion for example, is created by the arrangement of three Pacmen figures positioned with their open angles of 60◦ all pointing inwards to the same region (see **Figure 1**). In the absence of any lines or color changes, this arrangement itself is sufficient to evoke the impression in the viewer of there being distinct contours forming a triangle. This impression is strengthened by the fact that the illusionary triangle also appears to be brighter than the background despite a homogenous luminance. In the field of Gestalt psychology, Kanizsa figures and other illusions are explained using the principle that the brain first assesses objects as a whole or an entity prior to or instead of paying attention to individual components or parts. Additionally, if parts are lacking and an object is incomplete an entirety will be imagined whenever possible. Accordingly, objects that are close together also tend to be perceived as belonging together.

Several studies have shown that teleosts, like mammals, birds and even insects, can be deceived by optical illusions (e.g., Nieder, 2002; Agrillo et al., 2013), perceive illusionary contours, e.g., Kanizsa figures (Wyzisk, 2005; Wyzisk and Neumeyer, 2007) and can recognize partly occluded or fragmented objects (Sovrano and Bisazza, 2008, 2009; Darmaillacq et al., 2011). Very recently, a review on illusionary contours in teleosts was published by Agrillo et al. (2013) but so far, the ability to perceive illusionary contours has not been tested in any elasmobranch (sharks and rays). Elasmobranchs belong to the class Chondrichthyes (cartilaginous fishes), which represents the oldest extant jawed vertebrates. Recent research has finally been shedding light onto the previously neglected and often disputed cognitive abilities within this group, specifically in regards to learning and memory. Results indicate that the once popular disclaimer "primitive fish with primitive brains" is well and truly out of date and that sharks and rays can solve many cognitive tasks to the same extent as other vertebrates (Reviewed by Guttridge et al., 2010; Schluessel and Bleckmann, 2005, 2012; Kuba et al., 2010; Spaet et al., 2010; Schwarze et al., 2013; Fuss et al., 2014a,b,c; Kimber

**FIGURE 1 | The experimental setup located within the experimental basin, inside the white pavilion.** The keyhole-shaped setup consisted of a Starting Compartment, a decision area and a frosted screen for projections, featuring a divider allowing for unambiguous choice-making (left and right). For the projections, a LED beamer was used. Sharks were placed within the SC at the start of each trial. 1 = feeders, 2 = frosted screen for projection, 3 = cable pulls to release feeders, 4a = guillotine door, 4b = cable pull to open guillotine door, 5 = ceiling mounted fluorescent tubes (above pavilion roof).

et al., 2014). Nonetheless, many questions regarding cognition in elasmobranchs still remain unanswered, whereas cognition in teleosts has been studied in much more detail and has been summarized in a detailed review by Brown et al. (2011). This study aimed to determine if the shark brain can be deceived by optical illusions, i.e., if it follows the same rules and principles in regards to the creative vision process as other vertebrate brains. The ability to perceive illusionary contours that lack a physical counterpart (Petry and Meyer, 1987; Schumann, 1900) shows that the visual system contains inferences about the world beyond available sensory information—whether on low levels (Paradiso et al., 1989; von der Heydt, 1995) or on a cognitive basis (Gregory, 1972; Rock and Anson, 1979). Accordingly, optical illusions can provide valuable information on the processing of sensory stimuli in the brain and the neural basis of form vision.

Three experiments were conducted to test the perception of illusionary contours in sharks, i.e., (1) Kanizsa figures, (2) Subjective contours, and (3) Müller-Lyer (ML) illusions. Gray bamboo sharks (*Chiloscyllium griseum*) are small, benthic sharks that naturally occur in the Indo-West Pacific (Compagno et al., 2005). They primarily inhabit shallow waters, such as lagoons and inshore environments, sea grass meadows as well as rocky and coral reef environments, occupy small territories and feed on benthic prey (Compagno et al., 2005). Sharks are more distantly related to teleosts and other vertebrates than birds are to mammals or mammals are to each other and shark brains are very differently organized and structured from teleost brains due to divergent developmental processing (Northcutt, 1977; Wullimann and Mueller, 2004; Nieuwenhuys, 2009). Experiments were therefore aimed to allow for new insights into the processing of (subjective) sensory information in the brain in one of the most ancient vertebrate groups.

# **MATERIALS AND METHODS**

#### **ANIMALS AND HOUSING FACILITIES**

Nine juvenile bamboo sharks (*Chiloscyllium griseum*, 4 male, 5 female, TL: 25–40 cm) were kept in aquaria (1 × 0*.*5 × 0*.*5 m) connected to each other and to the experimental setup, providing constant environmental conditions (conductivity, temperature, and pH). The system was filled with aerated, filtered salt water [conductance: about 50 mS (ca. 1,0217 kg/dm3)] at 26 <sup>±</sup> <sup>2</sup>◦C. Food (small pieces of squid, fish, or shrimp) was only available during the experimental training. Experiments were conducted during daylight hours; there was a 12 h light: 12 h dark cycle. Individuals were identified by phenotypic characteristics.

# **SET-UP**

Experiments were performed by using the same octagonal experimental basin as well as the same setup as outlined previously (Fuss et al., 2014c). The gray PVC setup (**Figure 1**) featured a Starting Compartment (SC, 0*.*51 × 0*.*35 m), a decision area (113*.*5 × 0*.*87 × 0*.*35 m) and a frosted screen for projection (0*.*92 × 0*.*35 m) and was placed within an octagonal experimental basin (2*.*5 × 2*.*5 × 0*.*35 m) made out of transparent Perspex featuring a white covered floor (**Figure 1**). During experiments, the basin was filled with water to a depth of about 0.3 m. To exclude uncontrolled cueing as well as other potentially disturbing external influences, the basin was surrounded by a white pavilion (3*.*0 × 3*.*0 × 2*.*5 m). Ceiling mounted fluorescent tubes allowed an even illumination during the experiments (above pavilion roof; Osram L 18 W, Lumilux Cool White, Germany).

A light gray guillotine door (0*.*43 × 0*.*23 m) confined the SC (0*.*43 × 0*.*3 × 0*.*35 m), in which sharks were placed before each trial. Independent of the type of trial/experiment the experimenter was situated behind the SC. The guillotine door was controlled manually by using a cable pull. A 0.33 m long divider, attached to the frosted screen separated a left from a right division, thereby allowing for an unambiguous decision making in response to the two stimuli displayed on the screen (**Figure 1**) via a projector. For projections, a LED projector situated at a distance of 1.3 m from the screen was used (**Figure 1**). The bluishgreen colored stimuli used during all experiments were displayed on a light gray colored background. According to Hart et al. (2011), the maximum absorbance (λmax) of cone visual pigments in the very closely related shark species *Chiloscyllium punctatum* was found at 531.8 ± 6.7 nm; in the visible light range for blue to green. As sharks were usually swimming close to the bottom, stimuli were projected at a height of 3 cm above the ground. To reward sharks for a correct decision, feeders were installed just above both stimuli, which allowed food to be dropped into the setup manually using a cable pull from the experimenter's position at the opposite side of the experimental set up (**Figure 1**). For a correct choice to be recorded by the experimenter, sharks had to press their nose against the wall just below/onto the positive stimulus. Selected sessions were videotaped. Both feeders were baited during all trials to exclude unintentional cueing. Additionally, the water in the maze was stirred after every trial to preclude any olfactory cues after a reward was given (which could bias the shark's choice of arm in subsequent trials).

empty triangle, whereas group 2 was trained vice versa. During the T2 transfer tests, sharks were expected to choose the Kanizsa figure

resembling the stimulus they had been trained on.

# **TRAINING**

Training followed the schedule outlined previously (Fuss et al., 2014c). The behavioral experiments consisted of three phases: 1—acclimatization, 2—training (regular trials), and 3—transfer trials. Experiments were conducted as two-alternative-forcedchoice experiments. After successful training of the first stimulus set (phase 1), performance was tested in the remaining pairs.

# *Phase 1—acclimatization*

Before training, sharks were allowed to become familiar with the experimental setup by swimming freely throughout the entire setup for up to 20 min at a time. The guillotine door was open, both divisions displayed the same 2D object (circle) and feeders were in place. Once a shark swam freely throughout the maze and looked for food being dropped from the feeders (i.e., nearby the 2D objects), training commenced (**Figure 1**).

### *Phase 2—training*

Before each trial, both feeders were baited and the water stirred. At the beginning of each regular trial the shark was placed in the SC. To start a trial, the shark had to push against the guillotine door with its snout. A trial lasted for a maximum of 2 min. A choice was made as soon as the shark touched the frosted screen on the opposite end of the set up with its snout. The two stimuli (**Figures 2**–**4**) to be discriminated were displayed simultaneously (one in each division) and switched randomly between the left and the right side of the screen (**Figure 1**) to avoid direction conditioning. Five alternating rotational schemes were used, so as to vary the succession of stimuli shown on a particular side between sessions. A correct choice was rewarded with food. During the inter-trial-interval (ITI), the shark was allowed to swim freely throughout the entire setup for 30 s, before it was gently guided back into the SC. The next trial started as soon as the shark pushed against the guillotine door. If a shark did not choose

within the allocated 2 min, the trial was terminated. Training sessions were carried out 5 days per week; each session consisted of ten trials. Training was completed as soon as a learning criterion of ≥70% correct choices on three subsequent sessions was reached (χ2*(*1*)* <sup>≤</sup> <sup>0</sup>*.*05; to prove statistical significance). If an animal did not reach the criterion within 30 training sessions it was excluded from further training.

# *Phase 3—transfers*

Transfer tests were conducted during which the sharks had to perform under altered conditions. Up to two transfer trials were interspersed randomly with ten regular trials within one session and separated by at least five regular trials from each other (resulting in 12 trials per session). Transfer trials remained unrewarded to prevent any kind of learning with respect to the new situation. During this phase, a maximum of eight regular trials (out of ten) were rewarded (random selection) irrespective of choice. This served to prepare the fish for transfer trials (so as to keep the fish from realizing that only transfer trials were unrewarded and therefore not worth participating in).

# **EXPERIMENT 1: KANIZSA FIGURES**

# *Experiment 1a*

During training, there were two groups: group 1 (*n* = 4) learned to recognize an empty square as the positive, rewarded stimulus over a filled one, whereas group 2 (*n* = 4) learned to recognize an empty triangle over a filled one (**Figure 2**). As soon as the learning criterion was reached, the transfer phase commenced. During the transfer tests (T1) it was tested if sharks preferentially chose the Kanizsa figure (group 1: resembling a square, group 2: resembling a triangle) over seven different randomized Pacmen figures (**Figure 2**). Each shark participated in 28 transfer tests.

# *Experiment 1b*

Group 1 (*n* = 3) was trained to recognize an empty square as the positive, rewarded stimulus over an empty triangle, whereas Group 2 (*n* = 4) was trained to recognize an empty triangle over an empty square. Following successful training, sharks were presented with a series of 28 transfer tests (T2) with the aim to determine whether the Kanizsa figure resembling the positive stimulus during regular trials (group 1: a square, group 2: a triangle) was chosen over the alternative one (**Figure 2**).

#### **EXPERIMENT 2: SUBJECTIVE CONTOURS**

All sharks (*n* = 8) were trained to choose a white square presented on diagonal lines (**Figure 3**), while the negative stimulus was a rhomboid. Following successful training, sharks were presented with subjective contours in a series of transfer tests. During 30 transfer tests of experiment 2a (T3) a correct choice was recorded, if the subjective contour defining a square by using grating gaps within the white lines was chosen over a rhomboid (**Figure 3**). In experiment 2b, sharks were then presented with a second series of 30 transfer tests (T4) and tested if the subjective contour defining a square by using phase-shifted abutting gratings was chosen over the subjective contour defining a rhomboid (**Figure 3**).

# **EXPERIMENT 3: SIZE RATIOS AND MÜLLER-LYER DECEPTION** *Experiment 3a*

All sharks (*n* = 8) learned to distinguish between two lines of different lengths (6 vs. 3 cm, 6 vs. 4 cm, 6 vs. 5 cm, 6 vs. 5.5 cm; see **Figure 4**). The longer of the two lines served as the positive, rewarded stimulus. Following successful training on the first pair, sharks were presented with a series of ten transfer tests (T5) before they continued with training of the next pair. Transfer tests of experiment 3a (5 vs. 5 cm) served to test whether other cues aside from the length of the lines helped the shark to recognize the positive stimulus and to determine behavior (**Figure 4**).

# *Experiment 3b*

All sharks (*n* = 8) learned to distinguish between two unequally sized lines (center line 6 vs. 3 cm) equipped with either arrow "heads" or arrow "tails" (i.e., "correct" or "inverted" arrowheads, see **Figure 4**). The longer line served as the positive, rewarded stimulus. The line length and orientation of arrowheads switched randomly between the left and the right side of the screen (**Figure 4**). Transfer tests (T6) were performed to test whether sharks are deceived by ML illusions. Accordingly, in these transfer test trials sharks were presented with two center lines of equal length (5 cm) but with differently oriented arrowheads (**Figure 4**). Each shark participated in 30 transfer tests.

#### **DATA ANALYSIS**

The average trial time, the percentage of correct choices and the percentage of right and left choices were recorded for each session for each individual. A Chi<sup>2</sup> test was performed to test for significant side preferences of individuals. To prove statistical significance of learning success, the learning criterion was established to be ≥70% correct choices in three consecutive sessions (χ2*(*1*)* <sup>≤</sup> <sup>0</sup>*.*05). A sign and binomial test was run to determine if those sharks, who did not reach the learning criterion within 30 sessions still chose the positive (rewarded) stimulus significantly more often than the negative (unrewarded) stimulus. A Mann-Whitney-U test was used to determine if the average trial times differed significantly between the regular training trials and the transfer test trials for each individual as well as for groups. Sign and binomial tests as well as the 95% confidence intervals of a proportion (both by using the absolute numbers of decisions) were calculated for each individual as well as for the group(s) to determine whether sharks preferred one symbol or one side significantly over the other. To test for differences between the two groups (experiment 1), a Wilcoxon signed rank test was used. For all tests a *p* ≤ 0*.*05 was considered significant, a *p* ≤ 0*.*001 highly significant.

# **RESULTS**

Nine sharks participated in the experimental training procedure (Shark 1 died after experiment 1a and was replaced by Shark 9 at the beginning of experiment 2). The following section will summarize individual results for those nine sharks as well as for the group. Group results include only those sharks, which finished a phase successfully.

# **ACCLIMATIZATION**

Sharks (*n* = 9) needed on average 11.22 ± 3.27 sessions to acclimatize to the maze, perform the starting procedure and retrieve food from the feeders. Initial side preferences were only observed in one individual [χ<sup>2</sup> Shark1*(*1*)* <sup>=</sup> <sup>0</sup>*.*014, <sup>χ</sup><sup>2</sup> Shark2*(*1*)* = 0*.*295, χ<sup>2</sup> Shark3*(*1*)* <sup>=</sup> <sup>0</sup>*.*604, <sup>χ</sup><sup>2</sup> Shark4*(*1*)* <sup>=</sup> <sup>0</sup>*.*795, <sup>χ</sup><sup>2</sup> Shark5*(*1*)* = 0*.*188, χ<sup>2</sup> Shark6*(*1*)* <sup>=</sup> <sup>0</sup>*.*434, <sup>χ</sup><sup>2</sup> Shark7*(*1*)* <sup>=</sup> <sup>0</sup>*.*796, <sup>χ</sup><sup>2</sup> Shark8*(*1*)* = 1, χ2 Shark9*(*1*)* = 0*.*604].

# **EXPERIMENT 1: KANIZSA FIGURES**

In **Figure 5** a representative learning curve of one individual (Shark 7) is provided for the different phases of experiment 1 until the learning criterion was reached. Additionally, average trial time

**FIGURE 5 | Experiment 1.** Shown is the performance of Shark 7 as % of correct choices (symbolized by triangles; left ordinate) per session as well as the average trial time in seconds (symbolized by gray bars; right ordinate) per session per phase until the learning criterion was reached.

per session is given in seconds. Group results of the transfer trials during experiment 1a and 1b are summarized in **Figure 6**.

#### *Experiment 1a*

Sharks needed on average 10.13 ± 6.29 sessions (group 1: 8.00 ± 2.45, group 2: 12.25 ± 8.62) to complete training successfully. On average 10.90 ± 2.24 s per trial were needed (group 1: 12.15 ± 4.15s, group 2: 10.17 ± 2.58 s) to make a decision (for individual details please compare **Table 1**).

During transfer tests, all but one shark (Shark 3, **Table 1**) chose the "correct" figure (the corresponding Kanizsa figure) significantly more often than the incorrect one (**Table 1**). All sharks solved the T1 transfer tests on average within 12.17 ± 10.64 s per trial (**Table 1**). This was not significantly different from the regular training trials during the transfer test phase, neither for any individual nor for group 1 (**Table 1**). In group 2, there were no significant differences between the regular and transfer trials for any individual but for the group as a whole (**Table 1**).

# *Experiment 1b*

Sharks needed on average 7.57 ± 5.35 sessions (group 1: 7.33 ± 6.66, group 2: 7.75 ± 5.25) to complete training successfully (for individual details please compare **Table 1**). On average 9.95 ± 3.83 s per trial were needed (group 1: 11.80 ± 4.92 s, group 2: 8.90 ± 2.39 s) to make a decision.

The whole group solved the T2 transfer tests on average within 10.51 ± 8.54 s per transfer trial (**Table 1**). During transfer tests, all but one shark (Shark 3, **Table 1**) chose the correct figure significantly more often than the incorrect one (**Table 1**). Sharks needed

**Table 1 | Part 1: Statistics on the** *performance* **during regular training trials and transfer tests during Experiment 1: Kanizsa figures (Experiments 1a and 1b).**


*<sup>p</sup> <sup>&</sup>gt; 0.05 not significant, p* <sup>≤</sup> *0.05 significant (\*), p* <sup>≤</sup> *0.01 significant (\*\*), p* <sup>≤</sup> *0.001 significant (\*\*\*).* #*Shark 1 died between Experiment 1a and 1b and did therefore not participate in Experiment 1b.*


**Table 1 | Part 2: Statistics on the** *average trial times* **[s] during regular training trials and transfer tests during Experiment 1: Kanizsa figures (Experiments 1a and 1b).**

*<sup>p</sup> <sup>&</sup>gt; 0.05 not significant, p* <sup>≤</sup> *0.05 significant (\*), p* <sup>≤</sup> *0.01 significant (\*\*).*

#*Shark 1 died between Experiment 1a and 1b and did therefore not participate in Experiment 1b.*

on average 10.51 ± 8.54 s per transfer trial (**Table 1**). This was not significantly different from the regular training trials during the transfer test phase, neither for any individual nor for group 1 (**Table 1**). In group 2, no significant differences were found in the performance of individual sharks but for the group as a whole (**Table 1**).

In comparison, there was no significant difference between group 1 and group 2 in the absolute number of correct choices during transfer test trials between Experiment 1a and 1b (NPH two samples: *Z* = −1*.*323, *p* = 0*.*186; Wilcoxon signed rank test: *Z* = −1*.*105, *p* = 0*.*375). Additionally, there was no significant difference in the average trial time to solve the regular training trials or the transfer test trials for any shark, but for group 2 as well as for all sharks combined (**Table 1**).

# **EXPERIMENT 2: SUBJECTIVE CONTOURS**

**Figure 7** provides a representative learning curve of one individual (Shark 8) for the different phases of experiment 2 until the learning criterion was reached. Additionally, the average trial time per session is given in seconds. Group results of the transfer trials during Experiment 2a and 2b are summarized in **Figure 8**.

Sharks needed on average 11.13 ± 8.44 sessions to complete training successfully (**Table 2**). They needed on average 9.88 ± 3.48 s per training trial to make a decision.

All sharks solved the T3 transfer tests on average within 12.57 ± 14.94 s per trial, T4 transfer tests on average within 11.10 ± 7.99 s per trial. During transfer tests, all but one shark (Shark 9, **Table 2**) chose the correct figure (the corresponding

**FIGURE 7 | Experiment 2.** Shown is the performance of Shark 8 as % of correct choices per session (symbolized by boxes; left ordinate) as well as the average trial time (s) per session (symbolized by gray bars; right ordinate) per phase until the learning criterion was reached.

square) significantly more often than the incorrect one (**Table 2**). There was no significant difference regarding the average trial time of regular vs. transfer test trials (T3 and T4; **Table 2**) for any shark.

# **EXPERIMENT 3: SIZE RATIOS AND MÜLLER-LYER DECEPTION**

In **Figures 9**, **10** representative learning curves of two individuals (**Figure 9**: Shark 3, **Figure 10**: Shark 5) are provided for the different phases of experiment 3 until the learning criterion was reached. Additionally, the average trial time per session is given in seconds. Group results of the transfer trials during Experiment 3a and 3b are summarized in **Figure 11** (for individual details please compare **Tables 3**, **4**).

# *Experiment 3a*

Six out of eight sharks completed training of the first size pair (6 vs. 3 cm) successfully. On average it took 6.17 ± 3.37 sessions to reach the learning criterion, and a decision was made within 7.20 ± 1.44 s per trial (**Table 3**). Two sharks (Shark 4 and Shark 9) were not able to solve the task (**Table 4**) and were therefore excluded from further training and testing. During T5 transfers (5 vs. 5 cm), only Shark 6 as well as all sharks grouped together showed a significant side preference (**Table 3**). There were no significant differences between regular and transfer trial times for five out of six sharks (Shark 6) but for the whole group combined (**Table 3**).

All sharks, which were successful in solving the first size pair, were also able to complete training on the second (6 vs. 4 cm). On average 5.33 ± 2.52 sessions were needed to reach the learning criterion (**Table 3**). On average a decision was made within 7.46 ± 1.92 s per trial (**Table 3**).

During transfers (5 vs. 5 cm), only Shark 5 as well as all sharks grouped together (**Table 3**) showed a significant side preference. There was also a significant difference for two sharks (Shark 2, Shark 5) as well as for the whole group regarding average trial time i.e., regular vs. transfer test trials (**Table 3**).

Three out of six sharks, which were successful in solving the second size pair were also able to complete training on the third one (6 vs. 5 cm). On average 3.67 ± 1.16 sessions were needed to reach the learning criterion (**Table 3**). A decision was made on average within 6.90 ± 1.35 s per training trial (**Table 3**). Three sharks (Shark 2, Shark 6, Shark 8) were not able to solve the task (**Table 4**), and were excluded from further training. Shark 6 preferred the negative stimulus (i.e., the shorter of the two lines) significantly over the positive one.

During T5 transfer tests (5 vs. 5 cm), all but one shark (Shark 3) and all sharks grouped together showed a significant side preference (Shark 5, Shark 7, **Table 3**). There was no significant difference for any but one shark (Shark 5) as well as for the whole group in the average trial time to solve the regular training trials or the transfer test trials (**Table 3**).

None of the three sharks, which were successful in solving the third size pair, was able to complete training on the fourth size pair (6 vs. 5.5 cm) within the allocated 30 training sessions (**Table 4**).

#### *Experiment 3b*

Six out of eight sharks completed training on two lines of varying lengths featuring differently oriented arrowheads (**Figure 4**) successfully. On average 8.00 ± 6.32 sessions were needed to reach the learning criterion (**Table 3**; group results only refer to those individuals, who reached the learning criterion within 30


**Table 2 | Part 1: Statistics on the** *performance* **during regular training trials and transfer tests during Experiment 2: subjective contours.**

*<sup>p</sup> <sup>&</sup>gt; 0.05 not significant, p* <sup>≤</sup> *0.05 significant (\*), p* <sup>≤</sup> *0.01 significant (\*\*), p* <sup>≤</sup> *0.001 significant (\*\*\*).*

training sessions). A decision was made on average within 7.72 ± 2.26 s per training trial (**Table 3**). Two sharks (Shark 2, Shark 9) did not solve the task (**Table 4**), and were excluded from further training. During T6 transfers (ML deception: two center lines of equal length with differently oriented arrowheads, **Figure 4**), three out of six sharks (Shark 4,Shark 5, Shark 6, **Table 3**) showed a significant side preference. In contrast, only one shark showed a distinct preference for the inverted arrowheads (Shark 6, **Table 3**). Three sharks (Shark 3, Shark 5, Shark 7) as well as all sharks grouped together showed significantly different average trial times to solve the transfer compared to the regular trials (**Table 3**).

# **DISCUSSION**

The visual experience of a line or an edge usually corresponds to a discontinuity in the intensity, wavelength, or spectral composition of the radiation that stimulates two contiguous areas of the retina (Kanizsa, 1974). The visual system accomplishes the organization of these contextual interactions by processing sensory information about shape, color, distance, and movement of objects according to its own rules (Kandel et al., 2000). Thus, form perception and the underlying neuronal mechanisms require a general representation of object boundaries, independent of how they are defined (Nieder and Wagner, 1999). Contour detecting cells within the visual system are unlikely to account for this phenomenon, but rather the subjective surface is generated by a visual system that has a tendency to complete certain figural elements (Kanizsa, 1976; Gerbino and Salmaso, 1987; Purghé and Coren, 1992). The brain appears to have expectations derived from both experience and intrinsic wiring for vision that form the basis for the assumptions it makes about what is to be seen in the visual world (Kanizsa, 1979; Day and Kasperczyk, 1983; Kandel et al., 2000).

One of the visual abilities essential to form perception is the reconstruction of contours absent from the retinal image (Nieder and Wagner, 1999) and the brain's association of certain parts of a scene to form a recognizable object while downgrading other parts (Kandel et al., 2000). Optical illusions demonstrate certain organizational mechanisms of visual perception and are known to be closely related to cortical processes in different vertebrates, such as humans (Bertenthal et al., 1980; Wede, 2008), cats (Bravo et al., 1988; De Weerd et al., 1990), monkeys (Vallortigara, 2004, 2008; Nielsen et al., 2006, 2008), owls (Nieder and Wagner, 1999; Nieder, 2002), and chickens (Vallortigara, 2006). There are several indications that parts of the fish telencephalon, such as the lateral and medial pallium could be considered as homologous to parts of the mammalian telencephalon, such as the hippocampus and the amygdala, (e.g., Northcutt, 1977, 1981, 1995; Salas et al., 1996a, 2003; Wullimann and Mueller, 2004; Durán et al., 2008, 2010; Nieuwenhuys, 2009; Martín et al., 2011). Nonetheless,


**Table 2 | Part 2: Statistics on the** *average trial time* **[s] during regular training trials and transfer tests during Experiment 2: subjective contours.**

*p > 0.05 not significant.*

other brain regions, such as the midbrain (e.g., in pigeons) may be involved in processing of illusionary contours as well (Niu et al., 2006).

Several aspects regarding the perception of optical illusions, such as the ability to reconstruct incomplete, partly occluded objects or subjective contours have already been successfully tested in a range of teleosts (e.g., Schuster and Amtsfeld, 2002; Wyzisk and Neumeyer, 2007; Sovrano and Bisazza, 2008, 2009; Siebeck et al., 2009). The present study aimed to behaviorally investigate the perception of Kanizsa figures (experiment 1), subjective contours (experiment 2), and the perception of the ML deceptions (experiment 3) in juvenile gray bamboo sharks (*Chiloscyllium griseum*).

Sharks needed on average ten sessions in the first and eight sessions in the second part of experiment 1 to discriminate successfully between squares and triangles. During the following two sets of transfer tests, all but one shark chose the corresponding Kanizsa figure significantly more often than any of the seven different randomized Pacmen figures (**Table 1**) that were presented as alternatives. All but one shark significantly preferred the Kanizsa figure, which most closely resembled the positive training stimulus. While other factors, such as symmetry features of the Pacmen figures could have potentially influenced the choosing process in the transfer tests of experiment 1a, the results of the transfer trials in experiment 1b clearly show that this was not

ordinate) as well as the average trial time (s) per session (symbolized by gray bars; right ordinate) per phase until the learning criterion was reached.

the deciding criterion implemented by sharks. This data, indicating that Kanizsa figures were easily perceived as squares and triangles, was supported by the data collected on trial time; there was no significant difference in the average trial time needed to

**FIGURE 10 | Müller-Lyer deception.** Shown is the performance of Shark 5 as the percentage of correct choices per session (symbolized by boxes; left ordinate) as well as the average trial time (s) per session (symbolized by gray bars; right ordinate) per phase until the learning criterion was reached.

make a definite choice during the regular training trials (choosing between two "real" symbols) compared to the transfer trials (choosing between Kanizsa figures; **Figure 2**, **Table 1**). Although group 2 performed slightly better than group 1, the recognition and differentiation of square-shaped as well as triangle-shaped Kanizsa figures was equally effective for both groups (**Table 1**). Results clearly show that sharks can perceive Kanizsa figures. As in humans, images of the Kanizsa squares or triangles had to emerge from fictional contours supplied by the brain, pointing to a similar or analogical organizational mechanism of visual perception to the "filling-in" mechanism found in mammals (Kellman et al., 1998; Kandel et al., 2000). Comparable results were also found in goldfish, *Carassius auratus* (Wyzisk, 2005). However, in the goldfish, square and triangle discriminations seemed to be based on very specific features of these forms, since not the entire figure was needed to retain the discrimination ability.

In experiment 2 sharks chose a white square presented on white diagonal lines over a rhomboid within 11.13 ± 8.44 sessions (**Table 2**). During T3 transfer tests, sharks had to choose the subjective contour defining a square by using grating gaps within the white lines. All but one shark chose the correct subjective contour representing a square significantly more often than the trained negative stimulus representing a rhomboid (**Table 2**). When facing subjective contours defining a square by using phase-shifted abutting gratings (T4 tests), all sharks maintained the high level of performance of the first transfer tests (**Table 2**). Again, all but one shark appeared to implement easily what they had learned during training. This is supported by the nearly constant average trial times during T3 and T4 transfer trials compared to regular training trials. The results indicate that sharks are capable of perceiving subjective contours as shown previously also for redtail splitfins (*Xenotoca eiseni*, Sovrano and Bisazza, 2009), barn owls (*Tyto alba*, Nieder and Wagner, 1999; Nieder, 2002), chickens (Vallortigara, 2006), and primates (Vallortigara, 2004, 2008). Barn owls, for example, which were trained to discriminate between two real shapes, were also able to distinguish between the corresponding illusionary contours and showed a clear preference for the positive training stimulus. Nieder and Wagner concluded that the birds recognized the illusionary contours as "true" objects by "filling-in" the missing edges. Surprisingly, goldfish were unable to recognize phase-shifted illusionary squares (Wyzisk and Neumeyer, 2007); however, results of this study could have been negatively influenced by methodological errors regarding the line sizing (Sovrano and Bisazza, 2009).

Considering the combined results of the first two experiments, it seems unlikely that the sharks focused on single feature elements of the stimulus, such as edges or lines instead of the overall shape. Interruptions and boundary discontinuities for example were present in both stimuli (i.e., Kanizsa figures and subjective contours with grating gaps or phase-shifted abutting gratings) and could have not aided in the discrimination process. Instead, it is likely that sharks applied the concepts of "filling-in" (Kandel et al., 2000) or "(a)modal completion" (Michotte et al., 1964/1991; Kanizsa et al., 1993; Singh, 2004) which occurs when parts of an object are camouflaged by an overlying surface, which projects the same luminance and color as the nearer object (Singh, 2004). In case of the Kanizsa figures, the "incomplete" Pacmen figures appeared as fully-uninterrupted circles, partially hidden behind an occluding figure. In case of the subjective contours with grating gaps or phase-shifted abutting gratings, a continuous square (or rhomboid) was recognized on a background of white diagonal lines (i.e., completing the lines amodally behind the illusory surface, Michotte et al., 1964/1991; Kanizsa et al., 1993).

Goldfish (Wyzisk and Neumeyer, 2007) and redtail splitfins (Sovrano and Bisazza, 2008) can recognize and "mentally complete" partly occluded objects, which represents another form of amodal completion. Sovrano and Bisazza (2008) trained redtail splitfins to discriminate between a complete and an amputated disc. The fish then performed in test trials in which hexagonal polygons produced or averted the impression of a partial occlusion of the disk. Fish behaved as if they were experiencing visual


**Table 3 | Part 1: Statistics on regular training trails and transfer tests during Experiment 3: Size pairs and Müller-Lyer deception (Experiments 3a and 3b).**

*(Continued)*

#### **Table 3 | Part 1: Continued**


*<sup>p</sup> <sup>&</sup>gt; 0.05 not significant, p* <sup>≤</sup> *0.05 significant (\*), p* <sup>≤</sup> *0.01 significant (\*\*), p* <sup>≤</sup> *0.001 significant (\*\*\*).*

#*Results are only shown for those individuals, who reached the learning criterion within 30 training sessions. Accordingly, group results refer only to these individuals.*

completion of the partly occluded stimuli (Sovrano and Bisazza, 2008). The perception of amodal completion and the perception of subjective contours both seem to use the same basic mechanisms to deal with occlusion problems (Kellman and Shipley, 1991; Kellman et al., 2001, 2005).

In preparation for the ML deception, training involved the discrimination of two lines of different length (experiment 3). As the ML deception evokes only a slight, not very pronounced size illusion, the difference in length between the two lines was reduced gradually with continuous training. Six out of eight sharks were able to significantly often select the longer of the two lines in two size pairs (6 vs. 3 cm, 6 vs. 4 cm) within 6 and 5 sessions, respectively (**Table 3**). Three sharks even discriminated 6 vs. 5 cm within 4 sessions (**Table 3**). In this task, sharks performed much better than goldfish (Wyzisk, 2005); these decided at chance level (50% correct) when being presented with lines of 5 vs. 3 cm, 6 vs. 4 cm, or 5 vs. 2 cm. When being presented with two lines of equal length (T5), sharks chose according to chance level or developed side preferences (**Table 3**). In the following task, sharks were presented with two lines of varying lengths (6 vs. 3 cm) randomly featuring differently oriented ends (either two arrowheads or -tails, **Figure 4**). Six sharks were able to reach the learning criterion on average within 8 sessions.

In the ML deception, two center lines of equal length, one featuring two inverted and the other two normal arrowheads, appear to be unequal in length due to the differently oriented arrowheads, which evoke a spatial impression. Humans judge the size of an object by comparing it to its immediate surrounding; thus, the spatial relationship of objects helps to interpret the image. Humans perceive the lines to be unequal because the brain uses shape and the experience from the spatial sense as an indicator of sizing (Kandel et al., 2000). As typical for many illusions, knowing that the lines are equal does not prevent humans from being misled by this illusion (Kandel et al., 2000). Surprisingly though, not all human cultures react equally to these illusions (Rivers, 1901), with Europeans being more susceptive than cultures such as Inuits, Aborigines or Africans (Segall et al., 1966; Berry, 1968). Most likely, several factors, such as eye pigmentation or enhancement though a "carpentered" environment contribute to these intercultural differences (Jahoda, 1971). The obtained results of the present study revealed a very different response to the ML deception than expected or found in most humans. Surprisingly, the sharks were not tricked by the "lengthconfusion" but displayed the same behavior as found when lines of equal length (featuring no or randomly oriented arrowheadsand tails; T6 tests) were presented. While three sharks developed a significant side preference (the other three chose according to chance level), only one shark showed a distinct tendency for a specific arrow (i.e., arrowtails, **Table 3**). For some unknown reason, three sharks as well as the whole group made their choice significantly faster during the transfer tests compared to regular trials (**Table 3**). Overall though, sharks seemed to identify the length of the center lines, irrespective of the surrounding elements. Thus, the here presented results on sharks are consistent with the results found in goldfish during an earlier study (Wyzisk, 2005) and stand in contrast to results obtained from other species such as gray parrots (Pepperberg et al., 2008), pigeons (Nakamura et al., 2006), chickens (Winslow, 1933), ring doves (Warden and Baar, 1929), capuchin monkeys (Suganuma et al., 2007), and rhesus macaques (Tudusciuc and Nieder, 2010).

Potentially, results could have been different in case other versions of the Müller-Lyer illusion had been tested, such as the Brentano variation (as used e.g., by Pepperberg et al., 2008; variation in the lengths or thickness of the center lines or the angle of the arrows or both). The present study obviously cannot exclude this, but the original version of the ML illusion that was tested here was not perceived. As potential mechanisms were not investigated any further it is impossible to decide which strategies may have been used. However, seeing oriented line terminations in the stimuli is not the same as perceiving an illusory contour. In fact, in all experiments sharks had to pay attention to the length of the lines, not to the orientation of arrows. Illusionary trials (i.e., T6) were randomly interspersed with regular training trials, featuring lines of different length with arrowheads and tails and results were always significant in those trials. Accordingly, sharks did not look for anything but the line length and all sharks with the exception


*significant,p significantp significantp significantproportion.* #*Results are only shown for those individuals, who reached the learning criterion within 30 training sessions. Accordingly, group results refer only to these*

 *individuals.*


**Table 4 | Sign and binomial test and 95% confidence interval to determine if those sharks, who did not reach the learning criterion within 30 sessions for single size pairs (Experiment 3a) or the Müller-Lyer deception (Experiment 3b) chose the positive (rewarded) stimulus significantly more often than the negative (unrewarded) stimulus.**

*<sup>p</sup> <sup>&</sup>gt; 0.05 not significant, p* <sup>≤</sup> *0.05 significant (\*), p* <sup>≤</sup> *0.01 significant (\*\*), p* <sup>≤</sup> *0.001 significant (\*\*\*).*

of one were proven not to pay attention to the arrow-formation. All other sharks showed side preferences, a common response if animals do not know what to choose. As there was no difference in the length of the lines, it is irrelevant if sharks have low or high visual acuity.

During experiment 3a, lines of 6 vs. 5 cm were still told apart from each other by some sharks—which would have about equaled the length difference between the two versions shown in the Müller Lyer tests in experiment 3b (including the arrowheads, not just the center lines). So if acuity was good enough to distinguish 6 and 5 cm (experiment 3a), then it should have been good enough to distinguish the length of the illusionary figures (experiment 3b, T6)—at least in some animals. The homogenous response that none of the sharks solved the task clearly indicates that no difference as observed as there was none. This recalls the fact that not all humans perceive the ML illusion (as it evokes only a very slight deception) and not all humans react equally to it (Rivers, 1901; Segall et al., 1966; Berry, 1968).

Visual perception is a creative process—not only in humans, mammals, birds and teleosts but also in bamboo sharks, a representative of one of the oldest vertebrate groups. Present results not only reveal that bamboo sharks have the ability to perceive or reject optical illusions. Moreover, they provide information on the evolutionary origin and development of selected cognitive abilities and the characteristics of shared or non-shared neural mechanisms. Lastly, as found in other cognition experiments, present results highlight the behavioral variability found among individuals trained in the same procedure and using the same training schedule. The often observed, apparently erratic nature of the individual learning success is part of this variability, as well as the sharks' different capabilities regarding the perception of optical illusions.

#### **ACKNOWLEDGMENTS**

We would like to thank Slawa Braun for animal caretaking, maintenance, and repairs. We are specifically grateful to the "Haus des Meeres" in Vienna for supplying the animals used during this study. The research reported herein was performed under the guidelines established by the current German animal protection law.

# **REFERENCES**


Gerbino, W., and Salmaso, D. (1987). The effect of amodal completion on visual matching. *Acta Psychol.* 65, 25–46. doi: 10.1016/0001-6918(87)90045-X


Wyzisk, K., and Neumeyer, C. (2007). Perception of illusory surfaces and contours in goldfish. *Vis. Neurosci*. 24, 291–298. doi: 10.1017/S095252380 707023X

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 January 2013; accepted: 03 March 2014; published online: 20 March 2014. Citation: Fuss T, Bleckmann H and Schluessel V (2014) The brain creates illusions not just for us: sharks (Chiloscyllium griseum) can "see the magic" as well. Front. Neural Circuits 8:24. doi: 10.3389/fncir.2014.00024*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Fuss, Bleckmann and Schluessel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# What can fish brains tell us about visual perception?

#### **Orsola Rosa Salva<sup>1</sup> , Valeria Anna Sovrano1,2\* and Giorgio Vallortigara1,2**

<sup>1</sup> Center for Mind/Brain Sciences, University of Trento, Rovereto, Trento, Italy

<sup>2</sup> Dipartimento di Psicologia e Scienze Cognitive, University of Trento, Rovereto, Trento, Italy

#### **Edited by:**

Andrea Benucci, RIKEN Brain Science Institute, Japan

#### **Reviewed by:**

Filippo Del Bene, Institut Curie, France Gonzalo G. De Polavieja, Instituto Cajal, CSIC, Spain

#### **\*Correspondence:**

Valeria Anna Sovrano, Center for Mind/Brain Sciences and Dipartimento di Psicologia e Scienze Cognitive, University of Trento, Palazzo Fedrigotti, C.so Bettini 31, Rovereto, Trento 38068, Italy e-mail: valeriaanna.sovrano@unitn.it

Fish are a complex taxonomic group, whose diversity and distance from other vertebrates well suits the comparative investigation of brain and behavior: in fish species we observe substantial differences with respect to the telencephalic organization of other vertebrates and an astonishing variety in the development and complexity of pallial structures. We will concentrate on the contribution of research on fish behavioral biology for the understanding of the evolution of the visual system. We shall review evidence concerning perceptual effects that reflect fundamental principles of the visual system functioning, highlighting the similarities and differences between distant fish groups and with other vertebrates. We will focus on perceptual effects reflecting some of the main tasks that the visual system must attain. In particular, we will deal with subjective contours and optical illusions, invariance effects, second order motion and biological motion and, finally, perceptual binding of object properties in a unified higher level representation.

**Keywords: perceptual organization, fish, chondrichthyes, osteichthyes, visual system, visual illusions, color constancy, perceptual binding**

# **THE FISH AS A MODEL OF OBJECT PROCESSING IN THE VISUAL SYSTEM**

Fish represent a highly complex taxonomic group, whose divergence from the other vertebrates is estimated to have occurred approximately 450 million years ago (Kumar and Hedges, 1998). Jawless fish (*Agnatha*) represent one of the oldest vertebrate forms (Foley and Janvier, 1993). Cartilaginous fishes (*Chondrichthyes*), which appeared about 400 million years ago, represent the oldest extant jawed vertebrates and preserve a number of their ancestral traits having evolved at a much slower rate than other classes (Martin et al., 1992). Contrary to mammals and avians, fish do not actually represent a single clade, but a paraphyletic collection of taxa, including jawless, cartilaginous and bony-fish species (Nelson, 2006). Within the bony-fishes, we find the Actinopterygii or ray-finned fishes, that alone represent the largest subclass of vertebrates, comprising of more than 30 thousand species (mostly belonging to the superorder of Teleosts). This great taxonomic diversity within fish species, and the phylogenetic distance that separates fish from other vertebrates, present an invaluable opportunity for the comparative investigation of brain and behavior in an evolutionary perspective. We will here concentrate on the contribution of research on the behavioral biology of fish for the understanding of the evolution of the visual system.

Many fish species rely mainly on vision, using it to guide a wide range of behaviors (Guthrie, 1986; Brown et al., 2011). Not surprisingly, it has been demonstrated that fish have well developed visual capabilities that match those of other vertebrates (von Frisch, 1914; Douglas and Djamgoz, 1990; Vallortigara, 2004; Brown et al., 2011). In the literature we find a number of studies on the perception of shape and color in fish species, showing for example that several Teleost fishes have excellent trichromatic color vision (Beauchamp, 1978), as well as the capacity to discriminate two- and three-dimensional shapes (Schaller, 1926; Herter, 1929, 1930; Hager, 1938; Meesters, 1940; Mackintosh and Sutherland, 1963; Sutherland, 1964; Mark, 1966; Wyzisk, 2005; Wyzisk and Neumeyer, 2007; Siebeck et al., 2009; Schluessel et al., 2012; Gierszewski et al., 2013). Motioperception has also been studied in fish, with a particular attention for model organisms such as zebrafish. Shortly after hatching zebrafish innately respond to movement with a characteristic optomotor response (Clark, 1981; Neuhauss et al., 1999). Different species of fish, from Elasmobranchs to Teleosts, have also revealed sophisticated cognitive abilities in the visual domain, distinguishing various shapes from their mirror image counterparts (Gierszewski et al., 2013) and succeeding in visual categorization tasks (Schluessel et al., 2012, 2014a,b; Schluessel, 2014).

With regards to the physiological substrate of vision, at the peripheral level the functioning of the fish visual system has been extensively studied (especially in morphology and electrophysiology). The great variety of taxonomic groups and ecologic niches that we observe in fish, together with their long evolutionary history, account for the surprising diversity documented in the organization and function of eyes of different species (Douglas and Djamgoz, 1990). In contrast, until recently, less was known about the organization and function of higher visual processing stations in the telencephalon, especially in comparison with other more well-studied taxa. In fish, as in amphibians and sauropods, we do not observe a layered structure resembling the mammalian neocortex, even though of course the general Bauplan of the vertebrate brain is respected (Wullimann, 1997; Northcutt, 2011). In recent years, our knowledge of the brain functioning and neuroecology of various fish groups has greatly increased (Teleosts, Broglio et al., 2011; for Elasmobranchs see Collin, 2012; Yopak, 2012a,b). This has revealed an astonishing variety in the development and complexity of pallial structures in different fish species, sometimes even when considering species belonging to close groups (Mueller et al., 2008; Mueller and Wullimann, 2009; Rodríguez-Moldes, 2009; e.g., Actinopterygii differ from all other vertebrates in that their telencephalon develops by eversion of the lateral walls and has no lateral ventricles; different species however show great variation in the degree of eversion, and thus in the pallial architecture, Nieuwenhuys, 2011). As it has been the case for other non-mammalian vertebrates, in the last decade scientists have started to recognize that fish telencephalon is not composed mostly of basal ganglia (subpallium), but includes wide pallial regions that bear homologies with the mammalian neocortex. These pallial structures potentially serve functions similar to the neocortex, instead of being simply devoted to olfactory processing (Wullimann and Mueller, 2004; Jarvis et al., 2005; Portavella and Vargas, 2005; Rodriguez et al., 2006; Costa et al., 2011). Despite these increasingly recognized homologies, the fish brain has clearly less computational power than what available to the primate cortex (Van Essen et al., 1992; Hansel and Sompolinsky, 1996; Kawai et al., 2001; Hill et al., 2003; Horton and Adams, 2005). Thus, the investigation of the perceptual and cognitive functioning of fish can provide information about the complexity of the neural circuitry required for a given function. This is especially true for those visual phenomena that have been traditionally considered limited to humans and only a few other mammals.

We shall review evidence obtained in different fish species concerning perceptual effects that reflect fundamental principles of the visual system functioning. We will highlight the similarities and differences between distant fish groups and with other vertebrates. Across most animal species the visual system faces similar challenges and must fulfill similar requirements to allow meaningful interaction with physical objects and adaptive responses to the external environment. In subsequent sections of the paper, we will focus on four primary tasks that a functional visual system must attain:



# **VISUAL INTERPOLATION PROCESSES: AMODAL COMPLETION AND ILLUSORY CONTOURS**

Visual illusions are instances of systematic discrepancy between a physical description of distal or proximal stimuli and perception. As such, they provide important insight about how the visual system operates (Bruce et al., 2003). In particular some illusions provide information on how the visual system integrates sensory stimulation into a unified representation (Nieder, 2002). The perception of illusory contours (which are not determined by a contrast gradient in the physical word, **Figure 1**) and the amodal completion of partially occluded objects are primary examples of the visual system's ability to interpolate visual information (Kanizsa, 1979). Both of these phenomena reflect grouping mechanisms that promote processing of objects as wholes and underlying neural mechanisms that represent object boundaries regardless of how they are defined in the sensory input (Sekuler and Palmer, 1992; Palmer, 1999; Kellman et al., 2001, 2005; see Nieder, 2002 for a review of neural mechanisms). These traits are likely to have emerged as a consequence of the adaptive need to segregate in a unitary percept partially occluded objects or objects presented through degraded visual information. In fact, form perception is possible because the visual system processes sensory information about shape, color, distance, and movement of objects according to its own system-specific rules (Kandel et al., 2000). Subjective contours are the manifestation of these principles, the action of a network that is predisposed to complete certain figural elements (Kanizsa, 1976; Gerbino and Salmaso, 1987; Purghé and Coren, 1992; Nieder and Wagner, 1999). The application of these processing principles allows the brain to reconstruct contours missing from the retinal image (Nieder and Wagner, 1999) and to selectively merge only some parts of the visual scene (Kandel et al., 2000). When perceiving subjective or amodal contours the visual system's response is based on assumptions on the likely state of things in the external word, rather than on the actual retinal input (Kanizsa, 1979; Day and Kasperczyk, 1983; Kandel et al., 2000). These assumptions are of course not to be intended as conscious explicit inferences, but rather reflect the action of prewired adaptive mechanisms available in the absence of previous experience at the individual level (e.g., Regolin and Vallortigara, 1995).

As we have mentioned above, a similar neural computational mechanism is purported to underlie both modal perception of illusory contours and amodal completion (Kellman and Shipley, 1991; Kellman et al., 2005) (e.g., filling-in mechanisms known in mammals Kellman et al., 1998; Kandel et al., 2000). Comparative research in fish has contributed to support this claim, revealing that species that are sensitive to one of the phenomena tend to also perceive the other (e.g., see Sovrano and Bisazza, 2008, 2009 for redtail splitfin fish).

Moreover, evidence obtained in fish species helped to understand the phylogenesis of this mechanism. The demonstration of susceptibility to amodal completion and illusory contours in this highly diverse taxonomic group, in addition to birds and mammals, suggests a conserved trait that is widespread in vertebrates and inherited from a common ancestor, rather than a case of convergent evolution in the different classes. In this regard it is particularly interesting to consider the high phylogenetic diversity of the fish species that respond to illusory contours and amodal completion. For example, illusory contours are perceived by teleosts as distant as Ostariophysi (redtail splitfin fish, *Xenotoca eiseni*) and Acanthopterygii (goldfish, *Carassius auratus*) (Wyzisk and Neumeyer, 2007; Sovrano and Bisazza, 2009). Surprisingly, while in the study of Sovrano and Bisazza (2009) redtail splitfins were able to recognize also illusory geometric shapes created by phase shifts or by interruption of diagonal lines, the goldfish tested by Wyzisk and Neumeyer (2007) could not recognize phase-shifted illusory shapes. However, this discrepancy may be due to a methodological problem in the stimuli of Wyzisk and Neumeyer, which consisted of very thin lines, reducing the strength of the illusory perception.

Similarly, amodal completion is observed in two species of Acanthopterygii (*Variola louti* and *Scarus niger*), in addition to the redtail splitfin fish (Sovrano and Bisazza, 2008; Darmaillacq et al., 2011; **Figure 2**). Recently it has been found that even cartilaginous fish (bamboo sharks, *Chiloscyllium griseum*) are susceptible to amodal completion and illusory contours (Fuss et al., 2014), despite being the oldest extant vertebrates and having conserved many of their ancestral traits (Martin et al., 1992).

Remarkable similarities in the distinctive traits of the visual interpolation effects observed in humans and in fish species further support the presence of a conserved mechanism. For example, both in goldfish and in human beings the perception of Kanizsa figures is disrupted by the superimposition of black lines (von der Heydt, 2004; Wyzisk and Neumeyer, 2007). This result, in humans, is considered consistent with the idea that neurons at the level of V2 are responsible for the perception of illusory contours. In primates, 60% of V2 neurons respond to illusory contours (von der Heydt et al., 1984), the same percentage observed in the visual Wulst of owls (Nieder and Wagner, 1999) 1 . This seems to suggest that forebrain structures should provide the neural basis of these phenomena in fish as well. However, a recent study in pigeons challenged the view that forebrain structures are mainly responsible for the perception of illusory contours. This study showed that pre-tectal neurons are capable or responding to real and subjective contours alike (Niu et al., 2006). Whether similar mesencephalic mechanisms are involved in the perception of illusory contours in fish is a question that calls for empirical investigation, in order to shed light on the phylogenesis of this trait in different classes.

# **GEOMETRICAL ILLUSIONS AND HIERARCHICAL PROCESSING**

Another widely studied class of perceptual phenomena, associated with grouping mechanisms, is that of geometrical size illusions, in which properties of a target stimulus, such as length, width, or diameter, are distorted by the surrounding context, providing an important tool for the study of perceptual integration of local elements into global context. Both mammalian and avian species are susceptible to geometrical illusions. For example, let us consider the Ponzo perspective illusion, in which two identical horizontal segments look different in length in the context of two converging lines, with the segment that is closer to the point of convergence appearing longer than the other. This illusion has been demonstrated in horses (Timney and Keil, 1996), monkeys (Bayne and Davis, 1983; Barbet and Fagot, 2002; see also Fujita, 1996), chimpanzees (Fujita, 1997), and pigeons (Fujita et al., 1991, 1993; Fujita, 2006; Nakamura et al., 2006, 2009). Similarly, the Müller-Lyer illusion (in which a line segment with two arrows facing outwards at the end appears longer than one with arrows facing inwards) deceives capuchins and rhesus monkeys

<sup>1</sup> Interestingly, perception of subjective contours and neurons performing interpolation operations that can support them have been found also in insects (van Hateren et al., 1990; Horridge et al., 1992).

(Suganuma et al., 2007; Tudusciuc and Nieder, 2010), as well as gray parrots (Pepperberg et al., 2008) and ring doves (Warden and Baar, 1929).

A less clear case seems to be that of the Ebbinghaus illusion, in which a central circle surrounded by large circular inducers is perceived as smaller than an identical circle surrounded by small inducers (**Figure 3**). This is one of the strongest geometrical illusions in humans (Ebbinghaus, 1902), but seems absent or even reversed in non-human primates (Parron and Fagot, 2007) and birds (pigeons and bantams Nakamura et al., 2008, 2014). In humans this illusion reflects the action of grouping mechanisms (as revealed by the fact that the strength of the illusion is influenced by the distance between the central target and the surrounding inducers, Roberts et al., 2005). Thus, the difficulty of obtaining evidence of its presence in non-human species seems to indicate a radical difference in the functioning of these mechanisms between our species and non-human animals. It has been also suggested that the neural circuitry underlying to the perception of the Ebbinghaus illusion might have evolved recently in mammals or even in the primate lineage (Parron and Fagot, 2007; Nakamura et al., 2008). However, this would be surprising given the evidence of widespread susceptibility to amodal completion and illusory contours (reflecting the action of interpolation and grouping mechanisms) in vertebrates ranging from mammals to different fish classes (see Section Visual interpolation processes: amodal completion and illusory contours). Notably, the three studies that failed to demonstrate human-like perception of the Ebbinghaus illusion in non-human animals all involved training and testing of the animals with touch screens, which require the subjects to perform a manipulative response (touching or pecking) and, in the case of pecking, also force a very close view of the stimuli when emitting the response. In humans, the Ebbinghaus illusion is also reduced when tested through motor tasks requiring a manipulative response (Aglioti et al., 1995; Danckert et al., 2002). This is in line with an involvement of the human neocortex, where the two independent neural pathways, the dorsal and the ventral stream, are responsible for visual awareness and for action control (Goodale and Milner,

1992). Moreover, forcing the subjects to inspect the stimuli from a close distance could have prompted them to pay attention only to the central target or to its immediate proximity. This could have caused the direction of the illusion to be reversed, transforming it into an assimilation illusion (analogous to what is observed in humans when the distal portion of the inducers is not visible, Oyama, 1960; Weintraub, 1979). In support of this interpretation, human-like perception of the Ebbinghaus illusion was reported in a recent study with domestic chicks that employed a more naturalistic training procedure, based on incidental learning, and a test procedure allowing the animals to observe the stimuli at a freely chosen looking distance (Rosa Salva et al., 2013). In this study, subjects that were habituated to finding food behind a screen depicting, for instance, a small orange circle, and then tested with the illusory configurations, preferred to look behind the screen depicting the perceptually smaller circle. Thus, when appropriate procedures are used, avian species are also found to be susceptible to the Ebbinghaus illusion. Most interestingly, using similar naturalistic training and testing procedures, we have been recently able to demonstrate the perception of this geometric illusion in teleost fish, finding that redtail splitfin fish also perceive the Ebbinghaus illusion as a contrast illusion (Sovrano, 2014; Sovrano et al., submitted). Different groups of fish were trained to locate the exit marked by a bigger or a smaller orange circle, in order to escape from the test arena and rejoin conspecifics. When tested with the illusory configurations, fish trained on the bigger orange circle preferred to approach the circle that appeared perceptually bigger in the Ebbinghaus display (i.e., the orange circle surrounded by small gray inducers). Similarly, fish reinforced on the smaller orange circle preferred to approach the illusory display in which the central circle appeared perceptually smaller (being surrounded by big inducers).

Moreover, in contrast with previous unsuccessful attempts with goldfish (Wyzisk, 2005; but see Herter, 1930 for an earlier report with small sample size, finding discrepant results), in a recent study it has been demonstrated that teleost fish (redtail splitfin) can perceive the Müller-Lyer illusion (Müller-Lyer, 1889), like humans and other vertebrates do (Sovrano, 2014; Sovrano et al., in preparation; **Figure 4**). Fish were trained to discriminate between two lines of different length. Reinforcement was provided by the possibility to rejoin conspecifics, escaping from the test arena through an exit, recognizable since it was marked by a longer or a shorter line. Then fish were presented with two lines of the same length with two arrow-shaped inducers facing inwards or outwards. Subjects chose the stimulus that, on the basis of the perception of the Müller-Lyer illusion, appeared deceptively larger or smaller, consistent with the condition of training. Curiously enough, another existing study investigating the perception of the Müller-Lyer display in a fish species revealed that bamboo sharks are not deceived by this illusion (Fuss et al., 2014). Elasmobranchs (sharks and rays) belong to the class of cartilaginous fishes. Thus, a possibility for reconciling these contradictory results would be to hypothesize that cartilaginous and bony fish differ in their ability to perceive geometric illusions in general, or the Müller-Lyer display in particular. This would have important implications for our understanding of the phylogenesis of the visual system, indicating that the neural substrate for the

perception of this geometrical illusion could have evolved after the separation of cartilaginous and bony fish. Due to the great phylogenetic distance between sharks and teleost, and in particular to divergent developmental processing (Northcutt, 1977; Wullimann and Mueller, 2004; Nieuwenhuys, 2009), major differences can be observed in the brain organization between these different classes, justifying the idea of a real dissociation of perceptual mechanisms available to cartilaginous and bony fishes. Notably, bamboo sharks tested in the same study were able to perceive Kanizsa figures and illusory contours (Fuss et al., 2014). This could indicate that the perception of subjective contours depends on conserved neural mechanisms that emerged earlier in phylogenesis than those underlying to the perception of the Müller-Lyer illusion, which could have been evolved after the divergence of cartilaginous and bony fish. Another possible interpretation would be, of course, that the mechanism allowing perception of subjective contours has an adaptive value in a wider range of species, including Elasmobranchs, and has thus been evolved independently multiple times. However, caution is needed before venturing too far with evolutionary interpretations on the basis of data collected only in two species and in two studies that employed different training methodologies. In the study of Fuss et al. (2014) sharks were food reinforced for pressing their nose against the wall just below/onto the positive stimulus, implying a very close inspection of the stimuli. On the contrary, red tail splitfins learned to use line length to orient in the test tank and locate its exit. Also, for the bamboo sharks tested by Fuss and colleagues, learning the line-length discrimination task resulted much more difficult than the other tasks trained in this study (e.g., in Experiment 3a only three sharks out of eight managed to learn to discriminate three pairs of lines based on their length, and none of them was able to learn the fourth pair proposed). Bamboo sharks seem thus to be not very sensitive to differences in line lengths in general, even when these differences are real rather than illusory. Interestingly, the goldfish trained by Wyzisk (2005), who also did not seem to perceive the Müller-Lyer illusion, had an even worse performance in learning the line discrimination task than the bamboo sharks. It is thus possible to hypothesize that the illusion itself could affect also sharks and goldfish, but that its extent, in the version tested by Fuss et al. (2014) would not be enough to create a sufficiently pronounced difference in perceived line length to reliably sustain performance. In fact, one of the six individuals tested in Experiment 3b seemed to be affected by the illusion, systematically choosing the display with inverted arrowheads. Also in the human species the Müller-Lyer illusion evokes only a slight deception and does not affect all individuals (Rivers, 1901; Segall et al., 1966; Berry, 1968), revealing again a striking similarity between the mechanisms present in very distant species.

The perception of geometrical illusions, such as those created by the Ebbinghaus or Müller-Lyer displays, has been often linked to the tendency of a species or of an individual to apply either a more global or a local processing strategy (Parron and Fagot, 2007; Nakamura et al., 2009, 2014; Rosa Salva et al., 2013). In fact, the tendency of the visual system to process visual configurations as wholes, rather than focusing on single details in isolation, allows contextual elements surrounding the target object to distort its perception. Since the seminal work of Navon (1977, 1981), hierarchical stimuli have been used to investigate the interplay of local and global processing in different species and in different tasks. In hierarchical stimuli a bigger global configuration is created by the juxtaposition of many smaller figures (**Figure 5**). The human species seems to be endowed with a remarkably globally-oriented perceptual style that makes us see "*the forest before the trees*" (Navon, 1977). That is to say, in most situations we tend to prioritize the processing of the bigger configuration (global level), rather than of the smaller figures composing it. On the contrary, evidence obtained in non-human primates and in some other species seemed to indicate a general tendency to prioritize the local information about the individual shapes, bringing some authors to suggest that a globally-oriented perceptual style would be limited to humans, with the possible exception of some great apes (e.g., Fagot and Deruelle, 1997; Deruelle and Fagot, 1998; Cavoto and Cook, 2001). Over the years evidence accumulated indicating that this is likely to be an extreme oversimplification. For instance, depending on the context of the current task and on viewing conditions, humans can display a locally oriented perceptual style (Kimchi, 1992), whereas pigeons (traditionally considered an exemplar case of locallyoriented perception, Cerella, 1980; Cavoto and Cook, 2001) are able to flexibly switch the focus of their attention between the local and the global level (Fremouw et al., 1998, 2002). Notably, the first clear demonstration of global dominance in the perception of hierarchical stimuli in non-human animals has been obtained few years ago in red tail splitfin fish trained according to the same general procedure described above for the demonstration of the Ebbinghaus and Müller-lyer illusions (Truppa et al., 2010). Again, this suggests that, when ecologically valid training and testing procedures are used, it is possible to demonstrate remarkable similarities in the grouping mechanism employed by the visual system of fish and of other vertebrates, despite great phylogenetic distance.

**processing of hierarchical stimuli in Xenotoca eiseni**. On the left side are the consistent stimuli presented, in which the same shape is represented at the global and local level; on the right side are the inconsistent stimuli, in which the shape information provided by the local and the global level conflict. Across the three different conditions (**A**, **B** and **C**), stimuli differed in absolute size and in the density of the local elements.

# **INVARIANCE EFFECTS: IS CORTEX NEEDED FOR INVARIANT COLOR PERCEPTION?**

Some of the visual illusions mentioned above have been hypothesized to reflect the action of adaptations evolved to ensure invariance in perception, despite huge variations in the physical parameters of the retinal input (e.g., Gregory, 1963; but see Humphrey and Morgan, 1965). For example, the Ponzo perspective illusion might involve the same mechanisms that give rise to the perception of size invariance (the tendency to perceive the absolute size of a known object, despite differences in the size of the pattern projected on the retina when the object is viewed from various distances) (Gregory, 1963; Fujita, 1996; but see Georgeson and Blakemore, 1973; Newman and Newman, 1974). Research in fish species has a long tradition for the investigation of size invariance, that has been demonstrated repeatedly in Actinopterygii species (Herter, 1930; Douglas et al., 1988; Schuster et al., 2004; Frech et al., 2012). In addition to that, more recently, form invariance has also been shown in Malawi cichlids (*Pseudotropheus* sp., Schluessel et al., 2014b; see Wood, 2013 for evidence that the ability to form viewpoint-invariant representations of 3D objects represents a core and experienceindependent cognitive trait).

Here we will concentrate on an exemplar case, describing the contribution of fish as an animal model of the physiological basis of color invariance, the mechanisms by which the visual system recognizes an object as having a consistent color regardless of the spectral composition of the light reflecting from it at a given moment (see Foster, 2011 for a comprehensive review on this phenomenon). Simultaneous color contrast is a related phenomenon to color invariance. In this case the perceived hue of a small visual region is altered by the presence of a colored surround: gray regions are perceived as of a hue complementary to that of the surround, whereas colored regions assume a hue "away" from that of the surround (Graham and Brown, 1965).

At the behavioral level, research on a very popular model organism, the goldfish, has demonstrated that this species is able to make color-constant judgments, implying the perception of color invariance (Ingle, 1985; Neumeyer et al., 2002). Simultaneous color contrast has been demonstrated in various Teleost species, including goldfish and other two Cyprinids (*Tinca vulgaris* and *Barbus paripentazona*), two Cichlid (*Hemichromis bimaculatus* and *Pterophyllum scalare*), the three-spined stickleback and a Gasterosteidae (*Gasterosteus aculeatus*) (Herter, 1950; Dörr and Neumeyer, 1997).

One of the most relevant models for understanding how the visual system could implement color invariance and color contrast effects is the *retinex* model by Edwin Land (Land, 1959a,b, 1983; McCann and Benton, 1969; Land and McCann, 1971; Land et al., 1983). This model theorizes a mechanism that computes, for each visual region, the relations between spectral features, based on the comparison of the lightness information provided by each photoreceptor system, and then collates them between distant regions<sup>2</sup> . The term *retinex* was coined combining the words retina and cortex, due to the uncertainty on the location of the neural substrate for these computations. Neural mechanisms underlying to color invariance have been identified over the years: partial chromatic adaptation (withinclass cone adaptation), spatial comparisons of cone and coneopponent signals and invariant responses. These operate at different levels in the visual system. An incomplete chromatic adaptation takes place in the retina's horizontal cells and in the geniculate nucleus (Creutzfeldt et al., 1991a,b; Lee et al., 1999). In line with what hypothesized by Land, recordings in the retina of goldfish revealed that the horizontal-cell network modulates the processing of cone signals so as to render the ratio of the responses of the three cone-systems stable across illumination conditions (Kamermans et al., 1998; Kraaij et al., 1998).

However, retinal adaptation mechanisms act locally and are not sufficient to fully explain the phenomena associated with color invariance. It is thus believed that, in the primate visual system, computations over spatially extended regions accounting for non-local effects take place in the primary visual cortex V1 or at higher stages of processing (e.g., V4) (Foster, 2011). Two different mechanisms for color-invariance have been discovered

<sup>2</sup> In fact, variations in spectral composition of distant regions of the visual field affect the perceived color of an object as much as in nearby regions (Land, 1983; Land et al., 1983).

in primate V1. The first one is still involved in computations over less spatially extended regions and is based on doubleopponent neurons that present both color and spatial opponency. This allows the computation of local ratios of cone activity, in line with what predicted by the *retinex* model. Double opponent cells, before being identified in the primary visual cortex of macaques (Conway, 2001; Conway and Livingstone, 2006), were first discovered in the goldfish retina (Daw, 1967), providing a neural substrate that could partially support color invariance in this species. However, this mechanism can compute the relations between reflectance of nearby areas only. It is thus not sufficient to fully explain color invariance, which involves effects over more spatially extended regions (Land, 1983; Land et al., 1983). In monkeys, networks supporting such comparisons have been identified in V1 and V4 (see Foster, 2011 for a review). In fish there are no known cortex homologs, prompting a question about which neural substrate supports this shared phenomenon within such a differently organized visual system.

# **SECOND ORDER MOTION AND BIOLOGICAL MOTION**

Up to now we have explored the perception of static visual objects, with particular attention to grouping mechanisms ensuring the perception of objects as units segregated from the background and to mechanisms that allow to perceive objects' properties as constant, despite the continuos variation of the physical input reaching the retina. We will now examine the contribution of research on fish species to our understanding of the mechanisms underlying the perception of two peculiar kinds of motion, second order motion and biological motion. We want to warn the reader, however, that this is somewhat an arbitrary distinction that we follow for the sake of argumentation. For instance, it is well known that motion is an extremely important cue for objectbackground segregation (biological or agentive motion represents a paradigmatic case on this regard Bertenthal and Pinto, 1994; Oram and Perrett, 1996; Giese and Poggio, 2003; Ibbotson, 2007; Nishida, 2011).

Objects that are moving in space are changing their current state and need to be more closely monitored than static objects. Immediate recognition and effective processing of movement in a visual scene is thus crucial for survival and widespread in animal species. On the contrary, only vertebrates having a more sophisticated visual system (i.e., an elaborated cortex, such as that of mammals), were traditionally supposed to be able to perceive second order motion (Ohzawa, 1999). Second order motion is a peculiar type of motion impression elicited by stimuli in which only second-order features, such as contrast, texture or flicker, are moving (also known as non-Fourier motion) (Ramachandran et al., 1973; Chubb and Sperling, 1988; Cavanagh and Mather, 1989). There is electrophysiological, psychophysical and neuropsychological evidence that, in the cortex of mammals, secondorder motion is carried out by a dedicate stream (Albright, 1992; Zhou and Baker, 1993; Smith et al., 1998; Baker, 1999). This supported the view according to which the perception of second-order motion would represent an instance of "higher level" motion processing, limited to primates and few other mammals. Despite that, we now know that zebrafish larvae show an optomotor response to motion stimuli that is qualitatively similar to what is observed in primates, reacting in the same way to first- and second-order motion (Orger et al., 2000; see Theobald et al., 2008 for subsequent evidence of second-order motion perception in invertebrates). This strongly undermines the idea that a primate-like organized visual cortex is necessary to perceive second-order motion, suggesting that this is already processed in earlier stages of vertebrates' visual system (possibly even on the basis of retinal sensitivity to some second-order features, Shapley and Victor, 1978; Demb et al., 2001). However, it is also possible to hypothesize that similar computations to those occurring in the primate cortex to support the perception of second-order motion are carried out by circuitry located in pallial structures of the fish telencephalon (see Jarvis et al., 2005 for a review on the homologies between non-mammalian pallium and mammalian neocortex).

Not all forms of motion are equally relevant for survival: objects belonging to biologically relevant categories, such as conspecifics, preys and predators, can be recognized thanks to the presence of specific movement patterns, typical of animate creatures in general or of a given species in particular. Humans' extreme sensitivity to the motion of biological creatures (biological motion) has been revealed using the so called point-light displays (PLD; Johansson, 1973). In these stimuli only a dozen of isolated light-points are visible, strategically placed on the major limb joints of a moving person (or animal), presented on an otherwise homogeneous background. As a consequence, PLD provide very little information about the shape or outline of the moving figure, presenting selectively the motion information. Despite the very sparse visual information available in PLD, as soon as these are put in motion, the impression of a moving animate creature is immediately and inevitably elicited in human adults. Human observers are also able to extract rapidly and effortlessly a large amount of information from PLD of biological motion, even in conditions of degraded visual presentation (Runeson and Frykholm, 1983; Bertenthal and Pinto, 1994; Neri et al., 1998; Sumi, 2000; Troje, 2002; Thurman and Grossman, 2008; Alaerts et al., 2011; Sokolov et al., 2011; Pavlova, 2012; Krüger et al., 2013). Specialized neural circuits for the processing of biological motion have been found in the temporal cortex of human and nonhuman primates (in the superior temporal sulcus, STS, Oram and Perrett, 1994; Grossman et al., 2000; Vaina et al., 2001; Jastorff et al., 2012). This cortical specialization emerges during ontogenesis through the interaction of predisposed mechanisms that prioritize the processing of some specific motion features typical of animate creatures and of the extensive expertise we gain by constant exposure to and processing of this sort of stimulus. In fact, the ability to recognize biological motion depicted in PLD, and the tendency to pay preferential attention to this stimulus, is already present in newborn infants (Simion et al., 2008). Most interestingly, analogous abilities and predispositions to process semi-rigid motion had been previously reported in visually naive newly hatched chicks and quails (Yamaguchi and Fujita, 1999; Regolin et al., 2000; Vallortigara et al., 2005; Vallortigara and Regolin, 2006), suggesting the presence of conserved mechanisms in distant vertebrate species (Johnson, 2006; Troje and Westhoff, 2006; Vallortigara, 2012). Conditioning procedures have been used to prove that also other species of mammals and avians can be trained to discriminate biological motion, which could support the idea of homologous mechanisms (Perrett et al., 1990; Omori and Watanabe, 1996; Dittrich et al., 1998; Tomonaga, 2001; Troje and Aust, 2013). However, one of the most remarkable features of human perception of biological motion is the fact that processing of PLD occurs in an effortless and preattentive manner (e.g., Thornton and Vuong, 2004). To understand whether similar mechanisms are employed also by non-human species it is important to test the presence of spontaneous responses to biological motion stimuli. Until recently, galliformes were the only species in which researchers demonstrated a spontaneous response to biological motion resembling what is observed in humans, with the possible exception of female marmosets (Brown et al., 2010). Nothing at all was known about the ability to perceive biological motion in classes other than mammals and avians. To fill this gap, Nakayasu and Watanabe (2014) exploited the spontaneous tendency of medaka fish (*Oryzias latipes*, another member of the class of Actinopterygii, family Adrianichthyidae) to increase shoaling behavior when seeing moving conspecifics. This indicates that visual mechanisms for the detection of biological motion could be evolutionarily more conserved than previously thought. In this study, medaka fish spent significantly more time swimming along a screen on which they could see a PLD of a swimming conspecific than along a screen on which a PLD of a rigid motion was visible. In addition, medakas proved to be able to discriminate different kinds of biological motion, preferring the motion pattern of conspecifics to human motion and being particularly sensitive to the smoothness and the speed of the movement. This is particularly relevant since, also in our species, the speed of movement can drastically alter the perception of biological motion, with abnormal speeds giving the impression of unnatural (e.g., robotic or moon-walk) movements (Barclay et al., 1978). Moreover, both humans (Kozlowski and Cutting, 1977; Barclay et al., 1978; Cai et al., 2011) and the fish tested by Nakayasu and Watanabe (2014) seem to be more affected if the movement sequences were slowed down than if velocity was increased.

# **BINDING OF MULTIPLE PROPERTIES OF VISUAL OBJECTS IN A UNIFIED REPRESENTATION**

In the first part of this review we have concentrated mainly on early visual processes that, starting from a fragmented retinal input, support the creation of a unitary object-percept with invariant properties (e.g., perceptual grouping mechanisms that ensure the processing of an object as a whole, involved in the perception of subjective contours and geometric illusions and possibly in invariance-effects Sekuler and Palmer, 1992; Palmer, 1999; Kellman et al., 2001, 2005). However, in order to interact effectively with objects in the real word, organisms must conduct also more advanced sensory processing that allows them to bind the multiple properties of a given object into a unified higherlevel representation. So, after an initial stage of processing carried out by specialized detectors responding selectively to different properties, such as shape, color and movement (Zeki and Shipp, 1988), the visual system must perform the challenging task of perceptual binding in order to allow adaptive behavior in the real world (Treisman, 1996; Roskies, 1999; Wolfe and Cave, 1999). Computationally, binding is considered a highly demanding task, requiring sophisticated neural circuitry to subtend it. Together with the fact that conjunction tasks seem to be particularly difficult for non-human primates (Smith et al., 2004), the absence of clear-cut evidence of this ability in invertebrates and in vertebrates with "simpler" nervous systems, supported the view that only the mammalian cortex (Zeki and Shipp, 1988; Shafritz et al., 2002; Robertson, 2003; Botly and De Rosa, 2009; DiCarlo et al., 2012) or the avian pallium (Cook, 1992; Blough and Blough, 1997; Jarvis et al., 2005; Katz et al., 2010) could provide a neural substrate with enough computational power for binding (Shettleworth, 2008). In the monkey brain, for example, a higher-level associative region (the superior temporal polysensory area, STPa) contains neurons whose response is driven by a conjunction of the properties of form and motion of walking agents (Van Essen et al., 1992; Oram and Perrett, 1996). Given the seemingly universal adaptive value of the capacity to bind multiple object features in a unified representation, however, it would be surprising that no other complexing-behaving animals, outside the mammal and avian classes, evolved this capability. In fact, earlier reports of binding-like abilities in invertebrates and anuran species (Ewert et al., 1979; Schubert et al., 2002) were recently followed by the demonstration that zebrafish can use feature-binding to direct their shoaling behavior (Neri, 2012; in order to demonstrate true perceptual binding, the animal must for example discriminate between two multiple-objects sets, each set containing both features in different objects, with the sole cue for discrimination being the way in which the two features are combined in the same visual object, Shepard et al., 1961; Treisman, 1996; Wolfe and Cave, 1999). In this study zebrafish spontaneously chose to associate with a "natural" movie of swimming conspecifics rather than with a backward version of the movie, while they did not react to another violation that also created an unfamiliar visual scene (movie presented upside down). In the backward movie, movement and shape information were both still present and virtually unaltered, but were inconsistent with each other. To recognize the original movie from the backward one fish needed to integrate form and motion, performing a conjunction task on two attributes that, in primates, are processed by different cortical regions (Zeki and Shipp, 1988; see Sajovic and Levinthal, 1982; Klar and Hoffmann, 2002; Masseck and Hoffmann, 2008, 2009; for evidence of dedicated centers for processing motion information in fish species). This result was then replicated in the same study (Neri, 2012) with computer generated stimuli that were more controlled, even though less natural: an image representing a side view of a zebrafish was moved along a linear trajectory, which could be either consistent or inconsistent with the orientation of the image of the zebrafish (the direction toward which it was facing). As long as a sufficient number of individuals was depicted in this artificial animation, zebrafish were able to direct their response on the basis of the conjunction of motion direction and shape orientation, even when stimuli were constructed using images of another species (needlefish, *Xenentodon*) or when only the frontal part of a zebrafish image was visible.

The implications of these results for our understanding of the way the visual system supports such sophisticated operations are apparent if we consider the vast disparity in available circuitry between primate and teleost (Van Essen et al., 1992; Hansel and Sompolinsky, 1996; Kawai et al., 2001; Hill et al., 2003; Horton and Adams, 2005). This means that the computations necessary for supporting perceptual binding need much less complex neural circuitry than we previously believed (Treisman, 1996; Shafritz et al., 2002; Robertson, 2003). Interestingly, a recent work on imprinting in domestic chicks revealed that these newborn and visually naive subjects spontaneously bind color and shape features into integrated representations at the onset of their experience with visual objects (Wood, 2014), suggesting the presence of a core mechanism devoted to this fundamental task.

# **CONCLUSIVE REMARKS**

We have reviewed studies that reveal the mechanisms used by the visual system of fish for adaptive object perception. The fundamental functioning principles that allow the appreciation of objects as unified entities, segregated from the background and characterized by invariant properties seem to be shared between species belonging to distant vertebrate classes, including the oldest extant jawed vertebrates. Moreover, Actinopterygii belonging to two different orders are able to perceive second-order motion and biological motion, whose perception in humans is ascribed to the action of specialized cortical areas, and to bind motion and shape properties of a single object in a higher order representation. Perceptual binding, in particular, is intimately linked to higher-level cognitive phenomena such as attention (Treisman, 1996; Robertson, 2003) and has been traditionally considered a computationally challenging task, requiring the full power of the mammalian neocortex.

One of the most important implications of these results is that they challenge the assumption that only the mammalian neocortex (or the avian pallium, Jarvis et al., 2005) has the computational power required to perform the sophisticated operations needed to perceive some of the above mentioned phenomena. The evidence reviewed in this paper must be interpreted in the context of the increasingly recognized presence of pallial structures in the fish telencephalon (e.g., Mueller and Wullimann, 2005, 2009). Nevertheless, the undeniable disparity in available circuitry between primates and Teleosts still needs to be considered (Van Essen et al., 1992; Hansel and Sompolinsky, 1996; Kawai et al., 2001; Hill et al., 2003; Horton and Adams, 2005). Existent studies in fish have already given insight in the neural mechanisms that support some of these shared abilities (e.g., in the case of color invariance), providing a most fruitful ground for further investigation. Another crucial aspect highlighted by research in fish is the similarity in the characteristics of the effects observed in distant classes of vertebrates. For example, both in fish and in humans the perception of Kanizsa figures is disrupted by the same manipulation (von der Heydt, 2004; Wyzisk and Neumeyer, 2007), and the perception of biological motion is similarly affected by changes in speed (Kozlowski and Cutting, 1977; Barclay et al., 1978; Cai et al., 2011; Nakayasu and Watanabe, 2014). These remarkable similarities may indicate an analogous organization, in distant vertebrates, of the neural circuitry involved in these perceptual effects.

On the basis of the above mentioned evidence that suggest "cortical-like" computational circuitry in fish, we can identify some important venues for future research. First of all, it is necessary to increase our knowledge of the organization and origins of the pallial structures in the fish telencephalon. Only by describing in greater detail the homologies between these structures and those composing the mammalian neocortex, we will be able to fully grasp the implications of the behavioral similarities that we have described here. A very promising approach on this regard is that offered by Mueller and Wullimann (2009), who used the zebrafish as a genetic model to search for developmental similarities between Teleosts and mammals, with a focus on early gene expression. These authors propose that the telencephalon of teleosts has evolved by partial eversion, recognizing homologies with all four mammalian pallial areas. In the light of the principle that recognition of homologies is independent of function and connectivity, we face some intriguing related questions. For example, are these similar perceptual functions implemented by homologous structures? Do these similar functions require structurally similar circuits sharing some specific patterns of connectivity? And, going back to behavioral research, what is possible to do with such brains? What is the role of homologies and structural analogies in the determination of the cognitive functions available to an organism?

Fish are an excellent model to investigate perceptual phenomena, not only for their great taxonomic diversity and peculiarly organized telencephalon, but also for the presence of sophisticated visually guided behavior, allowing one to investigate not only perceptual organization, but also higher cognitive visual functions (Schluessel et al., 2012, 2014a,b; Gierszewski et al., 2013; Schluessel, 2014). In addition to being amenable to traditional training procedures, fish perceptual abilities can be investigated also through more naturalistic incidental learning tasks allowing the animal to freely choose the viewing distance from the stimuli (Truppa et al., 2010; Sovrano et al., submitted; in preparation). On this regard it is important to consider the evidence that we have summarized on the perception of the Ebbinghaus illusion, of the Müller-Lyer illusion and on the processing of hierarchical stimuli (see Section Geometrical illusions and hierarchical processing). These three cases beautifully exemplify the importance of the availability of a number of procedures that can be employed in the same species. This possibility is a necessary prerequisite for a meaningful comparison of the results obtained in different species. We have, in fact, seen that the task-context may actually account for the apparent inter-species differences observed in the susceptibility to perceptual phenomena. In addition to advocating caution with the interpretation of evidence obtained in very diverse settings, we can also propose a venue for further research. Future studies should systematically explore, on the same set of animal models, the effect of the different tasks that are typically applied to different species. For example, it would be interesting to adapt to fish species the touch screen/skinner box procedures that are usually employed with pigeons and other birds. Fish can be trained to respond by touching the stimuli or pressing a button in order to obtain a food reward in the close proximity of the visual display. In this case, would they flexibly change their response similarly to what is seen in avian species? Would they adopt a more locally oriented perceptual style and a smaller attentional focus? It is interesting to note that the bamboo sharks tested by Fuss et al. (2014), which did not seem to perceive the Müller-Lyer illusion, were trained to respond by pressing their snout on the stimuli. Unfortunately, this is the only study that investigated the perception of this illusion in a cartilaginous fish species. We are thus unable to draw firm conclusions from this evidence, pointing once again to the need of a systematic investigation of this issue.

Most interestingly, recent studies have also started to exploit fish spontaneous shoaling behavior (Neri, 2012). This offers a great opportunity to study homologies in phenomena such as biological motion, whose perception in humans stands out for occurring in an effortless and preattentive manner (e.g., Thornton and Vuong, 2004). Indeed, spontaneous social responses to biological motion have been shown in naive chicks (Vallortigara et al., 2005; Vallortigara and Regolin, 2006), and, recently, also in medaka fish (Nakayasu and Watanabe, 2014). This highlights another promising venue for future research, which could put the study of perceptual processes and of their neural bases in the context of social behavior. A similar approach has been used with galliform chicks. Research in domestic chicks revealed that they are endowed with a set of unlearned perceptual and cognitive mechanisms that predispose them to appropriate social interactions. These early mechanisms are, thus, tightly linked to the evolutionary pressures posed by the social environment. Overall, chicks' perceptual and cognitive predispositions ensure preferential processing of stimuli associated with conspecifics, direct imprinting toward appropriate stimuli, maintain the brood cohesion and facilitate social learning (e.g., Regolin and Vallortigara, 1995; Johnston et al., 1998; Rosa Salva et al., 2009, 2010, 2011, 2012, in press; Daisley et al., 2010; Mascalzoni et al., 2010; Regolin et al., 2011; Vallortigara, 2012). With regard to fish species, a related approach can be found in the work of Rui Oliveira. This research is centered on the study of social competence and of the cognitive processes involved in it, with an integrative approach and a particular focus on the zebrafish as an animal model (Oliveira, 2012; Taborsky and Oliveira, 2012). Among other things, these studies aim to understand how the brain translates social information into flexible behavioral responses, how this impacts on individual fitness, and how this process is constrained by the individual developmental history or by tradeoffs with other adaptive competences (Taborsky and Oliveira, 2012). Teleost fish represent an ideal model to identify basic information processing mechanisms that provide the functional building blocks of social behavior across different species with varying social systems. In fact, among teleosts we have a pronounced diversity of social systems in closely related species. This allows for planned phylogenetic comparisons of perceptual and cognitive abilities. Moreover, model species such a zebrafish also offer genetic tools for the study of selected neural circuits (Oliveira, 2012), making this a most promising field of research for future interdisciplinary studies.

Future studies should thus capitalize on the potential insights offered by fish species to understand the evolution of the vertebrate visual system, especially by further investigating the neural correlates of perceptual organization in species belonging to distant taxa. On this regard, an important aim for future work should be to increase our knowledge of the perceptual abilities of species specifically selected because of their informative value, based on their phylogenetic relation with other species of known perceptual abilities. A particular case is that of jawless fish (Agnatha), such as lampreys and hagfish, whose susceptibility to some fundamental perceptual phenomena has never been tested, despite their great phylogenetic interest.

#### **ACKNOWLEDGMENTS**

We thank Dr. Sang Ah Lee for revising the English text.

## **REFERENCES**


Herter, K. (1930). Weitere Dressurversuche an Fische. *Z. Vgl. Physiol.* 11, 730–748.

Herter, K. (1950). Uber simultanen Farbkontrast bei Fishen. *Z. Vgl. Physiol.* 60, 283–300.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 July 2014; accepted: 09 September 2014; published online: 29 September 2014*.

*Citation: Rosa Salva O, Sovrano VA and Vallortigara G (2014) What can fish brains tell us about visual perception? Front. Neural Circuits 8:119. doi: 10.3389/fncir.2014.00119 This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Rosa Salva, Sovrano and Vallortigara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Concept learning and the use of three common psychophysical paradigms in the archerfish (Toxotes chatareus)

# *Cait Newport1\*, GuyWallis 2,3 and Ulrike E. Siebeck1*

<sup>1</sup> Laboratory for Visual Neuroethology, School of Biomedical Sciences, The University of Queensland, Brisbane, QLD, Australia

<sup>2</sup> Centre for Sensorimotor Neuroscience, School of Human Movement Studies, The University of Queensland, Brisbane, QLD, Australia

<sup>3</sup> Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia

#### *Edited by:*

David D. Cox, Harvard University, USA *Reviewed by:*

Ronen Segev, Ben-Gurion University, Israel Philip M. Meier, Brain Corporation, USA

#### *\*Correspondence:*

Cait Newport, Laboratory for Visual Neuroethology, School of Biomedical Sciences, The University of Queensland, Building 64, Brisbane, QLD 4072, Australia e-mail: c.newport@uq.edu.au

Archerfish are well known for their specialized hunting technique of spitting water at prey located above the water line. This unique ability has made them a popular focus of study as researchers try to understand the mechanisms involved in targeting and spitting. In more recent years, archerfish have also become an increasingly popular model for studying visual discrimination and learning in general. Until now, only the alternative forced-choice (AFC) task has been used with archerfish, however, they may be capable of learning other classical discrimination tasks. As well as providing alternative, and potentially more efficient, means for testing their visual capabilities, these other tasks may also provide deeper insight into the extent to which an organism with no cortex can grasp the concepts underlying these tasks. In this paper, we consider both the matched-to-sample (MTS) and the odd-one-out (OOO) tasks as they require the subject to learn relatively sophisticated concepts rather than a straight, stimulus-reward relationship, of the kind underlying AFC tasks. A variety of line drawings displayed on a monitor were used as stimuli.We first determined if archerfish could complete the MTS and OOO test and then evaluated their ability to be retrained to new stimuli using a 4-AFC test. We found that archerfish were unable to learn the MTS and had only a limited capacity for learning the OOO task. We conclude that the MTS and OOO are impractical as paradigms for behavioral experiments with archerfish. However, the archerfish could rapidly learn to complete an AFC test and select the conditioned stimulus with a high degree of accuracy when faced with four stimuli, making this a powerful test for behavioral studies testing visual discrimination. In addition, the fish were able to learn the concept of oddity under particular training circumstances. This paper adds to the growing evidence that animals without a cortex are capable of learning some higher order concepts.

**Keywords: visual discrimination, behavior, matched-to-sample, alternative forced-choice, odd-one-out**

# **INTRODUCTION**

For many organisms, vision represents the primary source of sensory information for guiding behavior. However, to date, the majority of what we have learnt about the processing of visual information has been gleaned through the study of a remarkably small range of higher vertebrates (cat, rabbit, monkey, and human). Because these animals all possess a cerebral cortex, many visual tasks, including object recognition, have been investigated in the context of the considerable processing capacity which a cortex provides, permitting the development of complex and or highly specialized models of how we solve specific visual recognition tasks. However, there is evidence to suggest that much simpler models may be sufficient to explain certain visual recognition abilities. One way to understand more about the mechanisms underlying visual recognition is to determine how animals lacking a cortex process complex visual information. If, for example, animals without a cortex are able to perform specific tasks competently, it suggests that, in that instance at least, specialized cortical

systems may well not be required after all. Conversely, if they struggle to perform a task this may indicate a significant processing contribution of the cortex in that case. Fish represent an ideal model organism as they lack a cortex, yet show sophisticated visual behaviors and can be trained to complete behavioral experiments.

The majority of our knowledge about the visual system of fish comes from the fields of morphology and electrophysiology, with only relative few studies choosing to employ behavioral experiments to explore the animal's visual abilities. Psychophysical (behavioral) tests offer an important means for determining properties of the visual capabilities of fish (e.g., absolute sensitivity, contrast sensitivity, spatial resolution, spectral sensitivity) but they can also be designed to provide important information about the underlying mechanisms of information processing. One area that has been explored behaviorally is how fish discriminate and/or categorize shapes. These studies have shown that fish can perform seemingly complex visual tasks such as image categorization (Schluessel et al., 2012), amodal completion (Sovrano and Bisazza, 2008), and perception of illusory contours (Wyzisk and Neumeyer, 2007). A range of species have been used in experiments including goldfish (Mackintosh and Sutherland, 1963; Bowman and Sutherland, 1969, 1970; Sutherland, 1969; Sutherland and Bowman, 1969; Douglas et al., 1988; Wyzisk and Neumeyer, 2007), redtail splitfin (Truppa et al., 2010), cichlids (Schluessel et al., 2012), damselfish (Mussi et al., 2005; Siebeck et al., 2009, 2010), groupers (Darmaillacq et al., 2011), parrotfish (Darmaillacq et al., 2011), weakly electric fish (Schuster and Amtsfeld, 2002; von der Emde et al., 2010), rays (Van-Eyk et al., 2011), and archerfish (Schuster et al., 2004; Schlegel et al., 2006; Segev et al., 2007; Ben-Simon et al., 2012b; Gabay et al., 2013; Newport et al., 2013; Rischawy and Schuster, 2013).

Archerfish are becoming increasingly popular as subjects for visual discrimination studies due in part to their unique hunting technique of knocking down insects in overhanging foliage using a jet of water. Several studies have focused on the mechanisms required for spitting (Milburn and Alexander, 1976; Waxman and McCleave, 1978; Elshoud and Koomen, 1985; Timmermans, 2000; Timmermans and Vossen, 2000; Timmermans, 2001; Rossel et al., 2002; Timmermans and Souren, 2004; Schuster et al., 2006; Schlegel and Schuster, 2008; Vailati et al., 2012) as well as their visual capabilities (Braekevelt, 1985a,b; Temple et al., 2010; Ben-Simon et al., 2012b; Temple et al., 2013). Recently a number of studies have also focused on the neural mechanisms of visual discrimination (Schuster et al., 2004; Schlegel et al., 2006; Segev et al., 2007; Ben-Simon et al., 2012a; Ben-Tov et al., 2013; Gabay et al., 2013; Rischawy and Schuster, 2013).

The goal of visual discrimination studies is to understand the circumstances under which a subject can perform relevant learning and discrimination, and beyond that, the robustness of the underlying representations to new exemplars of a target or to other objects within a category. In general terms, discrimination tasks in fish operate in a manner not unlike those conducted on human subjects. Visual stimuli are presented to the subject and some form of behavior is recorded as a response. Psychophysics tests can rely on observations of innate behaviors such as optomotor response or eye movements, as well as learned behaviors instantiated through classical and/or operant conditioning. Archerfish are particularly well suited for operant conditioning experiments as they are easily trainable, highly motivated, and their method of stimulus selection (i.e., hitting stimuli with a jet of water) produces an easily measurable response.

There are a number of psychophysical tests that can be employed to test the visual capabilities of fish (Schuster et al., 2011); however, a common approach is the two-alternative forcedchoice (2-AFC) task. In this task, subjects are conditioned to associate a particular stimulus with a reward. The test involves identifying the conditioned stimulus (S+) when it is presented together with a single unconditioned distracter stimulus (S−). Archerfish have also been trained successfully to complete a 4-AFC task in which S+ is one of four stimuli (Ben-Simon et al., 2012b; Newport et al., 2013). While this test can be used to answer a wide range of questions about what an animal can discriminate, the conditioning process can be arduous as subjects have to be retrained to a new set of S+/S− stimuli following the completion of a particular experiment.

There are other psychophysical tests that do not require conditioning to particular stimuli but instead rely on the subject's ability to learn associative rules such as the matched-to-sample (MTS) and odd-one-out (OOO) tasks. In the MTS task, the goal is for the subject to match a sample stimulus with a comparison stimulus (S+) shown in the presence of a distractor (S−). The sample can either be shown together with the comparison and distractor stimuli (simultaneous MTS), or the sample can be shown prior to the comparison and distractor stimuli being presented (delayed MTS). The delayed MTS can be used as a test of both working memory and visual discrimination ability. In a complementary method to the MTS, called the oddity-from-sample (OFS), the rewarded stimulus is the one which does not match the sample. Both reward systems require that subjects are able to discriminate the stimuli and to remember the sample.

The OOO task requires that subjects select a stimulus that is different amongst a set of like distracters. Unlike the delayed MTS/OFS paradigm, the OOO places only weak, if any, demands on working memory; subjects must simply discriminate between stimuli. However, crucially, in both types of task, subjects must learn the general concept of the task rather than simply associating a particular stimulus with a reward. Although conceptually more challenging, the advantage of the MTS/OFS and OOO tasks is that subjects do not need to be continually retrained to new stimuli. Not only can this decrease the time required to run an experiment, but also means that the discrimination capabilities of the subject can be tested with many stimuli, not just a particular conditioned one. It also makes it possible to reverse the role of test stimuli between target and distractor, reducing the chance that behavior is being driven by some inherent affinity the subject has for a particular visual feature or brightness level etc.

Knowing that archerfish can complete the MTS/OFS and OOO would be useful for the design of future discrimination experiments for several practical reasons, but may also provide insights into the cognitive abilities of these fish, namely their capacity for concept learning. Humans are notable in the animal kingdom for their extensive use of advanced concepts which are the foundation for the creation of language and numbers. Learning concepts can provide significant advantages to animals by allowing them to transfer previously gained knowledge to new objects and situations. As a result, it is reasonable to assume that these abilities did not arise solely in humans but have origins in other animals. Indeed, reports that animals can learn concepts (see Zentall et al., 2008, for a review of concept learning in animals) provide further evidence for this hypothesis. In humans the area of the brain associated with conceptual learning is the cortex (Martin, 2007; Binder and Desai, 2011). If fish, which lack a cortex, are unable to learn either of these tests it may suggest that they have trouble learning the associated concept and that the cortex is a requirement of higher learning. Likewise, if archerfish are able to learn the OOO task but not the delayed MTS/OFS task it may imply that they can learn concepts but do not have an adequate working memory. As a result, the inability of fish to perform a specific task may be just as telling as their ability to do it.

Most visual experiments involving fish have so far have used AFC tasks, however, Goldman and Shapiro (1979) and Zerbolio and Royalty (1983) did show that goldfish could complete a simultaneous MTS/OFS task. In a more recent study, experimenters were unable to train cichlids to complete a similar simultaneous MTS task (Gierszewski et al., 2013). It is important to note that all tests with fish have used a simultaneous MTS/OFS test where three stimuli were presented in each trial (the sample, S+ and S−). While subjects could solve the task by matching the sample with the comparison stimulus, it could also be solved by simply selecting or avoiding the stimulus that is different from the other two. As a result, it is impossible to determine if the fish had learned a matching task or an oddity task. We aimed to determine if archerfish could complete either or both of these tasks. To ensure that the fish were learning the MTS/OFS and not simply solving based on oddity, a delayed MTS/OFS was tested for the first time. As a comparison, the archerfish were additionally trained to complete a 4-AFC test. These results were used to evaluate how quickly archerfish could be retrained to new stimuli.

# **MATERIALS AND METHODS**

#### **SUBJECTS**

Seven large-scale archerfish (*Toxotes chatareus*; Hamilton, 1822) were purchased from local suppliers. The total length ranged from 6 to 10 cm. All fish were kept in accordance with The University of Queensland Animal Ethics Committee approval (AEC Approval number: SBMS/241/12). Subjects were housed in individual aquaria (30 cm × 30cm × 60 cm) that served as both a holding and experimental tank. The fish were kept under a 12:12 h light: dark cycle using full spectrum fluorescent lights (F36T8/840, Cool White, Crompton, Australia) and supplied with recirculating fresh water maintained at 24 ± 0.5◦C. Opaque dividers were placed between aquaria to ensure fish were unable to see each other, and therefore eliminate the possibility of observational learning. Fish were fed mini pellets (Cichlid Gold®, Kyorin Co. Ltd., Japan) daily as part of experiments. The fish had different levels of previous experience; however, all subjects had at least been pre-trained to spit at stimuli presented on a monitor, following methods described in Newport et al. (2013).

#### **APPARATUS**

Stimuli were displayed on a 15 inch LCD monitor (SyncMaster 153v, Samsung) with a Plexiglas housing. This was suspended above the aquaria and oriented parallel to the water's surface, as described in Newport et al. (2013). The stimuli were presented in different positions on the monitor depending on the experimental paradigm (see General Procedure). All stimuli were created using Microsoft PowerPoint and Adobe Photoshop CS5 (**Table 1**) and were 2.5 cm x 2.5 cm in size.

#### **GENERAL PROCEDURE**

Our aim was to test whether archerfish could learn the concepts required to solve OOO and MTS tests. A total of four experiments were conducted: (1) the OOO, (2) the delayed MTS/OFS, (3) the simultaneous MTS/OFS and (4) the 4-AFC. Different approaches can be used to train subjects and because neither the OOO nor MTS/OFS tests had been tested in archerfish before, the ideal training procedure was unknown. As a result, a series of training

approaches was attempted so that if one method did not work, it would be possible to progress to a new one. A variety of simple line drawings (see **Table 1**), were used as stimuli in all experiments. These stimuli were chosen because Newport et al. (2013) showed that archerfish were able to easily discriminate these shapes. In our previous study, archerfish were trained to discriminate four shapes using a 4-AFC test (one S+ and three different S− stimuli). These results not only showed that archerfish use a variety of strategies when making decisions about stimuli but also that they are able to discriminate four trained shapes from 60 novel ones. Here, we also use shapes because they are easily discriminable by archerfish and therefore any breakdown in performance was more likely due to problems with the test itself and not the stimuli used. Methods for each experiment are described in detail below but see **Table 1** for a summary of all methods and stimuli used.

In all experiments, archerfish selected a stimulus by hitting it with a jet of water (referred to as 'a hit'). The fish were rewarded with one food pellet each time they correctly hit S+. Incorrect choices terminated the trial without a reward and stimuli were removed from the monitor, except in some initial training sessions (the first 1–2 sessions) where the fish were given the opportunity to select various stimuli until they hit S+, at which point they were rewarded. This was to help the fish learn which stimulus was correct. In allfollowing sessions the fish were only given one chance to make a selection. A squeegee was then used to remove water from the Perspex® monitor cover. The next trial began after a brief delay. An individual was considered to have successfully learned the task once performance was significantly different from chance for two consecutive sessions (see Statistical Analysis for statistical calculations).

#### *Odd-one-out*

Four fish (Fish 1, 2, 3, and 4) were trained to select the odd stimulus (S+) out of three other identical stimuli (S−). Four shapes (S1, S2, S3, and S4) were used as stimuli (**Table 1**) and all shapes could be both rewarded and unrewarded depending on whether they were acting as S+ or S−. In any given trial, only two of the four possible stimuli were presented, one being S+ and the other S−. There were four stimulus display positions on the monitor (monitor coordinates: −200 150, 200 150, −200 −150, and 200 −150) and the positions of all stimuli were randomized in all experiments with the constraint that S+ was never in the same position in consecutive trials (**Figure 1A**). Sessions were run until each subject completed 10 sessions (20 trials per session). If the subjects were able to successfully complete the task, two transfer sessions were run in which the four familiar shapes were exchanged for four novel stimuli. The transfer tests served to show if the fish had learned the concept of the OOO test in which case they should be able to transfer this knowledge to new stimuli.

#### *Matched-to-sample/oddity-from-sample*

*Delayed MTS/OFS.* In the delayed MTS/OFS paradigm, the subject was first presented with a sample followed by a pair of comparison stimuli, one of which was identical to the sample. In the MTS task, the subject must select the comparison stimulus



See text for rewarded and unrewarded conditions.

that matches the sample to receive a food reward. In the OFS task, the subject must select the stimulus that is different to the sample (**Figure 1B**). Two fish (Fish 3 and 6) were trained to the MTS task and a further two fish (Fish 5 and 7) were trained to the OFS task throughout all MTS and OFS experiments. The reason for training fish to complete the MTS and OFS task was that Newport et al. (2013) found that when archerfish learn a 4-AFC task S− plays an important role in learning and that the archerfish develop a strong association with S−. As a result, we hypothesized that archerfish may find the task easier if they were required to avoid the stimulus that matched the sample. Either approach provides a valid test of the fish's ability to discriminate the two stimuli.

The training consisted of two steps. In step 1, 10 different shapes were used as stimuli (**Table 1**) and all shapes were used as both S+ and S−. A trial began when the sample stimulus was displayed in the center of the monitor (monitor coordinates: 0 0). Once the archerfish hit the sample, a key was hit by the experimenter which removed the stimulus from the monitor and caused S+ and S− to be presented on either side of where the sample stimulus had been shown (monitor coordinates: −90 0, 0 90). The positions of S+ and S− were randomized under the constraint that S+ was never in the same position in more than two consecutive trials

and that S+ and S− were presented on each side equally often. The fish were rewarded with one food pellet every time they hit S+. Incorrect choices terminated the trial without a reward and stimuli were removed from the monitor. Between trials, a squeegee was used to remove water that had accumulated on the Perspex® monitor cover. Daily training sessions consisted of 20 trials; except in rare cases where a fish would not complete every trial within a session due to variations in motivation. A total of 19 sessions was completed by all fish. In addition, the two fish that were trained to MTS were given an extra 10 pre-trials where only S+ was displayed after the sample. This was intended to reinforce the association between the sample and S+. The two fish trained to OFS were not given these pre-trials as it was impossible with this experimental design.

In step 2, the number of stimuli was reduced to three, all of which were used as both S+ and S−. These stimuli were different to those presented in the previous MTS/OFS procedures (**Table 1**). The number of stimuli was reduced because Goldman and Shapiro (1979) were successful at training goldfish to complete a simultaneous MTS task using only three stimuli. The procedures were identical to those of MTS/OFS methods 1. Sessions consisted of 20 trials and 10 sessions were completed.

**sample (MTS/OFS), and the odd-one-out (OOO) task.** Stimuli were a range of black line drawings (not drawn to scale in figure) on a white background, presented on a computer monitor suspended directly above the aquarium. **(A)** Odd-one-out. The archerfish were presented with four stimuli, three identical S− and one different S+. These stimuli could appear in any of four possible positions on the monitor. The archerfish were (Continued)

#### **FIGURE 1 | Continued**

required to select the single reward stimulus (S+). In this case the correct response is indicated as a dashed line representing a correctly aimed spit response. **(B)** Delayed MTS/OFS. The archerfish were presented with the sample stimulus in the middle of the monitor, shown here as S. The archerfish were required to hit the sample stimulus in order to trigger the display of the comparison stimuli and the removal of the sample. Of the two comparison stimuli, one stimulus was identical to the sample and the second stimulus was different from the sample. The fish was required to select the matching stimulus in the MTS test or select the different stimulus in the OFS test. In the figure, an example of a correct response is indicated as a spit to the reward stimulus. **(C)** Simultaneous matched-tosample/oddity-from-sample. Similarly to the delayed MTS/OFS, a sample stimulus was presented in the middle of the monitor (S). However, once the archerfish hit the sample it remained on the monitor and the two comparison stimuli (S+ and S−) were immediately presented. The archerfish then selected either S+ or S− but selection of the sample stimulus was neither rewarded nor penalized.

*Simultaneous MTS/OFS.* The methods used by Goldman and Shapiro (1979) were replicated to train the archerfish to complete a simultaneous MTS/OFS task. The difference between the simultaneous and delayed MTS/OFS is that the sample remains in place and the two stimuli choices (S+ and S−) appear on either side of the sample once the archerfish hits the sample (**Figure 1C**). In this situation there is no consequence to hitting the sample (it is neither rewarded nor causes the termination of a trial) so this is still considered a two choice test and selection frequency is expected to be 50% if at chance. All other components of the procedure were the same as in the delayed MTS/OFS step 2 including the stimuli used. A total of 40 sessions was attempted for all fish; however, due to variations in motivation not all fish completed the full 40 sessions.

At the conclusion of the simultaneous MTS experiment, a control test was run to determine if the archerfish could discriminate the three shapes. This was done in order to eliminate the possibility that the archerfish were unable to complete this task due to a breakdown in discrimination ability. Two fish (5 and 7) were presented with a 3-AFC task and were trained to select one S+ from two different S−. Each fish was trained to a different S+ to ensure that an individual S+ was not affecting performance. Stimuli were presented in the same positions as described for all MTS tasks. Fish 3 and 6 did not complete this control test due to a lack of motivation to participate in any further testing. Fish 5 was trained to select S1 and Fish 7 was trained to select S2 (**Table 1**).

#### *Four-alternative forced-choice*

The archerfish were trained to complete a 4-AFC test in which four stimuli were presented in each trial (one S+ and three identical S−). To determine how many sessions were required to retrain the fish to novel stimuli, the fish were then conditioned to two novel stimuli. A further test was run with another two novel stimuli to determine if retraining to new stimuli required less sessions when the fish had practice. Finally, a test was run with all three conditioned pairs in the same session. This was done to determine if archerfish could remember up to three conditioned stimulus pairs at the same time which may allow for greater flexibility for the design of future experiments.

Four fish (Fish 1, 2, 3, and 4) were conditioned to discriminate between one cross (S+) and three identical squares (S−). There were four stimulus display positions on the monitor (monitor coordinates: −200 150, 200 150, −200 −150, and 200 −150) and the positions of all stimuli were randomized in all experiments with the constraint that S+ was never in the same position in consecutive trials. Sessions consisted of 20 trials and were run until each subject had completed a minimum training criterion of five sessions and reached an S+ selection frequency ≥70% in two consecutive sessions. This criterion was chosen because in order for a task to be used as a visual discrimination test, subjects should be able to complete each training task with a high degree of accuracy and should demonstrate consistency in their performance. This is to ensure that when analyzing performance during transfer tests with new stimuli, any changes in fish behavior are due to the new stimuli and not simply stochastic variation. Archerfish have been shown to reach accuracy levels of up to 95% when presented with a 4-AFC test with shapes as stimuli (Newport et al., 2013) which is much higher than required for significance.

The stimuli were then substituted for a second pair; a triangle (S+) and three identical stars (S−) and the same method was repeated as described above. After each fish had completed the required training sessions, a third pair of stimuli was introduced: an arrow (S+) and three identical crescents (S−). Once the fish had learned all three stimulus pairs, a test was run to determine if the fish could continue to complete the task when all three pairs were presented within the same session. For each trial, one pair was chosen at random with the restriction that the same pair was not shown in two consecutive trials and all pairs were shown equally often. Two test sessions were run. See **Table 1** for stimuli.

#### **STATISTICAL ANALYSIS**

Selection frequencies for each stimulus type (S+ or S−) were calculated for each condition per fish by tallying the number of hits for all trials per session. The raw data were analyzed using a Chisquare test. In both the AFC and OOO paradigms four stimuli are presented. As a result, the expected selection frequency of S+ if chosen at random is 25%. A selection frequency of S + ≥ 45% (*n* = 20 trials) is statistically significant (*P* = 0.039). In the MTS/OFS task, only two stimuli can be chosen so the expected selection frequency of either stimulus is 50% if chosen at random. A selection frequency ≥75% (*n* = 20 trials) is statistically significant (*P* = 0.025).

A Chi-square test was used to test for positional bias. For the AFC test, the two test sessions were tested for positional bias and for the OOO and MTS/OFS tests the last two sessions completed by the subject were tested (*n* = 40 trials). The expected selection frequency of each position is 25% in the AFC and OOO tests and 50% in the MTS/OFS tests. An additional test of the same sessions was done for stimulus selection bias using a Chi-square test. In both the AFC and OOO procedures, not all stimuli are presented within a trial, however, the presentation of each stimulus is balanced so that all stimuli are shown in equal frequencies within a session. Therefore, the expected selection frequency of each stimulus is 16.6% in the AFC test (six different stimuli) and 25% in the OOO test (four different stimuli). For MTS/OFS tests the expected S+ selection frequency with 10 stimuli is 10% and 33.3% with three stimuli. Only the last two sessions were used because

training is a learning process and as a result we only wanted to test the sessions where the fish was most likely exhibiting the learned behavior.

# **RESULTS**

#### **ODD-ONE-OUT**

Of the four fish tested, two individuals (Fish 1 and 2) were able to reach a significant selection frequency (S+ selection ≥45%) in two consecutive sessions (**Figure 2**). However, the accuracy of these subjects was variable and selection of S+ was significant in only some of the 10 sessions (two and five sessions, respectively). Two fish (Fish 3 and 4) were also able to reach an accuracy above chance, however, performance was again inconsistent and significance was only achieved in two sessions out of 10 each. Because Fish 2 reached significance in three sessions, two transfer tests with new stimuli were completed, however, only an S+ selection frequency of 20% in the first session and 35% in the second session was achieved, which are not significantly different from chance (session 1: *P* = 0.606; session 2: *P* = 0.302).

A test for positional and stimulus bias was run for all fish. Fish 2 was the only individual to exhibit a positional bias (*P* < 0.001), predominantly selecting stimuli in position 1 (position 1: 50%; position 2: 27.5%; position 3: 15%; position 4: 7.5%). This individual was also the only one to exhibit a significant stimulus bias (*P* < 0.05) selecting S4 in 45% of trials (S1: 32.5%; S2: 17.5%; S3: 5%).

# **MATCHED-TO-SAMPLE/ODDITY-FROM-SAMPLE** *Delayed MTS/OFS*

Neither Fish 3, 5, nor 6 was able to reach statistical significance after 19 sessions in step 1 (**Figure 3A**). Fish 7 did achieve an S+ selection frequency ≥75% in two out of 19 sessions, however, never in consecutive sessions.

The final two sessions of step 1 for each fish were tested for a possible positional bias. Three of the fish (Fish 3, 5, and 6) exhibited a significant side bias (*P* < 0.001). While Fish 3 selected

**FIGURE 2 | Discrimination performance as a function of time (binned by testing session), for four fish performing an odd-one-out task.** Two stimuli were selected for each trial from a pool of four possibilities. See **Table 1** for stimuli used. The dashed line at 45% indicates a statistically significant selection frequency of S+ and the dashed line at 25% indicates chance.

stimuli on the right side at a higher frequency, Fish 5 and 6 preferred stimuli on the left. Fish 7 showed no preference for either stimulus position (*P* = 0.114). None of the fish showed a preference for any of the 10 stimuli presented (Fish 3: *P* = 0.689; Fish 5: *P* = 0.941; Fish 6: *P* = 0.834; Fish 7: *P* = 0.534).

Following the delayed MTS/OFS task with 10 stimuli, a further 10 sessions were completed in which the number of stimuli presented was reduced to three. Two individuals, Fish 6 and 7, reached significance for one session each, however, the other two fish (Fish 3 and 5) did not (**Figure 3B**).

No fish exhibited a positional bias in the final two sessions (Fish 3: *P* = 0.527; Fish 5: *P* = 0.206; Fish 6: *P* = 0.527; Fish 7: *P* = 0.342) however, three of the fish did show a stimulus bias (Fish 3: *<sup>P</sup>* <sup>=</sup> 0.149; Fish 5: *<sup>P</sup>* <sup>=</sup> 3.74 <sup>×</sup> <sup>10</sup>−3; Fish 6: *<sup>P</sup>* <sup>=</sup> 9.66 <sup>×</sup> <sup>10</sup>−4; Fish 7: *<sup>P</sup>* <sup>=</sup> 5.44 <sup>×</sup> <sup>10</sup>−3). All three fish avoided one of the stimuli, however, the stimulus avoided varied between fish (**Figure 4A**).

# *Simultaneous MTS/OFS*

None of the fish were able to achieve an S+ selection frequency ≥75% (**Figure 3C**). The number of sessions completed was variable between fish and, as a result, Fish 6 only completed 36 sessions and Fish 3 completed 37. Both Fish 5 and 7 completed all 40 sessions. No fish exhibited a position bias in the final two sessions (Fish 3: *P* = 0.527; Fish 5: *P* = 0.107; Fish 6: *P* = 0.527; Fish 7: *P* = 0.342). All fish had a significant stimulus selection bias and avoided one of the stimuli; however, the stimulus avoided varied between fish (**Figure 4B**).

Both fish 5 and 7 were able to successfully learn the 3-AFC control test and reached a statistically significant S+ selection frequency (≥55%) in two consecutive sessions within four sessions (**Figure 5**). These results indicate that the stimuli used could be discriminated by the fish.

#### **FOUR-ALTERNATIVE FORCED-CHOICE**

All fish were able to reach well above a statistically significant S+ selection frequency (≥45%) when presented with a cross (S+) and a square (S−) within 2–3 sessions (**Figure 6A**). They continued to reach ≥45% when presented with the second stimulus pair, a triangle (S+) and star (S−), but took 4–9 sessions to do so (**Figure 6B**). The final stimulus pair was an arrow (S+) and a crescent (S−). All fish reached ≥45% within 2–9 sessions (**Figure 6C**) and two of the fish (Fish 2 and 3) achieved 100% accuracy in all five sessions. Regardless of the stimuli presented, the fish were able to be re-trained and complete the task with different stimuli. All fish were able to select S+ at a frequency ≥45% when all three stimulus pairs were presented within the same session showing that they could complete the task (**Figure 7**).

# **DISCUSSION**

The overall aim of this project was to explore the ability of archerfish to solve two concept based psychophysics tests. The MTS/OFS and the OOO tests both require that the subject learn a concept rather than simply learning to associate a particular stimulus with a reward. One benefit of these tests is that a large number of stimuli can be tested within a single experiment without having to continuously retrain the subject to new stimuli. Another benefit is that they can provide information about how subjects are

able to learn to complete complex tasks. The results of the OOO test show that two out of four archerfish reached a statistically significant S+ selection accuracy in two consecutive sessions and therefore passed the test. In contrast, none of the four archerfish were able to reach statistical significance in two consecutive sessions in the delayed or simultaneous MTS/OFS test. Our findings indicate that some archerfish may be able to learn the concept based OOO tests, however all were unable to learn the MTS/OFS regardless of the training procedure used. A 4-AFC test was then conducted as a comparison to the other tests and to assess how easily archerfish could be retrained to new stimuli. All archerfish reached a much higher S+ selection accuracy in the 4-AFC test with one S+ and three identical S− (present study) and the 4- AFC in which all four stimuli were different (Newport et al., 2013) than in the MTS/OFS or the OOO tests. We found that retraining archerfish to new stimuli required few sessions and that they could be trained to recognize up to three conditioned stimulus groups at once. In addition, we found after training the fish to two different sets of stimuli, some individuals were able to achieve 100% accuracy within the first training session with new stimuli pairs. This

was 40 for each experiment and the selection frequency of each stimulus was tested for a selection preference using a Chi-square test.

would appear to indicate that archerfish are capable of generalizing their learning to novel stimuli, indicative of some degree of task relevant conceptual learning, rather than merely stimulus specific learning.

The OOO test requires that subjects apply the concept of oddity to solve the task. It has been primarily used as a test for visual discrimination in primates but has been shown to be solvable by other animals such as pigeons (Blough, 1986), cats (Boyd and Warren, 1957), and goats (Roitberg and Franz, 2004). It has never before been tested in fish. In this test, each archerfish was given 10 training sessions (200 trials). The results of our experiments show that all four archerfish were able to reach statistical significance in a combined 11 out of 40 sessions (2, 5, 2, and 2 sessions, respectively) yet only two of these fish (Fish 1 and 2) could do this in consecutive sessions. These results suggest that two of the fish had learned the task. The probability of reaching our learning criteria by chance in a particular session, and thereby getting a false positive result, is *P* = 0.0389 (*n* = 20 trials). Therefore within the 10 sessions performed by four fish, we would expect two sessions to be positive due to chance (0.0389∗10∗4 = 1.55). Therefore it is unlikely that our observed results are simply due to false positives. It is even less likely considering that two of the fish reached an S+ selection accuracy of ≥45% in consecutive sessions. However, there appears to be no learning curve whereby performance improves over the number of training sessions. In addition, when Fish 2 was given a transfer test in which the stimuli were changed for novel shapes, performance was at chance. True evidence that the concept of oddity has been learned requires that the subject apply the concept to novel stimuli. As a result, it appears as though the archerfish may have had only a limited understanding of this task if at all. This is somewhat surprising as this task is likely to be of ecological relevance to many species of fish. For example, targeting rare prey in a group increases the chance of predatory fish catching their prey (Landeau and Terborgh, 1986; Theodorakis, 1989; Almany et al., 2007). However, it is possible that archerfish gain no such advantage in singling out a rare object and have therefore not developed this skill. Archerfish are generalist feeders that encounter many insect species in their natural environment. In order to catch insects, they must spit at many potential food sources and only make a decision about whether or not to ingest something after

selection frequency of S+. All subjects achieved an S+ selection frequency above chance. In the MTS/OFS test, subjects must apply the concept of matching to select or avoid a stimulus that is the same as a previously

presented sample stimulus. A series of training procedures was attempted to train the archerfish to theMTS/OFS test; however, the results of all three MTS/OFS training procedures show similarities in that all fish were unable to perform the task in more than one consecutive session. In step 1, all fish were allowed 19 sessions (380 trials) and in step 2, all fish were given a further 10 sessions (200 trials). In the simultaneous MTS/OFS two fish completed 40 sessions (800 trials) while one fish completed 36 (720 trials) and another completed 37 (740 trials) sessions. Although two (Fish 6 and 7) fish did reach above significance on occasion, these match the number of expected false positives. As was observed in the OOO test, there was no evidence of improved performance throughout the training period. The archerfish showed similar results in both the delayed and simultaneous MTS, making it unlikely that their poor performance was due to a lack of working memory alone. In addition, Newport et al. (2013) found evidence that when solving a task where multiple stimuli are presented, archerfish examined each stimulus individually, a behavior which would require some form of working memory. It is more likely that the archerfish lacked the ability to understand the relationship between the sample and the comparison stimuli and, as a result, did not learn the concept of "sameness/difference." Primates can learn this "sameness/difference" concept (e.g., Premack, 1976; Oden et al., 1988; Fagot et al., 2001; Wasserman et al., 2001; Young and Wasserman, 2001, 2002) and there is evidence that non-primate species such as bees (Giurfa et al., 2001), dolphins (Herman et al., 1989, 1994; Mercado et al., 2000), sea lions (Pack et al., 1991; Kastak and Schusterman, 1994) and pigeons (e.g., Blaisdell and Cook, 2005; Bodily et al., 2008) are also capable of doing so. Based on our results and those of Goldman and Shapiro (1979), Zerbolio and Royalty (1983) and Gierszewski et al. (2013) it appears as though the answer for fish may be dependent on the species and possibly their particular ecology.

**FIGURE 7 | Selection frequency (%) of S+ using a 4-AFC test where all three conditioned stimulus pairs were presented within a session.** The results of two testing sessions (n = 20 trails each) are presented for four subjects. The dashed line at 45% indicates a statistically significant

Archerfish were then trained to complete a 4-AFC test. Although the 4-AFC test has been proven to provide reliable results

they have taken it into their mouth and "tasted" it. As a result, visually selecting an individual insect from a crowd may not provide any benefit to archerfish. It may be possible that other species, especially predators that hunt schooling fish, will prove more adept at the OOO task. Future experiments are required to test this hypothesis.

to an arrow (S+) and a crescent (S−; **C**). The dashed line at 45% in all figures indicates a statistically significant selection frequency of S+ and the dashed line at 25% indicates chance. See**Table 1** for example stimuli.

(Ben-Simon et al., 2012b; Newport et al., 2013), it is limited by the fact that subjects must be conditioned to a particular stimulus. It was thought that retraining fish to new stimuli would take just as many sessions as initial training, but this had not yet been shown experimentally. Following the initial training, the archerfish were trained to two additional stimulus pairs. We found that the archerfish generally learned new S+/S− combinations in fewer sessions in step 3 than required for initial training. In the initial training test and the first test with new stimuli, all fish showed typical learning curves where accuracy generally increased as more sessions were completed. However, when the stimuli were changed for a third time, two fish were able achieve an accuracy of 100% within the first session. In a 4-AFC test where all distractors are the same, it is possible to solve the task by simply applying the concept that the one stimulus that is different is the correct answer. The ability of some individuals to solve the task immediately suggests that the fish learned the concept of selecting the single S+ stimulus and could apply it to new stimuli. What is different between the OOO and 4-AFC test is that the role of the stimuli did not change in the 4-AFC test. In the OOO test the same stimuli could be used as both S+ and S− whereas in the AFC a particular stimulus could only represent either S+ or S−. For archerfish, the concept of oddity may break down once the same stimuli are used as both S+and S−. It is possible that reassigning the role of a learned object is unnatural for archerfish. For example, if the fish had learned that an object had a negative association (i.e., it was unrewarded or inedible), it may be rare that the properties of that object would change to being positive (i.e., the object becoming more palatable). As a result, once archerfish learn the role of an object they do not easily reverse their association.

Not all fish applied this strategy and instead exhibited a similar learning curve as observed in the previous two experiments except that they selected S+ at a frequency higher than chance within the first session. The number of sessions required to learn each task was variable. In all tests, Fish 1 consistently required more sessions to learn than the other three fish. It is possible that this fish did not understand the task as easily as the others. Alternatively, archerfish individuals have been shown to apply different decision strategies when solving the AFC test (Newport et al., 2013). It is possible that Fish 1 was using a different strategy from the other fish that required more sessions to learn. A third alternative is that this individual had a different level of motivation for completing the task. A final test was completed in which the fish were faced with all three pairs of stimuli within the same session. This was done to determine if they could remember multiple conditioned stimuli at the same time. All four fish were able to complete this task. Although using new stimuli does require retraining, our results show that fish can progressively learn faster and faster. In addition, they can learn more than one set of stimuli at a time, meaning that more complex experiments can be designed.

It is interesting to note that when the archerfish did not grasp the MTS/OFS or OOO tasks, they did not simply choose stimuli at random but instead resorted to using at least two different strategies to solve the problem. When confronted with a difficult task it is common for fish to develop a strong preference for stimuli on a particular side (Northmore and Yager, 1975). In the case of the

delayed MTS/OFS test where 10 different stimuli were used, three of the four fish tested, developed a side bias. In experiments where fewer stimuli were used such as the OOO test with four alternating stimuli, the simultaneous MTS/OFS and the delayed MTS/OFS with three stimuli, the fish generally developed a stimulus bias in which they had a hierarchal preference for stimuli.

The results of our experiments provide some interesting insight into the limitations of the fish brain. Because of the nature of the tests used, the poor performance of the archerfish when presented with the MTS/OFS and OOO tests could suggest a deficiency of the working memory or an inability to learn concepts. Newport et al. (2013) found evidence from the 4-AFC test that archerfish consider stimuli independently and sequentially based on the fact that the anatomy of their eye makes it unlikely they could view more than one stimulus at a time and the fact that there were variable reaction times when responding to different stimulus types. This indicates that archerfish have an adequate working memory to consider all stimuli on the monitor and therefore to at least perform the simultaneous MTS and OOO tests. The problem then may lie with concept learning. Traditionally it has been thought that the evolution of vertebrate brains has progressed linearly in increasing complexity. Fish, the most primitive vertebrate group, therefore would have the simplest brains and would be expected to be incapable of more complex tasks. However, there is increasing evidence that fish share similar learning and memory capabilities with other vertebrates and that these are based on equivalent or similar neural mechanisms and brain systems. For example, classical conditioning of simple motor responses such as eye blink responses occurs in the cerebellum in both mammals (Thompson and Steinmetz, 2009) and fish (Gómez et al., 2010). Similarly, emotional conditioning and spatial memory is linked to the telencephalon and cerebellum of fish and homologous structures such as the amygdala and cerebellum of mammals (see Broglio et al., 2011, for a review of the neural mechanisms of cognition in fish). In humans, the frontal cortex is generally associated with abstract rule learning (Strange et al., 2001; Koechlin et al., 2003; Bunge, 2004; Bor and Owen, 2007; Christoff and Keramatian, 2007) and therefore it is possible that since fish lack a cortex, they will be unable to learn concepts. However, the neural mechanisms of concept learning in fish have not yet been examined and it is impossible to say if fish have homologous structures that enable them to perform this task. The results of the AFC retraining described in this report suggest that archerfish are capable of learning some sort of relational concept and predatory fish are able to apply the concept of oddity to hunting prey (Landeau and Terborgh, 1986; Theodorakis, 1989; Almany et al., 2007). In addition, other animals lacking a cortex are capable of the concept based MTS/OFS (bees: Giurfa et al., 2001; birds: Zentall and Hogan, 1974; goldfish: Goldman and Shapiro, 1979; Zerbolio and Royalty, 1983) and OOO (birds: Blough, 1986) tasks. The fact that both archerfish and cichlids (Gierszewski et al., 2013) appear incapable of learning the MTS/OFS task yet goldfish can, suggests that fish in general may have the neural mechanisms required for concept learning; however, different species may apply different decision rules which limit their performance. The ability to complete this task may come down to the general ecology of the species. Alternatively, it is possible that some species have evolved specialist hardware for

this sort of task. Of course we cannot exclude the possibility that our training procedures did not adequately convey the task to the archerfish. Although we tried a range of training procedures, it is possible that different training techniques may elicit better performance. The combined evidence from fish, birds and bees, all of which lack a cortex, suggests that having a cortex is not a requirement for learning abstract relationships and concepts. However, many of these tests show that these animals can have limitations in their capabilities such as decreased performance when novel stimuli are introduced (Zentall and Hogan, 1974; Giurfa et al., 2001). It may be that a lack of cortex limits the flexibility of learning these concepts and that comprehension can only occur under specific conditions. However, one should be cautious in overinterpreting our results and more focused research in this field is required.

Although our results suggest that archerfish are incapable of learning the MTS/OFS and OOO tests, it is possible that they would be able to learn these under different experimental conditions. In our experiments we used a range of shapes as stimuli as previous studies have shown that archerfish are capable of discriminating a large number of shapes from four trained shapes (Newport et al., 2013). Shapes are a common stimulus class for behavioral studies and have previously been used in successful concept learning studies (e.g., Herman et al., 1989; Pack et al., 1991; Bodily et al., 2008), however, other studies have employed different stimuli such as colors (e.g., Goldman and Shapiro, 1979; Giurfa et al., 2001) and patterns (Giurfa et al., 2001). It is possible that although archerfish can discriminate shape stimuli, they may not be able to learn the concept of similarity based on this stimulus class. As a result, the use of different stimulus classes may yield different results. Pilot studies were run for the OOO test in which three different stimulus classes were tested: colors (red, blue, yellow, and gray), directional arrows and shapes, however, no difference in performance was found. When training animals, it can sometimes be difficult to successfully communicate the task, especially when trying to convey an abstract concept. Subtle changes in procedure can have an impact on the ability of the subject to understand the task. As a result, a range of training methods were attempted during pilot studies. For example, the feedback to errors in stimulus selection was varied in an attempt to make the consequences greater (i.e., a tone was played if the choice was incorrect or a timeout of 30 s was introduced before a new trial could commence). However, the methods described in this manuscript were those that were found to engender the most success when training archerfish to complete an AFC test. Future attempts to test concept learning in archerfish would likely have the most chance of success if they focused on changes in how the stimuli are presented. For example, in the OOO test described in this report four stimuli were presented, one of which the fish had to choose. Future experiments may be more successful if a much larger number of distractor stimuli were presented.

Another consideration is the duration of training. As this was the first time these paradigms have been attempted in archerfish, it is difficult to know how much training might be required. Evidence from other animals can be difficult to use as a guide as a range of factors can influence how many trials and sessions can be completed. For example, the number of trials that an animal can complete per session is highly variable. While animals such as baboons (Fagot et al., 2001) and pigeons (Bodily et al., 2008) can readily complete 96 trials per session, dolphins (Herman et al., 1989), and sea lions (Pack et al., 1991) typically only do between 8 and 28 trials. In behavioral experiments involving fish, they are commonly given between 6 and 10 trials (e.g., Siebeck et al., 2009, 2010; Truppa et al., 2010; Schluessel et al., 2012; Gierszewski et al., 2013), however, goldfish are capable of completing 100–120 (e.g., Goldman and Shapiro, 1979; Wyzisk and Neumeyer, 2007). Although archerfish have the motivation to complete a large number of trials in one session, we found during pilot experiments that archerfish performed best over long periods if given 20 trials per session. Because of the large variation in trial number that can be performed, it is difficult to compare the total number of trials required to learn a task between species. It is not known how much trial number affects the performance of fish and it is possible that the number of sessions is more relevant. Session number can also be difficult to use as a guideline because of the large discrepancies amongst different species. For example, pigeons were able to learn a MTS task within 11 sessions (Bodily et al., 2008) while bees and dolphins required 6 (Herman et al., 1989; Giurfa et al., 2001) and sea lions required 36 (Pack et al., 1991). Goldman and Shapiro (1979) reported that goldfish learned the simultaneous MTS and oddity-from-sample within 11–60 sessions; however, most individuals showed signs of improvement within the first 10 sessions. In this report as well as Gierszewski et al. (2013), a total of 40 sessions was attempted for the simultaneous MTS/OFS after the fish had already completed a total of 29 sessions for step 1 and 2 of the delayed MTS/OFS task. While it is possible that archerfish could eventually learn with more trials and sessions, we decided that any more than this would make the test impractical as a visual discrimination testing paradigm and therefore did not continue. In the case of the OOO, fewer sessions were completed. Despite the large number of sessions conducted in the combined MTS/OFS tests there was no improvement in performance with an increasing number of sessions and therefore we found it unlikely that conducting large numbers of sessions would improve our results. We found that in the MTS/OFS experiments, the archerfish eventually lost motivation and after about the first 10 sessions rarely changed their decision strategy (i.e., side or stimulus bias). In addition, in the simultaneous MTS/OFS experiments with goldfish, a large number of sessions were required for some individuals to reach significance; however, they at least showed some improvement within 10 sessions. In the case of the archerfish, no learning curve was observed whereby accuracy improved over time. For the purpose of identifying other paradigms that maybe useful for future testing, completing more trials and sessions is impractical; however, future studies focused on concept learning in general may want to attempt more sessions. If that is the case, it may be useful to change the food reward to be smaller or less nutritious or to use an intermittent reward schedule.

Our results indicate that archerfish were unable to learn the MTS/OFS task and only a few individuals were able to significantly select S+ in the OOO task but showed inconsistent performance. Although it is possible that archerfish may be able to learn concepts under different experimental conditions, we conclude that both of these tests are poor choices for visual discrimination experiments involving archerfish. However, our results indicate that archerfish achieve a very high accuracy when completing a 4-AFC test and can be rapidly retrained to new stimuli. In a 4-AFC test in which the three S− stimuli are identical, archerfish can learn to select the single S+ stimulus and therefore require no retraining when new stimuli are presented. The ability of archerfish to select odd stimuli can be used in a similar way to a traditional OOO test, in which subjects learn to select the singleton stimulus, with the limitation that stimuli are not presented in the role of both S+ and S−. This report not only provides important insight into concept learning in fish but also provides a powerful new technique that can be added to the tool box of psychophysical experiments used to explore vision in fish.

# **ACKNOWLEDGMENTS**

Thank you to Yarema Reshitnyk, Sarah Van-Eyk, and Daniel Caj Christie for help with behavioral experiments. Thank you to David Lloyd and Amira Parker for technical support.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 November 2013; accepted: 04 April 2014; published online: 24 April 2014. Citation: Newport C, Wallis G and Siebeck UE (2014) Concept learning and the use of three common psychophysical paradigms in the archerfish (Toxotes chatareus). Front. Neural Circuits 8:39. doi: 10.3389/fncir.2014.00039*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Newport, Wallis and Siebeck. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# NEURAL CIRCUITS

# Spectral and spatial selectivity of luminance vision in reef fish

#### **Ulrike E. Siebeck<sup>1</sup>\*, Guy Michael Wallis <sup>2</sup> , Lenore Litherland<sup>1</sup> , Olga Ganeshina<sup>3</sup> and Misha Vorobyev<sup>3</sup>**

<sup>1</sup> School of Biomedical Sciences, The University of Queensland, Brisbane, QLD, Australia

<sup>2</sup> Centre for Sensorimotor Neuroscience, School of Human Movement Studies, The University of Queensland, Brisbane, QLD, Australia

<sup>3</sup> Department of Optometry and Visual Science, Auckland University, Auckland, AU, New Zealand

#### **Edited by:**

Davide Zoccolan, International School for Advanced Studies, Italy

#### **Reviewed by:**

Valeria Anna Sovrano, University of Trento, Italy Martin Meyer, King's College London, UK

#### **\*Correspondence:**

Ulrike E. Siebeck, School of Biomedical Sciences, The University of Queensland, MacGregor Building (64), Brisbane, QLD 4072, Australia e-mail: u.siebeck@uq.edu.au

Luminance vision has high spatial resolution and is used for form vision and texture discrimination. In humans, birds and bees luminance channel is spectrally selective—it depends on the signals of the long-wavelength sensitive photoreceptors (bees) or on the sum of long- and middle-wavelength sensitive cones (humans), but not on the signal of the short-wavelength sensitive (blue) photoreceptors. The reasons of such selectivity are not fully understood. The aim of this study is to reveal the inputs of cone signals to high resolution luminance vision in reef fish. Sixteen freshly caught damselfish, Pomacentrus amboinensis, were trained to discriminate stimuli differing either in their color or in their fine patterns (stripes vs. cheques). Three colors ("bright green", "dark green" and "blue") were used to create two sets of color and two sets of pattern stimuli. The "bright green" and "dark green" were similar in their chromatic properties for fish, but differed in their lightness; the "dark green" differed from "blue" in the signal for the blue cone, but yielded similar signals in the long-wavelength and middle-wavelength cones. Fish easily learned to discriminate "bright green" from "dark green" and "dark green" from "blue" stimuli. Fish also could discriminate the fine patterns created from "dark green" and "bright green". However, fish failed to discriminate fine patterns created from "blue" and "dark green" colors, i.e., the colors that provided contrast for the blue-sensitive photoreceptor, but not for the long-wavelength sensitive one. High resolution luminance vision in damselfish, Pomacentrus amboinensis, does not have input from the blue-sensitive cone, which may indicate that the spectral selectivity of luminance channel is a general feature of visual processing in both aquatic and terrestrial animals.

**Keywords: reef fish, operant conditioning, behavior, visual modeling, luminance vision**

# **INTRODUCTION**

Reef fish are famously colorful to human eyes, and often their colors are arranged in complex patterns that vary between species and frequently also between individuals of the same species. Most interest has been directed at understanding the function of these colors for intra- and inter-specific signaling (e.g., Frisch, 1912; Lorenz, 1962; Marshall, 2000; Cheney et al., 2009; Siebeck et al., 2010; Millar and Hendry, 2012) while investigations into visual processing of colors and patterns in fish are still comparatively rare as this field has only started to develop relatively recently. What we have learned about visual processing in fish is often surprisingly similar to what we know about visual processing in primates. Fish extract color information via color opponent cells (Kamermans et al., 1991; Patterson et al., 2002; Ramsden et al., 2008) and they possess direction/orientation selective ganglion cells in the retina which facilitate shape discrimination and the perception of illusory contours (Wyzisk and Neumeyer, 2007; Tsvilling et al., 2012).

Both, fish and primates have typical vertebrate eyes but differ in some aspects of their design, e.g., optics, which is mostly due to differences in their terrestrial/aquatic lifestyles (Land, 1990). Both have a duplex retina with rods and cones, however the spectral sensitivities and number of their photoreceptors differ (Lythgoe, 1979). In addition to single cones, fish also have double cones, which are two photoreceptor cells, which are fused together (Marchiafava, 1985). The function of double cones has long thought to involve motion detection and it was thought that they did not contribute to color vision due to electrical coupling of their two members (Boehlert, 1978). However, a recent study on the trigger fish, *Rhinecanthus aculeatus* showed that in this species, both members do contribute separately to color vision (Pignatelli et al., 2010). The spectral sensitivities and/or color vision abilities of fish have been investigated using a variety of methods, including behavioral experiments (e.g., Neumeyer, 1984; Risner et al., 2006; Siebeck et al., 2008), electrophysiological experiments (electroretinogram or ERG, e.g., Morita et al., 1997; Hughes et al., 1998; Hawryshyn et al., 2010), and microspectrophotometric (MSP) measurements of individual photoreceptor sensitivities (e.g., Losey et al., 2003; Waller, 2005; Marshall et al., 2006). Overall, results show that teleost fish can have up to five photoreceptor sensitivities, but also that not all of them necessarily contribute to color vision simultaneously (Sabbah et al., 2010). The number of different spectral photoreceptor types can therefore not be used to infer the dimensionality of the color vision system.

Color vision requires that the output of at least two photoreceptor types is compared, which is best demonstrated using behavioral experiments (Kelber et al., 2003; Kelber and Osorio, 2010) but can also be shown using ERG recordings under various illumination and background conditions (Hughes et al., 1998). Natural colors differ in hue as well as in brightness, and experiments designed to test color vision therefore must control for luminance cues (e.g., Kelber et al., 2003; Siebeck et al., 2008). This can be done through "gray card experiments" where animals are trained to pick out the colored stimulus from a range of stimuli that differ in brightness (Frisch, 1913). Alternatively, visual modeling can be used to design isoluminant stimuli, which are only discriminable if the animal has color vision, provided the photoreceptor sensitivities are known for the animal under investigation (Vorobyev et al., 2001; Pignatelli et al., 2010).

In a previous study, we showed with behavioral experiments that *Pomacentrus amboinensis* have color vision (Siebeck et al., 2008). The fish were not only able to discriminate yellow from blue of varying brightness levels but they could also generalize from one blue or yellow to other blue or yellow stimuli. We also know that this species is sensitive to ultraviolet (UV) light and uses complex UV patterns to discriminate between conspecific and heterospecific fish (Siebeck, 2004; Siebeck et al., 2010). Microspectrophotometric studies have shown that *P. amboinensis* have four spectral types of cone visual pigments peaking at 365 nm (UV sensitive), 485 nm (short-wavelength sensitive, S), 504 nm (middle-wavelength sensitive, M) and 526 nm (longwavelength sensitive, L). The UV and middle-wavelength sensitive visual pigments are housed in single cones, while the shortwavelength sensitive visual pigment and long-wavelength sensitive pigments are housed in double cones (Waller, 2005; Siebeck and Hart unpublished results).

In primates, parallel pathways exist for luminance and color processing, which not only differ in their spectral but also spatial properties (Livingstone and Hubel, 1988). In primates the combined outputs of the long-wavelength (L) and middle wavelength (M) sensitive cones contribute to luminance vision, while all three cones contribute to color vision. The luminance channel has high spatial acuity while the color channel has low spatial acuity (Cavanagh et al., 1987). Similar parallel processing of color and luminance has been found in other terrestrial animals (for review see Osorio and Vorobyev, 2005). In honeybees, S, M and L cones contribute to color vision while only the L cones are involved in luminance vision (Backhaus, 1991; Giurfa et al., 1997). In birds, double cones housing the L visual pigment (the spectral sensitivity of the double cones is similar to the sum of L and M cones in primates) contribute to luminance vision and single cones to color vision (Osorio et al., 1999). Overall, it appears that the L-cone is generally involved in luminance vision in terrestrial animals. First hints about a potential similar mechanism in fish came from a study, which found that different spectral sensitivity functions were found when goldfish were trained to discriminate a dark from a light field compared to when the fish were trained to discriminate a light from a dark field (Neumeyer et al., 1991). The authors hypothesized that the when fish were trained on the dark field they learned to discriminate the stimuli based on "color" cues whereas the fish trained on the light field were using "luminance" cues, and proposed that separate color and luminance channels exist in these fish.

The aim of this study was therefore to directly test for the existence of such a luminance channel based on L/M cones and to assess whether this channel has spatial properties similar to the luminance channel found in primates. Specifically, we tested whether *P. amboinensis* are able to discriminate between stimuli with either low spatial frequency (solid colors) or high spatial frequency (checked color patterns) which were designed to be isoluminant for the L-cones and M cones. Visual modeling based on quantum catch calculations was used in order to select specific colors that selectively eliminated the contribution of the L-cones (Vorobyev and Osorio, 1998). Investigating the contribution of the L/M system to high spatial vision is particularly interesting in this species as they are able to discriminate between complex UV patterns when contrast is given in the UV only and fail to discriminate between size matched conspecifics and heterospecifics in the absence of UV signals (Siebeck et al., 2010).

# **MATERIALS AND METHODS**

### **FISH**

Fish were collected with hand nets while on SCUBA around Lizard Island, Australia (fisheries permit: PRM37727I; GBRMPA permit G05/13668.1). Throughout experimentation, the fish were maintained in individual aquaria (30 cm × 40 cm × 30 cm) exposed to natural sunlight, given a PVC tube for shelter and supplied with fresh seawater (flow-through system). Aquaria were cleaned daily and fish were fed as part of the experiments. Following the experiments, all fish were released onto the reef where they had been caught. Experiments were conducted during two field trips using 16 (exp 1) and 12 fish (exp 2). All experiments were conducted according to the animal welfare act Australia and approved by the ethics committee of the University of Queensland (ethics permit VTHRC/194/08/ARC/UQ).

# **STIMULI**

# **General**

Four sets of stimuli were created by printing (Epson Stylus Photo 1290) the selected colors in patches of 2 × 2 cm on photo paper (Epson glossy photopaper). The squares were then cut out and laminated (Ibico pouchMaster 9VT). Six replicate stimuli were created for each stimulus condition. Three colors were created, a light green, a dark green and a blue (for details see below). The two greens differed in brightness but not in chromaticity, while the dark green and blue were closely matched in terms of their L and M cones quantum catches (**Figure 1**).

The colors were either combined to patterns (stripes or checkerboards) or left as solid colors to create four conditions: (1) blue/dark-green checkerboards vs. blue/darkgreen stripes; (2) dark-green/light-green checkerboards vs.

dark-green/light-green stripes; (3) solid blue vs. solid dark-green; and (4) solid dark-green vs. solid light-green (**Figure 1**).

### **Visual modeling**

The spectral reflectance of the stimuli was also measured using the fiber-optic spectrometer and PX-2 pulsed xenon light source. The angle between illumination and measurement probes was held at 45◦ with a custom made holder fitted with collimating lenses. The receptor quantum catches relative to 100% reflecting white were calculated for each stimulus color. A tetrachromatic visual system was assumed on the basis of the photoreceptor sensitivity data (two single cones (λmax = 365 nm and 504 nm) and one double cone (λmax = 480 nm and 524 nm); S, M, L: Waller, 2005; UV, S, M, L: Siebeck and Hart unpublished results). The receptor spectral sensitivities were calculated using Govardovskii templates (Govardovskii et al., 2000) combined with the ocular media transmittance (Siebeck and Marshall, 2001, 2007). The illumination of the experimental arena was natural daylight with the UV part of the spectrum removed by the material shading the outdoor aquaria. Here, we report the quantum catches calculated using D65 standard daylight spectrum.

The process of identifying the required stimuli involved printing a large series of potential stimuli, laminating them, measuring their reflectance and calculating the quantum catches. This process was repeated until stimuli were found that fulfilled our prerequisites. The spectra of two stimuli with identical chromaticity but different lightness (dark green and light green) were adjusted so that the ratio of quantum catches for all four receptors was constant. The spectrum of a third color (blue) was adjusted so that the L and M-cone quantum catches closely matched the L and M quantum catches of dark green stimulus. (**Figure 1**).

#### **TRAINING**

The fish were trained using the method described in Siebeck et al. (2009). Briefly, the fish were trained to associate food with a colored stimulus (laminated printout presented on a board inserted into the aquarium for each trial), which they had to "tap" (push with their mouth) in order to receive a food reward (**Figure 1**). The food delivery was separated from the stimuli in time and location so that no olfactory cues were present while the fish were making their choices. Only once the fish had made a correct choice, the feeding tube, containing a mix of fish flakes (HBH Marine Flake Frenzy, Spanish Fork, UT, USA) and water, was inserted into the aquarium and the food reward was given.

In experiment 1, four fish were trained to each of the four conditions (**Figures 2**, **3**). Within each condition, two fish each were trained to each of the two stimuli (e.g., two fish were trained to stripes and the other two to checkers). This was done in order to control for a possible bias towards a particular stimulus. The second stimulus (distracter) was introduced once the fish had learned to swim to and tap the trained stimulus presented in one of two locations on the board in order to receive a food reward.

Experiment 1 left the possibility open that any difference in performance could be due to some characteristic of the different fish used rather than due to their ability to solve the experimental tasks. In experiment 2, we therefore controlled for this possibility by retraining the fish so that they had to complete both pattern conditions (condition 1 and 2) as well as one condition with solid colors (condition 3 or 4). Each group of fish either started with Condition 1 or Condition 3 (patterns) before completing condition 3 or 4 (solid colors; **Figure 3**). Fish were randomly allocated to each group.

# **TESTING PROCEDURE**

In order to be able to discount a side preference from the selection results the stimuli were presented in random positions counterbalanced across each testing session. The only constraint on the randomization process was that the stimuli never appeared in the same position more than twice in a row. If a fish took more than 2 min to complete the task, the board was removed and the next fish was tested.

Two printed laminated stimuli were attached to a board which was then placed into the aquarium of the fish under investigation (**Figures 1, 3**). For each trial, the stimuli were randomly chosen from six replicate stimuli thus preventing the fish from using any cues specific to a particular replicate (e.g., slightly different cutting angle of the laminate can cause different reflections).

The stimuli were removed from the aquarium following a correct completion of the task and a food reward or a timeout (2 min). Fish were tested twice a day and made 10 choices in each session. Eight sessions were carried out for each condition in both experiments.

# **ANALYSIS**

The number of correct choices within each of the eight sessions was determined and, in each case the last four sessions were used for further analysis. This was done to discount the learning phase during the first four sessions. Graphpad Prism was used to carry out the statistical tests. Two-tailed binomial tests were used to determine whether the observed choice frequency of each fish as well as the average response of all fish within each condition was different from chance, i.e., from a 1:1 (distracter: stimulus) selection.

In experiment 1, the hypothesis was tested that the patterns (high frequency stimuli) created with the L-cone matched colors

**P. amboinensis when looking at the three different experimental colors using D65 daylight illumination (top graphs). (A)** Quantum catches for the light green/dark green stimuli are shown while on the right, the quantum catches for the dark green and blue stimuli are

minimize contrast to the L-cone ( λmax 526 nm) and the light green color was selected to only differ from the dark green color in brightness (but not hue). **(C)** The three colors were combined to form four stimulus conditions (bottom row) with different spatial properties.

**(A)** Each group of fish was trained to a different stimulus set and **(B)** each group of fish was retrained following the completion of 10 sessions for a particular stimulus set. Lines indicate retraining events. each session (10 trials), S+ and S− were shown equally often on both sides. S+ indicates the rewarded stimulus and S− the distracter stimulus.

(blue-green patterns) would be harder to discriminate compared to the patterns created with colors that proved L-cone contrast (light green—dark green patterns). We also hypothesized that the solid colors (low frequency stimuli) blue vs. green would be easier to discriminate than the light green/dark green stimuli. Two-tailed *t*-tests were used to analyze the results.

In experiment 2, repeated measures 2-factorial ANOVA was used to test (a) whether training sequence (condition 1 or condition 2 first) influenced the results; and (b) whether there was a significant difference between the fish's performance in the two pattern conditions.

# **RESULTS**

#### **EXPERIMENT 1**

All fish learned the task of tapping their reward stimulus within 3–4 days of capture so that testing could begin on day 5.

Condition 1 (patterns: blue/dark-green, stripes vs. checkers): none of the fish reached ≥70% correct choices in two consecutive sessions within the eight testing sessions. In the last four sessions, the fish made on average 59% (sd ±5) correct choices, which was not significantly different from chance (Binomial test: *p* = 0.081; **Figure 4**).

Condition 2 (patterns: light-green/dark-green, stripes vs. checkers): within 3–4 sessions all fish were able to discriminate the patterns at a level of at least ≥70% correct choices in two

when the solid color conditions were compared, but performance was significantly worse for fish trained to condition 1 (blue—green patterns) relative to condition 2 (dark/light green patterns; significance levels are given above the bars). Additionally, results are compared to chance level (50% accuracy; insets in bars). ns—not significant, \* p < 0.05; \*\* p < 0.01.

consecutive sessions. In the last four sessions, the fish made on average 71.5% (sd ±7.8) correct choices, which was significantly different from chance (binomial test *p* = 0.0088; **Figure 4**).

Condition 3 (simple colors: blue vs. dark-green): within 3–4 sessions, all fish were able to discriminate the colors and reached a level of at least 70% correct choices over at least two consecutive sessions. In the last four sessions, the fish made on average 88.2% (sd ±10) correct choices, which was significantly different from chance (binomial test: *p* < 0.0001; **Figure 4**).

Condition 4 (simple colors: dark-green vs. light-green): within 3–4 sessions, all fish were able to discriminate the colors with a frequency of at least 70% correct choices over at least two consecutive sessions. In the last four sessions, the fish made on average 80.3% (sd +/−4.3) correct choices, which was significantly different from chance (binomial test: *p* < 0.0001; **Figure 4**).

Comparison of different conditions showed that the hypothesis that the green patterns are easier to discriminate than the blue/green patterns is correct (two-tailed *t*-test: *t* = 2.46, df = 6, *p* = 0.048). No significant difference was found between the performance of the fish in conditions 3 and 4 (solid colors, twotailed *t*-test: *t* = 1.24, df = 6, *p* = 0.26).

#### **EXPERIMENT 2**

Group 1: The group of fish initially trained to light/dark green patterns learned to discriminate the checked and striped patterns within 4–5 sessions (all three fish reached a level of ≥70% correct choices). Over the last four sessions, the fish reached an accuracy level of on average 83.3% (sd ±8.0) correct choices (**Figure 5**).

Following retraining to blue/green patterns, this group of fish was no longer able to discriminate the checked from the striped patterns. They reached a level of 59.2% (sd ±14.1) correct choices over the last four sessions (**Figure 5**).

Group 2: The fish initially trained to blue/green patterns were not able to discriminate the checked from the striped patterns in

**FIGURE 5 | Results of experiment 2**. Two groups of fish were trained to both pattern conditions, but in a different order. Group 1 fish were trained to light green-dark green patterns first and then retrained to blue-green patterns whereas group 2 experienced the opposite. In both cases, accuracy was significantly higher for dark-light green patterns and results for blue-green patterns were not significantly different from chance.

this condition. They reached a level of 62.5% (sd ±4.3) correct choices over the last four sessions (**Figure 5**).

Following retraining to light/dark green patterns they learned to discriminate the checked and striped patterns from the first session on with at least 70% accuracy. Over the last four sessions they reached a level of 81.7% (sd ±10.1) correct choices (**Figure 5**).

#### **COMPARISON OF THE CONDITIONS**

The performance of the fish in the two conditions (blue/green and light/dark green) was found to be significantly different (repeated measures ANOVA: F1, 4 = 71.16, *p* = 0.0011), no influence of the training sequence was found (*F*(1,4) = 0.01198, *p* = 0.92) and no interaction existed between the factor training sequence and condition (*F*(1,4) = 0.95, *p* = 0.38). *Post hoc* multiple comparisons showed that performance of the fish was consistent for the two repetitions of each condition (Sidak's multiple comparison test).

Following the retraining to the second pattern condition the fish were randomly allocated into two groups. One was retrained to blue vs. green simple color, the other to light green vs. dark green. Over the last four sessions, animals allocated to group 1 reached a level of 80% (sd ±10.7) and those allocated to group 2 a level of 65% (sd ±21) correct choices.

# **DISCUSSION**

Despite the colorful nature of many coral reef fish patterns, limited knowledge exists about the visual processing of color and patterns in fish. In many animals, visual processing of color and luminance is achieved via parallel processing channels. We aimed to test whether there is a spatially selective luminance channel in the coral reef fish, *Pomacentrus amboinensis* using a combination of visual modeling and behavioral experiments based on operant conditioning. We showed for the first time, that, similar to what has been described for terrestrial animals, contrast to L and/or Mcones is required for high spatial frequency pattern discrimination in reef fish.

In the first experiment, we compared the ability of fish to discriminate two colored squares, which differed in either, luminance (light green and dark green), or hue (blue and dark green with near equal L and M-cone quantum catch) or two patterned squares (checkers and gratings) made up of either of the two color combinations. All fish rapidly learned to associate a color or pattern with a food reward within the typical timeframe of 3–4 days post capture, observed in previous studies (Siebeck et al., 2008, 2009). The fish trained to a solid color (blue or dark green) were able to discriminate their rewarded square from another colored square with high accuracy, irrespective of whether the squares differed in hue or brightness.

The fish trained to light green—dark green checked patterns were also able to discriminate their rewarded stimulus from the distractor (light green—dark green gratings), while the fish trained to blue-green patterns (no L and M-cone contrast) were unable to discriminate checkers from gratings. At this point our results could be explained in two possible ways. Either the group of fish trained to this condition were unable to learn or had motivational problems often seen in behavioral experiments (Newport et al., 2014), or L and/or M-cone contrast is indeed required for high spatial frequency pattern discrimination in these fish.

To exclude the possibility of motivational or learning problems, in the second experiment we used a repeated measures design, in which each fish acted as its own control. One group of fish was initially trained to the patterns, which provided luminance contrast only (green—green) and then retrained to the patterns, which did not provide L/M-cone contrast (dark green dark blue). The other group of fish completed the experiment in the reverse order. Irrespective of the order of the conditions, the fish were only able to discriminate the patterns if contrast was provided for the L and M cones (i.e., the light green dark green patterns). We can therefore conclude that the loss in discrimination ability in the L/M isoluminant condition was not due to a loss in motivation or learning difficulty, and that contrast to the L and M cones is indeed required for the discrimination of high frequency patterns. Following the pattern discrimination, the fish were retrained a second time to either of the two simple color conditions. Interestingly, the fish allocated to the chromatic contras condition (blue vs. green) solved the task with much higher accuracy compared to those re-trained to the luminance contrast condition (light green vs. dark green). This further demonstrates the spatial selectivity of the luminance channel.

Our findings imply that reef fish also process visual stimuli in separate channels and that not all cones contribute equally to color and luminance vision when processing static patterns. The luminance channel receives input from L and M cones in primates (Livingstone and Hubel, 1988), L cones only in bees (Giurfa et al., 1997) and probably from double cones containing L visual pigments in birds. Due to the electric coupling found in some fish double cones, the long standing hypothesis has been that double cones are the most likely candidates for motion and luminance vision in fish (Boehlert, 1978; Lythgoe, 1979; Cameron and Pugh, 1991; McFarland, 1991). This hypothesis has recently been challenged by a study showing that both double cones can contribute separately to color vision in a reef fish (Pignatelli et al., 2010). Our results show for the first time that L and/or Mcone contrast is essential for pattern discrimination but it is still unclear whether in fish L cones only, or L and M cones contribute to luminance vision. What we can say however is, that, as the double cones in *P. amboinensis* contain the S and L sensitive cones (rather than M/L cones), they do not form the luminance channel as previously proposed and also, that contrast to the S-cone alone is not sufficient for pattern discrimination. While there is previous evidence which supports the existence of a separate channel for large field motion processing in fish (optomotor response is mediated via the L-cones of zebrafish and goldfish; Schaerer and Neumeyer, 1996; Krauss and Neumeyer, 2003), and small field motion processing via M-cones of goldfish (Gehres and Neumeyer, 2007), our study is the first to demonstrate high spatial acuity of the luminance channel in fish.

Overall, it seems that processing visual information in parallel channels is a general feature of visual systems within the animal kingdom, despite many differences in eye design, such as different optics, the morphology and number of photoreceptors with different spectral sensitivities and also perhaps most importantly in brain size and processing power. In primates we know that these parallel channels, i.e., the parvocellular, magnocellular and koniocellular pathways have their origin in the retina and follow all the way through to higher processing centres in the cortex where they feed into the ventral and dorsal streams (Ungerleider and Mishkin, 1982; Yoonessi and Yoonessi, 2011). Whether similar pathways exist in animals with smaller brains and reduced apparent processing power, such as fish and insects is an exciting field for further investigation.

#### **ACKNOWLEDGMENTS**

We would like to thank the staff at the Lizard Island Research Station for their support during the experimental part of the study. The study was supported by an ARC discovery grant to Ulrike E. Siebeck and Guy M. Wallis (DP14010043). Guy M. Wallis was supported through a Future Fellowship from the Australian Research Council (FT100100020).

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 June 2014; accepted: 08 September 2014; published online: 30 September 2014*.

*Citation: Siebeck UE, Wallis GM, Litherland L, Ganeshina O and Vorobyev M (2014) Spectral and spatial selectivity of luminance vision in reef fish. Front. Neural Circuits 8:118. doi: 10.3389/fncir.2014.00118*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Siebeck, Wallis, Litherland, Ganeshina and Vorobyev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Effects of dopamine on response properties of ON-OFF RGCs in encoding stimulus durations

# *Lei Xiao , Pu-Ming Zhang , Hai-Qing Gong and Pei-Ji Liang\**

*Department of Biomedical Engineering, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China*

#### *Edited by:*

*Andrea Benucci, RIKEN Brain Science Institute, Japan*

#### *Reviewed by:*

*Deborah Baro, Georgia State University, USA David J. Margolis, Rutgers University, USA*

#### *\*Correspondence:*

*Pei-Ji Liang, Department of Biomedical Engineering, School of Biomedical Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China e-mail: pjliang@sjtu.edu.cn*

Single retinal ganglion cell's (RGCs) response properties, such as spike count and response latency, are known to encode some features of visual stimuli. On the other hand, neuronal response can be modulated by dopamine (DA), an important endogenous neuromodulator in the retina. In the present study, we investigated the effects of DA on the spike count and the response latency of bullfrog ON-OFF RGCs during exposure to different stimulus durations. We found that neuronal spike count and response latency were both changed with stimulus durations, and exogenous DA (10µM) obviously attenuated the stimulus-duration-dependent response latency change. Information analysis showed that the information about light ON duration was mainly carried by the OFF response and *vice versa*, and the stimulation information was carried by both spike count and response latency. However, during DA application, the information carried by the response latency was greatly decreased, which suggests that dopaminergic pathway is involved in modulating the role of response latency in encoding the information about stimulus durations.

**Keywords: retinal ganglion cell, dopamine, response latency, firing rate, information coding**

# **INTRODUCTION**

Neuronal response activities contain many aspects, including firing rate, response latency, correlated activity pattern among neurons, etc. How neurons transmit external information via these characteristics is still not fully understood (Averbeck and Lee, 2004; Field and Chichilnisky, 2007). Neuronal firing rate can vary when in exposure to different stimuli, and thus encodes stimulus information (Richmond et al., 1987; Risner et al., 2010). On the other hand, some studies also revealed that the timing of individual spikes, especially the timing of the first spike after stimulus onset (identified as response latency), also played important roles in encoding the information about certain stimulus features, such as stimulus contrast, location, moving speed and direction, etc. (Gawne et al., 1996; Panzeri et al., 2001; Reich et al., 2001; Thiel et al., 2007; Gollisch and Meister, 2008; Risner et al., 2010; Nowak et al., 2011).

In the retina, dopamine (DA) is synthesized and released by dopaminergic interplexiform cells and amacrine cells in the inner retina during exposure to constant or flickering light (Witkovsky, 2004). Studies have shown that DA takes part in regulating circadian rhythmicity, retinal light and dark adaptation process, and contrast sensitivity, etc. (Witkovsky, 2004; Popova and Kupenova, 2011; Jackson et al., 2012). Besides, DA can also modulate neuronal properties, including electrical coupling between retinal neurons and glutamate-gated ionic currents, etc. (Witkovsky and Dearry, 1991; Maguire and Werblin, 1994; Bloomfield and Volgyi, 2009), which results in changes in the response characteristics of retinal ganglion cells (RGCs), such as firing rate, response latency, and receptive field size, etc. (Bonaventure et al., 1980; Witkovsky, 2004; Li et al., 2012).

Visual stimulation contains many important features, such as stimulus intensity, contrast, and duration. Previous study on retinal ERG showed that in the retinal DA-depleted mouse model, amplitudes of retinal ERG a-waves and b-waves (which respectively represented the function of rod photoreceptors and ON bipolar cells) exhibited significant deficits in light-adapted responses and contrast sensitivity (Jackson et al., 2012). In the retina, it was reported that depolarization degree of cone-driven OFF bipolar cells at light offset could be increased with the preceding light ON duration (Schwartz, 1974). And in our previous study, it was also observed that RGCs' responsiveness (including response latency and firing rate) changed with stimulus duration (Xiao et al., 2014). In the present study, we intended to study the effects of DA on the stimulus-duration-dependent response changes and information coding.

Using the multi-electrode recording system, the coding strategy of single bullfrog ON-OFF RGC in response to different stimulus durations, as well as the DA effects on RGC's response and coding ability, was investigated. It was observed that both response latency and spike count of ON response varied with light OFF intervals and *vice versa*. Information analysis showed that response latency and spike count both carried the information about stimulus durations. Application of exogenous DA (10µM) increased neuronal firing rate and shortened neuronal response latency, and it also attenuated the stimulus-durationdependent response latency change, and significantly decreased the information carried by the response latency. These results suggest that dopaminergic pathway is involved in modulating the role of response latency in encoding the information about stimulus durations.

# **MATERIALS AND METHODS**

#### **RETINAL RECORDING**

Experiments were performed on isolated bullfrog retinas at room temperature (22–26◦C) (Jing et al., 2010; Xiao et al., 2013, 2014). Bullfrogs were dark adapted for about 30 min prior to experiments. A piece of retina (about 4 <sup>×</sup> 4 mm2) was placed on micro-electrode arrays (MEA, MMEP-4, CNNS UNT, USA) with the ganglion cell side contacting the electrodes, and superfused with the oxygenated Ringer's solution. In pharmacological experiment, DA (10µM) (purchased from Sigma-Aldrich, St. Louis, MO, USA) was applied with the Ringer's solution.

Neuronal activities were recorded by the MEA consisted of 64 electrodes (8µm in diameter) which were arranged in an 8 × 8 matrix with 150µm tip-to-tip distance. Signals were amplified by a 64-channel amplifier (MEA workstation, Plexon Inc. Texas, USA; single-end amplifier, amplification 1000×, bandpass 100–8000 Hz), with each channel being sampled at a rate of 40 kHz (along with the stimulus). Spikes from individual neurons were sorted based on principal component analysis (PCA) method (Zhang et al., 2004) as well as the spike-sorting unit in the commercial software Offline Sorter (Plexon Inc. Texas, USA). In order to get accurate data for spike train analysis, only single-neuron events clarified by all the above-mentioned spike-sorting methods were used for further analyses (Li et al., 2012).

All procedures strictly conformed to the humane treatment and use of animals as prescribed by the Association for Research in Vision and Ophthalmology, and were approved by the Ethic Committee, School of Biomedical Engineering, Shanghai Jiao Tong University.

#### **STIMULATION PROTOCOLS**

Light stimuli were projected from a computer monitor onto the isolated retina via a lens system. Before application of stimulation protocols, full-field sustained dim white light (38.9 nW/cm2) was given for 30 s to adjust the RGCs' sensitivity to similar levels (Jing et al., 2010; Xiao et al., 2013).

In our experiments, two stimulation protocols were applied: (1) the stimulation with different light ON durations, in which light ON stimuli (77.7 nW/cm2) with duration of 1, 5, and 9 s were presented randomly in each trial and separated by 1-s full-filed light OFF intervals (about 0.0015 nW/cm2), and repeated for 30 trials (**Figure 1A**). (2) the stimulation with different light OFF intervals, in which randomized light OFF intervals of 1, 5, and 9 s were separated by fullfiled 1-s light ON stimuli, and also repeated for 30 trials (**Figure 1B**).

#### **INFORMATION ESTIMATION**

In the present study, metric-space method was used to estimate the stimuli information carried by both neuronal spike count and response latency of the first spike after stimulus onset (Victor and Purpura, 1996). The metric-space method measures the distance between two spike trains by three elementary manipulations: adding and deleting spikes at a cost of unity, as well as shifting spike timing, which costs *q* per unit time of moving. The value

Stimulus protocol with different light ON durations, in which 1-s, 5-s, and 9-s light ON were given randomly and separated by 1-s light OFF intervals in each trial, and repeated for 30 trials. **(B)** Stimulus protocol with different light OFF intervals, in which 1-s, 5-s, and 9-s light OFF intervals were presented randomly in each trial and were separated by 1-s light ON. Full-field sustained dim white light was given for 30 s before each stimulation protocol.

of *q* expresses the relative sensitivity to the precise timing of the spikes (Victor, 2005).

For a neuron with *Ntot* spike trains elicited by *Nsti* different stimuli, the shortest distance (*D*[*q*](*ri*,*rj*)) between one spike train (*ri*) and other spike train (*rj*) of this neuron (*i*, *j* = 1, 2,..., *Ntot*. and *i* = *j*) is computed based on the above mentioned three elementary manipulations. If one spike train (*r*) is elicited by a stimulus in class *s*α, the average distance (*d*(*r*,*s*<sup>γ</sup> )) from the spike train (*r*) to each of the spike trains elicited by stimuli of class *s*<sup>γ</sup> (γ = 1, 2,..., *Nsti*) is defined as (Victor and Purpura, 1996):

$$d(r, s\_{\mathcal{V}}) = \left[ \left< D[q](r, r')^z \right>\_{r' \text{ called by } s\_{\mathcal{V}}} \right]^{1/z},\tag{1}$$

where angle brackets denote the average over all the spike trains (*r* ) elicited by a stimulus in class *s*γ , and *z* is arbitrarily set as −2 (Victor and Purpura, 1996). After computing the average distance for every spike train, *Ntot* spike trains can be classified into *Nsti* response classes. This classification can be summarized by a matrix *N*(*s*α,*r*β), whose entries indicate the number of times that spike trains elicited by the stimulus class *s*α are classified into response class *r*β. If a spike train *r* is elicited by stimulus class *s*α(α = 1, 2,..., *Nsti*), it is classified into the response class *r*β(β = 1, 2,..., *Nsti*) when *d*(*r*, *s*β) is the minimum of all the average distances, and increment *N*(*s*α*, r*β) by 1. If there are *k* average distances sharing the minimum, elements of the matrix *N* corresponding to these *k* average distances are incremented by 1/*k*.

*N*(*s*α, *r*β) denotes the number of response sequences elicited by stimulus class *s*α which are classified to be the response elicited by stimulus class *s*β. Clustering performance can be quantified by the transmitted information *H* (Victor and Purpura, 1996):

$$H = \frac{1}{N\_{\text{tot}}} \sum\_{\alpha, \beta} N(s\_{\alpha}, r\_{\beta}) \left[ \log\_2 N(s\_{\alpha}, r\_{\beta}) - \log\_2 \sum\_{\alpha} N(s\_{\alpha}, r\_{\beta}) \right]$$

$$- \log\_2 \sum\_{\beta} N(s\_{\alpha}, r\_{\beta}) + \log\_2 N\_{\text{tot}} \right],\tag{2}$$

In our present study, there are three stimulation classes with equal probability (*Nsti* = 3), the maximal value of transmitted information (*H*) is log23 bits when perfect clustering occurs (*N*(*s*α,*r*β) = *Ntot*/3 for α = β and others are 0), while random clustering leads to *H* = 0.

*H* value changes with the cost parameter *q*, and we can obtain the information carried by different response component for different *<sup>q</sup>* value (Victor and Purpura, 1996). When *<sup>q</sup>* <sup>=</sup> 0 s−1, *<sup>H</sup>*<sup>0</sup> represents the amount of information contained in the spike count or firing rate. If the peak value of *H* (*Hpeak*) occurs at *q* > 0 s−1, it implies that there is some information contained in the temporal structure of spike train. The information contributed by response latency of the first spike is obtained by selecting the first spike in each trial only, and those trials in which no spike fired are excluded (Reich et al., 2001).

#### **BIAS IN ESTIMATING THE INFORMATION**

Estimating the information using Equation 2 with a limited number of trials will cause a sampling bias (Panzeri and Treves, 1996). To estimate this bias, we used Equation 2 to recalculate the information *H* after randomly associating spike trains with stimuli. The average value of 10 such calculations (*Hbias*) is the estimated bias in *H* value estimation (Victor and Purpura, 1997).

# **RESULTS**

Our experiments were performed on bullfrog retinas. Bullfrog RGCs can be classified into four subtypes based on their response properties: sustained edge detector, convexity edge detector, changing contrast detector, and dimming detector (Maturana et al., 1960; Ishikane et al., 2005). In our present study, more than 90% RGCs recorded were changing contrast detector, they respond transiently to both light ON and OFF stimuli, and hereafter our analyses were focused on such ON-OFF RGCs.

# **DA EFFECTS ON NEURONAL RESPONSE LATENCY AND SPIKE COUNT OF ON-OFF RGCs DURING EXPOSURE TO DIFFERENT STIMULUS DURATIONS**

In the retina, DA is an important neuromodulator. Activation of DA receptors can influence RGCs' responses (Witkovsky, 2004). In the present study, exogenous DA (10µM) was applied to study whether DA took part in modulating ON and OFF response characteristics, including firing rate and response latency, during exposure to different stimulus durations.

Raster plots of an example neuron during exposure to different light ON durations in the control condition and during DA application are plotted in **Figures 2A,B**, respectively. The timing of the first spike after stimulation switch was defined as the response latency (Greschner et al., 2006; Gollisch and Meister, 2008). Because bullfrog ON-OFF RGCs mostly only fired in the first 200 ms of light ON and OFF transients, only the first 200-ms responses during light ON and OFF stimulations were taken for further analyses in our present study.

Average response latencies and spike counts of the ON and OFF responses of the example neuron during exposure to different light ON durations are plotted in **Figures 2C,D**. In the control condition, the OFF response latency tended to be shortened and the spike count tended to be increased when light ON duration was prolonged, but the ON response did not exhibit obvious change. During DA application, average response latencies of both ON and OFF responses of this example neuron were shortened and spike counts were increased. On average, it was found that during DA application, ON and OFF response latencies did not exhibit obvious change with light ON duration, but the spike count of OFF response still tended to be increased.

The ON-time-dependent latency change of OFF response was quantified by the slope of linear fitting. For the example neuron, such a change was obviously attenuated during DA application [the linear fitting slope *k* = −1.099 and −0.093 in the control and DA conditions, respectively, while the relative difference of the fitting slopes (|*k*Con − *k*DA|/|*k*Con + *k*DA|) is 0.8440; **Figure 2C** inset]. The ON-time-dependent spike count change of OFF response in DA condition was similar to that in the control condition (the linear fitting slope *k* = 0.520 and 0.466 in the control and DA conditions, respectively, while the relative difference of the fitting slopes is 0.0548; **Figure 2D** inset).

Statistical results obtained from 45 RGCs of 6 retinas show that the OFF response latency was significantly decreased when light ON duration was increased in the control condition (**Table 1**; paired *t*-test, *p* < 0.05), but in DA condition there was no obvious difference for the OFF response latency during exposure to different light ON durations (**Table 1**; paired *t*-test, *p* > 0.05). Spike count of OFF response was obviously increased with light ON duration, and such tendency was kept during DA application (**Table 1**).

It is well acknowledged that light increment and decrement can activate retinal ON and OFF pathways, respectively. Different synaptic circuitries and neurotransmitter receptors of ON and OFF pathways make light response of ON and OFF RGCs show some differences in response sensitivity, temporal kinetics, and receptive field size etc. (DeVries, 1999; Chichilnisky and Kalmar, 2002; Zaghloul et al., 2003; Margolis and Detwiler, 2007). Thus, the effects of DA on response characteristics of ON-OFF RGCs during exposure to different light OFF intervals were further studied.

In exposure to different light OFF intervals, the raster plots of an example neuron in the control and DA conditions are shown in **Figures 3A,B**, respectively. For this example neuron, only ON response properties were changed, with latency shortened and spike count increased with light OFF interval, while OFF response properties did not exhibit obvious change (**Figures 3C,D**), which showed that ON-OFF RGCs' activities were mainly modulated by the preceding stimulus. During DA application, it was also observed that DA shortened neuronal response latency and increased neuronal spike count, and the

*t*-test.

**Table 1 | Average response latency and spike count in response to different light ON durations in the control and DA conditions (Mean ± s.e.m.,** *n* **= 45 RGCs from 6 retinas).**

panels show the cell's responses during 1-s/1-s, 5-s/1-s, and 9-s/1-s (ON/OFF)


*Bold entries indicate response properties (response latency and spike count) which were significantly changed with light ON durations. p* < *0.05, paired t-test.*

OFF-time-dependent latency change of ON response was obviously attenuated (**Figures 3C,D**).

The OFF-time-dependent response latency change of ON response was also quantified by the slope of linear fitting, and it was attenuated obviously during DA application (the linear fitting slope *k* = −0.620 and −0.026 in the control and DA conditions, respectively, and the relative difference of the fitting slopes is 0.9195; **Figure 3C** inset). In DA condition, the spike count of ON response was increased with light OFF interval, but the OFFtime-dependent spike count change of ON response was similar to that in the control condition (the linear fitting slope *k* = 0.246 and 0.229 in the control and DA conditions, respectively, and the relative difference of the fitting slopes is 0.0385; **Figure 3D** inset).

Statistical results from 23 neurons of 3 retinas also showed that the ON response latency was significantly decreased with light OFF interval in control condition, but in DA condition, it exhibited no obvious difference in exposure to different light OFF intervals, and the OFF-time-dependent spike count change of ON response was still kept with DA application (**Table 2**).

### **ROLES OF NEURONAL RESPONSE LATENCY AND SPIKE COUNT IN ENCODING STIMULUS DURATIONS**

Though neuronal response latency and spike count both varied with stimulus durations, many reports suggested that the contribution of neuronal response latency and spike count during information encoding is not equal (Panzeri et al., 2001; Gollisch and Meister, 2008). We then analyzed the contribution of the response latency and the spike count in encoding the information about stimulus durations based on the metric-space method (Equations 1 and 2) (Victor and Purpura, 1996).

**Figures 4A,B** show the results of applying the metric-space method to the entire sequence and only the first spike of one

**Table 2 | Average response latency and spike count in response to different light OFF intervals in the control and DA conditions (Mean ± s.e.m.,** *n* **= 23 RGCs from 3 retinas).**


*Bold entries indicate response properties (response latency and spike count) which were significantly changed with light OFF intervals. p* < *0.05, paired t-test.*

example neuron's ON and OFF responses during exposure to different light ON durations. When considering the entire ON and OFF response sequences (**Figure 4A**), the total information carried by ON and OFF responses (the maximum information value) and the information carried by the spike count (the information value at the cost *<sup>q</sup>* <sup>=</sup> 0 s−1) can be estimated, while the information contributed by response latency reaches its maximum value when only the first spike is considered (**Figure 4B**) (Reich et al., 2001).

For the example neuron, the total information carried by the OFF response was about 0.39 bits, which was obviously higher than that carried by the ON response (0.05 bits); the information carried by the spike count and the response latency of OFF response were 0.19 bits and 0.16 bits, respectively (**Figures 4A,B**). The OFF response carried more information about light ON duration than the ON response did, which was consistent with the results that OFF response characteristics were changed obviously with light ON duration.

Statistical results from 179 neurons of 10 retinas showed that the total information carried by the OFF response was significantly higher than that carried by the ON response (**Figures 4C,E**; paired *t*-test, *p* < 0.05). It is also shown that for the OFF response, the information carried by spike count was a little higher than that carried by response latency (**Figures 4D,E**; paired *t*-test, *p* < 0.05).

Information encoding during exposure to different light OFF intervals was also analyzed based on the metric-space method. **Figures 5A,B** show the results of applying the metric-space method to the entire sequence and only the first spikes of one example neuron's ON and OFF responses during exposure to different light OFF intervals. For this example neuron, the total information about light OFF interval carried by the ON response (about 0.62 bits) was obvious more than that carried by the OFF response (about 0.12 bits), and the information carried by the spike count of ON response (about 0.14 bits) was less than

that carried by the response latency of ON response (about 0.25 bits) (**Figures 5A,B**). Statistic results from 125 RGCs of 8 retinas also showed that ON response carried more information than OFF response (**Figure 5C**), and the information carried by the response latency was significantly more than that carried by the spike count (**Figures 5D,E**, paired *t*-test, *p* < 0.05).

#### **INFLUENCE OF DA ON INFORMATION CODING**

Our results showed that neuronal spike count and response latency both carried the information about stimulus durations. In pharmacological experiments, it was observed that 10µM DA could attenuate the stimulus-time-dependent response latency change, but it had little effect on the stimulus-time-dependent spike count change. So, the influence DA exerts on the capacity of information carried by the spike count and the response latency about stimulus durations was further examined.

Our results showed that DA did not obviously influence the total information carried by the entire ON- and OFF-sequence in response to different light ON durations (**Figure 6A**; paired *t*-test, *p* > 0.05, *n* = 45 cells from 6 retinas). On the other hand, in exposure to different light OFF intervals, DA did not obviously influence the total information carried by the entire OFF-sequence (**Figure 6D**; paired *t*-test, *p* > 0.05, *n* = 23 cells from 3 retinas), but tended to decrease the total information carried by entire ON-sequence (**Figures 6D,F**; paired *t*-test, *p* < 0.05). Given that RGCs' responses were mainly modulated by the preceding stimuli (**Figures 2**, **3**) and the information about light ON/OFF duration was also mainly carried by OFF/ON response (**Figures 4**, **5**), therefore our experiments were further focused on the effects of DA on information coding of the OFF response during exposure to different light ON durations and *vice versa*.

For the neuronal responses during exposure to different light ON durations, we respectively calculated the information carried by the spike count and the response latency of OFF response in the control and DA conditions based on the metric-space method (Victor and Purpura, 1996). In the control condition, information carried by the spike count of OFF response was little higher than that carried by the response latency. Application of DA significantly decreased the information carried by the response latency (**Figures 6B,C**; paired *t*-test, *p* < 0.05), but it did not obviously change the information carried by the spike count (paired *t*-test, *p* > 0.05). During exposure to different light OFF intervals, though the response latency of ON response carried more information than the spike count in the control condition, the information carried by the response latency also decreased significantly during DA application (**Figures 6E,F**; paired *t*-test, *p* < 0.05) and the information carried by the spike count did not changed obviously (paired *t*-test, *p* > 0.05).

Information about stimuli can be carried by neuronal activity only when the response variability is correlated with the stimulation parameters (Borst and Theunissen, 1999). The effects of DA on the information coding by the spike count and response

latency were consistent with the effects of DA on the stimulustime-dependent spike count and response latency changes.

# **DISCUSSION**

In the present study, the effects of DA on rate coding and latency coding of single bullfrog ON-OFF RGCs in exposure to different stimulus durations were investigated. Spike count and response latency were changed with the stimulus durations, and they both took part in encoding information about stimulus duration. DA at a concentration of 10µM obviously attenuated the stimulusduration-dependent response latency change and also decreased the information carried by the response latency. These results suggest that in the retina, dopaminergic pathway is involved in modulating the role of response latency in encoding the information about stimulus durations.

#### **EFFECTS OF DA ON THE STIMULUS-DURATION-DEPENDENT RESPONSE CHANGES**

DA is an important endogenous neuromodulator in the retina. It has been reported that RGCs' activities, including response latency and firing rate, can be influenced by DA (Bonaventure et al., 1980; Witkovsky, 2004). In the present study, we observed that DA-related pathway took part in modulating stimulusduration-dependent response latency changes.

It was reported that in the clawed frog, vitreal DA concentration was measured 564 ± 109 nM in the light-adapted condition, and the retinal DA uptake was saturated when the DA concentration in the bath was about 10 µM (Witkovsky et al., 1993). So, in our present study, 10µM DA was used to study the DA effects on the stimulus-duration-dependent RGCs' response changes. There are two classes of DA receptors, D1 and D2. Previous studies showed that D2 receptors are more sensitive to DA than D1 receptors, and light-induced dopamine release can desensitize D2 receptors (Witkovsky, 2004), so DA has concentration-dependent effects on different types of receptors. But, in our study, 10µM DA could saturate the effects of both types of DA receptors, which could help to identify whether DA played a role in modulating the stimulus-duration-dependent responses. Meanwhile, in the present study, we focused on the stimulus-durationdependent response changes and the effects of DA on it, and our experiments were performed at one level of brightness/contrast. Our experimental results showed that neuronal spike count was increased with stimulus durations at the selected level of brightness/contrast, during control and during 10µM DA application, which suggested that spike count was not saturated at this concentration of DA and the level brightness/contrast, so it was feasible to study the stimulus-duration-dependent response changes at this concentration of DA and level of brightness/contrast.

DA receptors have been found on retinal neurons, including RGCs (Witkovsky and Dearry, 1991). Activation of D1 and D2 receptors can modulate RGCs' excitability via regulating cAMP-dependent protein kinase in opposite ways

(Witkovsky and Dearry, 1991). A study in mouse RGCs showed that blocking D1-type receptors decreased RGCs' response amplitude, whereas blocking D2-type receptors had an opposite effect (Yang et al., 2013). Recent experiments performed on bullfrog retina suggested that light-induced dopamine release activated D1-type receptors and desensitized D2-type receptors (Li et al., 2012), and application of exogenous DA shortened the response latency (Li and Liang, 2013), which was consistent with our present results. So, one possible mechanism for the ON-timedependent OFF response changes is that prolonged light ON duration can increase DA release (Bloomfield and Volgyi, 2009), which will activate D1-type receptors in ON-OFF RGCs, and eventually increase the OFF responsiveness.

On the other hand, RGCs mainly receive excitatory inputs (glutamate) from bipolar cells, and modulation of bipolar cells' activities can directly influence RGCs' responses, especially the response latency. It was reported that the depolarization degree of cone-driven OFF bipolar cells at light offset could be increased with the preceding light ON duration (Schwartz, 1974), which should result in elevated the OFF responsiveness of RGCs. In the retina, bipolar cells receive glutamatergic input from cones, and DA can enhance glutamate-gated currents (Maguire and Werblin, 1994), which can elevate the depolarization degree of cone-driven OFF bipolar cells at light offset and RGCs' firing activities. DA release from dopaminergic amacrine cell is increased by light ON (Bloomfield and Volgyi, 2009). Another possible mechanism for the ON-time-dependent OFF response changes observed in our experiments is that prolonged light ON duration can increase DA release (Bloomfield and Volgyi, 2009), which will enhance glutamate-gated current in the retina, and eventually shorten the response latency and enhance the firing rate of OFF response.

However, DA release is decreased during darkness, so the DA effect can hardly explain the OFF-time-dependent ON response changes. It was reported that retinal ON and OFF pathways are asymmetric (DeVries, 1999; Chichilnisky and Kalmar, 2002), with the main difference being that ON bipolar cells possess metabotropic glutamate receptors (mGluRs) and OFF bipolar cells possess ionotropic glutamate receptors (iGluRs). Different mechanisms underlying the activation of these two types of receptors make postsynaptic cells exhibit opposite response polarities, activation of mGluRs indirectly closes cause cation channels through a signaling cascade that involves G-protein, while glutamate-gated cation channels are directly opened when glutamate binds to iGluRs (Yang, 2004; Oesch et al., 2011).

For cone-driven ON bipolar cells, it was reported that their activities can be depressed by the intracellular calcium concentration via inhibitory feedback to cation channels (Snellman et al., 2008). So, one possible mechanism for the OFF-time-dependent ON response changes is that prolonged light OFF interval can increase glutamate released by cones (Yang, 2004; Oesch et al., 2011), which results in a decrement in the intracellular calcium concentration in ON bipolar cells and increases these cells' activity to the following light ON stimulation, and eventually elevates the ON responsiveness of RGCs.

In general, the timing of the first spike after stimulation switch depends on the direct excitatory glutamatergic pathway from photoreceptors to ganglion cells via bipolar cells in the retina, but the firing rate depends on both direct excitatory input (glutamate) from bipolar cells and lateral inhibitory modulation (GABA and glycine) from amacrine cells. In the retina, DA can enhance glutamate-gated current (Maguire and Werblin, 1994), our results showed that during 10µM DA application, the stimulus-duration-dependent response latency change was attenuated, which suggested that DA-application eliminated the stimulus-duration-dependent glutamate-gated current change. However, DA had no significant effect on the stimulus-durationdependent firing rate change. Though application of exogenous DA (10µM) caused an increase in RGCs' firing rate, the stimulusduration-dependent firing rate change may be attributed to other mechanisms, such as the activities of GABAergic and glycinergic networks related to amacrine cells, as well as the stimulus-duration-dependent glutamate release by photoreceptors (Schmitz and Witkovsky, 1996).

### **CODING STRATEGIES OF RETINAL ON AND OFF PATHWAYS**

Spike count and response latency are basic and important neuronal response properties, which are both involved in neuronal information coding. Some experimental studies showed that response latency could convey information about stimuli in addition to that encoded by spike count (Panzeri et al., 2001; Reich et al., 2001; Chase and Young, 2007; Storchi et al., 2012). Quantitative analysis also showed that for some neurons, stimulus information was more carried by the spike count, but some other neurons might carry more information by the response latency (Panzeri et al., 2001; Reich et al., 2001; Storchi et al., 2012), which is similar to our results (**Figures 4**, **5**). Our statistical results showed that the response latency of ON response carried more information than the spike count when in exposure to different light OFF intervals, but the spike count of OFF response carried more information about light ON durations, which suggested that ON and OFF pathways are asymmetric in encoding stimulus durations.

As mentioned, bipolar cells in retinal ON and OFF pathways exhibit asymmetric properties (Oesch et al., 2011). It was reported that ON cone bipolar cells can cross-inhibit OFF bipolar cells and OFF RGCs through the activation of AII amacrine cells (Margolis and Detwiler, 2007; Oesch et al., 2011), and thus extend the dynamic range of signaling in the OFF pathway (Manookin et al., 2008). Furthermore, ON and OFF RGCs have different excitatory and inhibitory synaptic inputs, which results in different spike time and spike count variability in these cells (Uzzell and Chichilnisky, 2004; Murphy and Rieke, 2006).These differences in the neural network between retinal ON and OFF pathways may induce different encoding strategies in neuronal ON and OFF responses (Zaghloul et al., 2003; Masland, 2012; Harris and Mrsic-Flogel, 2013; Xiao et al., 2013).

In addition, some other temporal patterns of neuronal response, such as inter-spike intervals and the precise timing of spikes other than the first one, may also carry stimulation information (Reich et al., 2001), which is considered as the residual information and it can be estimated by the difference between the total information carried by the entire sequence and that carried by the spike count and the response latency (Reich et al., 2001). In our present study, the residual information carried by the OFF response in exposure to different light ON durations (about 0.04 ± 0.01 bits, Mean ± s.e.m., **Figure 4**) was obviously less than that carried by the ON response in exposure to different light OFF intervals (0.15 ± 0.02 bits, Mean ± s.e.m., **Figure 5**). In the metric-space method, the value of *q* expresses the relative sensitivity to the precise timing of the spikes (Victor, 2005). In the present study, the temporal precision limitation for information capacity (Reich et al., 2001), which was a measure of the precision with which spike times can be used to distinguish one stimulus from others, was defined as 1000/*qmax*, where *qmax* was the value of *q* at which *H*(*q*) was the peak value of information, and it was found that ON response had more precise spike timing (1000/*qmax* = 12.5 ± 2.6 ms, Mean ± s.e.m.) than OFF response (1000/*qmax* = 26.6 ± 3.7 ms, Mean ± s.e.m.) in distinguishing stimulus durations. Given that response latency is also one component of the temporal pattern of neuronal response, these results further suggest that ON and OFF responses of bullfrog ON-OFF RGCs may adopt different strategies.

#### **EFFECTS OF DA ON INFORMATION CODING**

DA plays an important modulatory role in the retina, it can modulate retinal circadian clock, visual sensitivity, and gapjunctional connectivity between neurons, etc. (Li and Dowling, 2000; Witkovsky, 2004). Some studies also showed that DA could modulate the spatial and temporal pattern of RGCs' activities, and these modulations might exert effects on visual information processing (Li et al., 2012; Bu et al., 2014). It was recently reported that application of exogenous DA did not influence the tendency of neuronal firing rate change in exposure to different stimulation patterns, but decreased the correct rate of population-activitybased stimulation pattern discrimination (Li and Liang, 2013). In our present study, exogenous DA (10µM) was used to probe the effects of DA on RGCs' responsiveness to different stimulus durations, and it was also observed that DA did not influence the stimulus-time-dependent spike count change, but reduced the visual information encoded by the response latency of singe RGCs.

Neurons can carry stimulus information only when the response variability is correlated with the stimulation parameters (Borst and Theunissen, 1999). In our present study, although exogenous DA (10µM) elevated neurons' responsiveness (Maguire and Werblin, 1994), it only attenuated the stimulus-time-dependent response latency change without affecting the stimulus-time-dependent spike count change. Hence, DA only decreased the information carried by response latency. These results suggested that in the retina, dopaminergic pathway may modulate the role of response latency in encoding the information about stimulus durations.

#### **CONCLUSIONS**

In the present study, exogenous DA (10µM) and one level of brightness/contrast were used to probe the effects of DA on RGCs' responsiveness to different stimulus durations, we observed that DA obviously attenuated the stimulus-duration-dependent response latency change, but had little effect on the stimulusduration-dependent firing rate change. Information analysis also showed that DA obviously decreased the information carried by the response latency. These results suggest that dopaminergic pathway takes part in modulating the role of response latency in encoding the information about stimulus durations.

#### **AUTHOR CONTRIBUTIONS**

Research questions: Lei Xiao, Pei-Ji Liang. Experimental design: Lei Xiao. Performed experiments: Lei Xiao, Hai-Qing Gong. Data analysis: Lei Xiao, Pu-Ming Zhang, Pei-Ji Liang. Manuscript preparation: Lei Xiao, Pu-Ming Zhang, Pei-Ji Liang.

### **ACKNOWLEDGMENTS**

This work was supported by grants from National Fundation of Natural Science of China (No. 61075108, Pei-Ji Liang; No. 61375114, Pu-Ming Zhang) and Graduate Student Innovation Ability Training Special Fund of Shanghai Jiao Tong University (Lei Xiao).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 December 2013; accepted: 12 June 2014; published online: 30 June 2014. Citation: Xiao L, Zhang P-M, Gong H-Q and Liang P-J (2014) Effects of dopamine on response properties of ON-OFF RGCs in encoding stimulus durations. Front. Neural Circuits 8:72. doi: 10.3389/fncir.2014.00072*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Xiao, Zhang, Gong and Liang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Method and software for using m-sequences to characterize parallel components of higher-order visual tracking behavior in *Drosophila*

#### *Jacob W. Aptekar 1, Mehmet F. Keles 1, Jean-Michel Mongeau1, Patrick M. Lu1, Mark A. Frye1 \* and Patrick A. Shoemaker <sup>2</sup>*

*<sup>1</sup> Department of Integrative Biology and Physiology, Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA, USA <sup>2</sup> Tanner Research Inc., Monrovia, CA, USA*

#### *Edited by:*

*David D. Cox, Harvard University, USA*

#### *Reviewed by:*

*James E. Fitzgerald, Harvard University, USA Aurel A. Lazar, Columbia University in the City of New York, USA*

#### *\*Correspondence:*

*Mark A. Frye, Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles Young Drive, Los Angeles, CA 90095-7239, USA e-mail: frye@ucla.edu*

A moving visual figure may contain first-order signals defined by variation in mean luminance, as well as second-order signals defined by constant mean luminance and variation in luminance envelope, or higher-order signals that cannot be estimated by taking higher moments of the luminance distribution. Separating these properties of a moving figure to experimentally probe the visual subsystems that encode them is technically challenging and has resulted in debated mechanisms of visual object detection by flies. Our prior work took a white noise systems identification approach using a commercially available electronic display system to characterize the spatial variation in the temporal dynamics of two distinct subsystems for first- and higher-order components of visual figure tracking. The method relied on the use of single pixel displacements of two visual stimuli according to two binary maximum length shift register sequences (m-sequences) and cross-correlation of each m-sequence with time-varying flight steering measurements. The resultant spatio-temporal action fields represent temporal impulse responses parameterized by the azimuthal location of the visual figure, one STAF for first-order and another for higher-order components of compound stimuli. Here we review m-sequence and reverse correlation procedures, then describe our application in detail, provide Matlab code, validate the STAFs, and demonstrate the utility and robustness of STAFs by predicting the results of other published experimental procedures. This method has demonstrated how two relatively modest innovations on classical white noise analysis—the inclusion of space as a way to organize response kernels and the use of linear decoupling to measure the response to two channels of visual information simultaneously—could substantially improve our basic understanding of visual processing in the fly.

**Keywords: vision, optomotor, psychophysics, fixation, attention,** *Drosophila***, system identification, dynamics**

# **INTRODUCTION**

Visual figure detection is a central capability demonstrated by sophisticated visual systems, including those of flies (Reichardt and Wenking, 1969; Reichardt and Poggio, 1976). In some species, this capability extends even to tracking targets that subtend less than one ommatidial facet, and thus fall below classical detection limits (O'Carroll and Wiederman, 2014). Such sensitivity implies that figure tracking capitalizes on highly specialized neural mechanisms. On the basis of physiological studies in flies (Dipterans), cells housed by third and fourth-order visual neuropils in these animals—i.e., the lobula plate and lobula—are strongly implicated in such functions. Neural elements have been identified that have distinct responses to discrete visual objects, including "figure detecting" (FD) cells (Egelhaaf, 1985a,b), "small target motion detector" (STMD) cells (O'Carroll, 1993; Nordström et al., 2006; Nordström and O'Carroll, 2006), and even some lobula plate tangential cells (LPTCs) (Lee and Nordström, 2012) that for years have been supposed to serve primarily wide-field optic flow analysis. However, although progress has been made in understanding phenomenological aspects of figure detection in flies, its computational basis is still largely unexplained—as are the ways in which it relates to the various other perceptual modes of vision, and how they all are transformed and recombined or selected to produce calibrated motor commands for control of visual orientation.

In earlier work, a white-noise-based systems identification technique that is conventionally used with linear systems was applied to characterize the optomotor reactions of flies to various modes of wide-field motion (Theobald et al., 2010a). More recently, we have reported several studies of *visual figure tracking* in fruit flies (Aptekar et al., 2012; Fox and Frye, 2014; Fox et al., 2014) in which we elaborated on this basic technique to develop a representation known as the *spatiotemporal action field* (STAF). A *STAF* is defined as a *function of time and space* that represents a *temporal impulse response* for some behavioral reaction, evaluated as a function of the *position* of a feature in the visual field. It provides a dynamical model of optomotor behavior over some limited range of operating conditions. Its application to a (usually highly non-linear) biological system, like that supporting figure detection, requires an assumption of *local* or *quasi*-linearity (specified for those operating conditions), approximate time invariance (i.e., behavioral consistency), and *temporal superposition* of responses evoked at different spatial locations (**Figure 1A**). In order to be accepted as a dynamical model, it must be validated for the range of conditions over which it is supposed to be applicable. The aim of this paper is to promote understanding of the STAF methodology by describing the theory, the experimental context, and the analysis techniques surrounding the formalism in detail. In addition, we describe instances of its application to visual figure detection, including special measures taken to ensure its validity, the results so obtained, and finally provide relevant software and documentation to facilitate the use of the technique.

# **METHODS**

### **APPLICATION OF THE M-SEQUENCE TECHNIQUE TO FIGURE TRACKING IN FLIES: DEPENDENCE ON FIGURE AND ELEMENTARY MOTION**

It has long been established that fruit flies will attempt to track—i.e., exert yaw torque to turn toward—to fixate—vertically elongated objects in their visual fields (Reichardt and Wenking, 1969; Maimon et al., 2008). The figure-centering fixation response in *Drosophila* is clearly seen for figures corresponding to actual physical objects– i.e., those in which the motion of any internal luminance patterns corresponds to the motion of the mean luminance distribution defining the object itself but in addition to such first-order or *Fourier motion*, flies also track figures defined by the envelope of mean luminance (secondorder) and also figures that are defined by higher-order properties that do not correspond to or do not contain first-order signals (Theobald et al., 2008; Aptekar et al., 2012; Zhang et al., 2013). For example, figures that comprise moving windows in which flickering patterns are displayed, or even figures in which the elementary or first-order motion of the internal texture is opposed to the direction of motion of the window itself [the so-called "theta" stimulus (Zanker, 1993)], all elicit a fixation response. The characteristics of the responses to these various types of figures do, however, differ measurably.

Based on prior experimental and theoretical work, it has been posited in the past that there are two components to figure tracking efforts: an optomotor response aligned with the velocity of motion, and an orientation response toward the position of flicker generated by motion (Reichardt and Wenking, 1969; Pick, 1976; Reichardt and Poggio, 1976; Wehrhahn and Hausen, 1980; Wehrhahn, 1981; Kimmerle et al., 2000). Recent evidence suggests that flies can in fact distinguish figures based on a broad range of spatiotemporal disparities, including cases in which flicker is uniform throughout the visual field (Theobald et al., 2010b), and our hypothesis with respect to figure tracking behavior is that the visual system of the fly extracts two streams of information in response to general figure motion, one related to the elementary motion of luminance edges of internal texture (smallfield Elementary Motion, sf-EM), if present, and the other to the overall motion of the figure itself (Figure Motion, FM) under the assumption that the FM system encapsulates not only the position of local flicker (i.e., classical "position" system input), but also any other higher-order spatio-temporal statistical disparities generated by a *moving* figure., and that the total behavioral response approximates a superposition of efforts commanded by the two streams (Aptekar et al., 2012). In order to design practical experiments to test this hypothesis, a time-efficient and reliable assay methodology is needed. For this we use a technique based on the *maximum length sequence*, or m-sequence, which has proved to be a useful tool for linear time-invariant system identification. (For reference, the m-sequence technique and its mathematical underpinnings are reviewed in the Supplementary Material, Section 7.) We used m-sequence techniques to extract two independent, additive components—represented in terms of two functions, termed the "EM STAF" and the "FM STAF"– that together characterize visual behaviors in response to vertically-oriented moving figures.

The experimental context in which these concepts were studied (Aptekar et al., 2012) is illustrated in **Figure 1**. Details of the wingbeat analyzer, LED flight arena, control software, and data acquisition have been published previously (Reiser and Dickinson, 2008; Fox et al., 2014). All experimental and analysis scripts are freely available as Matlab code (see Supplementary Material). The visual figures used in all experiments were vertical bars or windows (subtending 120◦ vertically and 30◦ azimuthally in a fly's field of view), displayed against a static background in a cylindrical arena with the fly tethered at center (**Figure 1B**). The interpixel separation was 3.75◦. Within the figure window was displayed a spatial pattern with the same spatial statistics as background. Motion of a Fourier bar, in which the EM and FM are identical, is illustrated in the first three frames of **Figure 1C**, whereas a presentation of FM with no EM (a "drift-balanced" stimulus) is displayed in frames 4–6. On the digital display, a triangle sweep of a Fourier figure (EM = FM, **Figure 1Di**) is produced by discrete velocity impulses that periodically reverse direction (**Figure 1Dii**). **Figures 1E,F** illustrate the application of the m-sequence technique. The figure is stepped one pixel in one direction or the other according to a periodically-applied m-sequence (**Figure 1E**) and the steering effort produced by the fly, quantified as the difference -*WBA* between left and right wingbeat amplitudes (Tammero et al., 2004), is measured and regarded as the system output *y*. If it is assumed that the responses to individual steps die out within the period of the m-sequence (an assumption to be examined in further detail below), circular cross-correlation of the output with the m-sequence can be used to obtain an estimate of a velocity impulse response or kernel function *g* (**Figure 1F**). This procedure relies on the fact that the autocorrelation of an m-sequence approximates a delta function (this approximation is imperfect due to the presence of a small dc error, as discussed in the Supplementary Material).

There is ample evidence that magnitudes of reactions to first-order motion (Krapp et al., 1998) and to figures (Pick, 1976; Reichardt and Poggio, 1979) vary with stimulus location in the visual field. The STAF representation that characterizes such variation is therefore constructed by applying the stimuli around the entire visual field in the azimuthal direction. Because

**FIGURE 1 | Systems identification approach for studying figure tracking behavior. (A)** The amplitude of steering responses to arbitrary figure motion (or stationary flicker) may be non-linear over the visual field, (∗ highlight two regions with different local rate of change in the dynamics of the steering response) but can be approximated over small spatial domains by a linear function (red). The STAF methodology approximates this steering response by estimating linear filters from m-sequences that are localized in space. **(B)** A circular display subtends 330◦ of the fly's visual field. The stimulus sequences are panoramic and 96 pixels in extent, but 8 physical pixels subtending 30◦ are omitted from the back of the display for access. A vertical grating of randomly segregated ON and OFF elements makes a stationary background containing broad band spatial wavelengths. A figure is defined by a 30◦ window (delineated in blue), within which the surface texture (denoted in red) varies from and replaces the background. The spatial statistics of the internal texture matches those of the background. The figure window itself can be displaced independently from the texture within it. **(C)** Example of figure motion. The figure is composed of the same pseudo-random pattern as the ground, therefore the figure is defined only by its relative movement. Displacement of the window provides figure motion (FM, highlighted in blue) that is undetectable by a standard motion detection model, which can be modulated independently from the displacement of the surface texture that generates small-field elementary motion that would be readily detected by an EMD-based system (small-field elementary motion [sf-EM] highlighted in red). In this simple case, a first-order "Fourier bar," FM and sf-EM move coherently in the same direction for frames 1–3. In frames 4–6, FM is toward the right and there is no sf-EM within the figure window (i.e., the pattern within this window remains stationary). **(D)** A Fourier bar is displaced in one pixel steps 90◦ back-and-forth across the visual azimuth. (i) is a space-time plot of the stimulus (in which azimuth constitutes the only spatial dimension), and (ii) illustrates how that each 3.75◦ step (minimum pixel-spacing in LED arena) in the position of the figure corresponds to an impulse in velocity. **(E)** Motion of the solid Fourier bar (i.e., FM = sf-EM) is modulated by velocity impulses controlled by a m-sequence (see Methods) producing a pseudo-random motion trajectory centered in this case near visual midline. (i) Space-time plot of movie; (ii) m(t), pseudorandom sequence of impulse responses in velocity; (iii) position [time-integral of m(t)] of the figure; (iv) y(t), animal steering response to stimulus in (i). **(F)** Cross-correlation of the m-sequence (*m*) in degrees with the animal's steering response (*y*) proportional to the difference in amplitude across the two wings (*-WBA*) provides an estimate of the velocity impulse response (*g*).

m-sequences are non-stationary and applied periodically, the required cross-correlations can be performed over sliding windows at various azimuths, each containing one full period of the m-sequence. The spatial dependence of the system is assumed to be approximately linear over the corresponding narrow range of figure positions, as illustrated in **Figure 1A**. A kernel function *g*(*t*) extracted from a single period is associated with the average position of the figure centroid during the period, and the set of kernel functions for all such positions are *concatenated* to obtain a STAF representation (**Figure 2**). The STAF is therefore defined at discrete times *t* (i.e., at multiples of the sampling interval) and discrete azimuth angles γ (the mean locations assumed by the figure centroids over the various individual m-sequences). With respect to the spatial resolution of this scheme, it can be shown that if the position dependence of a kernel function is approximately linear over the range of figure positions assumed during a single cycle of the m-sequence, then the estimate of the kernel computed over that cycle is very nearly equal to its value at the average position. In order to ensure that this was the case, we used relatively short m-sequences (of length *p* in the range 127–255) so that the total excursion of the figure was limited during any single period. For example, the standard deviation of the displacement from the mean position for a 7th order (127 element) m-sequence is between 3 and 4 pixels, or about 15◦. The STAFs obtained for several sequence lengths were compared to verify that the spatial dependence was captured at these lengths.

In the primary set of experiments reported in Aptekar et al. (2012), we used a compound stimulus, in which the position of the figure window and the spatial texture internal to the figure were stepped *independently* at the same times—the figure according to one m-sequence *mFM*, and the internal pattern according to a second *distinct* m-sequence *mEM* of the same order, as suggested in **Figure 2**. Under the hypothesis that EM- and FM-driven components of the response are quasilinear and they superpose, two independent kernel functions, *gFM*(*t*) for figure motion and *gEM*(*t*) for internal elementary motion, can be obtained by crosscorrelation of the output with *mFM* and *mEM*, respectively. The function *gFM*(*t*) represents the impulse response with respect to figure velocity, i.e., γ˙, and *gEM*(*t*) the impulse response with respect to the velocity *vEM* of the internal first-order motion. In addition to the autocorrelation property of m-sequences, this analysis relies on the fact that the cross-correlation of distinct m-sequences is nearly zero (see the Supplementary Material). The *gFM* and *gEM* obtained at different locations may each be concatenated around the azimuth to obtain respective STAF representations *GFM*(*t*, γ ) and *GEM*(*t*, γ ), as illustrated at bottom in **Figure 2**.

The Fourier transforms of these STAFs, according to the customary linear time-invariant systems interpretation, would give the frequency-domain representation of the system transfer functions parameterized by azimuth. These may be useful for qualitative characterization of the STAFs (e.g., how they may be interpreted as filters), but due to the restrictions discussed in below, they cannot be interpreted as general models of the optomotor figure response.

The most basic restriction on STAFs as models relates to the limits of quasilinear behavior of the system relative to the

**FIGURE 2 | Dissociating Figure Motion (FM) from small-field Elementary Motion (sf-EM) and measuring the non-linear variation in the impulse response to the motion of each over space.** Two m-sequences (*m*) are used to independently modulate the elementary motion of the small-field surface of the figure (sf-EM, red) and figure motion (FM, blue). FM in the absence of sf-EM would resemble a drift-balanced figure in which the figure "overwrites" the ground pattern with a new random texture, but generates no coherent motion signals. A property of the m-sequence is that the figure ends the trial displaced one pixel from its starting location, and the mean position is centered on the starting location. Cross-correlation of each of the two m-sequence signals with the difference of left and right wingbeat amplitude (-WBA) steering response data yields two impulse response estimates for the sf-EM stimulus (*gEM*(*t*)) and the FM stimulus (*gFM*(*t*)). By evenly sampling the visual azimuth of the LED display, the impulse response filters are concatenated into a function of space and time, a spatio-temporal action field (STAF) for the sf-EM and FM signals, respectively (at bottom). These functions are spatially smoothed with a four pixel boxcar.

experimental protocols used to determine them. When stimuli conform to such limits, then under the assumption of temporal superposition, a STAF-based model for the optomotor figure response can be expressed in the time domain in terms of convolutions of the position-dependent kernels *GFM*(*t*, γ ) and *GEM*(*t*, γ ) with, respectively, azimuthal figure velocity γ˙ and the velocity *vEM* of elementary motion (if present). In these timedomain convolutions, figure positions must be parameterized according to the times at which they were assumed. If motion begins at time *t* = 0, then the complete expression for the steering response is:

$$\begin{aligned} \boldsymbol{\gamma}(t) &= \int\_{\mathbf{r}=0}^{t} \left[ G\_{EM} \left( t - \mathbf{r}, \boldsymbol{\gamma}(\mathbf{r}) \right) \cdot \dot{\boldsymbol{\gamma}} \left( \mathbf{r} \right) \right. \\ &\quad + G\_{EM} \left( t - \mathbf{r}, \boldsymbol{\gamma} \left( \mathbf{r} \right) \right) \cdot \boldsymbol{\nu}\_{EM} \left( \mathbf{r} \right) \right] d\mathbf{r} \end{aligned}$$

$$+\int\_{\theta=0}^{\mathcal{V}(0)} \mathcal{G}\_{\rm FM}(t,\theta) \, d\theta \,. \tag{1}$$

The origin for the azimuth angle γ is identified with the figure location at which no steering effort is exerted by the FM system, that is, at front center of the animal. The second integral term in (1) represents the effect of the initial figure position as predicted by this model; it is zero if the figure starts at front center. The response of the FM system to a *stationary* figure at azimuth γ predicted by the model would be <sup>γ</sup> <sup>θ</sup> <sup>=</sup> <sup>0</sup> *GFM* (∞, θ) *<sup>d</sup>*θ.

In practice, the STAF estimates are computed (that is to say, *sampled*) only at discrete times and positions. The STAFs obtained in our study (Aptekar et al., 2012) vary smoothly and could be interpolated to obtain values off of this sampling grid when dealing with continuous motion, or with discrete time and position grids differing from the original. In point of fact, most laboratory display technologies produce sequences of discrete image frames and will thus impose position steps/velocity impulses at discrete times. In such case, the convolution in (1) becomes a sum:

$$\mathcal{Y}\left(t\right) = \sum\_{\mathfrak{r}=0}^{t} G\_{\rm FM}\left(t-\mathfrak{r}, \mathcal{Y}\left(\mathfrak{r}\right)\right) \cdot \Delta\_{\rm FM}\left(\mathfrak{r}\right)$$

$$+ G\_{\rm EM}\left(t-\mathfrak{r}, \mathcal{Y}\left(\mathfrak{r}\right)\right) \cdot \Delta\_{\rm EM}\left(\mathfrak{r}\right)$$

$$+ \sum\_{\theta=0}^{\mathcal{Y}\left(0\right)} G\_{\rm FM}\left(t, \theta\right) \cdot \Delta\_{\rm F}\left(\theta\right) \tag{2}$$

where -*FM* (τ ) represents the step in figure position and -*EM* (τ ) the step in internal pattern position at discrete time τ over the particular stimulus history, and -*<sup>F</sup>* is the magnitude of the figure step at each discrete angle θ for which *GFM* (*t*, θ) is defined between θ = 0 and θ = γ (0).

#### **EXPERIMENTAL DESIGN: SPECIAL MEASURES FOR FIGURE TRACKING**

With this approach, care must be taken to consider likely deviations from linearity and other effects that influence the interpretation of the STAF as characterizing the optomotor system, and to ensure this, a number of special measures were taken in the design of experiments and analysis of the resulting data.

For instance, there is a great deal of evidence that elementary motion is processed in the visual system by local elementary motion detectors (EMDs) that compute spatiotemporal luminance correlations between neighboring or nearby visual sampling units (Buchner, 1976; Egelhaaf et al., 1989; Haag et al., 2004). Because the EM STAF depends on first-order motion, it is reasonable to assume that the neural machinery underlying it must involve EMDs. The operation of the EMD is inherently nonlinear, and its output depends not just on velocity of motion but other characteristics of the visual scene as well. However, areas of visual texture in our experimental protocols conform to consistent spatial statistics, and when they move they are stepped at a regular rate by a single pixel, which is on the order of the interreceptor angle—so under these conditions it may be justifiable to interpret the mean EMD response to an individual step as an impulse response function. We also expect that if a number of EMD outputs were summed over a region of retinotopic space, such as the area subtended by a finite-sized object, there would be a relative reduction in the standard deviation of the resulting signal. If the downstream processing that transforms the summed outputs into a motor command is approximately linear, then the interpretation of a *behavioral* step response may be justifiable. Prior results suggest that this is indeed the case for optomotor responses to wide-field motion (Theobald et al., 2010b).

However, from this qualitative discussion it is clear that constraints must be imposed on the design of experiments used to determine a STAF that depends on EMD processing—and similarly, that limits apply to interpretation of the results. For one, motion impulse responses ought not be estimated based on object steps much greater than the spatial basis of the EMD correlation; the variance of the output increases while its expected value approaches zero as the longest spatial wavelengths in the image are exceeded by the step. In addition, because the dependence of mean EMD output on image speed is non-linear (and in fact nonmonotonic), the accuracy of an EM STAF as a representation of the optomotor control system is likely to degrade as object speeds vary significantly from the product of the step size and image update rate used in its experimental determination.

Currently, little is known about the processing that enables the fly visual system to distinguish a figure from background based on the variety of spatiotemporal differences that have been shown to support figure tracking in behavioral experiments. Thus, there is no guidance available from computational theory about the limits of an experimentally-determined FM STAF as a representation for optomotor behavior. However, one result of prior studies is especially significant with respect to its estimation: as mentioned in the prior section, there is a component of figure response that both theory and experiment suggest is fundamentally *position-dependent* (Pick, 1974; Buchner et al., 1984), in that steering efforts can persist for seconds when the position of a figure is *stationary* and it is located away from front center in the visual field (Pick, 1976). It is not known at present if this effect can be well-represented as the asymptotic behavior of an FM STAF obtained from experiments with *moving* figures. However, we should at least expect that reactions of the FM system to steps in figure position may be more akin to *step* than *impulse* responses, in that (unlike an EM-dependent STAF) they may assume non-zero values at long times. In order to extract the kernel associated with the figure response, we assumed that the *slope* of such position step responses does approach very small values over times corresponding to the duration of one cycle of an m-sequence, and made use of the fact that the time derivative of the output in response to figure motion can be written:

$$\frac{d\mathbf{y}}{dt} = \frac{d(m\_{FM} \* \mathbf{g}\_{FM})}{dt} = m\_{FM} \* \frac{d\mathbf{g}\_{FM}}{dt},\tag{3}$$

where ∗ indicates temporal convolution. In this case, the crosscorrelation *uFM* of *dy*/*dt* with *mFM* may be computed to provide an estimate of the derivative *dgFM*/*dt* of the desired kernel function, and this may (in principle) be integrated to obtain *gFM*. However, the *dc error term* also present in this cross-correlation, when integrated, would result in an accumulating error that would nearly cancel the desired result at times approaching the duration, *t* = *p* − 1 of the m-sequence. Thus, it is desirable to

take measures to correct for this dc error. We note that this error, which takes the value <sup>−</sup><sup>1</sup> *p p*−<sup>1</sup> *j* = 0 *dgFM dt* , is proportional to the asymptotic value of *gFM* at long times, and thus may be eliminated if this asymptote can be estimated and added to *uFM* prior to integration. For this purpose, we use the average of *<sup>k</sup> <sup>j</sup>* <sup>=</sup> <sup>0</sup> *uFM* over times *k* corresponding to 2–5 s. During this interval the slope *dy*/*dt* typically assumes small values. Formally, this approximates the DC response term as having the same magnitude as terms for very low bar velocity, for which there is no measurable deviation of the steering effort from the static bar position, consistent with the fly tracking the absolute position, rather than the very low velocities of the bar.

It should be emphasized that the contribution to *GFM*(*t*, γ ) obtained by integration of (3) at a particular γ represents the *change* in the FM-driven figure response induced by a step in figure position at that location—i.e., the FM STAF is an *incremental* representation.

When elementary motion is present within the figure, the analysis of *its* contribution is complicated by the figure position response: if *gEM* is estimated by cross-correlation of *mEM* with *y*, the estimate is contaminated by the dc component of the FM response whenever the figure is off of the midline. In addition, when relatively short m-sequences are used (as was the case in our experimental design), the cross-correlation between *mFM* and *mEM* may also be appreciably different from zero. This results in cross-contamination of the estimates for both *gFM* and *gEM*; that is, each would be the sum of the desired kernel and a small proportion of the other when a simple cross-correlation is used. In order to reduce these sources of error, our full protocol comprised two sets of stimuli, interleaved randomly in time and each covering the entire visual field. In one, *mFM* and *mEM* respectively drove the figure and internal pattern steps, whereas in the second, *mFM* and −*mEM* were used. The outputs in these two cases are, respectively,

$$\begin{aligned} \gamma\_1 &= m\_{\rm FM} \ast \varrho\_{\rm FM} + m\_{\rm EM} \ast \varrho\_{\rm EM}, \\ \gamma\_2 &= m\_{\rm FM} \ast \varrho\_{\rm FM} - m\_{\rm EM} \ast \varrho\_{\rm EM}. \end{aligned}$$

During analysis, an estimate of *gEM* can be formed by crosscorrelating *mEM* with the *difference* between these two output sequences (or equivalently, taking the difference between the cross-correlations with each):

$$2\mu\_{\rm EM} = m\_{\rm EM} \ast \mathcal{y}\_1 - m\_{\rm EM} \ast \mathcal{y}\_2,\tag{4}$$

in theory eliminating the effect of the dc figure position response as well as any cross-contamination due to finite cross-correlation. Similarly, the *sum* of cross-correlations of the derivatives of the output sequences with *mFM* yields a cross-contamination-free estimate of *d*(*gFM*)/*dt*:

$$2\mu\_{\rm FM} = m\_{\rm FM} \ast d\mathbf{y}\_1/dt + m\_{\rm FM} \ast d\mathbf{y}\_2/dt,\tag{5}$$

where the use of the dc error correction methodology discussed above is implicitly assumed.

Due to the nature of the compound stimulus, one additional and subtle source of cross-contamination between the kernel estimates is present. When the figure and internal pattern are stepped syndirectionally in our protocol, the entire 8-pixel-wide pattern is shifted by one pixel in the common direction of motion, and there is the potential for spatiotemporally-correlated changes across 8 interpixel boundaries. However, when the two are stepped antidirectionally, only the center 6 pixels of the internal pattern are visible before and after the step, so that spatiotemporallycorrelated changes can appear only across six boundaries. Thus, the effective extent of the coherently moving pattern is larger for syndirectional motion, and we would expect the response component driven by elementary motion to also be larger than for antidirectional steps. The stimulus used in practice was therefore modified to eliminate this source of cross-contamination by replacing the boundary pixels of the figure at random for each syndirectional step in the entire sequence.

Finally, a related issue with short sequences is the presence by sheer chance of more spatiotemporal correlations in one direction than the other during a cycle of the sequence, as a figure passes over and the fixed background becomes visible. In order to reduce this effect, we replaced the random background pattern every three periods of m-sequence excitation during the course of an entire experiment.

In our study, the magnitudes of the figure and elementary motion steps were 3.75◦ for all applied stimuli (although the signs of each of course varied with time in a manner unique to each stimulus). In any event, the validity of this representation should be expected to hold only for circumstances in which the mean velocity of motion approximates the product of the step size and image update rate used in the experimental determination of the STAFs.

# **RESULTS**

#### **APPLICATION TO TRACKING OF GENERAL FIGURES IN DROSOPHILA**

The results of our original figure tracking study using white noise techniques support the hypothesis that the total response to a figure against a static background approximates a superposition of efforts commanded by two processing streams, as characterized by the EM and FM STAFs (Aptekar et al., 2012). The spatial and temporal characteristics of the two STAFs differ significantly. The temporal dependence of the EM-STAF shows a clear "impulseresponse" shape, with a short onset delay, rapid integration time, and near-zero asymptote, consistent with response to the velocity of the EM. In contrast, the FM-STAF displays a slow onset delay and persists for many seconds, consistent with a slower effort to track the retinotopic position of the figure. The EM response is strongest when the figure is present within the frontal field of view, diminishing gradually in amplitude with increasing displacement of the figure away from midline. In contrast, the spatial profile of the FM-STAF resembles a classic "center-surround" function in that the peripheral response is inverted relative to the response at the midline, and the spatial integral over the entire azimuth is near zero. This indicates that an incremental change in figure position within the frontal field of view results in an increment in the steering effort toward the figure (positive gain), but a position step within the periphery results in a decrement in the steering effort (negative gain, although not necessarily a reversal in the steering direction since the STAF is an incremental representation). Furthermore, our experiments confirmed that the FM system can operate in the absence of any coherent motion. We presented a moving figure that was dynamically updated with a new random internal pattern at each time step, such that no net coherent motion was present in the stimulus in any direction. When the motion of such a figure is driven by a single white noise sequence, the spatial and temporal characteristics of the turning reactions and the derived STAFs are nearly identical to those of the FM-STAFs obtained from the original figures containing uncorrelated EM. Furthermore, for a stimulus in which EM and FM of the figure covary (i.e., a Fourier bar), the resultant STAF is, to good approximation, simply the sum of the FM-STAF and EM-STAF obtained from the original experiment.

# **VALIDATION OF ASSUMPTIONS AND MODELS**

#### *Response to standard figure stimuli*

The most authoritative and general validation of the STAF-based model is its predictive power with respect to arbitrary stimulus scenarios. In Aptekar et al. (2012), we predicted responses to triangle sweeps of Fourier bars, of theta bars (in which the EM of texture within the bar is opposite in direction to the FM), and to trajectories in which EM and FM were driven by novel independent m-sequences (i.e., sequences different than those used to obtain the STAFs). During these simulations, the EM and FM step magnitudes and update rates were maintained at the same values as in the experiments used to determine the STAFs, and responses were predicted as the superposition of EM and FM responses as in (1). Predictive power was assessed by computing Pearson's R2-values for modeled vs. experimentally measured responses to these stimuli—and was found to be 0.9 or greater in all three cases. We have reproduced one such comparison for a Fourier figure sweeping at constant velocity across the frontal 180◦ of the visual field (**Figure 3Ai**). Measured and STAF-modeled results are in very close agreement (**Figure 3Aii**, with the STAFs used for the model indicated within the inset at left).

#### *Response symmetry*

A corollary of our assumption of quasilinearity is that the responses to progressive and regressive motion (either EM or FM) at a given velocity are roughly equal and opposite in sign at any location in the visual field. While the white noise technique captures the first-order component of behavior, i.e., the first-order Volterra kernel, even when non-linearity is present, the accuracy of the STAF as a dynamical model depends on how well linearity is approximated. However, results from other studies have been interpreted as suggesting that such asymmetry is in fact present. For example, Bahl et al. (2013) postulate that figure responses can be decomposed into "position" and "motion" components (roughly comparable to our FM and EM responses, respectively) and attempted to isolate these components in two distinct experiments. Discrepancies between the results of these experiments were taken as evidence for response asymmetry in that study.

To examine this issue, we considered the results of this prior study (Bahl et al., 2013), which addressed the cellular mechanism of EM detection for figure tracking by a tethered fly walking on an air-supported ball. In such an experiment, the fixed fly can "steer" the ball by walking in different directions. The apparatus is surrounded by several computer monitors that project perspective-corrected revolutions of a solid black vertical bar on a white background. The bar was rotated at constant velocity, and the fly's turning effort was measured by the displacement of the ball below the tethered fly. In response to constant velocity revolution of the bar in each of two directions (clockwise and counter clockwise), the animals tend to show smaller responses to the bar as it revolves from the rear toward the frontal field of view (backto-front, BTF) by comparison to the steering response when the bar crosses midline and moves front-to-back, FTB). We used the STAFs collected from *flying* animals to predict the responses of the *walking* flies. Convolving the stimulus trajectory (**Figure 3B**) with the EM and FM STAFs (**Figure 3A** inset) produces modeled estimates that qualitatively match the behavioral responses of walking flies plotted in Bahl et al. (**Figure 3Bii**). To estimate the response component generated by the static position of the bar, Bahl et al. added the CCW and CW spatial trajectories, which are well approximated by our STAF predictions (**Figure 3Biii**). To estimate the response component generated by the motion of the bar, Bahl et al. subtracted the spatial trajectories. This predicts that the fly's response to elementary motion is at a minimum for an object in the frontal visual field, which directly opposes the prevailing evidence in the field. However, our STAF predictions show that this phenomenon is not a result of insensitivity to motion in the frontal visual field because we can recapitulate this apparent result using our STAFs which show maximal sensitivity to EM and FM in the frontal visual field (**Figure 3Biv**).

We conclude that the result observed by Bahl et al., was accentuated by a stimulus that moved at a rate that maximizes the apparent effect of hysteresis on the fly's steering behavior. We then show that the same effect is observed when the stimulus from Bahl et al., is convolved with our STAFs. We concede that it may be surprising that the results would be so similar for walking and flying animals, but argue that this explanation is more parsimonious than the unexpected alternative that walking flies are relatively insensitive to frontal motion (i.e., a prominent dip in the motion response function for a figure positioned near 0◦, **Figure 3Biv**).

Hence, the STAF functions provide robust predictions of figure tracking responses to arbitrary visual stimuli presented in the same behavioral apparatus in which the STAFs were measured (**Figure 3A**), as well as qualitatively reasonable approximations to behavioral measures taken with walking flies in a completely different apparatus (**Figure 3B**).

Based on these results, we conclude that response asymmetry occurs for figure motion along extended continuous paths, and is a consequence of the spatial variations of the response characteristics in combination with their temporal dependence. Small displacements, conversely, do not produce the asymmetry. This view is supported by results of studies on these animals under stimulus conditions similar to ours (Buchner, 1976; Reichardt and Poggio, 1979; Kimmerle et al., 2000; Maimon et al., 2008; Theobald et al., 2010b). One example appears in Figure 4B of Maimon et al. (2008), in which a solid dark bar was oscillated about several mean positions relative to the visual midline. The fly's steering response has two components: a slow sustained turn

and s.e.m. indicated by gray shaded envelope), and responses predicted by convolution of trisweep trajectory with both of the sf-EM and FM STAFs (indicated with insets at left) (red). R<sup>2</sup> <sup>=</sup> coefficient of determination, indicating degree of correlation between STAF estimate and actual behavior. **(B)** STAFs predict responses measured under different experimental conditions. (i) the stimulus trajectory of a bar revolving around a circular arena at constant velocity in either the clockwise (CW) or counterclockwise (CCW) directions. (ii) convolution of the stimulus trajectory from (i) with both the sf-EM and FM STAFs models to predict turning responses (red). Overlaid (gray) are the mean STAF predictions. (iii) for a sufficiently slow stimulus, addition of the bi-directional fly turning responses to the revolving bar produces an estimate of turning response to the bar's position (gray), which is well-approximated by the addition of the two STAF predictions from (ii) (red). (iv) subtraction of the bi-directional fly turning responses to the revolving bar produces an estimate of the turning response to the local motion of the bar (gray), which is well-approximated by the subtraction of the two STAF predictions from (ii) (red).

toward the bar's position when it is off the midline, and a superimposed oscillatory steering response. At every mean azimuth for which the periodic response is significant, it is symmetric; there is no clear evidence of the pronounced harmonic distortion that would result from significant asymmetry between front-to-back and back-to-front responses. Similar results are obtained from experiments in our own lab (**Figure 4**). By way of comparison, asymmetry is apparent in experiments using longer trajectories (Götz, 1968; Reichardt and Poggio, 1976; Maimon et al., 2008; Bahl et al., 2013). Both sets of findings are valid, but the key finding with respect to our work is that the STAF model is capable of capturing extended-path results.

# *STAFs predict reverse-phi illusion for wide-field yaw, but not small-field EM*

Visual systems that compute motion from space-time luminance correlations sampled at neighboring receptors are susceptible to a visual illusion called reverse-phi (Anstis, 1970). For example, a black and white vertical grating pattern that is displayed on a computer screen, drifting to the right is perceived to instead drift to the left if the contrast polarity flickers (black to white and visaversa). Virtually every animal, including humans, that perceives apparent motion is susceptible to the reverse-phi illusion. The standard implementation of the Hassenstein-Reichardt elementary motion detector (HR-EMD) (Hassenstein and Reichardt, 1956) is also susceptible to this illusion, which provides strong evidence for this model in the computation of motion in biological vision (Aptekar and Frye, 2013), particularly in flies (Tuthill et al., 2011). Proof positive of the primacy of an EMD circuit to a navigational task is mirror-symmetric reversal of an animal's steering effort to a "reverse-phi" stimulus relative to a "phi" stimulus (**Figure 4**). Furthermore, for the normal phi motion stimuli, the responses to motion in each of two opposing directions are equal in magnitude and time course (**Figure 4**). Under the same constant-velocity stimulus conditions used to evaluate response symmetry, we tested reverse-phi motion responses in the same flies, which confirms prior results demonstrating opposite directional steering responses (**Figure 5A**) (Tuthill et al., 2011).

with shading. Note that the motion-induced responses are nearly symmetric, and opposite for reverse-phi. BTF, indicates regressive, back-to-front motion on the eye; FTB, indicates progressive, front-to-back motion. For the stimulus at ±45◦, the rapid motion-induced oscillations

superimposed upon the following BTF response, which has been reflected about the vertical axis, to demonstrate that the time course and steering trajectory of FTB responses in this case are nearly equal and opposite to the BTF responses.

Accordingly, the EM STAF is sign-inverted for the reversephi stimulus (**Figure 5B**). However, consistent with our model of figure-motion (FM) being an EMD-independent quality of a figure-like input, STAFs collected with reverse-phi stimuli reveal that the FM stream is entirely insensitive to the reverse-phi illusion, showing similar spatial and temporal properties for phi and reverse-phi conditions in the same flies (**Figure 5B**). These results are consistent with a model of figure detection that is described as "flicker dependent" as both a phi and reverse-phi figure on a stationary ground contain similar flicker signals [We note, however, that a flicker-based model fails to explain figure-tracking on a moving ground when both figure and ground contain similar local flicker (Fox et al., 2014), or when the figure and ground flicker at the same rate (Theobald et al., 2010b)].

However, we also note that while, as predicted by the EMD model, the EM STAF shows an inversion of its kernel, consistent with a reversal of the perceived direction of motion encoded by the EM within the figure (**Figure 5B**), the response is not equal and opposite to the phi response (**Figure 5C**). This may be expected for some range of pattern velocities because the reverse-phi version of a stimulus tends to flicker at approximately 2x the rate of the complementary phi stimulus (Tuthill et al., 2011). To examine this idea we recorded full-field yaw kernels (Theobald et al., 2010a) at the same frame update rate as the STAFs. Wide-field phi and reverse-phi kernels were collected with an identical group of m-sequences to those used for the STAFs, and the wide field version of the EM response is near perfectly inverted (**Figure 5C**). Taken together, these results would suggest that the output of EMDs integrated for tracking elementary motion within a moving figure is treated differently than standard EMD-based motion processing implemented within the widefield motion pathway, and may be worthy of further exploration. This example highlights the power of the STAF technique to identify nuanced differences in the combinatorial processing of multiple motion-cues simultaneously.

#### *STAFs to assess eye occlusion and binocular overlap*

A useful application of the STAF methodology is to interrogate visual field-specific deficits that may be imposed by limited genetic lesions. Such experiments place stringent requirements on a behavioral assay to be both highly sensitive—able to identify small lesions—and precise—able to repeated across a number of a subjects to similar effect. To validate that the STAF methodology is able to identify such retinotopic deficits, we undertook a set of experiments where we painted over one eye in adult female wildtype *D. Melanogaster* before compiling STAFs for these flies. Animals were tethered to tungsten pins and head-fixed with dental acrylic. Once tethered, while still under cold anesthesia, an eyelash brush was used to apply two coats of water diluted acrylic paint (Carbon Black, Golden Fluid Acrylics, New Berlin, NY) to the cuticle overlying one or the other eye. To verify total coverage of the eye, each preparation was observed and photographed under a 10x magnification dissecting microscope prior to being run. Subjects were rejected if any part of the occluded eye was

**FIGURE 5 | STAF validation: reverse-phi illusory motion.** For a periodic stimulus, reversing the contrast polarity of the pattern during apparent motion generates the illusion of motion moving in the opposite direction for any motion detection system based on the EMD. **(A)** Data replotted from **Figure 4** for normal phi motion (blue), superimposed with results from reverse-phi stimuli (red) collected in the same animals. **(B)** STAFs collected with normal phi apparent motion compared to those collected with reverse-phi stimulation in the same group of individual flies. Note that the EM-STAF is negative, indicating the reverse-phi illusion, but the FM STAF is essentially unaffected by the motion illusion. **(C)** Full-field yaw kernels measured for phi and reverse-phi motion collected from the same flies. By comparison, the "slices" of the EM STAFs at zero-degrees azimuth for the normal phi and reverse-phi stimuli are not equal in amplitude (arrowhead).

visible to inspection or if the paint had entrapped the ipsilateral antenna. Subjects were run through the STAF assay according to standard protocol. While we did not expect that eye painting completely blinds the treated eye, we expected the retinal input to be significantly attenuated.

Our results clearly demonstrate a significant reduction in behavioral response amplitude in the occluded visual field under the STAF protocol in both the EM and FM channels (**Figure 6A**). Furthermore, to verify the retinotopic accuracy of the STAF technique, we mounted a fly in two-axis gimbal under our dissecting scope and, using the GFP epifluorescence channel, took photos of the fly pseudopupil over the full azimuth and pitch axes (**Figure 6B**). The pseudopupil is the region of the compound eye that appears dark when viewed from a particular angle due to colinearity of the viewpoint with acceptance angle of the ommatidia. We used a machine-vision algorithm to count the number of ommatidial facets from each eye visible at each point on the sphere and to reconstruct the region of binocular overlap. The fly was restrained and imaged at 10x magnification with coaxial illumination in a dissecting microscope using a DAPI filter set. This produced strong reflectance from the photopigment and made clear the position of the pseudopupil in one or both eyes. We then produced a threshold mask over the pseudopupil to capture its shape. To calculate how many ommatidia it contained, we tessellated this mask over the original image at eight random locations very near to the pseudopupil where the curvature of the eye was approximately the same. Within each of these tessellated windows, we created a binary mask to identify the septa and a watershed algorithm to count the number of disjoint regions in this mask (the number of discrete ommatidia). Finally, we averaged this count across all eight tessellated windows and used that as the final ommatidial count for the pseudopupil from that vantage. We found that, when convolved with the width of the stimulus bar width (30◦), the anatomically measured region of azimuthal binocular overlap was in good agreement with the behaviorally measured region of binocular overlap defined as the overlap between the two single-eye occluded EM STAFs (**Figure 6C**), and also in agreement with prior measurements using a different method (Wolf and Heisenberg, 1984). The implication here is that the spatial tuning of the STAFs is in part determined by the region of binocular visual overlap, thus forming a sort of "motion fovea" in the frontal field of view.

#### *Statistical analysis to compare STAFs across experimental treatments*

In order to establish the general utility of the STAF, it is important to demonstrate that the methodology is sufficiently precise to provide robust statistics for inter-group comparison. This requires enough self-similarity between subjects within a particular group with respect to our method of measurement that groups may be differentiated by a *t*-test or ANOVA. To demonstrate this principle, we provide a set of single-animal STAFs from the eye occlusion study in **Figure 6**, where one can clearly observe strong features of the average STAF manifest at the level of individual subjects (Figure S1). The dimensionality of the STAF representation is very low with respect to a singular value decomposition, such that a single principal component captures nearly 90% of the

population variance (Figure S1), demonstrating that the STAF is in fact a relatively low-dimensional function. This suggests that, although each STAF is composed of <sup>∼</sup>10<sup>5</sup> data points, we may significantly correct our false discovery rate (FDR) to reflect this low-dimensionality. We used the Benjamini-Hochberg algorithm to control for the FDR (Benjamini and Hochberg, 1995). This algorithm is suited to control for the FDR in cases where many of the observations (pixels of the STAF) may be positively correlated. Because of the relative large contribution of low spatial and temporal frequencies to the STAFs (i.e., they are relatively smooth), it is suitable to assume a high level of correlation in the values of neighboring pixels and, therefore, the B-H method is well-suited to control for the FDR. Results of the B-H corrected comparisons between the single-eye occluded STAFs are shown in **Figure 7**.

These difference maps demonstrate that the STAF methodology has sufficient precision to provide a robust interpretation of subtle phenotypes resulting from perturbing the underlying circuitry. Animal-to-animal variation is certainly apparent in the STAFs (Figure S1), and analysis of such variation could be facilitated by the STAF method.

Alternatively, to evaluate individual animal performance, a fitthen-compare method to identify significant differences between STAFs is also feasible, as has been deployed to analyze spatiotemporal receptive fields (Woolley et al., 2006). Fitting with a sum of exponentials model identifies the time constants and asymptotic amplitudes of STAFs (Fox et al., 2014) and statistical comparison of the fit coefficients would be more sensitive to small differences that may not reach significance under the pure probabilistic approach given here, and should be employed in cases where the mode of differentiation between test and control groups can be hypothesized *a priori*.

# **DISCUSSION**

In summary, our work has demonstrated the utility of a whitenoise-based system identification technique for analysis of complex, visually-mediated behavior in the fruit fly. In particular, it has painted a clearer picture of two distinct perceptual streams that contribute to figure-tracking behavior:


(3) A total tracking effort approximated by a superposition of the outputs of the two streams.

These results are embodied in Spatio-Temporal Action Fields, a representation that yields a model for optomotor behavior, whose derivation is described in detail in this paper along with the conditions, experimental measures, and limitations required for their validity. We contend that the STAF methodology, when applicable, offers more in this regard than the measurement of raw steady-state responses to the classic repertoire of stimuli periodic or unidirectional motion of periodic gratings and solid bars—that has been used in past studies of optomotor behavior.

By modifying the STAF methodology, a recent study explored the influence of active figure tracking against a moving visual surround. Instead of displaying separate EM and FM components of a figure on a stationary visual surround, the movement of a solid Fourier bar (EM = FM) and the visual panorama were controlled two m-sequences (Fox et al., 2014). The composite Figure STAF is well approximated by the superposition of the EM and FM STAFS (Aptekar et al., 2012), containing both the rapid EM driven impulse response, and also the slow FM driven step response. The Figure STAF and the Ground STAF show distinct spatial and dynamical characteristics, most importantly demonstrating that the presence of a figure in the frontal visual field either suppresses the normal optomotor response that is driven by azimuthal background motion or that the total control effort is shared by the two subsystems. A potential problem with using the STAF methodology in this manner is that the two m-sequences control EM visual stimuli in adjacent regions of the visual field. The two m-sequences are typically updated at the same frame rate. Thus, for ½of the total displacements, the figure and the ground are displaced in the same direction by the same amount (a single 3.75◦ pixel)—the figure, defined here only by its relative motion, would disappear from view. We therefore examined the influence of phase-shifting the displacement of the figure and ground so that the two stimuli are interleaved rather than displaced simultaneously in time. By running these two conditions on the same group of flies, we demonstrated that there is no significant influence of shifting the two m-sequences.

The development and application of the STAF methodology bears significantly on an unresolved dispute in the literature between the view that "position" detection emerges from the D(psi) function (Poggio and Reichardt, 1973), which is based solely on the asymmetry between front-to-back and back-to-front responses to a moving figure, and the view that motion responses are approximately symmetric and position detection is instead based on static receptive fields that are driven by flicker (Pick, 1974, 1976; Buchner et al., 1984). There were two limitations that impeded a broader understanding of the mechanisms at work. First, the temporal dynamics of the two subsystems are crucial to the interpretation, and, prior to our method, there was no way to fully separate the "velocity" component from the "position" component of feature detection *without holding the figure stationary*. A slowly revolving solid bar might generate little flicker but generates other higher-order spatiotemporal statistical disparities that flies track; similarly, a stationary flickering bar is a relatively weak stimulus because it is not moving. By separating the first-order and higher-order properties of a moving visual figure, our prior work generally supports the Pick model, since we deploy low angle displacements (for which no asymmetry can be detected), measure the influence of first-order and higher-order components simultaneously for a *moving* figure, and find that the superposition of the EM and FM components *predict* the Reichardt model responses, *including* the misleading "notch" in the derived motion function (**Figure 3**). In summary, the EM component is equivalent to a classical "velocity" servo, and the FM component captures a classical "position" servo driven by flicker. However, flicker alone is not the sole determinant of the FM component Theobald et al. (2010b). Instead, other spatio-temporal disparities also contribute.

In more general terms, the decomposition of visual information into visual features is an important function of any high performance visual system. For humans, the field of psychophysics has explored these capacities for more than a century. The evidence from that work points generally to cortical mechanisms for feature extraction. In contrast, a half century of work in flies has shown that these animals accomplish similar feature extraction within the secondary and tertiary optic ganglia—the medulla, lobula, and lobula plate (Egelhaaf, 1985a,b,c; Reichardt et al., 1989; Egelhaaf et al., 1993, 2003; Kimmerle and Egelhaaf, 2000; Aptekar et al., 2012; Fox et al., 2014). As these systems become more tractable with the advent of genetic tools for lesioning and imaging specific subsets of cells within these parts of the fly brain, in addition to the completion of full-fledged wiring diagrams, the need for more nuanced behavioral tools is acute.

The specificity of new genetic tools that robustly and repeatedly target an identifiable cell pathway presents a complementary set of challenges to the behavioral neuroscientist: while it is technically easier to determine the behavioral effects of large lesions to the nervous system of the fly—e.g., the genetic inactivation of many neurons—it is correspondingly harder to identify the functional role of small sets of neurons playing highly specialized roles in visual processing. Lesions that affect few or single neurons may often have only subtle effects on behavior, so that while the identity of the lesioned cells may be well-determined, the behavioral relevance may not be. To overcome these challenges, fine-grained and sensitive approaches to studying behavior are needed.

Because the STAF characterizes both the spatial organization and dynamical properties of an optomotor figure tracking response, it provides a tool for an integrated understanding of the functional components of the visual pathway—and in addition, can help the behavioral neuroscientist who studies genetically targeted lesions to understand where a deficit occurs and what sort of visual processing has been affected. Specific advantages include:


To conclude, this work has demonstrated how two relatively modest innovations on classical white noise analysis—the inclusion of space as a way to organize response kernels and the use of linear decoupling to measure the response to two channels of visual information simultaneously—could substantially improve our basic understanding of the fly visual system. The aim of this paper has been to extend understanding of the STAF methodology by describing the set of behavioral assays and analysis techniques surrounding the STAF formalism in detail, to discuss the particular value of the STAF technique to the study of lesions in the visual system, and to provide relevant software and documentation to facilitate the use of the STAF technique.

# **ACKNOWLEDGMENTS**

This work was supported by grant FA9550-12-1-0034 from the US Air Force Office of Scientific Research. Mark Frye is an Early Career Scientist with the Howard Hughes Medical Institute. We are grateful to the rigorous, thoughtful and thorough efforts by the two Reviewers.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fncir.2014. 00130/abstract

# **REFERENCES**


Zhang, X., Liu, H., Lei, Z., Wu, Z., and Guo, A. (2013). Lobula-specific visual projection neurons are involved in perception of motion-defined second-order motion in *Drosophila*. *J. Exp. Biol.* 216(Pt 3), 524–534. doi: 10.1242/jeb.079095

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 June 2014; accepted: 09 October 2014; published online: 31 October 2014. Citation: Aptekar JW, Keles MF, Mongeau J-M, Lu PM, Frye MA and Shoemaker PA (2014) Method and software for using m-sequences to characterize parallel components of higher-order visual tracking behavior in Drosophila. Front. Neural Circuits 8:130. doi: 10.3389/fncir.2014.00130*

*This article was submitted to the journal Frontiers in Neural Circuits.*

*Copyright © 2014 Aptekar, Keles, Mongeau, Lu, Frye and Shoemaker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Motion as a source of environmental information: a fresh view on biological motion computation by insect brains

# **Martin Egelhaaf \*, Roland Kern and Jens Peter Lindemann**

Department of Neurobiology and Center of Excellence "Cognitive Interaction Technology" (CITEC), Bielefeld University, Bielefeld, Germany

#### **Edited by:**

Davide Zoccolan, International School for Advanced Studies, Italy

#### **Reviewed by:**

Pavel M. Itskov, Champalimaud Centre for the Unknown, Portugal Thomas Collett, University of Sussex, UK

#### **\*Correspondence:**

Martin Egelhaaf, Department of Neurobiology and Center of Excellence "Cognitive Interaction Technology" (CITEC), Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany e-mail:

martin.egelhaaf@uni-bielefeld.de

Despite their miniature brains insects, such as flies, bees and wasps, are able to navigate by highly erobatic flight maneuvers in cluttered environments. They rely on spatial information that is contained in the retinal motion patterns induced on the eyes while moving around ("optic flow") to accomplish their extraordinary performance. Thereby, they employ an active flight and gaze strategy that separates rapid saccade-like turns from translatory flight phases where the gaze direction is kept largely constant. This behavioral strategy facilitates the processing of environmental information, because information about the distance of the animal to objects in the environment is only contained in the optic flow generated by translatory motion. However, motion detectors as are widespread in biological systems do not represent veridically the velocity of the optic flow vectors, but also reflect textural information about the environment. This characteristic has often been regarded as a limitation of a biological motion detection mechanism. In contrast, we conclude from analyses challenging insect movement detectors with image flow as generated during translatory locomotion through cluttered natural environments that this mechanism represents the contours of nearby objects. Contrast borders are a main carrier of functionally relevant object information in artificial and natural sceneries. The motion detection system thus segregates in a computationally parsimonious way the environment into behaviorally relevant nearby objects and—in many behavioral contexts—less relevant distant structures. Hence, by making use of an active flight and gaze strategy, insects are capable of performing extraordinarily well even with a computationally simple motion detection mechanism.

**Keywords: optic flow, motion detection, spatial vision, insects, natural environments**

# **ENVIRONMENTAL INFORMATION AS A BASIS FOR VISUALLY GUIDED ORIENTATION**

A key function of vision is to extract behaviorally relevant information about the outside world from the activity patterns evoked in the retina. Especially fast locomotion requires information about the spatial layout of the environment to allow for meaningful behavioral decisions. Spatial information can be obtained from the relative movements of the retinal image, the optic flow patterns that are generated on the eyes during locomotion.

Visual motion information is not only generated on the eyes when a moving object crosses the visual field, but also all the time while the animal moves around in the environment. Despite this ongoing movement on the retina, we usually perceive the outside world as static. Nevertheless, the retinal motion information is conventionally thought to be important to signal self-motion. One particular type of self-motion has been studied intensively, especially in tethered animals: confronted with a rotating environment, most animals generate eye- or body-movements following this rotation. These rotational responses of the eyes and/or the body to visual motion were monitored and interpreted to compensate for deviations from an intended course of locomotion or an intended gaze direction. In this context the retinal motion is regarded as a disturbance that needs to be compensated (reviews: Götz, 1972; Taylor and Krapp, 2008; Borst, 2014). Although this view may be correct in many behavioral situations, it misses one important point: retinal image motion induced by self-motion of the animal is not just a nuisance, but may also be a highly relevant source of environmental information. In particular, fast flying animals, such as many insects, heavily rely on environmental information derived from optic flow, for instance, to avoid collisions with obstacles, to find a landing site and control landing maneuvers or when learning the landmark constellation around a goal and when later navigating towards this previously learnt site. However, also sitting animals may induce specific body, head, and eye movements for estimating distances to objects in their environment (for review see Collett and Zeil, 1996; Kral, 2003; Srinivasan, 2011; Egelhaaf et al., 2012; Zeil, 2012).

The working hypothesis of much of our recent research on insects, such as flies and bees (Egelhaaf et al., 2012) and, thus, the assumption underlying this article is that the output of the motion vision system combines two highly relevant cues of environmental information: nearness and contrast borders. As a consequence it segments the time-dependent retinal images into potentially relevant nearby structures and—in many behavioral contexts–potentially less relevant distant objects. On the one hand, we will argue that all this is likely to be accomplished by simple computational principles that have been conceptually lumped into a well-known and well-established computational model, i.e., the correlation-type movement detector (often also termed Hassenstein-Reichardt detector or elementary motion detector, EMD; Reichardt, 1961; Borst and Egelhaaf, 1989; Egelhaaf and Borst, 1993; Borst, 2000). On the other hand, we will sketch the current knowledge about how local motion information is further processed to guide orientation behavior.

# **INSECT MOTION DETECTION REFLECTS THE PROPERTIES OF THE ENVIRONMENT IN ADDITION TO VELOCITY**

The correlation-type motion detection scheme has been derived originally as a computational model on the basis of behavioral and electrophysiological experiments on insects (Reichardt, 1961; Egelhaaf and Borst, 1993; Borst et al., 2003; Borst, 2004; Lindemann et al., 2005; Straw et al., 2008; Brinkworth and O'Carroll, 2010; Meyer et al., 2011). Only recently, the computational principles are being decomposed on the circuit level. The neural networks and synaptic interactions underlying motion detection are investigated mainly in the fruitfly *Drosophila* by employing the sophisticated repertoire of novel genetic tools (e.g., Freifeld et al., 2013; Joesch et al., 2013; Maisak et al., 2013; Reiser and Dickinson, 2013; Silies et al., 2013; Tuthill et al., 2013; Behnia et al., 2014; Hopp et al., 2014; Mauss et al., 2014; Meier et al., 2014; Strother et al., 2014). Since we are focusing here especially on the overall output of the motion detection system, rather than on the cellular details of its internal structure, our considerations are mainly based on model analyses of EMDs. Variants of this computational model can account for many features of motion detection, as they manifest themselves in the activity of output cells of the motion vision pathway and even in the behavioral performance of the entire animal.

In its simplest form, an EMD is composed of two mirrorsymmetrical subunits (**Figure 1A**). In each subunit, the signals of adjacent light-sensitive cells receiving the filtered brightness signals from neighboring points in visual space are multiplied after one of them has been delayed. The final detector response is obtained by subtracting the outputs of two such subunits with opposite preferred directions, thereby considerably enhancing the direction selectivity of the motion detection circuit. Each motion detector reacts with a positive signal to motion in a given direction and with a negative signal to motion in the opposite direction (reviews: Reichardt, 1961; Borst and Egelhaaf, 1989, 1993). Various elaborations of this basic motion detection scheme have been proposed to account for the responses of insect motion-sensitive neurons under a wide range of stimulus conditions including even natural optic flow as experienced under free-flight conditions (e.g., Borst et al., 2003; Lindemann et al., 2005; Shoemaker et al., 2005; Brinkworth et al., 2009; Hennig et al., 2011; Hennig and Egelhaaf, 2012).

As a consequence of their computational structure, EMDs and their counterparts in the insect brain have a number of peculiar features that deviate in many respects from those of veridical velocity sensors. Therefore, they often have been interpreted as the consequence of a simple, but somehow deficient computational mechanism. The most relevant of these features are:


the sensitivity to temporal discontinuities in the retinal input (Maddess and Laughlin, 1985; Liang et al., 2008, 2011; Kurtz et al., 2009).

These response features of EMDs make their responses ambiguous with respect to a representation of the retinal velocity. Because these ambiguities, especially the contrast- and texturedependent response modulations, deteriorate the quality of representing pattern velocity, they have often been discussed as "pattern noise" (Dror et al., 2001; Shoemaker et al., 2005; Rajesh et al., 2006; O'Carroll et al., 2011) and, thus, as a limitation of the biological motion detection mechanism. Here we want to take an alternative stance by proposing that these patterndependent modulations of the movement detector output do not reflect noise in the context of velocity coding. Rather, they can be interpreted as being relevant from a functional point of view, as they reflect potentially useful information about the environment and, thus, may be relevant for visually guided orientation behavior (Meyer et al., 2011; O'Carroll et al., 2011; Hennig and Egelhaaf, 2012; Schwegmann et al., 2014; Ullrich et al., 2014b).

# **ENHANCING THE OVERALL POWER OF INSECT BRAINS: REDUCING COMPUTATIONAL LOAD BY ACTIVE VISION STRATEGY**

It is indispensable that the animal is active and moves to be able to use the environmental information provided by EMDs. This is because movement detectors do not respond in a stationary world if the animal is also stationary. However, not every type of self-motion is equally suitable for the brain to extract useful information about the environment from the image flow and, thus, from the EMD responses. Especially, if spatial information is concerned only the optic flow component generated by translational self-motion is useful. During pure translational selfmotion the retinal images of objects close to the observer move faster than those of more distant ones. More specifically, for a given translation velocity, retinal image velocity evoked by an

**FIGURE 2 | Saccadic flight strategy and variability of translational self-motion**. Saccadic flight and gaze strategy of free-flying bumblebees and honeybees. **(A)** Inset: Trajectory of a typical learning flight of a bumblebee as seen from above during a navigational task involving landmarks (black objects). Each line indicates a point in space and the corresponding viewing direction of the bee's head each 20 ms. The color code indicates time (given in ms after the start of the learning flight at the goal). Upper diagram: Angular orientation of longitudinal axis of body (black line) and head (red line) of a sample flight trajectory of a bumblebee during a learning flight after departing from a visually inconspicuous feeder surrounded by three landmarks. Note that step-like, i.e., saccadic direction changes are more pronounced for the head than for the body. Bottom diagram: Angular yaw velocity of body (black line) and head (red line) of the same flight

environmental object at a given viewing angle increases linearly with its nearness, i.e., the inverse of its distance. However, the retinal velocity of an object even at a given distance also depends on its viewing angle relative to the direction of motion: the optic flow vectors are maximal at 90◦ relative to the direction of motion and decrease according to a sine function from here towards the direction of self-motion, where they are zero. Hence, at this singular point, i.e., the direction in which the agent is heading, it is not possible to obtain nearness information. The geometrical situation differs much for pure rotational selfmovements of the agent. Then the retinal image displacements are independent of the distance to objects in the environment (Koenderink, 1986).

If locomotion is characterized by an arbitrary combination of translation and rotation, the optic flow field is more complex, and information about the spatial structure of the environment cannot readily be derived. Nevertheless, a segregation of the optic flow into its rotational and translational components can, at least in principle, be accomplished computationally for most realistic situations (Longuet-Higgins and Prazdny, 1980; Prazdny, 1980; Dahmen et al., 2000). However, such a computational strategy (Boeddeker et al., submitted; Data from Mertes et al., 2014). **(B)** Translational and rotational prototypical movements of honeybees during local landmark navigation. Flight sequences while the bee was searching for a visually inconspicuous feeder located between three cylindrical landmarks can be decomposed into nine prototypical movements using clustering algorithms in order to reduce the behavioral complexity. Each prototype is depicted in a coordinate system as explained by the inset. The length of each arrow determines the value of the corresponding velocity component. Percentage values provide the relative occurrence of each prototype. More than 80% of flight-time corresponds to a varied set of translational prototypical movements (light blue background) and less than 20% has significantly non-zero rotational velocity corresponding to the saccades (light red background) (Data from Braun et al., 2012).

is demanding, and it is not clear whether it can be pursued by a nervous system. Several insect species with their tiny brains appear to employ other computationally much more parsimonious strategies.

Specific combinations of rotatory and translatory self-motion may generate an optic flow pattern that contains useful spatial information. For instance, when the animal circles around a pivot point while fixating it, the retinal images of objects before and behind the pivot point move in opposite directions and, thus, provide distance information relative to the pivot point, rather than to the moving observer (Collett and Zeil, 1996; Zeil et al., 1996). Other insects generate pure translational self-motion to obtain distance information relative to the animal. For instance, mantids, dragonflies, and locusts, perform lateral body and head translations and employ the resulting optic flow for gaining distance information, when sitting in ambush to catch a prey or preparing for a jump (Collett, 1978; Sobel, 1990; Collett and Paterson, 1991; Kral and Poteser, 1997; Olberg et al., 2005). During flight, flies, wasps and bees reveal a distinctive behavior that is characterized by sequences of rapid saccade-like turns of body and head interspersed with virtually pure translational,

i.e., straight flight phases (Schilstra and Van Hateren, 1999; van Hateren and Schilstra, 1999; Mronz and Lehmann, 2008; Boeddeker et al., 2010, submitted; Braun et al., 2010, 2012; Geurten et al., 2010; Kern et al., 2012; van Breugel and Dickinson, 2012; Zeil, 2012). Saccadic gaze changes have a rather uniform time course and are shorter than 100 ms. Angular velocities of up to several thousand ◦ /s can occur during saccades (**Figures 2A**, **3A**). Rotational movements associated with body saccades are shortened for the visual system by coordinated head movements and roll rotations performed for steering purposes during sideways translations, are compensated by counter-directed head movements. As a consequence, the animal's gaze direction is kept virtually constant during intersaccades (Schilstra and Van Hateren, 1999; van Hateren and Schilstra, 1999; Boeddeker and Hemmi, 2010; Boeddeker et al., 2010, submitted; Braun et al., 2010, 2012; Geurten et al., 2010, 2012). Hence, turns that are essential to reach behavioral goals are minimized in duration and separated from translational flight phases in which the direction of gaze is kept largely constant. This peculiar time structure of insect flight facilitates the processing of distance information from the translational intersaccadic optic flow. With regard to gathering information about the outside world, it is highly relevant from a functional perspective that the intersaccadic translational motion phases last for more than 80% of the entire flight time (van Hateren and Schilstra, 1999; Boeddeker and Hemmi, 2010; Boeddeker et al., 2010; Braun et al., 2012; van Breugel and Dickinson, 2012). Still, the individual intersaccadic time intervals are short and usually last for only some ten milliseconds; they are only rarely longer than 100 to 200 ms in blowflies, for example (Kern et al., 2012). This characteristic dynamic feature of the active flight and gaze strategy of insects, thus, constrains considerably the timescales on which spatial information can be extracted from the optic flow patterns during flight, a fact the underlying neuronal mechanisms have to cope with (Egelhaaf et al., 2012).

Although the translational intersaccadic flight phases are diverse with regard to the direction and velocity of motion they appear to be adjusted to the respective behavioral context (**Figure 2B**; Braun et al., 2010, 2012; Dittmar et al., 2010; Geurten et al., 2010). This is especially true for the overall velocity of translational self-motion, although it does not change much during individual intersaccadic intervals (**Figure 3B**; Schilstra and Van Hateren, 1999; van Hateren and Schilstra, 1999; Boeddeker et al., 2010; Kern et al., 2012). For instance, insects tend to decelerate when their flight path is obstructed, and flight speed is thought to be controlled by optic flow generated during flight (David, 1979, 1982; Farina et al., 1995; Srinivasan et al., 1996; Kern and Varjú, 1998; Baird et al., 2005, 2006, 2010; Frye and Dickinson, 2007; Fry et al., 2009; Dyhr and Higgins, 2010; Straw et al., 2010; Kern et al., 2012). Thereby, they appear to regulate their intersaccadic translational flight velocity to keep the retinal velocities in the frontolateral visual field largely constant at a "preset" level (Baird et al., 2010; Portelli et al., 2011; Kern et al., 2012). This level appears to lie within the part of the operating range of the motion detection system where the response amplitude still increases with increasing retinal velocity (**Figure 3**; See Section Insect Motion Detection Reflects the Properties of the Environment in Addition to Velocity). These features are likely to be of functional significance from the perspective of spatial vision, because they help to reduce the ambiguities in extracting nearness information from the EMD outputs that represent the optic flow in the visual system. On the other hand, since insects may adjust their translational velocity to the behavioral context (see above, but also Srinivasan et al., 2000), no absolute nearness cues can be obtained by any mechanism extracting spatial information from optic flow: this is because a given retinal velocity and, thus, response level of a motion detection system may be obtained for different combinations of translation velocity and nearness. Hence, nearness information can be extracted only in relative terms, unless translation velocity is known. This implies that translation velocity should be kept constant, if from the response modulations of EMDs (See Section Insect Motion Detection Reflects the Properties of the Environment in Addition to Velocity) nearness information needs to be determined. If also the translation velocity varies, the resulting response modulations are ambiguous with regard to their origin: they could be a consequence of either changes in self-motion or the spatial structure of the surroundings.

# **REPRESENTATION OF CLUTTERED ENVIRONMENTS BY ARRAYS OF MOTION DETECTORS**

Insects provide the basis for representing computationally efficient environmental information from the optic flow generated during the intersaccadic intervals of largely translational selfmotion. However, optic flow information is not explicitly given at the retinal input. Rather, it needs to be computed from the spatiotemporal brightness fluctuations that are sensed by the array of photoreceptors of the retina. This is accomplished by local neural circuits residing in the visual neuropils. As explained in Section Insect Motion Detection Reflects the Properties of the Environment in Addition to Velocity the overall performance of these circuits can be lumped together and explained by variants of the correlation-type EMD. Despite the detailed knowledge at the cellular and computational level, the functional significance of the information provided by these movement detectors has not been clearly unraveled yet. Since EMDs are sensitive to velocity, they may exploit the different speeds of objects at different nearnesses during translational self-motion and, thus, may represent information about the depth structure of the environment. However, EMDs are also sensitive to textural features of the environment (See Section Insect Motion Detection Reflects the Properties of the Environment in Addition to Velocity). Is this pattern dependence of the EMD output just an unwanted by-product of a simple computational mechanism, or could it have any functional significance?

Recent model simulations of arrays of EMDs provided evidence that their pattern dependence may make sense from a functional perspective during translatory self-motion in cluttered natural environments. Although several experimental and modeling studies probed the insect motion vision system already before with moving natural images, they only employed image sequences that did not contain any depth structure and, thus, differed much from what an animal experiences in natural environments (Straw et al., 2008; Wiederman et al., 2008; Brinkworth et al., 2009; Barnett et al., 2010; Meyer et al., 2011; O'Carroll et al., 2011). The potential significance of the combined velocity and pattern dependence of correlation-type EMDs became obvious by comparing the activity profiles of EMD arrays induced by image sequences that were obtained from constantvelocity translational movements through a variety of cluttered natural environments containing the full depth information and after the depth structure of the environment was removed. For both types of situations, sample activity profiles of EMD arrays are shown in **Figure 4**. They differ much, because without depth structure all environmental objects move at the same velocity and, thus, lead to responses irrespective of their distance. It is obvious that the activity profile evoked by motion through the environment with its natural depth structure preserved is most similar not to the nearness map *per se*, but to the contrastweighted nearness map, which is the nearness multiplied by the contrast. However, the activity profile evoked by the artificially depth-removed image sequences matches best the contrast map (Schwegmann et al., 2014). This exemplary finding is corroborated by correlation analysis based on translatory motion through several different natural environments (**Figure 4**).

Hence, EMD arrays do not respond best to the retinal velocity *per se* and, thus, to the nearness of environmental structures, but to the contrast-weighted nearness. This means that, during translational self-motion in natural environments, the arrays of EMDs represent to a large degree the nearness of high-contrast contours of objects. This conclusion holds true as long as the translational velocity varies only little and, thus, does not induce timedependent response changes on its own (See Section Enhancing the Overall Power of Insect Brains: Reducing Computational Load by Active Vision Strategy). As mentioned above, this condition is met to a large extent for the short time of most intersaccadic intervals (**Figure 3B**; Schilstra and Van Hateren, 1999; van Hateren and Schilstra, 1999; Kern et al., 2012). By representing the contours of nearby objects, the distinctive feature of EMDs to jointly represent contrast and nearness information may make perfect sense from a functional point of view. Cluttered spatial sceneries are segmented in this way, without much computational expenditure, into nearby and distant objects. This finding underlines the notion that the mechanism of motion detection has been tweaked by evolution to allow the tiny brains of insects to gather behaviorally relevant information in a computationally efficient way.

However, motion measurements cannot be made instantaneously. As is reflected by the time constants that are an

integral constituent of any motion detection mechanism including correlation-type EMDs, it may take some time until reliable motion information and, thus, spatial cues can be extracted from their responses. This may be a challenge as the uninterrupted translational movement phases during intersaccadic intervals are short, ranging between 30 ms up to little more than 100 ms (**Figure 3B**). It takes few milliseconds after a change from a saccadic rotation to an intersaccadic translational movement for the EMD response to reach a kind of steady-state level. This finding indicates that the initial part of a translational sequence cannot be used by the animal for a reliable estimation of nearness information from the EMD responses (Schwegmann et al., 2014). Even under such constraints the duration of most intersaccadic intervals appears to be long enough to allow for extracting spatial information from the optic flow patterns on the eyes.

In conclusion, during constant-velocity translatory locomotion the largest responses of the motion detection system are induced by contrast borders of nearby objects. Hence, it appears to be of functional significance that insects, such as flies and bees move essentially straight for more than 80% of their flight time and change their direction by interspersed saccadic turns of variable amplitude (**Figure 2B**). Since translation velocity does not change much during intersaccadic intervals, the output of the motion detection system during individual intersaccadic intervals highlights contrast borders of nearby objects. Thus, what has been conceived often to be a limitation of the insect motion detection system may turn out to be a means that allows in combination with the active flight and gaze strategy—to parse the environment into near and far and, at the same time, enhance the representation of object borders in a computationally extremely parsimonious way. By combining contrast edge information and motion-based segmentation of the scene in a single representation, the insect vision pathway may reflect an elegant and computationally parsimonious mechanism for cue integration.

In computer vision optic flow is also used for segmentation purposes as well as for solving other spatial vision tasks, such as the recovery of the shape and relative depth of threedimensional surface structures or the determination of the timeto-collision to an obstacle and the position of the focus of expansion to detect the heading direction (Beauchemin and Barron, 1995; Zappella et al., 2008). Since quite some time, a variety of approaches to optic flow computation has been proposed and applied to robotic applications. These algorithms are based on different assumptions on image motion and operate on different image representations, e.g., directly on the gray level values or the edges in the image sequences (Beauchemin and Barron, 1995; Fleet and Weiss, 2005). In contrast to EMDs that provide jointly information about motion and contrast edges during translatory motion, these technical optic flow approaches have in common that they attempt to estimate the optic flow field veridically, i.e., the flow vectors (up to a scaling factor) according to their velocity in the image plane. If applied to natural image sequence this, however, proofed to be possible to only some extent and erroneous velocity estimates are a common result depending on the pattern properties of the sceneries (Barron et al., 1994; McCarthy and Barnes, 2004). To what extent segmentation algorithms which compute segment borders from discontinuities in a dense field of optic flow estimates as provided by the various computer vision algorithms (Zappella et al., 2008) may be also applicable for computing segmentations based on a motion image computed by EMDs remains to be tested.

# **EXPLOITATION OF ENVIRONMENTAL INFORMATION FROM MOTION DETECTORS BY DOWNSTREAM MECHANISMS**

Is the environmental information provided by the insect motion detection system during the translational phases of intersaccadic intervals really used by downstream processes in the nervous system and does it eventually play a role in controlling orientation behavior? Answers to this question can only be tentative so far, although it is suggested by two lines of evidence that the EMD-based environmental information might be functionally relevant. On the one hand, detailed knowledge is available of the computational properties of one neural pathway processing the information provided by the arrays of local motion detectors. On the other hand, behavioral studies and current modeling attempts suggest that the motion-based information about the environment may well be exploited for solving behavioral tasks such as collision avoidance and landmark navigation. Both aspects will be dealt with briefly in the following.

# **CONSEQUENCES OF SPATIAL INTEGRATION**

The output of the local motion sensitive elements in insects are spatially pooled to a varying degree in one neural pathway depending on the computational tasks that are being solved (Hausen, 1981; Krapp, 2000; Borst and Haag, 2002; Egelhaaf, 2006; Borst et al., 2010). However, spatial pooling inevitably reduces the precision with which a moving stimulus can be localized. Although this might appear, at least at first sight, to be a disadvantage, this is not necessarily the case. The determination of self-motion of the animal is one obvious task of motion vision systems. In this case, the retinal motion should not be localized, but rather only few output variables, i.e., of its translational as well as rotational velocities, are to be computed from the global optic flow. Information about self-motion is thought to be relevant for solving tasks such as, for instance, attitude control during flight, the compensation of involuntary disturbances by corrective steering maneuvers or the determination of the direction of heading (Dahmen et al., 2000; Lappe, 2000; Vaina et al., 2004; Taylor and Krapp, 2008; Egelhaaf et al., 2012). Accordingly, spatial pooling of local motion information over relatively large parts of the visual system as is done by wide-field cells (LWCs) in the lobula complex of insects enhances the specificity of the system for different types of self-motion (Hausen, 1981; Krapp et al., 1998, 2001; Franz and Krapp, 2000; Horstmann et al., 2000; Dror et al., 2001; Karmeier et al., 2003; Franz et al., 2004; Wertz et al., 2009).

In contrast, if information about the spatial layout of the environment is required, it might be relevant to localize objects together with their nearness to the animal. Then spatial pooling over only a relatively small spatial area will be acceptable. Integration of the outputs of neighboring EMDs was found to increase considerably the reliability with which the boundaries of nearby objects are represented in the activity profile of EMDs; pooling of the direct and second neighbors is already sufficient. Increasing the pooling area further does not increase the contrast-weighted nearness information significantly, but reduces the localizability of environmental features to a spatial range as given by the receptive field size of the pooling neuron (**Figure 5;** Schwegmann et al., 2014). Spatial pooling across larger areas of the visual field provides only information about the averaged spatial information within the pooling areas during translational self-motion without being able to localize environmental features within this area of the visual field.

Experimentally most information about how the spatial layout of the environment might be represented by the visual motion pathway during translational self-motion is available from recent experiments on LWCs, those neurons that have usually been conceived as sensors for self-motion estimation because of their relatively large receptive fields (see above). However, individual LWCs are far from being ideal for self-motion estimation as their receptive fields are spatially clearly restricted and show distinct spatial sensitivity peaks. Accordingly, they show pronounced response modulations even during constantvelocity motion resulting from textural features of the environment (Meyer et al., 2011; O'Carroll et al., 2011; Ullrich et al., 2014b). In addition, the responses of LWCs provide information about the spatial layout of the environment—at least on a coarse spatial scale, but even on the short timescale of intersaccadic intervals: the intersaccadic response amplitudes evoked by ego-perspective movies were found to depend on the distance to the walls of the flight arena in which the corresponding behavioral experiments were performed or on objects that were inserted close to the flight trajectory (Boeddeker et al., 2005; Kern et al., 2005; Karmeier et al., 2006; Liang et al., 2008, 2012; Hennig and Egelhaaf, 2012). Moreover, LWC responses are found to reflect the overall depth structure of different natural

environments (**Figure 6;** Ullrich et al., 2014a). Recently, it could even been shown that the intersaccadic responses of bee LWCs to visual stimuli as experienced during navigation flights in the vicinity of a goal strongly depend on the spatial layout of the environment. The spatial landmark constellation that guides the bees to their goal leads to a characteristic time-dependent response profile in LWCs during the intersaccadic intervals of navigation flights (Mertes et al., 2014).

What is the range within which spatial information is represented on the basis of motion information? Under spatially constrained conditions with the flies flying at translational velocities of only slightly more than 0.5 m/s, the spatial range within which significant distance dependent intersaccadic responses are evoked amounts to approximately two meters (Kern et al., 2005; Liang et al., 2012). Since a given retinal velocity is determined in a reciprocal way by distance and velocity of selfmotion, respectively, the spatial range that is represented by LWCs can be expected to increase with increasing translational velocity. Accordingly, at higher translation velocities as are characteristic of flights under spatially less constrained conditions the spatial range within which environmental objects lead to significant intersaccadic response increments is extended to a few more meters (Ullrich et al., 2014a). From an ecological perspective it appears to be economical and efficient that the behaviorally relevant spatial range that is represented by motion detection systems scales with locomotion velocity: a fast moving animal can thus initiate an avoidance maneuver at a greater distance from an obstacle than when moving slowly.

We can conclude from this experimental evidence that during translational self-motion as is characteristic of the intersaccadic flight phases of flies and bees that even motion sensitive cells with relatively large receptive fields provide spatial information about the environment. Although it is still not clear to what extent this information is exploited for behavioral control (see below), its potential functional significance is underlined by the fact that the object-induced responses observed during intersaccadic intervals are further increased relative to the background activity of the cell as a consequence of motion adaptation (Liang et al., 2008, 2011, 2012; Ullrich et al., 2014b).

# **BEHAVIORAL SIGNIFICANCE OF MOTION-BASED SPATIAL INFORMATION**

Fast flying animals, such as many insects, need to respond to environmental cues often already at some distance, for instance, when they have to evade a potential obstacle in their flight path or when using objects as landmarks in guiding them to a previously learnt goal location. Then optic flow is likely to be the most relevant cue to provide spatial information. Accordingly, motion cues have been implicated on the basis of many behavioral analyses to be decisive in controlling behavioral components of flying insects. Optic flow processing determines several aspects of the landing behavior (Wagner, 1982; Lehrer et al., 1988; Srinivasan et al., 1989, 2001; Kimmerle et al., 1996; Evangelista et al., 2010; van Breugel and Dickinson, 2012; Baird et al., 2013), and is used for flower distance estimation and tracking (Lehrer et al., 1988; Kern and Varjú, 1998). Insects also seem to exploit retinal motion in the context of collision avoidance (Tammero and Dickinson, 2002a,b; Reiser and Dickinson, 2003; Lindemann et al., 2008, 2012; Kern et al., 2012; van Breugel and Dickinson, 2012; Lindemann and Egelhaaf, 2013). Moreover, insects, such as bees and wasps, show a rich repertoire of visual navigation behavior employing motion cues on a wide range of spatial scales. When a large distance to a goal needs to be spanned, odometry, i.e., determining flown distances, based on optic flow cues is a central constituent of navigation mechanisms of bees (Srinivasan et al., 1997; Esch et al., 2001; Si et al., 2003; Tautz et al., 2004; Wolf, 2011; Eckles et al., 2012). However, even if the animal is already in the vicinity of its goal it can use spatial cues based on optic flow to find the goal (Zeil, 1993b; Lehrer and Collett, 1994; Dittmar et al., 2010, 2011), although also textural and other cues play an important role in local navigation (Collett et al., 2002, 2006; Zeil et al., 2009; Zeil, 2012). Bees even seem to orchestrate their flights in specific ways that facilitate gathering spatial information by intersaccadic movements with a strong sideways component (Lehrer, 1991; Zeil et al., 2009; Dittmar et al., 2010; Braun et al., 2012; Collett et al., 2013; Philippides et al., 2013; Riabinina et al., 2014; Boeddeker et al., submitted).

Turns, at least of flies and bees, are thought in most behavioral contexts including collision avoidance behavior to be accomplished in a saccadic fashion. Hence, understanding the mechanisms underlying collision avoidance means understanding by what visual input during an intersaccadic interval evasive saccades are elicited. There is consensus that intersaccadic optic flow plays a decisive role in controlling the direction and amplitude of saccades in this behavioral context. Despite discrepancies in detail, all proposed mechanisms of evoking saccades rely on extracting asymmetries between the optic flow patterns in front of the two eyes. Asymmetries may be due to the location of the expansion focus in front of one eye or to a difference between the overall optic flow in the visual fields of the two eyes (Tammero and Dickinson, 2002b; Lindemann et al., 2008, 2012; Mronz and Lehmann, 2008; Kern et al., 2012; Lindemann and Egelhaaf, 2013). Not all parts of the visual field have been concluded to be involved in saccade control of blowflies in the context of collision avoidance. The intersaccadic optic flow in the lateral parts of the visual field does not play a role in determining saccade direction (Kern et al., 2012). This feature appears to be functional as blowflies during intersaccades fly mainly forwards with only relatively small sideways components occurring mainly directly after saccades. These sideways components shift the pole of expansion of the flow field slightly towards frontolateral locations (Kern et al., 2012). In contrast, in *Drosophila*, which often hover and fly sideways (Ristroph et al., 2009), the optic flow and, thus, the spatial information sensed in lateral and even rear parts of the visual field has been concluded to be also involved in saccade control in the context of collision avoidance (Tammero and Dickinson, 2002b).

Nonetheless, systematic analyses based on models of LWCs with EMDs as their input revealed difficulties with regard to collision avoidance performance of a simulated insect arising from the contrast and texture dependence of the local motion detectors (Lindemann et al., 2008, 2012; Lindemann and Egelhaaf, 2013). The difficulties with these models can be reduced to some extent by implementing contrast normalization in the peripheral visual system (Babies et al., 2011). Recent modeling based on a somewhat different approach indicates an even more robust solution to the problem. Here, a spatial profile of the environment is determined along the horizontal extent of the visual field from local EMD-based motion measurements. The motion measurements are performed during short intersaccadic translatory flight segments. Although this spatial profile does not represent pure nearness information, but also the contours of nearby environmental structures (**Figure 7**), it allows determining a locomotion vector that points in the direction which makes a

**FIGURE 7 | Collision avoidance while heading for a goal**. The model insect starts at the left at three different positions (colored arrows) in two different cluttered environments (top and bottom diagrams). The goal is indicated at the right of the environment. The three resulting trajectories in each environment are given in red, blue and black. The objects as seen from above are indicated by black rectangles. The walls enclosing the environment are represented by thick black lines. The walls and the objects were covered with the same random texture. Direction of locomotion is indicated by arrows underneath trajectories (Data from Bertrand et al., submitted).

collision least likely and, thus, allows, under most circumstances, to avoid colliding with obstacles. This is even true when the objects are camouflaged by being covered with the same texture as the background of the environment (Bertrand et al., submitted). If the collision avoidance algorithm is combined with an overall goal direction, leading for example to a previously learnt food source or a nest, the model insect tends to move on quite similar trajectories to the goal through a heavily cluttered environment irrespective of the exact starting conditions by employing just the local motion-based collision avoidance mechanism, but no genuine route knowledge (**Figure 7**). It is interesting to note that these trajectories are reminiscent of routes of ants heading for their nest hole from different starting locations that are usually interpreted within the conceptual framework of navigation mechanisms (Wehner, 2003; Kohler and Wehner, 2005).

Whereas collision avoidance and landing are spatial tasks that must be solved by any flying insect, local navigation is relevant especially for particular insects, such as bees, wasps and ants, which care for their brood and, thus, have to return to their nest after foraging. Apart from finding without collisions a way towards the area where the goal may reside, motion information may be employed to determine the exact goal location by using the spatial configuration of objects, i.e., landmarks located in the vicinity of the goal (Lehrer, 1991; Zeil, 1993a,b; Lehrer and Collett, 1994; Collett and Zeil, 1996; Zeil et al., 2009; Dittmar et al., 2010, 2011; Braun et al., 2012; Collett et al., 2013; Philippides et al., 2013; Boeddeker et al., submitted). Motion information is especially relevant, if the landmarks are largely camouflaged by similar textural properties as those of the background (Dittmar et al., 2010). Information about the landmark constellation around the goal is memorized during elaborate learning flights: the animal flies characteristic sequences of ever increasing arcs while facing the area around the goal. During these learning flights, the animal is thought to gather relevant information about the spatial relationship of the goal and its surroundings. This information is subsequently used to relocate the goal when returning to it after an excursion (Collett et al., 2002, 2006; Zeil et al., 2009; Zeil, 2012). The mechanisms by which information about the landmark constellation is learnt and subsequently used to localize the goal are still controversial. However, optic flow information is likely to be required to detect texturally camouflaged landmarks and to derive spatial cues that are generated actively during the intersaccadic intervals of translational flight. Also textural cues characterizing the landmarks seem to be relevant for localizing the goal, since bees were found to adjust their flight movements in the vicinity of the landmarks according to the landmarks' specific textural properties (Dittmar et al., 2010; Braun et al., 2012). It remains to be shown in future behavioral experiments and model analyses, whether the optic flow information and textural cues relevant for navigation performance can be accounted for on the basis of the joint velocity and texture dependence of biological movement detectors and of EMDs as their model equivalents. Alternatively, mechanisms may be required that process optic flow and environmental texture separately and combine both cues only at a later processing stage.

# **CONCLUSIONS**

The nearness of objects is reflected in the optic flow generated on the eyes during translational self-motion as is characteristic of the intersaccades of insect flight. In many behavioral contexts nearby objects are particularly relevant. Examples are obstacles that need to be evaded, landing sites, or landmarks that indicate the location of an inconspicuous goal. The main assumption of this review is that the behaviorally highly relevant spatial information can be gained without sophisticated computational mechanisms from the optic flow generated as a consequence of translational locomotion through the environment.

However, movement detectors as are widespread in biological systems and can be modeled by correlation-type EMDs do not represent veridically the velocity vectors of the optic flow, but rather also reflect textural information of the environment. This distinguishing feature has often been regarded as nothing but a nuisance of a simple motion detection mechanism. This opinion has been challenged recently by analyzing motion detectors with image flow as generated during translational movements through a wide range of cluttered natural environments. On this basis, the texture information has been suggested to be potentially of functional significance, because it basically reflects the contours of nearby objects. Contrast borders are thought for long to be the main carrier of functionally relevant information about objects in artificial and natural sceneries. This is evidenced by the well-established finding that contrast borders are enhanced by early visual processing in biological visual systems including that of primates (e.g., Marr, 1982; van Hateren and Ruderman, 1998; Simoncelli and Olshausen, 2001; Seriès et al., 2004; Girshick et al., 2011; Berens et al., 2012). One major function of this type of peripheral information processing is thought to be the enhancement of contrast borders at the expense of the overall brightness of the image, but also redundancy reduction in images. Independent of the particular conceptual framework, enhancing contrast borders is seen as advantageous with regard to representing visual environments.

The main conclusion of this paper is that the motion vision system of insects combines both nearness and contour information and preferentially represents contrast borders of nearby environmental structures and/or objects during translatory self-motion. It makes just use of the fact that in normal behavioral situations all this information is only required when an animal is moving. Then the motion vision system segregates, in a computationally parsimonious way, the environment into behaviorally relevant nearby objects and—at least in many behavioral contexts—less relevant distant structures. This characteristic matches—as we think—one major task of the motion detection system, to provide behaviorally relevant behavioral information about the environment, rather than only to extract the velocity of self-motion or the velocity of moving objects. Based on this conclusion, motion detection should not be conceptualized exclusively in the context of velocity representation, which is certainly important in many contexts, but also in the context of gathering behaviorally relevant information about the environment.

#### **ACKNOWLEDGMENTS**

The research of our group has been generously supported by the Deutsche Forschungsgemeinschaft (DFG). We also acknowledge the support for the publication fee by the Deutsche Forschungsgemeinschaft and the Open Access Publication Funds of Bielefeld University.

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 August 2014; accepted: 05 October 2014; published online: 28 October 2014*.

*Citation: Egelhaaf M, Kern R and Lindemann JP (2014) Motion as a source of environmental information: a fresh view on biological motion computation by insect brains. Front. Neural Circuits 8:127. doi: 10.3389/fncir.2014.00127*

*This article was submitted to the journal Frontiers in Neural Circuits*.

*Copyright © 2014 Egelhaaf, Kern and Lindemann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

# TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

# COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org